The explainability principle states that medical AI tools should provide clinically meaningful information about the logic behind the AI decisions. Although medicine is a high stake discipline that requires transparency, reliability and accountability, machine learning techniques often produce complex models that are black box in nature. Explainability is considered desirable from a technological, medical, ethical, legal, and patient perspective. It enables end users to interpret the AI model and outputs, understand the capacities and limitations of the AI tool, and intervene when necessary, such as to decide to use it or not. However, explainability is a complex task that has challenges that need to be carefully addressed during AI development and evaluation to ensure that AI explanations are clinically meaningful and beneficial to end users. Two recommendations for explainability are defined in the FUTURE-AI framework.
Recommendations | Operations | Examples |
---|---|---|
Define the need and requirements for explainability with end users (explainability 1) | Engage end users to define explainability requirements | Clinicians, technicians, patients |
Specify if explainability is necessary | Not necessary for AI enabled image segmentation part, critical for AI enabled diagnosis | |
Specify the objectives of AI explainability (if it is needed) | Understanding AI model, aiding diagnostic reasoning, justifying treatment recommendations | |
Define suitable explainability approaches | Visual explanations, feature importance, counterfactuals | |
Adjust the design of the AI explanations for all end user subgroups | Heatmaps for clinicians, feature importance for patients | |
Evaluate explainability (explainability 2) | Assess if explanations are clinically meaningful | Reviewing by expert panels, alignment to current clinical guidelines, explanations not pointing to shortcuts |
Assess explainability quantitatively using objective measures | Fidelity, consistency, completeness, sensitivity to noise | |
Assess explainability qualitatively with end users | Using user tests or questionnaires to measure confidence and affect clinical decision making | |
Evaluate if explanations cause end user overconfidence or overreliance | Measure changes in clinician confidence, performance with and without AI tool | |
Evaluate if explanations are sensitive to input data variations | Stress tests under perturbations to evaluate the stability of explanations |