Explainability

The explainability principle states that medical AI tools should provide clinically meaningful information about the logic behind the AI decisions. Although medicine is a high stake discipline that requires transparency, reliability and accountability, machine learning techniques often produce complex models that are black box in nature. Explainability is considered desirable from a technological, medical, ethical, legal, and patient perspective. It enables end users to interpret the AI model and outputs, understand the capacities and limitations of the AI tool, and intervene when necessary, such as to decide to use it or not. However, explainability is a complex task that has challenges that need to be carefully addressed during AI development and evaluation to ensure that AI explanations are clinically meaningful and beneficial to end users. Two recommendations for explainability are defined in the FUTURE-AI framework.

Recommendations Operations Examples
Define the need and requirements for explainability with end users (explainability 1) Engage end users to define explainability requirements Clinicians, technicians, patients
Specify if explainability is necessary Not necessary for AI enabled image segmentation part, critical for AI enabled diagnosis
Specify the objectives of AI explainability (if it is needed) Understanding AI model, aiding diagnostic reasoning, justifying treatment recommendations
Define suitable explainability approaches Visual explanations, feature importance, counterfactuals
Adjust the design of the AI explanations for all end user subgroups Heatmaps for clinicians, feature importance for patients
Evaluate explainability (explainability 2) Assess if explanations are clinically meaningful Reviewing by expert panels, alignment to current clinical guidelines, explanations not pointing to shortcuts
Assess explainability quantitatively using objective measures Fidelity, consistency, completeness, sensitivity to noise
Assess explainability qualitatively with end users Using user tests or questionnaires to measure confidence and affect clinical decision making
Evaluate if explanations cause end user overconfidence or overreliance Measure changes in clinician confidence, performance with and without AI tool
Evaluate if explanations are sensitive to input data variations Stress tests under perturbations to evaluate the stability of explanations