Explainability

The explainability principle states that medical AI tools should provide clinically meaningful information about the logic behind the AI decisions. Although medicine is a high stake discipline that requires transparency, reliability and accountability, machine learning techniques often produce complex models that are black box in nature. Explainability is considered desirable from a technological, medical, ethical, legal, and patient perspective. It enables end users to interpret the AI model and outputs, understand the capacities and limitations of the AI tool, and intervene when necessary, such as to decide to use it or not. However, explainability is a complex task that has challenges that need to be carefully addressed during AI development and evaluation to ensure that AI explanations are clinically meaningful and beneficial to end users. Two recommendations for explainability are defined in the FUTURE-AI framework.

Recommendations	Operations	Examples
Define the need and requirements for explainability with end users (explainability 1)	Engage end users to define explainability requirements	Clinicians, technicians, patients
	Specify if explainability is necessary	Not necessary for AI enabled image segmentation part, critical for AI enabled diagnosis
	Specify the objectives of AI explainability (if it is needed)	Understanding AI model, aiding diagnostic reasoning, justifying treatment recommendations
	Define suitable explainability approaches	Visual explanations, feature importance, counterfactuals
	Adjust the design of the AI explanations for all end user subgroups	Heatmaps for clinicians, feature importance for patients
Evaluate explainability (explainability 2)	Assess if explanations are clinically meaningful	Reviewing by expert panels, alignment to current clinical guidelines, explanations not pointing to shortcuts
	Assess explainability quantitatively using objective measures	Fidelity, consistency, completeness, sensitivity to noise
	Assess explainability qualitatively with end users	Using user tests or questionnaires to measure confidence and affect clinical decision making
	Evaluate if explanations cause end user overconfidence or overreliance	Measure changes in clinician confidence, performance with and without AI tool
	Evaluate if explanations are sensitive to input data variations	Stress tests under perturbations to evaluate the stability of explanations