The sixth and final principle of the FUTURE-AI guidelines is the one of Explainability, which states that medical AI algorithms should be able to provide meaningful and actionable explanations to the clinicians for their predictions. Explainability provides insight into the algorithmic mechanisms behind AI decision making processes thereby allowing for clinical validation and scrutinisation of these decisions. While local explainability highlights the reasons behind a particular prediction by the AI model for an individual image, global explanations identify the common characteristics that the AI model considers important for a particular image analysis task. Attribution maps (or heat-maps) are commonly used visual methods for explainability in medical AI, which highlight the relevant regions on the input image that the AI model considers important. To assess and achieve explainability in medical AI, we recommend the following quality checks:

  1. Explainability requirements: During requirement analysis, different options for explainability should be presented to the clinicians and end-users (e.g. using dummy examples) to select the most adequate explainability methods for the AI tool in question.
  2. Semantic annotations: When possible, semantic annotations that are relevant for the clinical task should be extracted from the images (e.g. texture, margins, shape, vascularity) and included as parameters of the AI algorithm to explicitly introduce a level of clinical interpretability.
  3. Multiple explanation methods: Multiple explainability methods that provide complementary explanations should be explored (e.g. local + global explainability) for a more holistic understanding the AI-driven decision-making process.
  4. Explainable biomarkers: To increase clinical value, the AI developers and clinical teams should evaluate if the explainability methods enable to identify variables, structures or patterns that can serve as novel or known biomarkers.
  5. Qualitative evaluation of explainability: The quality and utility of the AI explanations derived by the implemented explainability method(s) should be qualitatively assessed with expert clinicians and other end-users, such as by using the System Causability Scale.
  6. Quantitative evaluation of explainability: Detailed quantitative evaluation of the explainability methods should be performed by using dedicated metrics such as the Attribution Sum and Density, or the Area Over the Perturbation Curve.
  7. Consistent explainability: The AI model and its explainability methods should be tested against adversarial examples to assess if the explanations remain consistent when the input images are subjected to small input perturbations.
  8. Explainability in practice: The impact in clinical practice of explainability (e.g. improved or biased decision making) should be evaluated by performing in silico collaborative human-AI studies that compare the clinician’s work using the AI tool with and without explanations.