Fairness

The first principle of the FUTURE-AI guidelines is the one of Fairness, which states that medical AI algorithms should maintain the same performance when applied to similarly situated individuals (individual fairness) and across subgroups of individuals, including under-represented groups (group fairness). Healthcare, which is an expensive but critical service for society, should be provided equally for all patients independently of their gender, ethnicity, income and geography. AI algorithms should not exacerbate existing health disparities, but instead should facilitate and enhance access to high-quality radiology services for all individuals and groups. Medical AI algorithms should be built such that they address common as well as hidden biases in training datasets. To assess and achieve fairness when developing medical AI algorithms, we propose the following specific recommendations:

  1. Inter-disciplinarity: The development of the AI algorithms should take into account diverse perspectives brought by multi-disciplinary teams comprising AI developers, radiologists and specialists, but also patients and social scientists (e.g. ethicists).
  2. Understanding bias: In collaboration with domain experts, potentially hidden and application-specific sources of bias (e.g. under-representation of high breast densities in breast imaging datasets) should be carefully analysed and identified beyond standard categories such as sex or ethnicity.
  3. Metadata labelling: At data collection, metadata such as sex, age, ethnicity and income should be included and documented for both the training and testing datasets, to allow for the evaluation and optimisation of algorithmic fairness.
  4. Estimating data (im)balance: The diversity, distribution and balance across diverse patient groups in the datasets should be carefully inspected and reported at training and testing to identify potential biases and apply appropriate corrective measures.
  5. Multi-centre datasets: AI models should be trained and tested on multi-centre datasets to account for differences in populations, resources and geographies across radiology centres.
  6. Fairness evaluation: Algorithmic fairness should be thoroughly and continuously evaluated as an integral part of the AI evaluation process, by using dedicated datasets with adequate diversity, as well as dedicated metrics such as Statistical Parity, Equalised Odds and Predictive Equality.
  7. Fairness optimisation: When bias is detected, corrective measures should be investigated such as re-sampling, generative learning or equalised post-processing, to neutralise discriminatory effects and optimise the fairness of the AI algorithm.
  8. Information and training on fairness: Adequate information and training material should be provided to raise awareness and inform end-users on the fairness, biases and limitations of the AI algorithm.