The first principle of the FUTURE-AI guidelines is the one of Fairness, which states that medical AI algorithms should maintain the same performance when applied to similarly situated individuals (individual fairness) and across subgroups of individuals, including under-represented groups (group fairness). Healthcare, which is an expensive but critical service for society, should be provided equally for all patients independently of their gender, ethnicity, income and geography. AI algorithms should not exacerbate existing health disparities, but instead should facilitate and enhance access to high-quality radiology services for all individuals and groups. Medical AI algorithms should be built such that they address common as well as hidden biases in training datasets. To assess and achieve fairness when developing medical AI algorithms, we propose the following specific recommendations:

  1. Fairness definition: Definitions and requirements for fairness are application-specific and should be compiled for each AI application. Requirements for AI fairness should be compiled during the early stage of the AI production lifecycle, including the definition of possible sources of bias (e.g. selection bias, confounders), as well as the specification of potential countermeasures.
  2. Metadata labelling: Training and testing data should be collected and report with their subject characteristics (e.g. age, sex, ethnicity, confounders) to assess bias and fairness
  3. Fairness evaluation: Algorithmic fairness should be thoroughly and continuously evaluated as an integral part of the AI evaluation process, by using dedicated datasets with adequate diversity, as well as dedicated metrics such as Statistical Parity, Equalised Odds and Predictive Equality.
  4. Fairness optimisation: When bias is detected, corrective measures should be investigated such as re-sampling, generative learning or equalised post-processing, to neutralise discriminatory effects and optimise the fairness of the AI algorithm.