Robustness

The robustness principle refers to the ability of a medical AI tool to maintain its performance and accuracy under expected or unexpected variations in the input data. Existing research has shown that even small, imperceptible variations in the input data might lead AI models into incorrect decisions. Biomedical and health data can be subject to major variations in the real world (both expected and unexpected), which can affect the performance of AI tools. Therefore, it is important that healthcare AI tools are designed and developed to be robust against real world variations, and evaluated and optimised accordingly. To this end, three recommendations for robustness are defined in the FUTURE-AI framework.

Recommendations Operations Examples
Define sources of data variations (robustness 1) Engage relevant stakeholders to assess data heterogeneity Clinicians, technicians, data managers, IT managers, radiologists, device vendors
Identify equipment related data variations Differences in medical devices, manufacturers, calibrations, machine ranges (from low cost to high end)
Identify protocol related data variations Differences in image sequences, data acquisition protocols, data annotation methods, sampling rates, preprocessing standards
Identify operator related data variations Different in experience and proficiency, operator fatigue, subjective judgment, technique variability
Identify sources of artefacts and noises Image noise, motion artefacts, signal dropout, sensor malfunction
Identify context specific data variations Lower data quality acquisition in emergency units, during high patient volume times
Train with representative real world data (robustness 2) Collect training data that reflect the demographic variations According to age, sex, ethnicity, socioeconomics
Collect training data that reflect the clinical variations Disease subgroups, treatment protocols, clinical outcomes, rare cases
Collect training data that reflect variations in real world practice Data acquisition protocols, data annotations, medical equipment, operational variations (eg, patient motion during scanning)
Artificially enhance the training data to mimic real world conditions Data augmentation, data synthesis (eg, low quality data, noise addition), data harmonisation, data homogenisation
Evaluate and optimize robustness against real world variations (robustness 3) Evaluate robustness under real world variations Using test-retest datasets, multivendor datasets
Evaluate robustness under simulated variations Using simulated repeatability tests, synthetic noise and artefacts (eg, image blurring)
Evaluate robustness against variations in end users Different technicians or annotators
Evaluate mitigation measures for robustness enhancement Regularisation, data augmentation, noise addition, normalisation, resampling, domain adaptation