The robustness principle refers to the ability of a medical AI tool to maintain its performance and accuracy under expected or unexpected variations in the input data. Existing research has shown that even small, imperceptible variations in the input data might lead AI models into incorrect decisions. Biomedical and health data can be subject to major variations in the real world (both expected and unexpected), which can affect the performance of AI tools. Therefore, it is important that healthcare AI tools are designed and developed to be robust against real world variations, and evaluated and optimised accordingly. To this end, three recommendations for robustness are defined in the FUTURE-AI framework.
Recommendations | Operations | Examples |
---|---|---|
Define sources of data variations (robustness 1) | Engage relevant stakeholders to assess data heterogeneity | Clinicians, technicians, data managers, IT managers, radiologists, device vendors |
Identify equipment related data variations | Differences in medical devices, manufacturers, calibrations, machine ranges (from low cost to high end) | |
Identify protocol related data variations | Differences in image sequences, data acquisition protocols, data annotation methods, sampling rates, preprocessing standards | |
Identify operator related data variations | Different in experience and proficiency, operator fatigue, subjective judgment, technique variability | |
Identify sources of artefacts and noises | Image noise, motion artefacts, signal dropout, sensor malfunction | |
Identify context specific data variations | Lower data quality acquisition in emergency units, during high patient volume times | |
Train with representative real world data (robustness 2) | Collect training data that reflect the demographic variations | According to age, sex, ethnicity, socioeconomics |
Collect training data that reflect the clinical variations | Disease subgroups, treatment protocols, clinical outcomes, rare cases | |
Collect training data that reflect variations in real world practice | Data acquisition protocols, data annotations, medical equipment, operational variations (eg, patient motion during scanning) | |
Artificially enhance the training data to mimic real world conditions | Data augmentation, data synthesis (eg, low quality data, noise addition), data harmonisation, data homogenisation | |
Evaluate and optimize robustness against real world variations (robustness 3) | Evaluate robustness under real world variations | Using test-retest datasets, multivendor datasets |
Evaluate robustness under simulated variations | Using simulated repeatability tests, synthetic noise and artefacts (eg, image blurring) | |
Evaluate robustness against variations in end users | Different technicians or annotators | |
Evaluate mitigation measures for robustness enhancement | Regularisation, data augmentation, noise addition, normalisation, resampling, domain adaptation |