Robustness

The Robustness principle refers to the ability of a medical AI tool to maintain its performance and accuracy under expected or unexpected variations in the input data. Existing research has shown that even small, imperceptible variations in the input data may lead AI models into incorrect decisions. Biomedical and health data can be subject to significant variations in the real world (both expected and unexpected), which can affect the performance of AI tools. Hence, it is important that medical AI tools are designed and developed to be robust against real-world variations, as well as evaluated and optimised accordingly.

To this end, three recommendations for Robustness are defined in the FUTURE-AI framework. At the design phase, the development team should first define robustness requirements for the medical AI application in question, by making an inventory of the potential sources of variation e.g. data-, equipment-, clinician-, patient- and centre-related variations (Robustness 1). Accordingly, the training datasets should be carefully selected, analysed and enriched to reflect these real-world variations as much as possible (Robustness 2). Subsequently, the robustness of the AI tool, as well as measures to enhance robustness, should be iteratively evaluated under conditions that reflect the variations of real-world clinical practice (Robustness 3).

Recommendation	Description
Robustness 1 Define sources of data variation	At the design phase, an inventory should be made of the application-specific sources of variation that may impact the AI tool’s robustness in the real world. These may include differences in equipment, technical fault of a machine, data heterogeneities during data acquisition or annotation, and/or adversarial attacks.
Robustness 2 Train with representative data	Clinicians, citizens and other stakeholders are more likely to trust the AI tool if it is trained on data that adequately represents the variations encountered in real-world clinical practice. Hence, the training datasets should be carefully selected, analysed and enriched according to the sources of variation identified at the design phase (see Robustness 1).
Robustness 3 Evaluate & optimise robustness	Evaluation studies should be implemented to evaluate the AI tool’s robustness (including stress tests and repeatability tests), by considering all potential sources of variation (see Robustness 1), such as data-, equipment-, clinician-, patient- and centre-related variations. Depending on the results, mitigation measures should be implemented to optimise the robustness of the AI model, such as regularisation, data augmentation, data harmonisation, or domain adaptation.