The Universality principle states that a medical AI tool should be generalisable outside the controlled environment where it was built. Specifically, the AI tool should be able to generalise to new patients and new users (e.g. new clinicians), and when applicable, to new clinical sites. Depending on the intended radius of application, medical AI tools should be as interoperable and as transferable as possible, so they can benefit citizens and clinicians at scale.
To this end, four recommendations for Universality are defined in the FUTURE-AI framework. First, the AI developers should define the requirements for universality, i.e. the radius of application of their medical AI tool (e.g. clinical centres, countries, clinical settings), and accordingly anticipate any potential obstacles to universality, such as differences in clinical workflows, medical equipment or digital infrastructures (Universality 1). To enhance interoperability, development teams should favour the use of established community-defined standards (e.g. clinical definitions, medical ontologies, data annotations, technical standards) throughout the AI tool’s production lifetime (Universality 2). To enhance generalisability, the medical AI tool should be tested with external datasets and, when applicable, across multiple sites (Universality 3). Finally, medical AI tools should be evaluated for their local clinical validity, and if necessary, calibrated so they perform well given the local populations and local clinical workflows (Universality 4).
Recommendation | Description |
Universality 1
Define clinical settings |
At the design phase, the development team should specify the clinical settings in which the AI tool will be applied (e.g. primary healthcare centres, hospitals, home care, low vs. high-resource settings, one or multiple countries), and anticipate potential obstacles to universality (e.g. differences in clinical definitions, medical equipment or IT infrastructures across settings). |
Universality 2
Use existing standards |
To ensure the quality and interoperability of the AI tool, it should be developed based on existing community-defined standards. These may include clinical definitions, medical ontologies (e.g. SNOMED CT, OMOP), interface standards (e.g. DICOM, FHIR HL7), data annotations, evaluation criteria, and technical standards (e.g. IEEE or ISO). |
Universality 3
Evaluate using external data |
To assess generalisability, technical validation of the AI tools should be performed with external datasets that are distinct from those used for training. These may include reference or benchmarking datasets which are representative for the task in question (i.e. approximating the expected real-world variations). Except for AI tools intended for single centres, the clinical evaluation studies should be performed at multiple sites to assess performance and interoperability across clinical workflows. If the tool’s generalisability is limited, mitigation measures (e.g. transfer learning or domain adaptation) should be considered, applied and tested. |
Universality 4
Evaluate local clinical validity |
Clinical settings vary in many aspects, such as populations, equipment, clinical workflows, and end-users. Hence to ensure trust at each site, the AI tools should be evaluated for their local clinical validity. In particular, the AI tool should fit the local clinical workflows and perform well on the local populations. If the performance is decreased when evaluated locally, re-calibration of the AI model should be performed (e.g., through model fine-tunning or retraining). |