No Stages Title Question(s)
Fairness  
1 1,2 Multi-disciplinarity Did you design your AI algorithm with diverse team of stakeholders? Did you collect requirements from a diverse set of end-users?
2 1,2,4 Definition of fairness Did you define fairness for your specific imaging application? Did you ask clinicians about hidden sources of data imbalance?
3 4 Metadata labelling At data collection, did you record key metadata variables on individuals and groups?
4 5,6 Estimation of data (im)balance Did you inspect and ensure the diversity of the training and evaluation data?
5 4,5,6 Multi-centre datasets Did you train and assess your algorithm for multi-centre imaging samples? Does your algorithm maintain accuracy across radiology units and geographical locations? In particular, is it applicable in centres with reduced imaging quality (e.g. in resource-limited countries)?
6 5,6,7 Transparency of fairness Did you document the data characteristics, including existing (im)balances?
7 6 Fairness evaluation and metrics Did you thoroughly evaluate the fairness of your AI algorithm? Did you use a suitable dataset and dedicated metrics? Do you have a mechanism for continuous evaluation of your algorithm’s fairness?
8 5,6 Fairness corrective measures If you identified sources of bias in the data, did you implement mitigation measures?
9 6 Continuous monitoring of fairness Do you have a mechanism for continuous testing of your algorithm’s fairness over its lifetime?
10 6 Information and training on fairness Did you prepare information and training material for the radiologists and clinicians, to inform on potential biases and maximise fairness during the algorithm’s use?
Universality
11 1,2 Definition of clinical task Did you use a universal definition of the clinical task?
12 2,3,5 Software standardisation Did you design and implement the imaging AI solution using proven libraries and framework standards that readily allow for extension and maintenance?
13 3,4 Image annotation standardisation Did you annotate your dataset in an objective, reproducible and standardised way?
14 3,4 Variation of quantified biomarkers Are the methods you used for feature quantification compliant with consensus provided by standards initiatives?
15 6 Evaluation metric selection and reporting Did you use universal, transparent, comparable, and reproducible criteria and metrics for your model’s performance assessment?
16 6 Reference dataset evaluation Did you evaluate your model on at least one open access benchmark dataset that is representative of your model’s task and expected real-world data exposure after deployment?
17 6 Reporting standards compliance Did you adhere to a standardised reporting guideline when assessing and communicating the design and findings of your study?
Traceability
18 1,2,3 Model scope Did you agree with the clinicians/radiologists a precise definition of the model’s scope? Did you precisely define the model’s intended use, the input imaging modalities, the necessary steps to provide the input to the AI model, the reference ground truth if any, the intended output and the use case scenarios? Did you check for any known limitations of the diagnostic/prognostic p problem faced?
19 4 Data provenance Did you prepare a complete documentation of the imaging dataset you used? Did you include the relevant DICOM tags? Did you structurally list the related clinical/genomic/pathology data?
20 4,5 Data localization and distribution Did you annotate the location of data over the network? Did you analyse dataset statistics with respect to the capability to represent the phenomenon at hand over the various clinical sites? Did you quantify missing values and any gaps or known biases?
21 4,5 Data-preparation documentation Did you keep track in a structured manner of the whole pre-processing pipeline of imaging and related data? Did you specify input/output, nature, prerequisites and requirements of your pre-processing and data preparation methods?
22 3,4,5 Specification of clinical references Did you include a clear description of the radiological/clinical standards or biomarkers used as reference? Did you include a complete record of the segmentation process, if any?
23 5 Training recording Did you record the details of the training process? Did you included a careful description of imaging and non-imaging features?
24 6 Validation documentation Did you document your validation process and the model selection approach agreed with clinicians?
25 5,6,7 Final model details Did you detail the characteristics of the final model released?
26 7 Traceability tool Did you equipped your model with traceability tool? Did you manage the dynamics of your model?
27 1-7 AI Model passport Did you prepare full metadata record of all the pieces of information of your model?
28 5,6,7 Accountability and risk specification Did you make a risk analysis for your model? Did you prepare tool to keep track of the usage of your model?
Usability
29 1 User engagement Did you engage users in the design and development of the AI tool?
30 2 Requirements definition Did you compile end-user requirements?
31 3 User interfaces Did you design appropriate user interfaces?
32 5 Usable explainability Did you implement any type of explainability that will be usable and actionable by the radiologist?
33 6 Usability testing Did you design an appropriate usability study?
34 6 In-silico validation Did you consider an in-silico validation of usability?
35 6 Usability metrics Did you define the appropriate usability metrics for evaluation?
36 6,7 Clinical Integration Did you evaluate the usability of your tool after integration in the clinical workflows of the clinical sites?
37 6,7 Training material Did you provide end- -users with resources to learn to adopt and appropriately work with your tool?
38 5,6,7 Usability monitoring Did you implement monitoring mechanisms to assess changes in user needs and re-evaluate the appropriateness of the AI solution though time?
Robustness
39 4 Image harmonisation Did you implement any image harmonisation solutions to account for image heterogeneity?
40 4 Feature harmonisation Did you perform any feature harmonisation study before developing your predictive models? Did you assess, minimise, and report the variation across features?
41 4 Intra and inter-observer variability Did you perform any intra- and inter-observer annotation studies?
42 4 Quality control Did you use any quality control tools to identify abnormal deviations or artifacts in images?
43 4 Phantoms Did you use phantoms to harmonise patient images and/or measurements?
44 4,5 Data augmentation for model training Did you use data augmentation techniques to improve training of AI models?
45 5,6 Training on heterogeneous data Did you train and evaluate your tools with heterogeneous datasets from multiple clinical centres, vendors, and protocols?
46 5,6 Uncertainty estimation Did you report any kind of model uncertainty beyond the classifier’s discriminant or confidence score?
47 6,7 Equity in accessibility Did you optimise your tool with images from resource-limite settings in ow-to-middle countries?
Explainability
48 1,2 Clinical requirements on explainability Did you consult with the clinicians to determine which explainability methods suit them? Did you intuitively present the different explanation methods to the clinicians and did they develop a clear understanding of them?
49 1-5 Incorporation of clinical concepts Did you consider using clinical annotations and clinical concepts as parameters of the AI algorithms or neural networks to explicitly introduce a level of clinical interpretability?
50 1,3,5 Multiple explanation methods Did you explore multiple and complementary explainability methods?
51 6 Identifying explainable imaging biomarkers To increase clinical value, did you evaluate if the explainability methods enable to identify imaging features or structures that can serve as imaging biomarkers? Did you determine if the identified imaging biomarkers are previously known?
52 6 Quantitative evaluation of explainability Did you use some quantitative evaluation tests to determine if the explanations are robust and trustworthy?
53 6 Qualitative evaluation of explainability Did you perform some qualitative evaluation tests with clinicians?
54 6 Robustness of explainability against ad- versarial attacks Did you evaluate robustness to adversarial attacks, by assessing if the explanations remain consistent when the input images are subjected to small input perturbations and noise?