The traceability principle states that medical AI algorithms should be developed together with mechanisms for documenting and monitoring the whole development lifecycle as well as the functioning of the AI tools in their environment. This will enable to increase transparency by providing detailed and complete information such as on the datasets used to train and evaluate the algorithms, the associated protocols and parameters, the variables included in the AI models, as well as any potential biases and limitations. Furthermore, continuous monitoring of the AI tools after their deployment in real-world practice is of paramount importance to evaluate performance and limitations over time, and hence to identify potential sources of error or drift from the training settings or previous states (e.g. due to a change in image quality or characteristics). For Traceability in AI for medicine and healthcare, we propose the following recommendations:

  1. Model passport: A model passport should be defined to accompany each AI algorithm during its entire lifecycle as a standardised up-to-date single source of information on the clinical scope, data provenance, model properties and evaluation, usage and maintenance.
  2. Model scope: A precise definition of the model’s scope should be agreed upon with the radiologists and/or the clinicians, then described in terms of model’s intended use, clinical scenarios, model outputs, key assumptions and any known limitations.
  3. Data provenance: The training and testing datasets should be documented, including clinical protocols and medical devices, sampled populations (groups, sample size), related data types (genetics, clinical, pathology) and data distributions/statistics (e.g. per group).
  4. Data curation: The data curation and pre-processing pipeline should be fully documented, such as to describe the manual annotation process, the clinical annotations/references, image pre-processing steps (e.g. normalisation or cropping) and data augmentations.
  5. Model properties: All details of the algorithm’s training and properties should be document in the model passport, including learning methods/libraries, initialisations (e.g. pre-trained model), training datasets, as well as model details and hyper-parameters.
  6. Federated learning: When using privacy-preserving federated learning, the AI developers should document the location of the nodes, the data properties and compute servers, aggregation framework, as well as potential biases/instabilities.
  7. Model evaluation: The evaluation of the AI algorithm should be described in detail, including evaluation datasets, criteria and metrics, evaluation process (e.g. cross-validation), obtained results (robustness, usability scores, confidence/uncertainty, etc) and identified limitations and biases.
  8. Monitoring tool: After clinical deployment, the AI solution should be equipped with a monitoring tool that tracks its usage and functioning, controls for data quality (image or segmentation quality), detects failures and extreme cases, and tracks its evolution over time (incl. updates).
  9. Continuous evaluation: Mechanisms should be in place to enable continuous evaluation and periodic audits for estimating performance over time and detecting potential data drifts or performance degradations (e.g, due to changes in medical devices).
  10. Roles and accountability: The roles of the stakeholders involved in the algorithm’s lifecycle should be clearly defined (e.g. model developers, model owners, radiologists, specialists, other end-users) to ensure adequate accountability, enforce codes of conduct, and minimise error.