The traceability principle states that medical AI algorithms should be developed together with mechanisms for documenting and monitoring the whole development lifecycle as well as the functioning of the AI tools in their environment. This will enable to increase transparency by providing detailed and complete information such as on the datasets used to train and evaluate the algorithms, the associated protocols and parameters, the variables included in the AI models, as well as any potential biases and limitations. Furthermore, continuous monitoring of the AI tools after their deployment in real-world practice is of paramount importance to evaluate performance and limitations over time, and hence to identify potential sources of error or drift from the training settings or previous states (e.g. due to a change in image quality or characteristics). For Traceability in AI for medicine and healthcare, we propose the following recommendations:

  1. Principled documentation: Key information about the AI tool should be specified and documented (e.g. intended use, target population, model properties, datasets, evaluation, limitations, usage and maintenance)
  2. Monitoring tool: After clinical deployment, a monitoring tool should track the AI solution (e.g. for AI errors, input data quality, usage statistics) automatically and comprehensively. Similar to a flight recorder for planes, this will allow to analyse AI errors over time, visualise distributions of errors and detect deviations and biases.
  3. Continuous evaluation: After deployment, the AI tools should be periodically evaluated (e.g. for data/concept drifts) and accordingly adapted. The health organisations should define the frequency of the periodic tests/audits (e.g. every 6 or 12 months) to identify potential degradation and apply corrections (e.g. model fine-tunning). A centre-specific representative datasets for testing and possibly re-calibrating the AI tools at each period is to be defined.
  4. Continuous evaluation: Mechanisms should be in place to enable continuous evaluation and periodic audits for estimating performance over time and detecting potential data drifts or performance degradations (e.g, due to changes in medical devices).
  5. Reporting Giudelines: AI studies should be reported using existing AI reporting guidelines such as TRIPOD-AI, CLAIM, RQS.