The Traceability principle states that medical AI tools should be developed together with mechanisms for documenting and monitoring the complete trajectory of the AI tool, from development and validation to deployment and usage. This will increase transparency and accountability by providing detailed and continuous information on the AI tools during their lifetime to clinicians, healthcare organisations, citizens and patients, AI developers and relevant authorities. AI traceability will also enable continuous auditing of AI models, identify risks and limitations, and update the AI models when needed.
To this end, six recommendations for Traceability are defined in the FUTURE-AI framework. First, a system for risk management should be implemented throughout the AI lifecycle, including risk identification, assessment, mitigation, monitoring and reporting (Traceability 1). To increase transparency, relevant documentation should be provided for the stakeholder groups of interest, including AI information leaflets, technical documentation, and/or scientific publications (Traceability 2). After deployment, continuous quality control of AI inputs and outputs should be implemented, to identify inconsistent input data and implausible AI outputs (e.g. using uncertainty estimation), and to implement necessary model updates (Traceability 3). Furthermore, periodic auditing and updating of AI tools should be implemented (e.g. yearly) to detect and address any potential issue or performance degradation (Traceability 4). To increase traceability and accountability, an AI logging system should be implemented to keep a record of the usage of the AI tool, including for instance, user actions, accessed and used datasets, and identified issues (Traceability 5). Finally, mechanisms for human oversight and governance should be implemented, to enable selected users to flag AI errors or risks, overrule AI decisions, use human judgement instead, assign roles and responsibilities, and maintain the AI system over time (Traceability 6).
Implement risk management
|Throughout the AI tool’s lifecycle, the development team should analyse potential risks, assess each risk’s likelihood, effects and risk-benefit balance, define risk mitigation measures, monitor the risks and mitigations continuously, and maintain a risk management file. The risks may include those explicitly covered by the FUTURE-AI guiding principles (e.g. bias, harm), but also application-specific risks. Other risks to consider include human factors that may lead to misuse of the AI tool (e.g. not following the instructions, receiving insufficient training), application of the AI tool to individuals who are not within the target population, use of the tool by others than the target end-users (e.g. technician instead of physician), hardware failure, incorrect data annotations or input values, and adversarial attacks. Mitigation measures may include warnings to the users, system shutdown, re-processing of the input data, the acquisition of new input data, or the use of an alternative procedure or human judgement only.
|To increase transparency, traceability, and accountability, adequate documentation should be created and maintained for the AI tool, which may include (i) an AI information leaflet to inform citizens and healthcare professionals about the tool’s intended use, risks (e.g. biases) and instructions for use; (ii) a technical documentto inform AI developers, health organisations and regulators about the AI model’s properties (e.g. hyperparameters), training and testing data, evaluation criteria and results, biases and other limitations, and periodic audits and updates; (iii) a publication based on existing AI reporting standards, and (iv) a risk management file (see Traceability 1).
Implement continuous quality control
|The AI tool should be developed and deployed with mechanisms for continuous monitoring and quality control of the AI inputs and outputs, such as to identify missing or out-of-range input variables, inconsistent data formats or units, incorrect annotations or data pre-processing, and erroneous or implausible AI outputs. For quality control of the AI decisions, uncertainty estimates should be provided (and calibrated) to inform the end-users on the degree of confidence in the results. Finally, when necessary, model updates should be applied to address any identified limitations and enhance the AI models over time.
Implement periodic auditing
|The AI tool should be developed and deployed with a configurable system for periodic auditing, which should define site-specific datasets and timelines for periodic evaluations (e.g. every year). The periodic auditing should enable the identification of data or concept drifts, newly occurring biases, performance degradation or changes in the decision making of the end-users. Accordingly, necessary updates to the AI models or AI tools should be applied.
Implement AI logging
|To increase traceability and accountability, an AI logging system should be implemented to trace the user’s main actions in a privacy-preserving manner, specify the data that is accessed and used, record the AI predictions and clinical decisions, and log any encountered issues. Time-series statistics and visualisations should be used to inspect the usage of the AI tool over time.
Implement human oversight
|Given the high-stake nature of medical AI, human oversight is essential and increasingly required by policy makers and regulators. Human-AI interfaces and human-in-the-loop mechanisms should be designed and implemented to perform specific quality checks (e.g. to flag biases, errors or implausible explanations), and to overrule the AI decisions when necessary. Furthermore, governance of the AI tool in the health organisation should be specified, including roles and responsibilities for performing risk management, periodic auditing, human oversight, and AI tool maintenance.