The design phase of FUTURE-AI framework focuses on early-stage planning and requirement gathering, encompassing 10 key recommendations that span from stakeholder engagement to risk management processes. Table 3 provides a comprehensive breakdown of these design phase recommendations, detailing specific operations and examples for each recommendation, including interdisciplinary stakeholder engagement (general 1), user requirements definition (usability 1), clinical settings specification (universality 1), data variation source identification (robustness 1), bias source definition (fairness 1), explainability needs assessment (explainability 1), ethical and social issue investigation (general 6 and 7), standards implementation (universality 2), and risk management process establishment (traceability 1).
Recommendations | Operations | Examples |
---|---|---|
Engage interdisciplinary stakeholders (general 1) | Identify all relevant stakeholders | Patients, GPs, nurses, ethicists, data managers |
Provide information on the AI tool and AI | Educational seminars, training materials, webinars | |
Set up communication channels with stakeholders | Regular group meetings, one-to-one interviews, virtual platform | |
Organise cocreation consensus meetings | One day cocreation workshop with n=15 multidisciplinary stakeholders | |
Use qualitative methods to gather feedback | Online surveys, focus groups, narrative interviews | |
Define intended use and user requirements (usability 1) | Define the clinical need and AI tool’s goal | Risk prediction, disease detection, image quantification |
Define the AI tool’s end users | Patients, cardiologists, radiologists, nurses | |
Define the AI model’s inputs | Symptoms, heart rate, blood pressure, ECG, image scan, genetic test | |
Define the AI tool’s functionalities and interfaces | Data upload, AI prediction, AI explainability, uncertainty estimation | |
Define requirements for human oversight | Visual quality control, manual corrections | |
Adjust user requirements for all end user subgroups | According to role, age group, digital literacy level | |
Define intended clinical settings and cross setting variations (universality 1) | Define the AI tool’s healthcare setting(s) | Primary care, hospital, remote care facility, home care |
Define the resources needed at each setting | Personnel (experience, digital literacy), medical equipment (eg, >1.5 T MRI scanner), IT infrastructure | |
Specify if the AI tool is intended for high end and/or low resource settings | Facilities with MRI scanners >1.5 T v low field MRIs (eg, 0.5 T), high end v low cost portable ultrasound | |
Identify all cross settings variations | Data formats, medical equipment, data protocols, IT infrastructure | |
Define sources of data heterogeneity (robustness 1) | Engage relevant stakeholders to assess data heterogeneity | Clinicians, technicians, data managers, IT managers, radiologists, device vendors |
Identify equipment related data variations | Differences in medical devices, manufacturers, calibrations, machine ranges (from low cost to high end) | |
Identify protocol related data variations | Differences in image sequences, data acquisition protocols, data annotation methods, sampling rates, preprocessing standards | |
Identify operator related data variations | Different in experience and proficiency, operator fatigue, subjective judgment, technique variability | |
Identify sources of artefacts and noises | Image noise, motion artefacts, signal dropout, sensor malfunction | |
Identify context specific data variations | Lower data quality acquisition in emergency units, during high patient volume times | |
Define any potential sources of bias (fairness 1) | Engage relevant stakeholders to define the sources of bias | Patients, clinicians, epidemiologists, ethicists, social carers |
Define standard attributes that might affect the AI tool’s fairness | Sex, age, socioeconomic status | |
Identify application specific sources of bias beyond standard attributes | Skin colour for skin cancer detection, breast density for breast cancer detection | |
Identify all possible human biases | Data labelling, data curation | |
Define the need and requirements for explainability with end users (explainability 1) | Engage end users to define explainability requirements | Clinicians, technicians, patients |
Specify if explainability is necessary | Not necessary for AI enabled image segmentation part, critical for AI enabled diagnosis | |
Specify the objectives of AI explainability (if it is needed) | Understanding AI model, aiding diagnostic reasoning, justifying treatment recommendations | |
Define suitable explainability approaches | Visual explanations, feature importance, counterfactuals | |
Adjust the design of the AI explanations for all end user subgroups | Heatmaps for clinicians, feature importance for patients | |
Investigate ethical issues (general 6) | Consult ethicists on ethical considerations | Ethicists specialised in medical AI and/or in the application domain (eg, paediatrics) |
Assess if the AI tool’s design is aligned with relevant ethical values | Right to autonomy, information, consent, confidentiality, equity | |
Identify application specific ethical issues | Ethical risks for a paediatric AI tool (eg, emotional impact on children) | |
Comply with local ethical AI frameworks | AI ethical guidelines from Europe, United Kingdom, United States, Canada, China, India, Japan, Australia, etc | |
Investigate social and environmental issues (general 7) | Investigate AI tool’s social and environmental impact | Workforce displacement, worsened working conditions and relations, deskilling, dehumanisation of care, reduced health literacy, increased carbon footprint, negative public perception |
Define mitigations to enhance the AI tool’s social and environmental impact | Interfaces for physician-patient communication, workforce training, educational programmes, energy efficient computing practices, public engagement initiatives | |
Optimise algorithms, energy efficiency | Develop and use energy efficient algorithms that minimise computational demands. Techniques like model pruning, quantisation, and edge computing can reduce the energy required for AI tasks | |
Promote responsible data usage | Focus on collecting and processing only the necessary amount of data. Implement federated learning techniques to minimise data transfers | |
Monitor and report the environmental impact of the AI tool | Regularly monitor and report on the environmental impact of AI systems used in healthcare, including energy usage, carbon emissions, and waste generation | |
Use community defined standards (universality 2) | Use a standard definition for the clinical task | Definition of heart failure by the American Academy of Cardiology |
Use a standard method for data labelling | BI-RADS for breast imaging | |
Use a standard ontology for the AI inputs | DICOM for imaging data, SNOMED for clinical data | |
Adopt technical standards | IEEE 2801-2022 for medical AI software | |
Use standard evaluation criteria | See Maier-Hein et al[21] for medical imaging applications, for fairness evaluation | |
Implement a risk management process (traceability 1) | Identify all possible clinical, technical, ethical, and societal risks | Bias against under-represented subgroups, limited generalisability to low resource facilities, data drift, lack of acceptance by end users, sensitivity to noisy inputs |
Identify all possible operational risks | Misuse of the AI tool (owing to insufficient training or not following the instructions), application of the AI tool outside of the target population (eg, individuals with implants), use of the tool by others than the target end users (eg, technician instead of physician), hardware failure, incorrect data annotations, adversarial attacks | |
Assess the likelihood of each risk | Very likely, likely, possible, rare | |
Assess the consequences of each risk | Patient harm, discrimination, lack of transparency, loss of autonomy, patient reidentification | |
Prioritise all the risks depending on their likelihood and consequences | Risk of bias (if no personal attributes are included in the model) v risk of patient reidentification (if personal attributes are collected) | |
Define mitigation measures to be applied during AI development | Data enhancement, data augmentation, bias correction techniques, domain adaptation, transfer learning, continuous learning | |
Define mitigation measures to be applied after deployment | Warnings to the users, system shutdown, reprocessing of the input data, acquisition of new input data, use of an alternative procedure, or human judgment only | |
Set up a mechanism to monitor and manage risks over time | Periodic risk assessment every six months | |
Create a comprehensive risk management file | Including all risks, their likelihood and consequences, risk mitigation measures, risk monitoring strategy |