The Ever-Evolving Regulatory Landscape Concerning Development and Clinical Application of Machine Intelligence: Practical Consequences for Spine Artificial Intelligence Research
Article information
Abstract
This paper analyzes the regulatory frameworks for artificial intelligence/machine learning AI/ML-enabled medical devices in the European Union (EU), the United States (US), and the Republic of Korea, with a focus on applications in spine surgery. The aim is to provide guidance for developers and researchers navigating regulatory pathways. A review of current literature, regulatory documents, and legislative frameworks was conducted. Key differences in regulatory bodies, risk classification, submission requirements, and approval pathways for AI/ML medical devices were examined in the EU, US, and Republic of Korea. The EU AI Act (2024) establishes a risk-based framework, requiring regulatory review based on device risk, with high-risk devices subject to stricter oversight. The US applies a more flexible approach, allowing multiple submission pathways and incorporating a focus on continuous learning. The Republic of Korea emphasizes possibilities of streamlined approval and with growing use of real-world data to support validation. Developers must ensure regulatory alignment early in the development process, focusing on key aspects like dataset quality, transparency, and continuous monitoring. Across all regions, the need for technical documentation, quality management systems, and bias mitigation are essential for approval. Developers are encouraged to adopt adaptable strategies to comply with evolving regulatory standards, ensuring models remain transparent, fair, and reliable. The EU’s comprehensive AI Act enforces stricter oversight, while the US and Korea offer more flexible pathways. Developers of spine surgery AI/ML devices must tailor development strategies to align with regional regulations, emphasizing transparent development, quality assurance, and postmarket monitoring to ensure approval success.
INTRODUCTION
Various fields of machine intelligence, including both artificial intelligence (AI) and machine learning (ML), have become increasingly prominent topics in medical research. AI-related publications in healthcare rose exponentially, with a 45.15% annual increase from 2014 to 2019 [1-4]. Clinical use followed suit, with 691 U.S. Food and Drug Administration (FDA)-approved devices by October 2023, most approved after 2019 [5]. This trend highlights the expanding role of AI in everyday medical practice, although its applications remain largely concentrated in radiology, with more limited adoption in specialties like spine surgery [5-7].
In parallel, regulatory authorities across multiple countries have been actively developing frameworks to address the challenges posed by AI/ML-enabled medical devices. Key concerns include accountability, ethics, safety, and data security, which are central to the regulatory discourse. These considerations complicate the regulatory landscape, as healthcare technologies must undergo rigorous evaluation to ensure safety and efficacy before clinical implementation is allowed. One of AI’s primary advantages is its ability to learn from real-world data and adapt over time; however, this also poses a challenge for regulatory authorities. Adaptive algorithms which evolve continuously (or simply the updating of algorithms), require novel regulatory approaches to maintain safety and effectiveness throughout their lifecycle. Additionally, the potential for algorithmic bias, which could negatively impact clinical outcomes, presents further complexity for regulators [4,8,9].
Regions have adopted varied regulatory approaches to balance innovation with patient safety in AI healthcare integration. The European Union (EU) has implemented the overarching AI Act, which regulates not only medical devices but also AI applications across other sectors. In contrast, the United States (US) and the Republic of Korea have adopted more targeted strategies, issuing specific guidelines for the submission and approval of medical devices. As AI continues to play an increasingly significant role in healthcare, effectively navigating these regulatory challenges will be essential for the seamless integration of these transformative technologies into routine clinical practice.
The purpose of this paper is to provide researchers and developers in the field of spine surgery with an understanding of the regulatory frameworks in 3 key regions (Europe, the US, and the Republic of Korea). This knowledge aims to help address regulatory challenges early in the research and development phase, facilitating smoother transitions to compliance and market approval.
EVOLUTION OF AI IN MEDICINE AND ITS REGULATORY LANDSCAPE
The Term “artificial intelligence,” first introduced by Turing in 1950, evolved into ML in 1952 and deep learning by the 2000s [10]. An early healthcare application was the CASNET model for glaucoma (1975) [11]. Notable advancements in AI for healthcare began in 1998 with ImageChecker for mammography and in 2004 with AI for retinal disease diagnosis [12]. In the fields of neurosurgery and neurology, advancements were slower, with the first computer-assisted segmentation of sectional brain images achieved in 2012 [12]. A major milestone occurred in 2018 when the FDA-approved Viz.AI, an AI-assisted clinical decision support system for stroke triage [12]. In July 2022, the first FDA-approved AI/ML-enabled device for spine surgery was introduced with Medtronic’s UNiD Spine Analyzer software [13]. This AI-powered platform was designed to assist surgeons in planning complex spinal procedures. The software aids in both planning and predicting outcomes of spine surgeries, utilizing an algorithm trained on data from 10,000 surgical cases [13].
Currently, spine surgery research is advancing rapidly with AI/ML-enabled devices in 4 key areas: preoperative planning and patient selection, intraoperative care, postoperative monitoring, and rehabilitation [14]. These technologies include imaging analysis, laboratory value interpretation, robotics, and predictive analytics [14]. Regulatory requirements differ markedly among these categories, with fully autonomous robotic surgical devices encountering greater challenges than supervised software for predicting surgical outcomes.
Areas currently generating significant interest in spine surgery research are AI-assisted preoperative planning and diagnostic imaging, as well as robotics. These fields are at different stages of development at present. In preoperative planning and diagnostic imaging, an example are AI tools for measuring the Cobb angle in scoliosis diagnosis. Devices such as CobbAngle Pro, a deep learning-based app for evaluating spinal x-rays and measuring angles, demonstrate how accessible technologies can assist medical professionals while leaving final decisions to the clinician [15]. These diagnostic tools are less regulated due to their supportive nature and potential for widespread accessibility. In contrast, intraoperative robotics falls into a highly regulated category due to its complexity and potential risks. Robotics shows significant potential but remains largely in the research phase [16]. Robotic devices are classified by autonomy levels, from level 0 (no autonomy) to level 5 (full autonomy) [17]. AI integration begins at level 2, where devices perform tasks autonomously under human supervision [16,17]. Currently, 5 FDA-approved devices are on the market: Mazor X Stealth Edition, ROSA Spine, ExcelsiusGPS, Renaissance, and SpineAssist, but are limited to autonomy levels 0–2. Higher autonomy devices are under development but not yet approved. Level 4 systems, capable of making medical decisions under physician supervision, represent the forefront of innovation but face regulatory and technical challenges [16].
AI regulation is a relatively recent development. The first comprehensive national strategy to address AI was introduced by Canada in 2017, followed by the EU’s Coordination Plan on Artificial Intelligence in 2018. In 2019, both the US and the Republic of Korea launched their respective American AI Initiative and National Strategy for Artificial Intelligence. The first legislative attempt to regulate AI occurred in 2016 with the EU’s General Data Protection Regulation (GDPR) [18]. Although GDPR primarily focuses on data protection, it includes provisions on automated decision-making and AI’s use of personal data [19]. This regulation set the stage for rapid developments in AI-related legislation globally. To date, 148 AI-related bills have been enacted across 32 countries since 2016 [20]. A major regulatory effort is the EU AI Act, proposed in 2021, which is set to become the first comprehensive regulatory framework for AI when it is implemented in 2024 [20].
REGULATIONS FOR ML/AI-RELATED MEDICAL DEVICES
In this section, we will present the approaches taken by 3 different regions to regulate the approval of AI/ML-related medical devices. We will highlight some of the characteristics, limitations, and practical applications of each regulatory framework.
1. European Union
In July 2024, the EU implemented the EU AI Act (EUAIA), a landmark bill designed to regulate the use of AI across multiple sectors, including healthcare. This regulation holds the same legal authority as other pivotal EU healthcare regulations, such as the EU Medical Device Regulation (MDR), the In Vitro Diagnostic Medical Device Regulation (IVDR), the Clinical Trials Regulation, and the GDPR. The EUAIA is a significant step towards addressing concerns surrounding the development and application of AI technologies, while also promoting innovation in this rapidly evolving field [21].
The EUAIA introduces a risk-based classification system for AI technologies, from minimal to high-risk categories, with healthcare AI systems represented at all levels. Diagnostic and therapeutic AI models, given their direct impact on patient care, are classified as high-risk, while systems used for administrative purposes, fall under minimal risk. The regulatory obligations vary according to the risk level, with higher-risk systems subject to stricter requirements for compliance [22].
Medical devices in general are categorized into different risk classes (I, IIa, IIb, III) under the MDR. AI-driven devices are classified as class IIa or higher, automatically designating them as high-risk under the EUAIA. High-risk AI/ML devices require Conformité Européene (CE) marking via third-party assessment to prove safety and performance [23].
One of the key challenges with the introduction of the EUAIA is the substantial increase in the documentation burden for AI developers. For instance, providers must submit a “technical documentation” file that mirrors, in part, the requirements already established under the EU MDR. This comprehensive file must be submitted prior to the device being placed on the market or put into service. The documentation must detail design specifications, system architecture, training methodologies, and validation procedures [21].
While the technical documentation required by the EUAIA is distinct from the MDR documentation, the Act allows for some alignment in the quality management system (QMS) requirements. For example: developers may be able to align QMS processes required by the EUAIA with those required by the MDR or IVDR, offering some relief in the application process. This integration aims to ease the regulatory burden while maintaining high safety and performance standards [21].
An example of an AI/ML-enabled device that has been authorized in the EU is CoLumbo, an AI-powered software designed to detect various pathologies from magnetic resonance imaging (MRI) images, including disc herniation (protrusion/extrusion modifiers), central spinal stenosis, Shizas and Lee morphological grading, decreased vertebral body height, reduced disc height, hypo/hyperlordosis, nerve root impingement, and listhesis [24]. The product obtained CE marking in 2021 and has been available on the market since [25]. It was trained on a diverse dataset of patients from the EU using a neural network to develop its diagnostic model [24]. Notably, CoLumbo was approved before the implementation of the EUAIA and thus adhered to different regulatory frameworks. Under the MDR, CoLumbo is classified as a class IIa device and underwent a conformity assessment conducted by a third-party regulatory body authorized by an EU member state. This process was required to obtain CE marking, demonstrating the device’s safety and effectiveness for market use [24]. Had the EUAIA been in place at the time of approval, the device would have been categorized as high-risk due to its role as a diagnostic tool. Consequently, CoLumbo would have been required to submit additional technical documentation specific to AI systems. This documentation would include comprehensive details on its neural network architecture, training parameters (e.g., batch size, learning rate, epochs), hardware specifications, and evaluation metrics to demonstrate its safety, reliability, and performance. The device’s CE marking approval defines its intended use: “The purpose of the device is to assist in reading MRI images of the lumbar spine by providing information on detected pathologies and abnormalities, as well as reducing the time required to write reports. The results are reviewed and confirmed by a radiologist.” [24] This underscores the critical role of human supervision, a requirement emphasized both in the MDR and the EUAIA, to ensure the accuracy and reliability of the device’s outputs.
Despite the Act’s objective of promoting safety, effectiveness, quality, and transparency in AI applications in healthcare, there are concerns regarding the potential for regulatory overlap. The multiple layers of requirements from different regulatory bodies could create confusion and hurdles for developers, especially those new to the field. A study conducted in the field of orthopedics showed that already the new MDR applicable since May 2021 poses important challenges for businesses trying to get approval in the EU, and the addition of the EUAIA could pose a significant obstacle [26]. Additionally, critics have highlighted the Act’s lack of flexibility in addressing the fast-paced evolution of AI. Such a comprehensive, overarching legislative framework may struggle to keep pace with technological advancements. However, it is important to recognize that the EUAIA is a pioneering attempt to regulate AI in a pragmatic way, serving as a critical first step in addressing the complex challenges posed by AI in healthcare [21].
The implementation of the EUAIA marks a significant moment for AI regulation, setting the stage for future developments and iterations that will likely refine and expand the scope of AI governance in the coming years.
2. United States
In the US, all medical devices, including those with AI or ML capabilities, must receive FDA approval. However, there is no specific legislation regulating AI/ML software in healthcare, with oversight falling to individual agencies. The FDA requires that AI/ML-enabled devices undergo the same approval processes as traditional medical devices [5].
To address the regulatory challenges associated with AI/ML in healthcare, the FDA has issued several key guidelines and documents aimed at enhancing oversight and ensuring the safe implementation of these technologies. In January 2021, it introduced the AI/ML Software as a Medical Device Action Plan, followed by the Good Machine Learning Practice for Medical Device Development: Guiding Principles in October 2021. More recently, in October 2023, the Predetermined Change Control Plans for Machine Learning-Enabled Medical Devices: Guiding Principles was issued, followed by Transparency for Machine Learning-Enabled Medical Devices: Guiding Principles in June 2024, and the Artificial Intelligence and Medical Products: How Center for Biologics Evaluation and Research, Center for Drug Evaluation and Research, Center for Devices and Radiological Health, and Office of Combination Products are Working Together in March 2024 [27].
Similar to the EU guidelines, the FDA’s recent documents do not introduce a specific submission pathway tailored for AI/ML-enabled medical devices. Instead, these devices must follow 1 of 3 established regulatory pathways: de novo review, premarket approval, or premarket notification through the 510(k) process. Notably, a study found that most AI developers tend to favor the 510(k) submission process, considering it the most efficient and practical option for bringing AI/ML-enabled devices to market. This preference is likely due to the 510(k)’s emphasis on demonstrating substantial equivalence to existing approved devices, making it a more straightforward pathway for innovative technologies [5].
The regulatory process in the US after submission is similar to that under the EUAIA, but with some notable differences. One key distinction lies in the rigor and detailed prerequisites for developers in the approval phase. For instance, the FDA addresses the challenge of continuous learning from the real world in AI systems through its Predetermined Change Control Plans for Machine Learning-Enabled Medical Devices: Guiding Principles. Developers must create a plan detailing which aspects of an AI algorithm can change postmarket and how such changes are expected to occur after the system collects and adapts to realworld data. This plan ensures that any updates or modifications maintain the system’s safety and effectiveness. These changes may include performance improvements or expanding the algorithm’s applicability to new patient populations for which initial evidence was limited. The FDA provides general guidance that postmarket reporting should ensure the safety and effectiveness of adaptive AI systems, without specifying detailed requirements. In contrast, the EU regulator requires manufacturers to implement a more comprehensive postmarket monitoring system. This system must actively and systematically collect, document, and analyze performance data from users and other sources throughout the AI-enabled medical device’s lifecycle, with the aim of efficiently addressing any potential risks that arise [8].
The implementation of the above-mentioned plan aims to address the challenges surrounding generative AI. Generative AI refers to a class of AI models designed to generate new data or content, such as text, images, or simulations, by learning patterns from existing datasets. While widely researched in the medical field, it has yet to be integrated into clinical practice due to the inherent complexity of regulating adaptive algorithms that continuously evolve, posing significant legal and ethical challenges [28]. In November 2024, the FDA published a discussion paper (DHAC Executive Summary - Total Product Lifecycle Considerations for Generative AI-Enabled Devices) [28] on the regulation of generative AI in healthcare. While it does not provide specific guidelines, it highlights critical considerations for developers, including the need for transparency and explainability, robust premarket performance evaluation, and ongoing postmarket surveillance to detect and address model drift. Additionally, the paper emphasizes mitigating bias, ensuring data security, and managing the lifecycle of GenAI-enabled devices through a Total Product Life Cycle (TPLC) approach [28].
Another area of divergence is how algorithmic bias is addressed. The FDA’s AI/ML Software as a Medical Device Action Plan only briefly mentions that AI systems must be suitable for racially and ethnically diverse populations, without strict rules on training data selection. In contrast, the EU mandates that AI systems be trained on data from European populations, imposing limitations on non-European systems [8].
For example, the CoLumbo device, as previously mentioned, is also FDA-approved via the 510(k) pathway. To secure approval in both the EU and the US, the device required 2 separate training datasets: one derived from a patient population in the EU and the other from the US. Although the FDA does not mandate the exclusive use of a US-based population in the training process under its AI/ML Software as a Medical Device Action Plan, the CoLumbo software included a dataset of US patients to streamline the approval process and minimize the risk of bias in the training data [24].
Overall, the FDA’s approach to AI implementation in healthcare is less stringent and more flexible compared to Europe’s. This may lead to faster development and adoption of AI systems in the US Additionally, the absence of a broad legislative framework allows for greater adaptability as the field of AI evolves [8].
3. Republic of Korea
The regulation of AI in the Republic of Korea’s medical field is overseen by the Personal Information Protection Commission, the Ministry of Food and Drug Safety (MFDS), and the Ministry of Health and Welfare. In 2022, the MFDS released 2 key documents: the Guidance on the Review and Approval of Artificial Intelligence (AI)-based Medical Devices and the Guidance on Clinical Trials Design of Artificial Intelligence (AI)-based Medical Devices [29,30]. These guidelines can be compared to similar frameworks published by the FDA and they represent the latest regulations specifically addressing the use of AI in medical devices [31]. After 2022, general guidelines for medical devices were updated to incorporate regulations for AI/ML devices [32]. However, no additional specific guidelines have been published since then [31,32]. Additionally, the Ministry of Health and Welfare and the Health Insurance Review & Assessment Service have published guidelines that are particularly relevant to the unique healthcare insurance system in the Republic of Korea [33,34].
In the Republic of Korea, the national health insurance system, which mandates a structured reimbursement approach, indirectly shapes the integration of AI in healthcare. AI-based medical devices and software, rather than being reimbursed for simply streamlining tasks or enhancing efficiency, are evaluated on their demonstrated patient benefit beyond current treatments. Reimbursement eligibility for these technologies considers clinical outcomes and improvements in patient care, encouraging AI applications that provide added value to medical practice [33,34].
AI-enabled devices used in healthcare in the Republic of Korea must comply with regulations established by the MFDS. Developers seeking approval for AI/ML-enabled medical devices must follow the standard submission process, as no dedicated or preferential pathway exists for AI-based systems. However, the approval process can be expedited for products deemed equivalent to previously approved AI/ML medical software. In such cases, a technical document outlining the new software’s characteristics is sufficient. Equivalence is determined by comparing factors such as intended use, disease classification, specific conditions, training data, and ML methods. If significant differences are identified, clinical trial results must be submitted for evaluation [29]. In February 2024, the MFDS introduced new regulations allowing lower-risk medical software to be submitted without a clinical trial. Additionally, clinical trials for AI-based medical devices can now be conducted at non-MFDS-approved locations, further streamlining the approval process [35]. The discussion on Real World Evidence, clinical evidence derived from real-world data, such as electronic health records and patient registries, to evaluate the safety, effectiveness, and outcomes of medical products, has gained wider acceptance for the review of AI/ML-based medical devices, particularly following the July 2023 revision of the Guidance on Review and Approval for Real World Evidence [36].
As with EU and US regulations, detailed documentation is required for submission in the Republic of Korea. Manufacturers of AI medical devices must provide comprehensive technical documentation, including specifications for the cloud computing environment, cybersecurity measures, and algorithm operation principles. Submissions must also include validation and verification reports outlining test data, methods, and standards, as well as cybersecurity risk management plans to address potential vulnerabilities. Clinical validation is mandatory and can be conducted through either prospective or retrospective studies to ensure the device’s safety and efficacy [37].
Unlike the guidelines in the EU and the US, Republic of Korea does not classify AI/ML technologies in the medical field as “medical devices” due to the criteria outlined in the Medical Devices Act issued by the MFDS. Instead, these technologies are categorized as software, which carries different legal implications. The software classification is divided into 5 categories: software that supports administrative tasks in healthcare organizations, software related to sports, leisure, and wellness, software intended for education or research, software for managing health records, and software that assists healthcare providers in organizing or tracking patient data and accessing medical information. The fifth category, which includes clinical decision support and diagnostic software, is the most relevant to AI in healthcare and are classified as AI/ML-enabled medical device. According to the guidelines, clinical decisions cannot be made solely based on the software’s recommendations; clinician involvement is mandatory to ensure patient safety and care. The classification of software by Korean regulatory authorities can be compared to those outlined in the EUAIA [29,37].
An example of a device developed and approved in the Republic of Korea is SwiftMR, a deep learning software created by AIRS Medical to enhance and shorten the time required for MRI acquisitions [38]. Instead of the lengthy process traditionally needed to capture MRI images, this software uses deep learning to enhance images obtained in a shorter imaging time [38,39]. To achieve this, the deep learning program was trained on 130,000 MRI scans from the Republic of Korea [38]. SwiftMR is particularly useful for spinal MRI imaging and has already seen application in this area. The device operates as a fully trained program and does not adapt dynamically based on real-world input data. While detailed information about the approval process for this device in the Republic of Korea is not available, it is understood that the submission process was similar to that of the FDA, due to a partnership between Korea’s MFDS and the FDA [35,36].
Differences and similarities can be noticed in the 3 regions (Table 1), overall, the FDA’s and the MFDS’s approach to AI implementation in healthcare is less stringent and more flexible compared to Europe’s. This may lead to faster development and adoption of AI systems in the US and the Republic of Korea. Additionally, the absence of a broad legislative framework allows for greater adaptability as the field of AI evolves [8].
PRACTICAL IMPLICATIONS FOR DEVELOPERS OF AI/ML-ENABLED MEDICAL DEVICES
In this section, we will examine how the regulations currently in place impact the research and development of AI/ML-enabled devices in spine surgery and how these regulations should be addressed in the early stages of development.
An ideal AI training dataset may include spine MRIs, x-rays, and perioperative data. Under EUAIA, datasets must represent the European population, while datasets in the US and Korea must reflect diversity in age, sex, and ethnicity. A diagnostic tool for spine deformities, for example, trained exclusively on older individuals (e.g., those aged 60 years and above), would not be approved for use across all age groups. In such cases, either the target population for the device must be redefined, or a new, more inclusive training dataset should be created. The dataset must be of high quality, comprehensive, and available for inspection by regulators. Its size should be determined by the developers based on the model’s requirements; however, larger datasets generally yield better performance and reduce the risk of bias in the algorithm. A well-constructed dataset is fundamental to achieving optimal model performance and minimizing bias. Currently, in all 3 regions there are no specific regulations regarding the size or quality of datasets. It is the responsibility of auditors to assess these aspects and ensure the dataset meets the necessary standards for the intended application. However, as mentioned earlier, approval in the EU requires that the dataset be composed of EU patients to ensure regional relevance and compliance with regulatory standards.
If the dataset necessitates a labeling or segmenting process this should be done from a qualified medical professional. The labeling of radiological images and the segmentation should be done by a board-certified radiologist or surgeon to obtain the best outcome.
For the approval of AI/ML-enabled devices, developers must create a detailed technical document that thoroughly outlines the functionality and characteristics of the AI/ML-powered software. Comprehensive understanding and documentation during the development phase are critical to facilitate the regulatory approval process. Currently, some of the most commonly used networks in AI/ML-enabled devices for spine surgery include decision tree learning, support vector machines, artificial neural networks, convolutional neural networks, and generative adversarial networks [40]. Despite their effectiveness, these neural networks are still considered “black box” algorithms, meaning the internal processes during model training are not fully understood. This lack of transparency poses significant challenges in ensuring the quality, safety, and legal oversight required for comprehensive regulatory control. One approach to address this challenge is the implementation of a human-in-the-loop training process, which allows for partial human supervision and oversight of the training process. When this oversight is conducted by a medical professional, it ensures the quality and safety of the model, potentially facilitating regulatory approval. Human supervision can help identify errors, correct misclassifications, and ensure that the model is learning appropriately.
Quality standards, including well-defined troubleshooting and management plans, must be extensive and precise to ensure the safe use of the product. Given ongoing challenges related to transparency and privacy in AI/ML, it is crucial to establish a robust plan for assessing the product’s quality. Regulatory authorities already enforce high standards for traditional medical devices, but these standards are even more stringent for AI and ML systems. For example, the submission of a QMS for AI/ML-enabled medical devices in the EU does not need to be entirely separate from the QMS for traditional medical devices. Both systems rely on International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) 13485, which requires a Quality Manual, clearly defined roles and responsibilities, a medical device file, procedures for document control, design and development protocols, and validation [21,41]. Although additional information may be required for AI/ML devices, a separate QMS is generally not necessary. For submissions in other regions, compliance with ISO/IEC 13485 is also advised to ensure product safety and quality. To ensure the quality and safety of a device, we recommend the inclusion of a medical professional, such as a spine surgeon or a radiologist, depending on the device’s purpose, to oversee all data-related processes throughout development. This includes key stages such as dataset creation, data labeling (if required), and certain aspects of the training process to provide confirmation and validation.
To leverage AI/ML software, a comprehensive Predetermined Change Control Plan (PCCP) is recommended. This plan should anticipate potential adaptations to the software after real-world learning, addressing changes to the device’s indications and algorithm while maintaining safety and transparency. A clear risk mitigation plan is essential to ensure that safety and performance are not compromised by updates. Developers should also define the conditions under which new versions or modifications require regulatory re-submission [42]. For example, the FDA’s Predetermined Change Control Plans for Machine Learning-Enabled Medical Devices: Guiding Principles outlines specific planned modifications, protocols for implementation, and methods for assessing the impact changes [43]. PCCPs are required across all major regulatory regions, though specifics may vary. The PCCP aims to facilitate the implementation of generative AI, although its clinical application remains in its early stages. A significant regulatory challenge in managing generative AI lies in the structure of the approval process. Even with a PCCP in place, premarket approval is still required, including validation on a small, predefined dataset. This poses a considerable hurdle due to the continuous learning capability and real-world adaptability of generative AI [44]. Projects in spine surgery that incorporate generative AI will likely face delays in regulatory approval until the regulatory landscape evolves to accommodate these unique characteristics. In the meantime, it is advisable to prepare a comprehensive PCCP with a detailed safety and quality plan. These elements are expected to play a central role in future regulatory requirements for generative AI in medical devices.
To obtain approval from regulatory bodies, it is essential to provide solid and relevant data to support the model’s performance. This requires not only an ideal and diverse training dataset but also the evaluation of the model using statistical metrics that are widely recognized in the scientific literature. These metrics ensure the quality, reliability, and robustness of the device. For instance, in the segmentation of medical images, such as spine computed tomography and MRIs, commonly used metrics include the Dice Similarity Coefficient, Intersection-over-Union, specificity, and sensitivity. Together, these metrics provide a comprehensive assessment of the segmentation model’s accuracy and consistency [45]. In contrast, for postoperative outcome predictions, sensitivity, specificity, accuracy, and the area under the receiver operating characteristic curve are preferred evaluation metrics. These measures provide a more holistic understanding of the model’s ability to distinguish between positive and negative outcomes, ensuring its clinical utility and reliability [46].
The above-mentioned recommendations should be continuously aligned with the latest regulatory guidelines and legislative developments. Given the rapidly evolving regulatory landscape for AI/ML-enabled medical devices, it is essential for developers to actively monitor updates to ensure compliance with the most current requirements.
CONCLUSION
Regulatory approval of AI/ML devices for spine surgery is challenging, with distinct approaches adopted by the EU, US, and Republic of Korea. The EU’s AI Act establishes an overarching framework, while the US and Republic of Korea focus on updated guidance documents to regulate AI/ML-enabled medical devices.
Key factors for successful regulatory approval include the creation of an ideal, diverse, and high-quality training dataset, the inclusion of qualified medical professionals in data labeling and training oversight, and the adoption of rigorous QMS. The inherent “black box” nature of AI models introduces transparency and accountability challenges, but these can be mitigated through human-in-the-loop training and detailed technical documentation. Furthermore, adaptive and generative AI systems pose unique regulatory hurdles due to their continuous learning capacity, necessitating the development of a comprehensive PCCP to address postmarket changes.
Developers must track regulatory updates, maintain documentation, and use robust evaluation metrics. By adopting these strategies early in the development process, developers can streamline regulatory submissions and ensure the safety, effectiveness, and clinical utility of AI/ML-enabled devices in spine surgery.
Notes
Conflict of Interest
The authors have nothing to disclose.
Funding/Support
This study received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Author Contribution
Conceptualization: MB, VES; Writing – original draft: MB, VES; Writing – review & editing: SJR, AET, SV, NM, DB, LR, CS.