Artificial Intelligence Detection of Cervical Spine Fractures Using Convolutional Neural Network Models
Article information
Abstract
Objective
To develop and evaluate a technique using convolutional neural networks (CNNs) for the computer-assisted diagnosis of cervical spine fractures from radiographic x-ray images. By leveraging deep learning techniques, the study might potentially lead to improved patient outcomes and clinical decision-making.
Methods
This study obtained 500 lateral radiographic cervical spine x-ray images from standard open-source dataset repositories to develop a classification model using CNNs. All the images contained diagnostic information, including normal cervical radiographic images (n=250) and fracture images of the cervical spine fracture (n=250). The model would classify whether the patient had a cervical spine fracture or not. Seventy percent of the images were training data sets used for model training, and 30% were for testing. Konstanz Information Miner (KNIME)’s graphic user interface-based programming enabled class label annotation, data preprocessing, CNNs model training, and performance evaluation.
Results
The performance evaluation of a model for detecting cervical spine fractures presents compelling results across various metrics. This model exhibits high sensitivity (recall) values of 0.886 for fractures and 0.957 for normal cases, indicating its proficiency in identifying true positives. Precision values of 0.954 for fractures and 0.893 for normal cases highlight the model’s ability to minimize false positives. With specificity values of 0.957 for fractures and 0.886 for normal cases, the model effectively identifies true negatives. The overall accuracy of 92.14% highlights its reliability in correctly classifying cases by the area under the receiver operating characteristic curve.
Conclusion
We successfully used deep learning models for computer-assisted diagnosis of cervical spine fractures from radiographic x-ray images. This approach can assist the radiologist in screening, detecting, and diagnosing cervical spine fractures.
INTRODUCTION
Cervical spine fractures represent a significant subset of traumatic injuries with potentially devastating consequences which reported incidence in general Western populations ranges from 4–17/100,000 person-years [1,2]. The cervical spine, comprising the first 7 vertebrae (C1–7), plays a critical role in supporting the weight of the head and facilitating movement [3]. Fractures in this region can result from various mechanisms, including motor vehicle accidents, falls, sports-related injuries, and acts of violence [4,5]. Epidemiological studies have highlighted their prevalence, with certain demographics, such as young adults and the elderly, being particularly vulnerable [6,7]. Current developments in medicine have strengthened evidence-based methods for the diagnosis and treatment of cervical spine fractures by deepening our understanding of these injuries. In order to confirm the diagnosis and measure the extent of injury, standard diagnostic techniques frequently comprise a clinical examination for initial assessment, which is then followed by imaging studies such as computed tomography, magnetic resonance imaging, and xrays. Medical diagnostics are being revolutionized by developing technologies like artificial intelligence (AI), even though classic imaging modalities are still vital [8-10].
In recent years, AI algorithms, particularly convolutional neural networks (CNNs) models, have shown promise in the detection of cervical spine fractures on radiographic images [11,12]. These AI systems analyze vast amounts of imaging data to accurately identify fracture patterns, aiding radiologists and clinicians in timely diagnosis and treatment planning [13]. CNNs models can improve diagnostic accuracy and interpretation time by using machine learning. Standardizing care and improving outcomes is possible via clinical guidelines and professional society consensus statements. Additionally, AI technology in diagnostic workflows may increase efficiency and patient care, improving patient outcomes. [14,15].
The objective of this study is to develop and evaluate a technique using CNNs for the computer-assisted diagnosis of cervical spine fractures from radiographic x-ray images. By leveraging deep learning techniques, the study might potentially lead to improved patient outcomes and clinical decision-making.
MATERIALS AND METHODS
This study was the experimental study that retrieved the images from an online open-access dataset [16]. This study was conducted in accordance with the Declaration of Helsinki and with approval from the Ethics Committee and Institutional Review Board (IRB No. HREC-UP-HSST 1.1/027/67). Cervical x-ray images were obtained from adult patients, with a focus on the lateral view of the cervical spine. The inclusion criteria consisted of cervical fracture cases confirmed by 3 musculoskeletal radiologists. Exclusion criteria included images from patients who had undergone prior cervical surgery, as well as those showing tumors, infection, inflammation, and congenital cervical disorders.
1. Data Collection and Annotation
In this study, 500 lateral cervical spine x-ray images were obtained from the ‘ChestPelvisCSpineScans’ dataset available on Kaggle (accessed on April 30, 2024) [16]. These images were selected to ensure a balanced representation of both normal cases and cases with cervical spine fractures. Prior to model training, each image underwent meticulous annotation to assign class labels indicating whether it depicted a normal cervical spine or exhibited signs of a fracture. Annotation was performed by expert radiologists to ensure accuracy and consistency in labeling.
2. Data Augmentation
To mitigate the risk of overfitting and enhance the model’s ability to generalize to unseen data, data augmentation techniques were applied to the training dataset. Augmentation involved generating synthetic variations of the original images through transformations such as rotation, translation, scaling, and flipping. These augmented images were then incorporated into the training dataset, effectively increasing its size and diversity. By exposing the model to a wider range of variations, data augmentation helped improve the robustness and resilience of the trained CNNs model.
3. Data Preprocessing
Before training the CNNs model, the input images underwent preprocessing steps to enhance their suitability for model training. Data preprocessing involved resizing images to 256×256 pixels and normalizing pixel values to a range of 0 to 1. Data augmentation techniques, implemented using the ImageData-Generator from the Keras library, included rotations, translations, scalings, and flips to artificially increase the number of training samples, thereby improving the model’s robustness and generalization performance.
4. CNNs Model Architecture
The CNNs architecture consisted of several convolutional layers followed by max-pooling layers for feature extraction and dimensionality reduction. Batch normalization and dropout layers were incorporated to prevent overfitting and improve model generalization. The final layers of the network comprised fully connected layers and softmax activation for classification into fracture and normal classes. Hyperparameters were optimized through grid search. The choice of CNNs architecture was based on previous research and experimentation to optimize performance for the task of cervical spine fracture detection.
5. Training and Testing
The dataset was randomly split into training and testing sets, with 70% of the data used for training and the remaining 30% for testing. During training, the model iteratively adjusted its parameters to minimize the loss function, optimizing performance on the training data. The testing set was used to evaluate the model’s performance on unknow data, providing an objective assessment of its generalization ability. Training was performed using stochastic gradient descent optimization with momentum, and the model’s performance was monitored using validation data to prevent overfitting.
6. Software and Tools
The development and implementation of the CNNs model were facilitated by Konstanz Information Miner (KNIME)’s graphical user interface-based programming environment (version of KNIME Analytics Platform is 5.2.4) [17] (Fig. 1). KNIME provided a user-friendly interface for data preprocessing, model development, and performance evaluation, enabling efficient workflow design and execution. Python scripting within KNIME allowed for seamless integration of deep learning libraries such as TensorFlow or PyTorch for model training and evaluation. The use of KNIME simplified the development process and enabled reproducibility of results.
7. Statistical Analysis
Descriptive statistics were employed to summarize the dataset’s characteristics, encompassing measures like mean, median, standard deviation, minimum, and maximum values. These statistics provided insights into the distribution and variability of the cervical spine x-ray images and their associated labels (normal or fracture). Model performance metrics to evaluate the CNNs model’s efficacy in detecting cervical spine fractures, various performance metrics were utilized. This included sensitivity (recall), precision, specificity, accuracy, area under the receiver operating characteristic curve (AUC), F1 score, Matthews correlation coefficient, and Cohen kappa. Comparison of performance metrics used different models were subjected to performance metric comparison using appropriate statistical tests, such as Student t-tests, based on the data’s nature and test assumptions. This analysis allowed for determining whether observed differences were statistically significant or arose from random variation. Formal hypothesis tests were conducted to assess the significance of disparities in performance metrics.
8. Performance Evaluation
The neural network model’s training, testing, performance evaluation and workflow was shown in Fig. 2. The performance of the CNNs model was evaluated using a range of metrics, including sensitivity (recall), precision, specificity, accuracy, and AUC. Sensitivity measures the proportion of true positives correctly identified by the model, while precision measures the proportion of predicted positives that are true positives. Specificity quantifies the model’s ability to correctly identify true negatives, while accuracy represents the overall proportion of correct predictions. AUC provides a summary of the model’s performance across different decision thresholds, with higher values indicating better discrimination between classes. Additionally, metrics such as F1 score, Matthews correlation coefficient (MCC), and Cohen kappa were calculated to provide a comprehensive assessment of model performance. A scale for interpreting the accuracy of tests based on the AUC metric. It divides AUC values into 6 categories, ranging from very good to worthless. AUC values between 0.9 and 1.0 are considered excellent, while those between 0.8 and 0.9 are good. Fair, poor, and fail categories cover AUC values between 0.7 and 0.6, and anything below 0.5 is deemed worthless. This scale offers a concise framework for assessing the reliability and utility of tests, facilitating informed decision-making in various fields, particularly in machine learning and medical diagnostics.
RESULTS
The performance model for cervical spine fracture detection using by CNNs model was reported in Table 1. This study aimed to develop and evaluate a performance model for detecting cervical spine fractures using a combination of machine learning techniques. It presents a comprehensive analysis of the model’s performance metrics for both cervical spine fracture cases and normal cases. The recall or sensitivity of our model refers to its ability to correctly identify positive cases out of all actual positive cases. For cervical spine fractures, the recall was found to be 88.6%, indicating that the model successfully detected 88.6% of all actual cervical spine fracture cases. Similarly, for normal cases, the recall was 95.7%, demonstrating the model’s effectiveness in accurately identifying cases without fractures. These high recall values suggest that our model exhibits a strong ability to capture true positives across both classes. Precision measures the proportion of correctly identified positive cases out of all cases predicted as positive by the model. In our study, the precision for cervical spine fractures was calculated at 95.4%, indicating that the majority of cases predicted as positive by the model were indeed true positives. For normal cases, the precision was 89.3%, suggesting a slightly lower but still substantial level of precision in identifying cases without fractures. These precision values highlight the model’s ability to minimize false positives while maximizing true positive identifications. Specificity represents the proportion of correctly identified negative cases out of all actual negative cases. Our model achieved a specificity of 95.7% for cervical spine fractures and 88.6% for normal cases, indicating its effectiveness in correctly ruling out false positives. These high specificity values are crucial for ensuring the accuracy of the model in distinguishing between positive and negative cases, thereby reducing the risk of misdiagnosis. The F-measure, also known as the F1 score, is the harmonic mean of precision and recall and provides a balanced assessment of a model’s performance. In our study, the F-measure for cervical spine fractures was 91.9%, while for normal cases, it was 92.4%. These values indicate a strong balance between precision and recall, suggesting that our model achieves high accuracy in classifying both cervical spine fracture and normal cases. Overall accuracy of our model was determined to be 92.14%, with an error rate of 7.86%. These results demonstrate the model’s ability to accurately classify cervical spine fracture cases and normal cases with a high degree of accuracy. The relatively low error rate further validates the effectiveness of our model in making correct predictions. The MCC is a measure of the quality of binary classifications, considering true and false positives and negatives. In our study, the MCC for cervical spine fractures was calculated at 0.796, indicating a moderate level of correlation between the predicted and actual classifications. For normal cases, the MCC was 0.927, suggesting a strong correlation between predicted and actual outcomes. These MCC values reinforce the reliability and robustness of our model in classifying cervical spine fracture cases and normal cases. The AUC provides a measure of a model’s discrimination ability. Our model achieved an AUC of 0.921 for cervical spine fractures and 0.942 for normal cases, indicating excellent discrimination ability in distinguishing between positive and negative cases.
The summary of model performance is reported in Table 2. It summarizes the key performance metrics of our model for detecting cervical spine fractures and normal cases. The overall accuracy for cervical spine fracture detection was 95.56%, with an error rate of 8.15%. Cohen kappa coefficient, which measures interrater agreement, was calculated at 0.843, indicating substantial agreement between predicted and actual classifications. The consistency of the MCC at 0.796 across different metrics reflects the reliability and stability of our model’s performance. The AUC values of 0.921 for cervical spine fractures and 0.942 for normal cases further confirm the model’s strong discrimination ability (Fig. 3). This model demonstrates high accuracy, precision, recall, and specificity in detecting cervical spine fractures, while also exhibiting robust performance across various evaluation metrics. These findings underscore the potential of our model as a valuable tool for assisting healthcare professionals in the accurate and timely diagnosis of cervical spine fractures. The final application and prediction of cervical spine fractures are shown in Fig. 4.
DISCUSSION
The successful KNIME’s application of CNNs in detecting cervical spine fractures represents a significant advancement in x-ray image analysis and diagnosis. The model achieved a sensitivity (recall) of 0.957 for normal images and 0.886 for cervical fracture images, indicating its high ability to identify true positives. The specificity was 0.957 for cervical fracture images and 0.886 for normal images, reflecting its capability to minimize false positives. These values compare favorably with previous studie [18], highlighting the effectiveness of the model. However, further validation with more diverse datasets is necessary to confirm these findings. The results of this study underscore the potential of AI techniques to augment radiologists’ capabilities and improve patient care in orthopedic practice. One of the key strengths of CNNs lies in their ability to automatically learn features from images, making them well-suited for analyzing radiographic x-ray images [19]. By training on a large dataset of cervical spine x-rays, our CNNs model effectively learned to distinguish between normal images and those indicative of fractures. The high sensitivity, precision, and specificity values obtained in our study demonstrate the model’s robust performance in correctly classifying cases, thereby minimizing false positives and false negatives. The development of a computer-assisted diagnosis technique for cervical spine fractures has several clinical implications. First and foremost, such a technique has the potential to significantly reduce the time and effort required for fracture detection and diagnosis. Radiologists often face heavy workloads, and automating repetitive tasks like fracture detection can free up their time to focus on more complex cases and patient care.
Introducing AI into medical diagnostics raises ethical considerations regarding patient privacy, consent for data use, and the potential for algorithmic bias. Ensuring transparency in AI model development, rigorous validation against diverse populations, and ongoing monitoring for bias are essential to maintain trust in these technologies [20,21]. Furthermore, education and training for healthcare professionals on AI integration can foster acceptance and effective use of these tools in clinical practice, promoting a balance between technological advancement and ethical responsibility in patient care. Implementing AI in fracture detection may lead to initial costs related to technology acquisition, training, and infrastructure integration. However, over time, these investments can potentially reduce healthcare costs by improving diagnostic efficiency, reducing unnecessary procedures, and optimizing resource allocation. Scalability remains a critical factor, as AI systems must be adaptable across various healthcare settings and accessible to healthcare providers globally [20,22]. Collaborative efforts among industry stakeholders, policymakers, and healthcare institutions are essential to ensure affordability, scalability, and sustainability of AI-based fracture detection solutions. By fostering innovation and strategic investment, healthcare systems can leverage AI to achieve cost-effective improvements in patient care and clinical outcomes [20]. The integration of AI-based fracture detection systems into clinical practice necessitates robust regulatory oversight and adherence to legal standards. Clear guidelines are essential for data privacy, security, and the responsible use of AI in healthcare settings [21]. Regulatory bodies must collaborate with healthcare professionals, AI developers, and policymakers to establish standards that ensure the safety, effectiveness, and ethical deployment of these technologies. Legal frameworks should address liability issues, algorithm transparency, and patient consent requirements to safeguard both practitioners and patients [20,21]. By fostering a supportive regulatory environment, healthcare systems can harness the full potential of AI while mitigating risks and maintaining patient trust and safety.
AI-based approaches in fracture detection can improve diagnostic accuracy and consistency. Human interpretation of radiographic images is inherently subjective and can be influenced by factors such as fatigue and experience level. In contrast, AI models can analyze images objectively and consistently, leading to more reliable diagnoses [21,23]. Another important consideration is the potential to improve patient outcomes through earlier detection and intervention. Cervical spine fractures can have serious consequences if not promptly diagnosed and treated [24]. By assisting radiologists in detecting fractures more quickly and accurately, AI-based approaches can facilitate timely interventions, leading to better patient outcomes and potentially reducing the risk of complications [25]. However, it is essential to acknowledge the limitations and challenges associated with the clinical implementation of AI-based fracture detection systems. The potential lack of diversity in the dataset, both in terms of fracture types and patient demographics, may limit the generalizability of the CNN model’s performance. Additionally, the relatively small dataset size and potential data imbalances within the fracture class could affect the model’s robustness and reliability. Variability in image annotation and the model’s blackbox nature further challenge its interpretability and acceptance in clinical practice. Moreover, the evaluation of model performance using data from standard open-source repositories may not fully represent the complexities of real-world clinical settings, highlighting the need for external validation on diverse clinical datasets. Finally, practical challenges associated with clinical implementation, such as regulatory approval and integration with existing systems, must be addressed for the successful adoption of AI-based fracture detection systems.
CONCLUSION
CNNs-based computer-assisted diagnosis technique for cervical spine fractures represents a significant step forward in leveraging AI for spinal practice. By harnessing the power of deep learning, we have demonstrated the potential to enhance fracture detection and diagnosis, ultimately improving patient care and outcomes. Continued research and collaboration between clinicians, data scientists, and technologists will be essential to realize the full potential of AI in orthopedic spine and neurosurgery imaging.
Notes
Conflict of Interest
The authors have nothing to disclose.
Funding/Support
This study received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Author Contribution
Conceptualization: WL, IH, WC, KDR; Data curation: WL, WC, PS; Formal analysis: WL, IH; Methodology: WL; Project administration: WL; Visualization: WL, IH, WC, PS, KDR; Writing – original draft: WL; Writing – review & editing: WL, IH.