Next Article in Journal
Voltage Reference Realignment Cell Balance to Solve Overvoltage Caused by Gradual Damage of Series-Connected Batteries
Next Article in Special Issue
Exploring Arterial Wave Frequency Features for Vascular Age Assessment through Supervised Learning with Risk Factor Insights
Previous Article in Journal
Optimized Weighted Ensemble Approach for Enhancing Gold Mineralization Prediction
Previous Article in Special Issue
Recent Advances of Artificial Intelligence in Healthcare: A Systematic Literature Review
 
 
Article
Peer-Review Record

Detection of Elbow OCD in the Ultrasound Image by Artificial Intelligence Using YOLOv8

by Atsuyuki Inui 1,*, Yutaka Mifune 1, Hanako Nishimoto 1, Shintaro Mukohara 1, Sumire Fukuda 2, Tatsuo Kato 1, Takahiro Furukawa 1, Shuya Tanaka 1, Masaya Kusunose 1, Shunsaku Takigami 1, Yutaka Ehara 1 and Ryosuke Kuroda 1
Reviewer 1:
Reviewer 2:
Reviewer 3:
Submission received: 2 May 2023 / Revised: 23 June 2023 / Accepted: 26 June 2023 / Published: 28 June 2023
(This article belongs to the Special Issue Artificial Intelligence (AI) in Healthcare)

Round 1

Reviewer 1 Report

1. What are any differences between YOLOV8m and YOLOV8n?

2. All the results in the experiments are closely to 100%. Does it mean the YOLOV8 model is perfect to solve the problems in the detection of elbow OCD in the ultrasound image?

3. What type results will be obtained if other models are used?

4. What is the creativity of the paper? It seems that people can solve the problem just using a YOLOV8 model.

5. The introduction of YOLO model should be not in the discussion of the paper but the introduction section.

Some typos are left, like (Figure. 2 and 3) at line 94, on Page 4. Some commons are left at the caption of Fig. 3 etc.

Author Response

Thank you for reviewing our paper, the reviewers' comments are valid and we believe that responding to them will improve the quality of our paper. Below is a point-by-point response to the reviewer’s comments.

Reviewer1

  1. What are any differences between YOLOV8m and YOLOV8n?

The model’s parameters and calculation speed are different. We added the information on methods and Table 4.
‘The YOLOv8n model, which has the fewest parameters (parameter size 3.0M) among the pre-trained models in YOLOv8, and the YOLOv8m model, which has a moderate number of parameters (parameter size 25.8), were selected as object detection models. These models were compared to the YOLOv5, a previous-generation model presented by the same group. As YOLOv5 models, the YOLOv5n model (parameter size 1.8M), and the YOLOv8m model (parameter size 20.8) were used.’ (L186-192)

  1. All the results in the experiments are closely to 100%. Does it mean the YOLOV8 model is perfect to solve problems in the detection of elbow OCD in the ultrasound image?

The trained models showed nearly 100% accuracy on the present datasets. However ‘overfitting of the model’ can happen. We added some words as a limitation.

‘the videos were captured by multiple experts, but the labeling was conducted by a single orthopedic surgeon who is skilled in AI research. This may have caused overfitting of the AI model. Therefore, labeling of the images by multiple experts may be necessary to improve the generalization performance of the AI model. In addition, the accuracy of the actual medical checkups has not been verified, and future examinations are needed to verify the accuracy of the data. (L381-386)’

  1. What type results will be obtained if other models are used?

We compared the result from YOLOv5 model as well as the data from previous report.

‘Several YOLO models were trained to detect the standard view of the elbow joint and OCD lesions were detected in the object detection task, the yolov8n model showed mAP(50) of 0.991 and mAP(50-95) of 0.784, while the yolov8m model showed mAP(50) of 0.991 and mAP(50-95) of 0.784. In the YOLOv5 model, mAP(50) and mAP(50-95) were 0.998 and 0.666 for the YOLOv5n model, respectively, and 0.993 and 0.714 for the YOLOv5m model, respectively. The mAP values for the YOLOv8 models were higher than those for the YOLOv5 models’ (L286-292)

‘In a prior investigation, we performed a binary classification task using the image with OCD or images without OCD [21]. Images were captured from only two views, ante-rior long and posterior long axis. Three DL models were compared, and the accuracy was 0.818 in ResNet50, 0,841 in MobileNet_v2, and 0.872 in EfficientNet. In the present study, accuracy of binary classification was 0.998 in YOLOv8n-cls model which is higher compared to the previous report.’ (L308-313)

 

  1. What is the creativity of the paper? It seems that people can solve the problem just using a YOLOV8 model.

YOLOv8 is the latest and fastest object detection model so far. However, ‘overfitting’ to the dataset is always the issue. We have not tested the model in actual medical checkup situations. We added some sentences as a limitation.

‘Secondly, the videos were captured by multiple experts, but the labeling was con-ducted by a single orthopedic surgeon who is skilled in AI research. This may have caused overfitting of the AI model. Therefore, labeling of the images by multiple experts may be necessary to improve the generalization performance of the AI model. In addition, the accuracy of the actual medical checkups has not been verified, and future examinations are needed to verify the accuracy of the data. ‘(L380-386)

  1. The introduction of YOLO model should be not in the discussion of the paper but the introduction section.

We moved the description of the YOLO model to the introduction section in accordance with the comments.

Reviewer 2 Report

In this paper, the authors applied the YOLOv8 method for image classification and object detection tasks. I have comments for the authors to address.

1. It seems that the authors directly apply the well-established to their own datasets. The novelty seems to be limited. The motivation and novelty of the method should be clarified.

2. The authors only evaluate the YOLOv8 method performance. I'm wondering if the authors could consider some reference methods (i.e., conventional or DL-based) and compare the performance of the proposed method to those of reference methods. This experiment may enhance the importance of the paper.

 

Author Response

Thank you for reviewing our paper, the reviewers' comments are valid and we believe that responding to them will improve the quality of our paper. Below is a point-by-point response to the reviewer’s comments.

Reviewer2

In this paper, the authors applied the YOLOv8 method for image classification and object detection tasks. I have comments for the authors to address.

 

  1. It seems that the authors directly apply the well-established to their own datasets. The novelty seems to be limited. The motivation and novelty of the method should be clarified.

 The novelty of the present study is model can detect four ‘standard views’ of the elbow joint and OCD lesion. In a clinical situation, it is difficult for inexperienced examiners to obtain accurate images that can withstand evaluation. Detection of ‘standard view’ by AI can be used as an explainable AI and help inexperienced examiners.

‘we used a "two-shot" detection model that detects the standard view of elbow joints and OCD lesions. This method prevents false positives and can be easily reproduced by inexperienced examiners.’ L371-373’

 

  1. The authors only evaluate the YOLOv8 method performance. I'm wondering if the authors could consider some reference methods (i.e., conventional or DL-based) and compare the performance of the proposed method to those of reference methods. This experiment may enhance the importance of the paper.

We added the result from YOLOv5 object detection model and the data from the previous report which performed image classification.

‘Several YOLO models were trained to detect the standard view of the elbow joint and OCD lesions were detected in the object detection task, the yolov8n model showed mAP(50) of 0.991 and mAP(50-95) of 0.784, while the yolov8m model showed mAP(50) of 0.991 and mAP(50-95) of 0.784. In the YOLOv5 model, mAP(50) and mAP(50-95) were 0.998 and 0.666 for the YOLOv5n model, respectively, and 0.993 and 0.714 for the YOLOv5m model, respectively. The mAP values for the YOLOv8 models were higher than those for the YOLOv5 models’ (L286-292)

‘In a prior investigation, we performed a binary classification task using the image with OCD or images without OCD [21]. Images were captured from only two views, anterior long and posterior long axis. Three DL models were compared, and the accuracy was 0.818 in ResNet50, 0,841 in MobileNet_v2, and 0.872 in EfficientNet. In the present study, the accuracy of binary classification was 0.998 in YOLOv8n-cls model which is higher compared to the previous report.’ (L308-313)

Reviewer 3 Report

The paper presents a study that evaluates the diagnostic accuracy of YOLOv8, a deep learning-based artificial intelligence model, for ultrasound (US) images of elbow osteochondritis dissecans (OCD) lesions or normal elbow joint images. The authors argue that early detection of OCD using ultrasound is crucial for successful conservative treatment, and they aim to determine whether YOLOv8 can accurately distinguish between normal and OCD-affected elbow joints.

 

The study utilized a dataset of 2,430 images, which were subjected to image classification and object detection using the YOLOv8 model. The model's performance was evaluated based on various metrics derived from the confusion matrix. The results demonstrated high accuracy, recall, precision, and F-measure values for the binary classification of normal and OCD lesions. The mean average precision (mAP) was also calculated to assess the agreement between the detected bounding boxes and the true labels.

 

Based on the abstract alone, it appears that the YOLOv8 model achieved excellent diagnostic accuracy for both image classification and object detection tasks. The reported metrics, such as accuracy, recall, precision, F-measure, and mAP, indicate strong performance in distinguishing between normal and OCD-affected elbow joints. These findings suggest that YOLOv8 has the potential to be utilized for mass screening during medical check-ups for baseball elbow.

 

The study presents promising results regarding the diagnostic accuracy of YOLOv8 for ultrasound images of elbow osteochondritis dissecans. The paper indicates strong performance in both image classification and object detection tasks.

 

 

There are several major negative points that need to be addressed in the full paper.

 

Firstly, the paper indicates that there is insufficient discussion of related work. A comprehensive review of existing literature and related studies is essential for contextualizing the research and understanding how the proposed approach compares to previous methods. It is crucial to provide a thorough discussion of the strengths and limitations of other techniques used for OCD detection in ultrasound images of the elbow. This would help readers understand the novelty and contribution of the YOLOv8 model in the field.

 

Furthermore, the paper does not provide sufficient information regarding the preprocessing steps applied to the ultrasound images. Preprocessing plays a crucial role in image analysis tasks and can significantly impact the performance of deep learning models. The paper should elaborate on the specific preprocessing techniques used, such as image normalization, noise reduction, or image enhancement, and discuss their potential influence on the model's accuracy. This information is essential for replicability and understanding the robustness of the proposed approach.

 

Lastly, it is mentioned in the paper that the YOLOv8 model was evaluated, but there is no mention of comparing its performance with other architecture or models for object detection. A comprehensive evaluation would involve benchmarking the YOLOv8 model against other commonly used architectures in the field of object detection to assess its superiority or competitiveness. This comparative analysis would provide a more comprehensive understanding of the strengths and weaknesses of YOLOv8 for ultrasound-based OCD detection.

 

In conclusion, while the abstract highlights some positive aspects of the paper, such as high diagnostic accuracy achieved by the YOLOv8 model for OCD detection, several major negative points need to be addressed in the full paper. These include the lack of discussion on related work, the need for more detailed result analysis and interpretation, and the absence of information regarding image preprocessing steps. Additionally, it is crucial to compare the performance of YOLOv8 with other architectures or models to assess its effectiveness in comparison. Addressing these points would enhance the paper's scientific rigor and provide a more comprehensive evaluation of the proposed approach.

The quality of English language is generally good. The sentences are well-structured, and the ideas are conveyed clearly. The introduction effectively provides an overview of the research topic, the significance of OCD in youth baseball players, and the role of imaging techniques in its detection. The author effectively discusses the limitations of current methods and introduces the potential of AI and DL techniques to address these limitations. Additionally, the introduction includes appropriate technical terms and references to support the statements made. However, there are a few minor areas where the language could be improved for better readability and clarity. These include minor grammatical errors, sentence restructuring, and the use of more precise terminology in certain instances. 

Author Response

Thank you for reviewing our paper, the reviewers' comments are valid and we believe that responding to them will improve the quality of our paper. Below is a point-by-point response to the reviewer’s comments.
We also corrected the grammatical error throughout the manuscript.

Reviewer 3 

There are several major negative points that need to be addressed in the full paper.

 

Firstly, the paper indicates that there is insufficient discussion of related work. A comprehensive review of existing literature and related studies is essential for contextualizing the research and understanding how the proposed approach compares to previous methods. It is crucial to provide a thorough discussion of the strengths and limitations of other techniques used for OCD detection in ultrasound images of the elbow. This would help readers understand the novelty and contribution of the YOLOv8 model in the field.

  We added the review of related studies to emphasize the problems of elbow OCD screening and treatment in the introduction and discussion section.

‘Baseball pitching can produce excessive stress on the anterior part of the capitellum, where most OCD lesions in throwing athletes are found. Mechanical conditions may play a role in elbow OCD and bone bruises may be a precursor to an OCD lesion [3].’ L29-32

‘According to the systematic review from Sayani et al. in 2021, nonoperative treatment was similar in outcomes to surgical treatment for low-grade lesions, whereas surgical treatment was superior for higher-grade lesions. There was no significant difference in the magnitude of improvement or overall scores according to the type of surgery for stable or unstable lesions [4]’ L37-41

‘In 2018, Yoshizuka et al. reported the high accuracy of US imaging for OCD diagnosis [6]. The study compared the diagnostic accuracies of US and magnetic MRI with intraoperative OCD fragment stability findings. They found that US was a useful tool for evaluating fragment instability in OCD and achieved superior accuracy compared with MRI criteria (96% vs. 73%). US screening for OCD is essential for early detection and successful conservative treatment. Group examinations such as the "Medical Checkup for Baseball Elbow" have been conducted nationwide for the early detection of OCD. In 2016, Iwame et al. reported about 30% of youth baseball players had episodes of elbow pain and 4% of young baseball players had an abnormal finding on initial ultrasonography screening [7]. In 2022, Ikeda et al. reported a study of a car-mounted mobile MRI for on-field screening of OCD in young baseball players. Mobile MRI had higher sensitivity than ultrasonography and could detect OCD from early stages to healing [8]. However, not all baseball teams have access to mobile MRI, and screening by the US is currently practical. ’ L49-63

‘In 2017, Otoshi et al. reported the frequency of elbow OCD by baseball position. Among total of 4249 participants, the overall prevalence of capitellar OCD diagnosed by US imaging was 2.2% (93 participants). As for playing positions, catchers had the highest prevalence of OCD (3.4%) followed by pitchers (2.5%). The prevalence for infielders and outfielders was 2.2% and 1.8%, respectively. There was no significant difference in the incidence of OCD by position [20].’ L267-272

 

Furthermore, the paper does not provide sufficient information regarding the preprocessing steps applied to the ultrasound images. Preprocessing plays a crucial role in image analysis tasks and can significantly impact the performance of deep learning models. The paper should elaborate on the specific preprocessing techniques used, such as image normalization, noise reduction, or image enhancement, and discuss their potential influence on the model's accuracy. This information is essential for replicability and understanding the robustness of the proposed approach.

  We added the description of image preprocessing and augmentation methods.

‘’Movie was recorded at 30 frames per second while the 15 or 18 MHz liner US probe (Arietta prologue, FUJIFILM, Tokyo, Japan) was placed in the anterior or posterior center of the elbow joint surface to delineate the standard view. To increase image variation, the probe was slowly tilted and slid against the surface of the elbow joint while capturing movies. ‘ L127-131

‘The images were resized into 640x640 and augmented using Albumentations (version 1.0.3) which is a Python library for image augmentation. The parameters for image augmentation was the following: Blur (probability of applying the transform :p=0.01), MedianBlur (p=0.01),  ToGray (p=0.01), Contrast Limited Adaptive Histogram Equalization (p=0.01), RandomBrightnessContrast (p=0.01), RandomGamma(p=0.01), ImageCompression(p=0.01).’ L135-140

Lastly, it is mentioned in the paper that the YOLOv8 model was evaluated, but there is no mention of comparing its performance with other architecture or models for object detection. A comprehensive evaluation would involve benchmarking the YOLOv8 model against other commonly used architectures in the field of object detection to assess its superiority or competitiveness. This comparative analysis would provide a more comprehensive understanding of the strengths and weaknesses of YOLOv8 for ultrasound-based OCD detection.

 We added the result from the YOLOv5 object detection model as well as the data from the previous report which performed image classification.

‘Several YOLO models were trained to detect the standard view of the elbow joint and OCD lesions were detected in the object detection task, the yolov8n model showed mAP(50) of 0.991 and mAP(50-95) of 0.784, while the yolov8m model showed mAP(50) of 0.991 and mAP(50-95) of 0.784. In the YOLOv5 model, mAP(50) and mAP(50-95) were 0.998 and 0.666 for the YOLOv5n model, respectively, and 0.993 and 0.714 for the YOLOv5m model, respectively. The mAP values for the YOLOv8 models were higher than those for the YOLOv5 models’ (L286-292)

‘In a prior investigation, we performed a binary classification task using the image with OCD or images without OCD [21]. Images were captured from only two views, anterior long and posterior long axis. Three DL models were compared, and the accuracy was 0.818 in ResNet50, 0,841 in MobileNet_v2, and 0.872 in EfficientNet. In the present study, the accuracy of binary classification was 0.998 in YOLOv8n-cls model which is higher compared to the previous report.’ (L308-313)

Round 2

Reviewer 1 Report

1. If all models can reach above 99% accuracy in the detection, it seems that this is not a big problem in the application and can be easy to be solved.

2. If this is true, the creativity of paper is low.

3. In 2.2.1, (parameter size 25.8) is (parameter size 25.8M)? Similar for (parameter size 20.8).

4.  You have two 2.2.1 in the paper!!

5. An extra data acquisition is needed as the testing data set to see if it can obtain the near 100% accuracy.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 2 Report

The authors have addressed all my comments. I do not have any additional comments.

 

Author Response

Thank you for reviewing the manuscript and your comments.

Back to TopTop