Next Article in Journal
An Efficient Method for Biomedical Entity Linking Based on Inter- and Intra-Entity Attention
Next Article in Special Issue
Application of Artificial Intelligence in the Practice of Medicine
Previous Article in Journal
Benchmarking Various Pseudo-Measurement Data Generation Techniques in a Low-Voltage State Estimation Pilot Environment
Previous Article in Special Issue
A Novel Bayesian Linear Regression Model for the Analysis of Neuroimaging Data
 
 
Article
Peer-Review Record

A Deep Learning Ensemble Method to Visual Acuity Measurement Using Fundus Images

by Jin Hyun Kim 1, Eunah Jo 1, Seungjae Ryu 1, Sohee Nam 1, Somin Song 1, Yong Seop Han 2,*, Tae Seen Kang 2, Woongsup Lee 1,*, Seongjin Lee 1, Kyong Hoon Kim 3, Hyunju Choi 4 and Seunghwan Lee 4
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3:
Reviewer 4: Anonymous
Submission received: 24 January 2022 / Revised: 28 February 2022 / Accepted: 11 March 2022 / Published: 21 March 2022
(This article belongs to the Special Issue Applications of Artificial Intelligence in Medicine Practice)

Round 1

Reviewer 1 Report

Thoroughly described research methods and analysis of the results.
The developed method can potentially be used in clinical practice by ophthalmologists.
The revised manuscript is acceptable for the readers.

Author Response

 

Reviewer 1:

Thoroughly described research methods and analysis of the results.The developed method can potentially be used in clinical practice by ophthalmologists.

Reviewer’s Comment 1:  The revised manuscript is acceptable for the readers.

Author’s response: We resubmitted revision of our paper that is revised as a whole.

Author Response File: Author Response.pdf

Reviewer 2 Report

Recommendation: Publish after major revisions noted. 

 

Comments: 

This manuscript provides a vision measurement method using deep learning-based ensemble methodology using fundus photography. The authors need to address the following comments and revise the manuscript accordingly.

  1. This manuscript is in need of substantial editing, English language and style improvement.
  2. Application of certain preprocessing techniques can improve performance. Show the impact of rotating, shifting (translation), rescaling, and shearing.
  3. Demonstrate that the ensemble model outperformed the individual models. Emphasize sensitivity, specificity, accuracy, and AUC.
  4. This deep learning model produces a prediction without explaining the reasoning behind it. Please consider to provide rationale.
  5. Most of the ground truth image refers to the healthy part of the retina and only a small proportion of the pixels refer to lesions.Consider to address a segmentation problem.
  6. The variability in development data could also lead to significant changes in the dataset size required for acceptable performance. Consider to illustrate with examples.
  7. The density of the background pigmentation of the fundus oculi is different for different races. Consider to highlight.
  8. Discuss the current challenges related to the clinical validation and real time deployment of these models in clinical practice.

Author Response

Response to Reviewer’s comments

Title of paper: A Deep Learning Ensemble Method to Visual Acuity Measurement using Fundus Images

Reviewer 2:

 

Reviewer’s Comment 1: This manuscript is in need of substantial editing, English language and style improvement.               

Author’s response: We have renovated all parts of our paper to refine English and presentation.    

 

Reviewer’s Comment 2: Application of certain preprocessing techniques can improve performance. Show the impact of rotating, shifting (translation), rescaling, and shearing.     

Author’s response: Image rotation is currently being used in our setting, but the rest of the pre-processing methods are not effective enough to improve the classification accuracy of our approach nor we can apply some of them to our models.

The pre-processing varying the shape or location, such as the shearing and shifting filters, seems not very helpful in improving the accuracy. They seem to impair the image features of the macula and optic nerve papilla, which human doctors watch closely to check the health of the eye, when CNNs are trained. The following graphs show the result of applying the shearing filter to our data for CNN training. They show that the training results using shearing is good but the testing results are not satisfactory.

In our setting, rescaling is limitedly used because it relates to the training speed of deep neural network.  Since we use transfer learning, the image size is rescaled in accordance with the input image size of a CNN model of the transfer learning model.  In the case of SVM, the image is reduced to about 32 x 32.

We have added to Section 2.3 some descriptions about the application of the image filters tweaking the shape and location of the images as follows:


"Note that the pre-processing method provided in this section is randomly used to augment fundus images for Class1 and 2.

Indeed, other pre-processing methods, such as shearing and shifting, may be helpful to improve the performance of the VA classifier. The image processing methods, such as shearing and shifting, that adjust the shape of images and the position of image features do not work effectively to improve the classification accuracy of our trained machines. For example, the shearing filter is not effective enough to improve the classification accuracy of VA measurement in our experiments. It seems that when a CNN is trained, tweaking of the shape and shifting of the image location impairs the shape of macular and optic nerve papilla that the human doctor observes carefully to check the health of the eye. Rotation of images is applied to augment fundus images from the datasets of Class 1 and Class 2 which are much less than the other classes, and rescaling is limitedly applied fitting to our needs and purposes, such as transfer learning and SVM training. "

 

Reviewer’s Comment 3: Demonstrate that the ensemble model outperformed the individual models. Emphasize sensitivity, specificity, accuracy, and AUC.                  

Author’s response: We have added a table to compare our ensemble method and VGG-19 VA 4-classes classifier which outperforms our methods, EfficientNet-B7 and SVM. To address this comment, we have added a new table (Table 7) and our addition comment on the table. The description we added on Table 7 is as follows:

"Table 7 shows the comparison between the performance of VA classifiers based on our ensemble method and VGG-19 in terms of 4 aspects: the overall average accuracy, each class accuracy, sensitivity, and specificity.   The reason why VGG-19 is selected to compare against our ensemble method is that it shows the best performance of VA classification as shown in Table 7.  It shows that our ensemble method outperforms the VGG-19 VA classifier in the overall accuracy, but the VGG-19 VA classifier shows higher accuracy in VA- classification for Class-2 than our ensemble method. In aspects of sensitivity and specificity, they are not comparable because one of them does not outperform the other in all classes. "

 

Reviewer’s Comment 4:  This deep learning model produces a prediction without explaining the reasoning behind it. Please consider to provide rationale.                       

Author’s response: It would be great to have an explanation about CNN-based classifications. It might be that a heat map is a method to explain our models' classifications.  Due to a limited time of revising this paper, we could not apply such a method but plan to apply such a method.

 

Reviewer’s Comment 5:  Most of the ground truth image refers to the healthy part of the retina and only a small proportion of the pixels refer to lesions. Consider to address a segmentation problem.                

Author’s response: Your idea is great. We leave it for future work since the focus of this paper is to measure VA from fundus images.

 

Reviewer’s Comment 6:  The variability in development data could also lead to significant changes in the dataset size required for acceptable performance. Consider to illustrate with examples.              

Author’s response: In this paper, we had to develop datasets of Class 1 and 2, which are much less than the other classes.  To overcome the shortcoming, we used data augmentation techniques making variability using filters illustrated in Figure 2 and Table 3.  In addition, we have discussed why we cannot use the pre-processing methods that modify the shape of fundus images and shift the images from the original position.             

Figure 2 illustrates the image pre-processing method we applied to fundus images for data augmentation.  Table 3 explains the feature and characteristics of each pre-processing techniques. In addition, we have presented the reason why we cannot use the pre-processing methods, such as shearing and shifting, as follows:

"The image processing methods, such as shearing and shifting, that adjust the shape of images and the position of image features do not work effectively to improve the classification accuracy of our trained machines. For example, the shearing filter is not effective enough to improve the classification accuracy of VA measurement in our experiments. It seems that when a CNN is trained, tweaking of the shape and shifting of the image location impairs the shape of macular and optic nerve papilla that the human doctor observes carefully to check the health of the eye. Rotation of images is applied to augment fundus images from the datasets of Class 1 and Class 2 which are much less than the other classes, and rescaling is limitedly applied fitting to our needs and purposes, such as transfer learning and SVM training. "

 

Reviewer’s Comment 7:  The density of the background pigmentation of the fundus oculi is different for different races. Consider to highlight.          

Author’s response:  Yes, you're right. However, we were limited to the datasets of fundus images and VA records that we could access in our hospital. Thus, this problem has to be left for future work. To highlight this point, we have added the following description in the conclusion section as follows:

 

"In practice, we have more challenges to overcome. For example, the density of the background pigmentation of the fundus oculi is different for different races.  To overcome this problem, we need to obtain more data from different countries and races."

 

Reviewer’s Comment 8:  Discuss the current challenges related to the clinical validation and real time deployment of these models in clinical practice.             

Author’s response:  First, the characteristics of fundus dependent on different races is one of challenges we must overcomes. In addition, we have discussed some more challenges to overcome in future work.            

“To make our approach practical use, we have more challenges to overcome. For example, the density of the background pigmentation of the fundus oculi is dependent races.  We need to obtain more data from other countries and races to overcome this problem.  Besides, the examinee's subjectivity in measuring vision acuity may degrade the collected data quality. In addition, the fundus image shows the functional status of the eye, thus measuring visual acuity with only fundus images has limitations since our vision depends on both the function of the eye and the function of the brain.”

Author Response File: Author Response.pdf

Reviewer 3 Report

The paper presents visual acuity measurement by deep learning methods. The application presented seems interesting. However, I have several comments as follows.

1.The authors claim that this is the first ever work presenting deep learning for VA measurement. Is the statement referring to using fundus images? Authors must revise the claim and properly go through the literature again. DOI: 10.1167/tvst.9.2.51

2.What are the authors’ scientific contributions? It is clear that the dataset is unbalanced, and the ensembled method is used for classification. Such hybrid and ensembled models are not innovative enough. Would you clarify your presented method?

3.Provide sufficient critical review indicating the shortcoming and define the main focus of the research direction. Why is the proposed approach suitable for solving the critical problem? Is it really applicable?

4.Revise the introduction. There is a huge lack of literature review. Properly define the problem statement. Related solutions. Define the deep learning model and explain why to select fundus images and only these deep learning approaches.

5.Is table 1 and table 2 necessary to include in this work? It seems out of context.  

6.Properly cite all the figures and tables in the main text—for example, figure 2.

7.Line 165-168. It must be rephrased. The images were cropped and resized. What was the size of original images and resized images? Should be included in the studies consisting of machine learning algorithms. As it always defines the first layer of training the model.

8.Are there any negative samples used? The false-positive analysis is missing. Should be included, and the possible improvements must be defined.

9.How were the hyperparameters chosen for training? Were they through a grid selection? If they were through grid selection, the authors could improve table 8 by providing the range of suitable hyperparameters.

10.Numerous pieces of literature are available using deep learning applications for classification purposes. Authors can follow any recent article and improve the presentation of the paper for the proper understanding of the readers. 

Summary: The study is overall interesting using DL for medical purposes in ophthalmology. However, the manuscript lacks the proper presentation. The presentation is a bit confusing and needs a thorough rephrasing. 

Author Response

Response to Reviewer’s comments

 

Title of paper: A Deep Learning Ensemble Method to Visual Acuity Measurement using Fundus Images

 

Reviewer 3:

 

Reviewer’s Comments 1: The authors claim that this is the first ever work presenting deep learning for VA measurement. Is the statement referring to using fundus images? Authors must revise the claim and properly go through the literature again. DOI: 10.1167/tvst.9.2.51

Author’s response:  This paper presents a novel technique of VA measurement using fudus and has the novelty over the   DOI: 10.1167/tvst.9.2.51. We clarify that our approach is unique in using fundus for VA measurement.  To clarify this point, we have added the following comment in the introduction section:

"To the best of our knowledge, this is the first paper on the VA measurement based on fundus images using machine learning."

 

Reviewer’s Comments 2: What are the authors’ scientific contributions? It is clear that the dataset is unbalanced, and the ensembled method is used for classification. Such hybrid and ensembled models are not innovative enough. Would you clarify your presented method?                    

Author’s response:  This paper is unique in that fundus image is used to measure VA for the first time.  As for scientific contribution, we show the effectiveness of our ensemble approach to overcome the discrepancy between the data size of individual VA classes for VA classification based on fundus images.   We revised the contribution description of the introduction section to clarify our scientific contribution. The revised introduction section as follows:

"In this paper, we would like to tackle the following two problems:

- How can we measure VA from a VA examinee who cannot communicate with the VA examiner or tries to present a wrong VA value?

- How can we achieve a more accurate classifier when a dataset is fairly biased to certain  classes in terms of the number of sample data?""

As for the contribution of this paper, we added the following statements in the introduction section.

The contribution of this paper are to

  • to present a deep-learning-based VA measurement approach using fundus images,
  • to demonstrate the feasibility and effectiveness of an ensemble approach to overcome the difficulties of obtaining data sets with a balanced size of data sets, and
  • to present a VA measurement alternative for the patient who is not easy or impossible to communicate with the VA examiner.

To the best of our knowledge, this is the first paper on the VA measurement based on fundus images using machine learning."

 

Reviewer’s Comment 3: Provide sufficient critical review indicating the shortcoming and define the main focus of the research direction.              

Author’s response:  The introduction section presented four complex cases where VA is difficult and cannot be measured. As for the research direction, we revise the introduction section with the following statements:

“This paper provides a vision measurement method using deep learning-based ensemble methodology using fundus photography.  In this paper, we would tackle the following two problems:

- How can we measure VA from a VA examinee who cannot communicate with the VA examiner or tries to present a wrong VA value?

- How can we achieve a more accurate classifier when a dataset is fairly biased to certain classes in terms of the number of sample data? 

Fundus photography involves photographing the rear of an eye, which is also known as the fundus. It is a photo image most popularly used in examining eye diseases more than 38 types, such as age-related macular degeneration, neoplasm of choroid, chorior-retinal inflammation or scars, glaucoma, retinal detachment and defects, and so on.

Fundus imaging has been advanced to decrease preventable visual morbidity by allowing easy and timely fundus screening. In particular, the usability and portability of fundus screening have been continuously advanced for the last two decades. Furthermore, recently, there have been significant technological advances that have radicalized retinal photography. Improvements in telecommunications and smartphones are two remarkable breakthroughs that have made ophthalmic screening in remote areas a realizable possibility. With the availability of fundus images, we address the above first problem. That is, we would estimate the VA by capturing a fundus image from a VA examinee and using a VA classifier based on deep machine learning techniques. In this paper, 11 levels from 0.0 to 1.0 (step by 0.1) of visual acuity are grouped into 4 classes according to ophthalmologist doctors' needs.

To tackle the second problem, we apply an ensemble approach consisting of three machine learning models. In the medical field, it is very difficult to obtain a balanced size of a medical data set because the data set of normal cases is much larger than that of abnormal cases.  Vision acuity has the same issue. That is, the cases of a lower VA level, in reality, are much less than those of a higher VA level.  For this reason, it is difficult to adopt a classical CNN model to such unbalanced datasets of VA levels in a classical way.  In our ensemble approach,  three machine learning models and techniques are combined and they are applied to datasets with their best classification performance."

 

Reviewer’s Comment 4: Revise the introduction. There is a huge lack of literature review. Properly define the problem statement. Related solutions. Define the deep learning model and explain why to select fundus images and only these deep learning approaches.                        

 Author’s Response: The introduction has been thoroughly revised to include the problem statement and the rationale for using fundus images for VA measurement.  In addition, we have explained why to use an ensemble approach consisting of three machine-learning models and techniques.

 We put the related work in Section 2. Unfortunately, we have not been able to find any approach using fundus images for VA measurement. For the reason, we couldn't include more literature closely relevant to our work.

 

Reviewer’s Comment 5: Is table 1 and table 2 necessary to include in this work? It seems out of context.                        

Author’s Response: Yes. We remove the two tables and refine the process of sort out the patient data from hospital for our purpose.  At the same location of our paper, we refined the description how to obtain patient data and the corresponding fundus images as follows:     

"In the first stage, we extract the vision acuity information from the medical charts of 79,798 patients with the keywords `VA (Vision Acuity)', `BCVA (Best Corrected VA),' and `CVA (Corrected VA)' and reshape, for our purpose, personal vision datasets of 60,021 visual acuity information, of which each has a matching fundus image."

  

Reviewer’s Comment 6: Properly cite all the figures and tables in the main text—for example, figure 2.                          

 Author’s Response: Currently, all fundus images are what we obtained from our experiments. Thus, we don't think to have a citation on them.

 

Reviewer’s Comment 7: Line 165-168. It must be rephrased. The images were cropped and resized. What was the size of original images and resized images? Should be included in the studies consisting of machine learning algorithms. As it always defines the first layer of training the model.                

 Author’s Response: This part is also thoroughly refined. We have refined and strengthened this part as follows:

 " Table4 shows the  size of datasets of each VA levels for our machine learning. For Class 3 and 4, we don't augment nor pre-process the datasets of Class 3 and 4. The datasets of Class 1 and 2 are much less than those of Class 3 and 4. For the reason, we augment them in the following ways:  First , we randomly select around 2,500  from the dataset of Class 2 to make it balanced with the dataset of Class 1.:  First , we randomly select around  2,500  from the dataset of Class 2 to make it balanced with the dataset of Class 1.  Then, the two datasets of Class 1 and 2 are augmented in the following way:  The images of Class 1 and 2 at the rate of 45% to 50% are randomly selected and rotated  at -10o to 10 o.  Then, 25% to 30% images of the rotated images of Class 1 and Class 2 are applied for the filtering methods in Section 3.3.  Then, the datasets of Class 1 and 2 are augmented the following way: The images of Class 1 and 2  are selected from the original datasets randomly at the rate of 45\% to 50\% and rotated at -10 o to 10 o.  Then, the filtering methods in Section 3.3 are applied for 25% to 30% images from the rotated images of Class 1 and Class 2.

For all images of Class 1 to 4, each image is cropped so that the main part of the macula and optic nerve papilla remains wholly highlighted, as shown in Figure 3, by completely removing the black part of each image.  The original size of each image may not be identical to the others since they are captured in different fundus cameras.  Thus, all images are resized in the size of 300 x 300.

Fundus images in all classes may be resized again for their individual methods when they are fed to CNN and SVM models for machine learning. For CNN models, the fundus images are resized into 244 x 244 fitting to the input image size of CNNs for transfer learning. For the SVM model,  fundus images are resized into 32 x 32."

Reviewer’s Comment 8: Are there any negative samples used? The false-positive analysis is missing. Should be included, and the possible improvements must be defined.                

 Author’s Response: In our training, there doesn't seem to be any negative samples because images from all classes are used together to train the CNN. However, if negative sampling can be applied to our approach, we will leave it for future work.

  

Reviewer’s Comment 9: How were the hyperparameters chosen for training? Were they through a grid selection? If they were through grid selection, the authors could improve table 8 by providing the range of suitable hyperparameters.                      

 Author’s Response: Basically, we use the try and error method. However, Table 8 is made after many experiments and selection of the best parameter values. To explain how to select hyper-parameters, we have added the following description, referring to Table 8, as follows:

 "Table 6 shows the parameters we use to train DNN of VGG19 and EfficientNet-B7. We use a try-and-error approach to select the hyper-parameters after many experiments."

 

Reviewer’s Comment 10: Numerous pieces of literature are available using deep learning applications for classification purposes. Authors can follow any recent article and improve the presentation of the paper for the proper understanding of the readers.                  

 Author’s Response:  Yes, there are numerous literature regarding deep learning applications for classification purposes. However, we wanted to make it focus on 1) deep learning techniques, 2) VA measurement, and 3) fundus-based techniques.  We formulated the related work section to meet our purpose.

 

Reviewer Summary: The study is overall interesting using DL for medical purposes in ophthalmology. However, the manuscript lacks the proper presentation. The presentation is a bit confusing and needs a thorough rephrasing.                 

 Author’s Response:  We have revised all parts of our part to improve the presentation and English.  To make reader easier to follow this paper, we have added more description and comparison.            

 

 

Author Response File: Author Response.pdf

Reviewer 4 Report

Authors propose an ensemble method to classify fundus images into 4 categories with different methods both CNN and traditional classifiers as SVM . The paper describes the method properly and analyses the obtained results. However, organization is not appropriate for me. I find it more understandable to start with introduction, then related works, next problem and dataset description, and so on. Also a section where the obtained results are compared with the existing ones that measure VA should be included (Discussion, after Results and before Conclusions)

 

Please, check typos: Fig. 6 (a) Accuracy, line 305 “EffectinetNet”, 

Author Response

Response to Reviewer’s comments

Title of paper: A Deep Learning Ensemble Method to Visual Acuity Measurement using Fundus Images

Reviewer 4:

 Authors propose an ensemble method to classify fundus images into 4 categories with different methods both CNN and traditional classifiers as SVM . The paper describes the method properly and analyses the obtained results.

  

Reviewer’s Comment 1: However, organization is not appropriate for me. I find it more understandable to start with introduction, then related works, next problem and dataset description, and so on.

 Author’s Response:  We have moved the related work section after the introduction section. In the introduction section, we have added the problem statements and contributions.   

 

Reviewer’s Comment 2: Also a section where the obtained results are compared with the existing ones that measure VA should be included (Discussion, after Results and before Conclusions)

 Author’s Response: We have added Table 7 to compare our method against VGG-19 VA classifier for 4 classes of VA. 

 

Reviewer’s Comment 3: Please, check typos: Fig. 6 (a) Accuracy, line 305 “EffectinetNet”,

 Author’s Response:  We have fixed the issues you pointed out.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

Authors have responded to all the comments and improved the presentation of work.

I believe it is now suitable for publication.

Back to TopTop