1. Introduction
According to WHO statistics, the number of adults with diabetes in the world reached 463 million in 2019. This number is expected to increase significantly in the future, reaching 700 million by 2045. Diabetic retinopathy (DR) presents one of the most serious complications of diabetes. Visual impairment is irreversible in diabetic retinopathy, and presents different pathological features at different stages, eventually causing eye damage and leading to blindness. Therefore, the early screening and diagnosis of DR are conducive to timely and effective treatment of diabetic patients.
Diabetic retinopathy is the most evident symptom of diabetes, which is characterized by microaneurysms, exudates, new blood vessel formation, hemorrhage, etc. Generally, DR can be divided into two stages, non-proliferative and proliferative retinopathy. The non-proliferative stage can be further classified as mild, moderate or severe. The mild stage is the early stage with small bleeding spots or small microhemangiomas. The moderate stage is subsequent to the mild stage, during which some yellowish-white punctate hard exudates may be examined. The severe stage is the last stage of non-proliferative retinopathy, accompanied by white, cotton-like, soft exudate. During the second DR stage, proliferative retinopathy, retinal damage will stimulate new blood vessel proliferation, which will further cause massive bleeding in the retina and vitreous body, leading to severe loss of vision or even complete blindness. Since the characteristics of diabetic retinopathy are complex and diverse, finding the corresponding features of the disease is a challenge.
At present, in clinical practice, doctors diagnose DR using fundus photograph, which is still a time-consuming artificial method. However, the increasing number of DR patients and the scarcity of high-level ophthalmology experts will likely result in a missed diagnosis, misdiagnosis alongside other problems. Computer aided diagnosis (CAD) does not present the problem of manual diagnosis, which greatly reduces the workload and the time spent by doctors in diagnosing diseases, and provides high accuracy. Recently, deep learning has led to great progress and made valuable research and application contributions in the field of CAD analysis. A Convolution Neural Network (CNN) is most effective in the field of computer vision for its excellent performance in image classification tasks. In recent years, scholars have proposed multiple CNN algorithms, such as VGGNet [
1], GoogLeNet [
2] and ResNet [
3], however, as a routine operation in these works, they usually only scale one of the three dimensions-depth, width and resolution, which often leads to poor accuracy and efficiency. EfficientNet [
4] demonstrates better performance by uniformly scaling depth, width and resolution, and this algorithm provides a new research direction for the subsequent development of CNN. The development of CNN has achieved revolutionized progress in medical areas, such as retinal vascular segmentation [
5], glaucoma screening [
6], cancer subtype classification [
7] and so on. The application of these CNN architectures is well suited for medical classification tasks, nevertheless, the problem of insufficient medical data has emerged, which makes the model’s training challenging. Transfer learning [
8] is a technique which, to ensure that the network is well trained, transfers information from the pre-trained dataset, which includes a huge number of images, to a new dataset. It is an excellent tool for the task of making the network more efficient and stable based on insufficient data.
Deep learning has become a research hotspot in the medical field, and its development has effectively promoted the progress of DR research. Gargeya et al. [
9] used ResNet and a decision tree classifier to distinguish between sick and healthy images, and the AUC for the Messidor dataset is up to 0.94. Chetoui et al. [
10] used EfficientNet combined with transfer learning to detect referable diabetic retinopathy (RDR) and vision-threatening DR (VTDR), which obtained satisfying results of up to 0.98 AUC both for the APTOS 2019 dataset and EyePACS dataset. Rao et al. [
11] employed pre-trained ResNet50 to achieve 96.59% accuracy for an APTOS 2019 dataset for the binary classification task, that is to detect No-DR or DR. The above classification models performed well, however they are of binary classification, meaning that it is unable to make an in-depth classification of specific diseases. Therefore, it is necessary to further study the grade classification of DR. Shanthi et al. [
12], based on the Messidor dataset, designed a neural network to automatically classify normal images, for stage 1, stage 2, and stage 3 of DR, with an accuracy up to 96% for each grade. This algorithm realizes four classifications of DR, that can better reflect the severity of DR than the use of two classifications. In recent years, the five classifications of DR have attracted greater attention because they are able to better reflect the DR and its severity levels. Dondeti et al. [
13], based on the APTOS 2019 dataset, combined the pre-training model NASNET with the T-SNE space to extract deep features, achieving an accuracy rate of 77.90%. Bodapati et al. [
14] proposed a composite deep neural network of Xception and VGG16 with gated attention mechanism to automatically diagnose DR. The accuracy of this model on the APTOS 2019 dataset was 82.54%. Majumder et al. [
15] combined the Xception model with a regression model to classify the five stages of DR, achieving an accuracy of 82%, 86% for the EyePACS and APTOS 2019 datasets respectively. Patel et al. [
16] used the pre-trained MobileNetV2 to classify the DR of the APTOS 2019 dataset, and obtained an accuracy of 91%. With the development of deep learning, the ability of a network to extract DR features has been enhanced, and the accuracy has been improved to some extent. However, for clinical diagnosis, the accuracy of a DR severity classification still deserves further improvement.
From the above statements, we can see that, firstly, in the diagnosis of DR, deep learning methods have received more attention for their improved performance; secondly, the grade classification for different severity levels of DR is easier for doctors to interpret; thirdly, the deep learning networks combined with transfer learning technology can provide more accurate results, especially with regard to the latest algorithms. Therefore, a simple and efficient network is required for a more accurate and effective diagnosis of DR of different severities.
Based on the above analysis, we constructed a DR grade diagnostic model in this paper. The steps performed for the construction of this model are summarized as follows: firstly, to reduce the inclusion of redundant information and to make the APTOS 2019 dataset more suitable for the diagnosis model, pre-processing steps are adopted. Secondly, to extract the features of DR images more accurately and efficiently, we propose the creation of a new network named RA-EfficientNet for the feature extractor of the model, based on the combination of EfficientNet and the residual attention block i.e., RA block. Finally, according to classification tasks, we designed two classifiers, one of which is a 2-grade classifier used for the identification of DR, and the other is a 5-grade classifier for the diagnosis of DR with respect to severity levels.
The rest of the paper is organized as follows:
Section 2 introduces the workflow of our proposed DR diagnostic model, including data pre-processing, the structure of RA-EfficientNet and the classifiers for two classification tasks. In
Section 3, the experimental process and the evaluation of the performance of different networks are illustrated. Finally, the work of the paper is summarized in
Section 4.
4. Conclusions
In this paper, we focus on 2-grade classification and 5-grade classification, which can be used for doctors’ diagnosis of DR, of which a 5-grade classification provides more accurate grading information. We conduct experimental studies on many current deep learning networks using transfer learning technology, including InceptionResNetV2, MobileNetV2, Xception, and EfficientNet, among which EfficientNet achieves the best results because of the balance of the depth, width and resolution, with a 97.95% and 91.00% accuracy for a 2-grade and 5-grade classification task, respectively. To achieve more efficient results, we adopted an RA block modeled on pre-trained CNN networks, and all the models with an RA block performed better than the original networks. After a small increase in the parameters, the evaluation index of the two tasks reached 98.36% and 93.55% accuracy, respectively, using RA-EfficientNet, which shows that our RA block better distinguishes between the lesion features of DR images. The results demonstrate that our diagnostic model provides the best performance and a better evaluation value for the APTOS 2019 dataset, compared to existing DR classification methods. Meanwhile, we verify the robustness of the proposed RA-Efficientnet algorithm by training and evaluating an EyePACS dataset of fundus images, through which we achieved good performance.
The advantages of this paper are as follows: (1) Two types of classification are realized in the model, proving its feasibility both for the diagnosis of DR and its severity levels; (2) Designs provided for the classification of DR, and the new network RA-EfficientNet performs better than other networks in comparison, with a higher accuracy and requires less computer-based calculations; (3) Combined with transfer learning, which can overcome the problem of a small sample size of DR, RA-EfficientNet has obtained satisfactory results. In the near future, we will further optimize the method to improve the accuracy of DR detection and try to develop a more powerful DR diagnosis model to assist doctors in clinical examinations.