Next Article in Journal
Self-Care IoT Platform for Diabetic Mellitus
Previous Article in Journal
Anthropometric Profile and Physical Fitness Performance Comparison by Game Position in the Chile Women’s Senior National Football Team
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Deep Fusion Feature Extraction for Caries Detection on Dental Panoramic Radiographs

1
Course of Science and Technology, Graduate School of Science and Technology, Tokai University, Tokyo 108-8619, Japan
2
School of Information and Telecommunication Engineering, Tokai University, Tokyo 108-8619, Japan
3
Faculty of Engineering, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand
*
Authors to whom correspondence should be addressed.
Submission received: 31 December 2020 / Revised: 17 February 2021 / Accepted: 19 February 2021 / Published: 24 February 2021
(This article belongs to the Section Computing and Artificial Intelligence)

Abstract

:
Caries is the most well-known disease and relates to the oral health of billions of people around the world. Despite the importance and necessity of a well-designed detection method, studies in caries detection are still limited and show a restriction in performance. In this paper, we proposed a computer-aided diagnosis (CAD) method to detect caries among normal patients using dental radiographs. The proposed method mainly consists of two processes: feature extraction and classification. In the feature extraction phase, the chosen 2D tooth image was employed to extract deep activated features using a deep pre-trained model and geometric features using mathematic formulas. Both feature sets were then combined, called fusion feature, to complement each other defects. Then, the optimal fusion feature set was fed into well-known classification models such as support vector machine (SVM), k-nearest neighbor (KNN), decision tree (DT), Naïve Bayes (NB), and random forest (RF) to determine the best classification model that fit the fusion features set and perform the most preeminent result. The results show 91.70%, 90.43%, and 92.67% for accuracy, sensitivity, and specificity, respectively. The proposed method has outperformed the previous state-of-the-art and shows promising results when none of the measured factors is less than 90%; therefore, the method is promising for dentists and capable of wide-scale implementation caries detection in hospitals.

1. Introduction

Oral health plays a main role in people’s overall health and quality throughout their lifetime, regardless of nationality, region, or religion. It is healthy conditions without mouth and facial pain, oral and throat cancer, oral infection and sores, periodontal (gum) diseases, tooth decay, tooth loss, and disorders that limit an individual’s capacity in biting, chewing, speaking, and psychosocial wellbeing. The World Health Organization (WHO) estimated that around 3.5 billion people were affected by oral diseases in 2016 and continuedly increasing [1]. Caries, also known as tooth decay or oral cavities, is the most common disease that affects the quality of life worldwide. Around 60%–90% of school children and almost 100% of adults have dental cavities. Caries is the breakdown of teeth due to acid made by bacteria. The symptoms of untreated caries come in different forms or colors, such as yellow or black, resulting in oral pain, facial pain, tooth loss, and is a major cause of noncommunicable disease. Treatment of oral diseases is usually expensive and not part of universal health coverage. Dental treatment costs 5% of total health spending and is generally a 20% out-of-pocket expenditure in many developed countries. The condition seems to be worse in most developing countries where people cannot afford oral health treatment services. Most caries conditions are treatable and preventable in the early stage, therefore reducing the dentist’s effort and expenditure. Figure 1 describes an example of a healthy tooth and a tooth with cavities.
Detection of caries may consist of three phases: (1) segment (or isolate) the diagnosis tooth from other teeth; (2) preliminary diagnosis to determine whether a tooth has decay; (3) comprehensive diagnosis to make a treatment for the decaying tooth as well as classify the stage of decay into four groups (C1–C4) based on the condition and damage of the tooth. Although a nurse could perform phase one, phases two and three may need practical experience from a dentist. In this research, we aim to develop a method to make a preliminary diagnosis at phase two to reduce the dentist’s effort on non-caries patients.
Recently, with the development of medical imaging technology, computer-aided diagnosis systems (CADs) play the main role in the early detection of several diseases such as cancer, diabetes, or even caries [3,4]. Caries can be detected using several different methods and techniques. Some researchers proposed detection using photoacoustic images, wavelengths, or ultrasound images [5,6,7]. Other research has detailed an approach using an RGB oral endoscope image [8,9]; however, most approaches cannot provide a detailed structure of the tooth, especially the tooth root, and therefore struggle to provide caries diagnosis. Compared to oral endoscope imaging, dental radiographs provide greater image quality and detect detailed structure deformity in the tooth [10]; therefore, the dental radiographic image is the most well-known approach, and is preferable for the detection of caries in the early stage.
Clinically, dental radiographs, which are used to identify teeth problems and evaluate oral health, were taken by X-ray with a low level of radiation to capture images of the interior of teeth and gums. Radiographs are usually shown in grayscale images or sometimes in color images; however, color radiographs require significant investment, which provides a barrier to entry for most hospitals, especially hospitals in low-income countries; to account for this, we focused on grayscale radiographs. Unfortunately, there is no reliable dataset that provides high-quality images, descriptions, and reliable ground truth. In this field, most data are only shared within some strict conditions, such as all the researchers must publish to a specific journal or must be a member of some group or event. Some researchers publish their private data used in their research. The data usually have problems with the quality of the image, size of data, lack of description and ground-truth, and/or lack of data availability in the long-term. In this research, dataset and ground truth were provided by Dr. Kumon Makoto, director of Shinjuku East Office, under a research contract with Tokai University. Dr. Kumon Makoto received The Academy of Clinical Dentistry Certified Physician and registered as a professional dentist with No.148529 on 19 May 2003. With 18 years of experience as a dentist and responsibility for over 200 patients per month, he could reliably provide a truthful dataset. More importantly, all the patients who participated in the dataset collection were real patients of Dr. Makoto and under his treatment. Each caries tooth in the dataset was already confirmed in the patient’s medical history during the treatment. For reasons mentioned above, we believe that our dataset is trustworthy and can be used for research and publication purposes.

2. Related Works

In the dental examination using radiographs, caries can be recognized as a break in the tooth, parts missing from a tooth, or tooth loss. There is no obvious symptom or criteria on the shape, size, or intensity for tooth decay except the dentist’s diagnosis experience, which causes a huge challenge for computer-aided diagnosis systems based on image processing. Wei Li et al. [11] proposed a method to detect tooth decay using a support vector machine (SVM) and a backpropagation neural network (BPNN). The method uses two features set separately for features extraction: Autocorrelation Coefficient and Gray-level Co-occurrence Matrix. Then, a model of SVM and BPNN was applied separately for classification purposes. The result shows that SVM has around 79% accuracy on the testing set, whereas BPNN is around 75% accuracy. The result is inefficient and needs more works for improvement. Besides this, in the article, the dataset’s description is not mentioned; thus, it may lead to questions about the research’s reliability.
Yang Yu et al. [12] tried to enhance the backpropagation neural network layer and features extraction of autocorrelation coefficient matrix. The method was tested on 80 private tooth images (55 images for training and 35 images for testing and shows 94% accuracy); however, there is a great computational burden when the number of layers in the backpropagation neural network is increased. In addition, effective measures, such as sensitivity (SEN), specificity (SPEC), precision (PRE), and F-measure, are not mentioned. Further, the pretty small testing data (35 images) without cross-validation also shows weakness, which cannot address the whole problem of tooth decay.
Shashikant Patil [13] proposed an intelligent system with dragonfly optimization. Multi-linear principal component analysis (MPCA) was applied to extract the feature set. The features set were then fed into a neural network classifier trained using an optimization method, which was the adaptive dragonfly algorithm (ADA). The proposed MPCA model non-linear Programming with ADA (MNP-ADA) was tested with 120 private tooth images divided into three test cases. Each test case consisted of 40 images, 28 images were used for training, and 12 images were used for testing. The other classifiers, such as fruit fly (FF) [14] and grey-wolf optimization (GWO) [15], and feature sets, such as linear discriminant analysis (LDA) [16], principal component analysis (PCA) [17], and independent component analysis (ICA) [18], were also used in the testing for comparison. The final average results show that the MNP-ADA model reaches 90% accuracy, 94.67% sensitivity, and 63.33% specificity. The result shows a low performance of specificity, which describes non-caries patients misclassified as caries patients; therefore, the distinction between caries and non-caries patients is not efficient, so the performance needs to be improved. Because the result shows a high accuracy value despite a low specificity value, this may lead to hesitations about the balance of data between caries and non-caries images. This study also shows other measure values, such as precision and f1-score, which are discussed in more detail in the results section.
Nowadays, deep learning makes a great breakthrough in the machine learning field [19]. The convolutional neural network (CNN) is the most well-known deep learning model, which could be used for many purposes, such as the detection of new unknown objects (transfer learning), fine-tuning the weight, or feature extraction [20,21,22,23]; however, so far as we are aware, there is no previous study that applied deep learning for caries classification problems, especially in dental radiographs; meaning that there may be a need for research in this area. In addition, a single CNN model may result in an unsatisfied performance and neglect a large space of unexplored potential of image data. Thus, there is a need for improvement from the deep activated features by combining other sources for more features. Consequently, in this study, we propose a deep activated model that can best describe dental radiographs and improve the performance of the feature set by combining other mathematic features such as mean, standard deviation (STD), and texture features. Each deep activated model features set is extracted carefully by testing the result of each considerably deep layer. The mathematic features are also tweaked to get the minimum features set while maintaining optimal performance. The combination features set, called the “fusion feature” in this study, is later fed into different classification models to find the best models that fit the features set and perform the best distinction in data. This study focused on two key objectives:
(i)
Stability, which based on the large data, can describe the problem and cross-validation to measure the different situation of problem;
(ii)
Performance is a better result in accuracy and improves specificity since the balance between sensitivity and specificity is sometimes more important than accuracy. Other measures are also shown for comparison with the previous study.
The rest of the paper is organized as follows. Section 3 describes the dataset and proposed method and describes how to implement our method step-by-step. Section 4 shows the results of each step described in Section 3. The results of previous studies are also mentioned for comparison. Section 5 provides discussion, a summary, and conclusions.

3. Materials and Methods

This section describes the proposed method as well as gives information about our dataset. Since there is no specific well-known public dataset in this field, a carefully prepared dataset is important for evaluating the proposed method; thus, most researchers prefer to build their own datasets for experiments [11,12,13].

3.1. Radiographs Dataset

To the best of our knowledge, the tooth is diverse in size, shape, and structure. Characteristics of tooth decay contribute even more to this diversity; therefore, the larger a dataset is, the better it can describe the tooth decay issues. Our dataset was collected and labeled by a dentist from the Tokai hospital. The dataset was assessed for quality and ethics by Tokai university’s committee for the right of use and publication; however, the dataset’s images are panoramic oral radiographs of all teeth, whereas dental diagnosis and treatment should be made for every individual tooth. Consequently, we needed to manually segment the tooth into each sub-image, which consists of the target tooth, which needs the diagnosis, and its label. The segmentation is simple and can be done by any dentist or nurse; therefore, we anticipate no considerable effect in this study (Figure 2). To simulate the real cases, where the area determined for each tooth varies between whoever performs the segmentation, we do not take the area and range of tooth fixed in any size but consider it very flexible depending on the tooth’s size and position and surrounding space.
After the segmentation, the dataset comprised 533 image samples: 229 caries teeth and 304 non-caries teeth. Since the difference in the number of caries and non-caries images remain a small proportion (caries/non-caries is approximate 0.43/0.57), the dataset can be considered as balanced data. Each image is a two-dimensional grayscale image that consists of the target tooth and its surrounding areas, such as black empty space or a part of neighboring teeth. The images present the original condition of teeth without any modification in color, size, or angle. All images are flexible in size, which matches the standard segmentation process and is then fed into the same size layer for the feature extraction step.

3.2. Method

Caries detection mainly consists of two stages: feature extraction and classification. In the first stage, we experimented to find the deep activated features from pre-trained models that best describe radiography, such as Alexnet [20], Googlenet [24], VGG16 [25], VGG19 [25], Resnet18 [26], Resnet50 [26], Resnet101 [26], and Xception [27] networks. The experiments were done in the deepest layers of each model. Later, the mathematic features, such as mean, STD, and texture features such as Haralick’s features [28], were extracted to improve the feature’s information. Both features set are later combined into fusion features. The second stage is where we test the feature set in the classification models, such as support vector machine (SVM), Naïve Bayes (NB), k-nearest neighbor (KNN), decision tree (DT), and random forest (RF). The whole process, along with other sub-stages, are shown in Figure 3.

3.2.1. Features Descriptors using Pre-trained CNN Deep Networks

A pre-trained CNN is used in this study as a feature descriptor to extract the deep activated features. The eight most well-known networks, Alexnet, Googlenet, VGG16, VGG19, Resnet18, Resnet50, Resnet101, and Xception network, were applied to find the best descriptor pre-trained networks. Table 1 describes each pre-trained model specifications in detail, such as depth, parameter, size, and input size. The most common recommended layers for extraction are usually the latest layer before the “prediction” layer for the deepest learning rate; therefore, for our experiment so far, we tested several layers before the “prediction” layer (except the “drop” layer because the drop layer likely shows the same information as the previous respective layer). The image needs to be resized at a specific size before feeding it into a particular network. Technically, the network processes RGB images when the radiographs are grayscale; therefore, we multiplied the grayscale channel to replace the missing channel in the image. The layers and network tests are shown in the Results section.

3.2.2. Features Descriptors using Geometric Features

Geometric features are a fundamental factor to describe any kind of problem. Since the features are extracted using a mathematic formula, the features are therefore understandable and explainable. Despite of the contribution of deep activated features descriptors, geometric features can contain sufficient and relevant information that is noticeable to human. Furthermore, deep activated features usually explore the data in a way impenetrable for human, whereas the geometric features are usually learned from the expert’s experience in the field; therefore, the geometric features are necessary and irreplaceable for solving a complex problem.
In clinical practice, dentists manually determine the difference between caries and non-caries depending on the damage of structure in the tooth. The damage to the tooth’s structure can be explained by the difference in size, shape, contrast, margin, intensity, and so on. Based on their characteristic, the suspicious features, which describes the state of the tooth, is extracted such as mean, Haralick’s features [28], and gray level co-occurrence matrices (GLCM) features [29,30]. Table 2 describes the name and formula of used features in detail. In the formula, I x , y presents the pixel value I at the coordinate point x , y of the candidate image N . p i , j presents the i , j t h entry of GLCM matrix. N g presents number of distinct gray levels in the image. μ and σ present the mean and standard deviation values.

3.2.3. Fusion Features

The extracted feature in deep networks and geometric features are combined in this step. The whole extract geometric features are connected to each deep activated feature. The fusion feature is then fed into a classification model in the next step. Also, to measure the efficiency of the geometric feature as well as fusion feature over deep activated features, we measured the performance by feeding each deep activated feature and fusion features into the classifier at the same condition as fusion features (Figure 4). The comparison of the result between fusion and deep activated features is discussed in the Results section.

3.2.4. Classification

Each deep activated feature is combined with geometric features and then fed into the classification model. To test the fusion efficiency between deep activated and geometric features, the deep activated features are also tested separately and compared with the fusion features. Most tests were conducted using a well-known “optimal margin classifier,” also known as support vector machine (SVM) [31].
The SVM model aims to find the optimal hyperplane that can best describe the distinction between data, caries, and non-caries in this case. To moderate the number of training points, we apply the Gaussian radial basis function in the classifier. For a given training data D = x i , y i ,   i = 1 N and y i {−1, 1} the SVM classifier and the mapping function of the Gaussian kernel can be described as follows in Equations (1) and (2):
min ω , b ,   ξ 1 2 W 2 + C i ξ i 2   s u b j e c t   t o   y i W T X i + b 1   ξ i ,   ξ i 0 ,   i
where C > 0 is the selected parameter and ξ is a set of slack variables.
K   X ,   Y   =   e X Y 2 A
where K is the kernel function and A is the constant.
Furthermore, to guarantee the best classification which fit the features set, we also test the best features set on k-nearest neighbor (KNN) [32], decision tree (DT) [33], Naïve Bayes (NB) [34], and Random Forest (RF) [35].

4. Experimental Results

This section describes how we conducted the experiment as well as gives information on the experiment environment. The result of each step is explained in detail. The best result was compared to the previous state-of-the-art.

4.1. Measures

Performance assessment of the proposed method in this study relied on three well-known measures: accuracy (ACC), sensitivity (SEN), and specificity (SPEC). Plus, we also present precision or positive predictive value (PPV), negative predictive value (NPV), f1-score, the area under the curve (AUC), and processing time to give a comprehensive view about the advantage of the proposed method and for other research reference purposes. The calculation of the measure can be explained as follows in Equations (3)–(8):
A C C =   T P + T N T P + F P + T N + F N
S E N =   T P T P + F N
S P E C =   T N T N + F P
P P V =   T P T P + F P
N P V =   T N T N + F N
F 1 s c o r e =   2 T P 2 T P + F P + F N
where:
  • True positive (TP) presents the number of caries images classified correctly as caries;
  • True negative (TN) presents the number of non-caries images classified correctly as non-caries;
  • False-positive (FT) means the number of non-caries images classified wrongly as caries;
  • False-negative (FN) means the number of caries images classified wrongly as non-caries.

4.2. Experiment and Result

In the first stage of the experimental result, we define the optimal layers in each deep pre-trained network that best represents the problem. Table 3 explains the features set extracted from each deep pre-trained network respective to their layer. The extracted features set are tested with a support vector machine model to reach the final classification result. There is no reference for choosing the layer in each network; therefore, we tried several layers before its prediction layer. Some of them show that the best layers are the pooling layer, whereas others may choose the layer before. The highest performance can be reached from the “fc8” layer in the VGG16 model, which presents an accuracy of 90.57%, sensitivity of 91.30%, and specificity of 90%. Furthermore, Resnet50, Resnet101, and Xception also show a very promising result of around 88% accuracy. Noticeably, none of the deep activated features have less than 80% accuracy, proving that deep activated features are effective.
To enhance the performance so far, we combined each deep activated feature set with geometric features and fed them into the SVM model (Table 4). The result of the fusion feature shows that the fusion Xception feature evolved. After the combination, Figure 5 shows that the fusion features of the Xception network have become the most prominent features and improved the performance to 90.45%, 100%, and 86.67% for accuracy, sensitivity, and specificity, respectively. The highest difference is the improvement of sensitivity from 91% to 100%; therefore, the fusion Xception features set has demonstrated geometric contribution in the proper combination with deep activated features. Although the performance is compatible with Xception fusion features, Resnet18 and Googlenet also show an improvement of 83.02% to 86.79% accuracy and 84.91% to 88.68% accuracy, respectively. Noticeably, none of the fusion feature sets have lower accuracy than their respective deep activated features. In conclusion, fusion features show an obvious advantage on deep activated features set alone.
We randomly divided the training and testing set for cross-validation to design and evaluate the appropriate caries detection method. The k-fold cross-validation is a well-known reliable technique to test the robustness of the method. The application of k-fold cross validation proves the proposed method’s reliability to cover the whole problem and adapt to the unknown samples; this technique was also used to prevent the overfitting of the method on our testing data.
We then used the most prominent results obtained by the different classification models to determine which classification model best fit the features set. In this study, decision tree (DT), k-nearest neighbor (KNN), Naïve Bayes (NB), random forest (RF), and support vector machine were used (Table 5). In this step, we also applied k-fold cross-validation to prevent the overfitting of the method and to calculate the final average assessment. The support vector machine is obviously the most dominant model that shows an accuracy of 91.70%, sensitivity of 90.43%, and specificity of 92.67%. As mentioned earlier in Section 3.1, the used dataset is considered balanced and shows a small difference in the number of caries and non-caries samples; therefore, precision (also known as positive predictive value) and recall pair (also known as sensitivity) also show promising values of 91.51% and 90.43%, respectively. Finally, we generated the receiver operating characteristic(ROC) curves for each classifier; the ROC curves describe each classifier in each fold of the experiment. The mean ROC curve of each classifier is interpolated in each graph from Figure 6a to Figure 6e and compared in Figure 6f.
For a comprehensive assessment, the execution time of the proposed caries detection method is also computed. The experiments were conducted using the Matlab2020a environment in Windows 10. The main process was performed using a CPU core i7-9750 HF, supported by a GeForce GTX 2060 graphic card.
Each function process is carefully considered since they are factors used to determine the complexity of the method. Table 6 shows that the total process takes 13.79 s in total, and the most complex function, which is deep activated feature extraction, takes less than 10 s. Also, the geometric feature calculation was extracted smoothly in only 2.52 s. Based on these results, we consider the proposed method to perform considerably well and capable of wide implementation, even with low computer specifications. Based on the prediction and evaluation time, an image of a tooth after segmentation will take only 0.28 s (less than 1 s) to know if it is caries or non-carries. This is optimal for dentists, even in a large hospital with a huge number of patients.
Lastly, the proposed method was compared with the previous state-of-the-art techniques (Table 7). Because the different methods were conducted in different datasets, each dataset’s size and complexity will make a difference in performance. For a fair comparison, we detail the method and describe the difference and the advantages/disadvantages of each method. In addition, because some methods are not fully described but have been tested on other datasets in other papers, we provide a reference to the appropriate study and provide a description. The comparison table shows that [11,12] have a disappointing performance, whereas [13] performs much better; however, considering the accuracy of 90.00%, the sensitivity of 94.67, and specificity of 63.33%, we can see an imbalance in data as well as a low-performance result. Our proposed method achieved 92.67% specificity compared to the other methods, which is a 29.34% improvement, and the remaining sensitivity values are better than 90%. The 4.24% decrease in sensitivity value is a worthy exchange for the improvement of specificity.

5. Conclusions

In this article, we present a caries detection method using radiography images. Firstly, the radiography images were manually defined by dentists as either caries or non-caries. Later, in the feature extraction process, tooth images were used to extract the deep activated feature. The proper layer used to extract deep activated features from each deep pre-trained model was defined during experiments. Then, the geometric feature was also extracted and combined with deep activated features to build fusion features. The optimal features set was explored by a performance comparison between deep activated features fusion features. The set of geometric features was reduced to its minimum while retaining the optimal information. Next, we fed the fusion into classification models such as support vector machine (SVM), decision tree (DT), k-nearest neighbor (KNN), Naïve Bayes (NB), and random forest (RF) to classify between caries and non-caries images. Our proposed method has achieved 91.70%, 90.43%, and 92.67% for accuracy, sensitivity, and specificity, respectively. We improved the accuracy by 1.7%, from 90% to 91.70%, and the specificity by 29.34%, from 63.33% to 92.67%; the sensitivity was also good at 90.43% compared to previous state-of-the-art methods. The proposed method gives two key contributions: the first contribution is to find the best features set, which is the combination between deep activated features and geometric features, and then fit a proper classification model to describe the problem. The second contribution is to enhance the performance by improving the specificity measure factor. The performance of the deep activated feature is not proportional to the complexity or size of the model. The VGG16 deep activated feature is better than Xception, whereas the fusion result is the opposite. Our choice of the deep activated feature plays an important role; however, choosing analytically calculated features also contributed to the result equally. The finding of which deep activated feature features are compatible with analytically calculated features is more important than finding the best deep activated feature among all pre-trained models. While most research tries to build networks as deep as possible to improve the learning performance, our result proves that the performance is sometimes irrelevant to the network’s depth. More importantly, the calculated feature’s combination may play a key role in improving the performance and, therefore, unexchangeable for the pre-trained model’s depth. The processing time, which takes 13.79 s for the whole experiment and 0.28 s for prediction, demonstrates that the method can be widely implemented in a low-tech computer for a trivial time-consuming. Nonetheless, despite the advantage compared with the previous state-of-the-art, this study’s limitation is that we conducted the detection of caries based on the manually segmented teeth. In future work, we will improve our research as a fully automated system by performing automatic segmentation. We also have a great interest in extending our method to classify different caries stages by using three-dimensional approaches. By that, our system will be an adjunct tool for both experienced and junior dentists.

Author Contributions

T.H.B. and K.H. conceived and designed this study. T.H.B. performed the experiments, simulations, and original draft preparation of the paper. M.P.P. helped in experimenting and evaluating the results. K.H. reviewed and edited the paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board (or Ethics Committee) of the Tokai University (protocol code 19212 and on March 6th, 2020).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

Restriction applies to the availability of these data. The data were obtained from Shinjuku East Dental Office (the director is Makoto Kumon) and are available from authors with permission of Makoto Kumon or by sending a request to Makoto Kumon at: http://www.shinjukueast.com/doctor-staff/.

Acknowledgments

The authors express sincere gratitude to the Japan International Cooperation Agency (JICA) for financial support. The author also thanks to Makoto Kumon, director of Shinjuku East Dental Office, for supporting the dataset in this research.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Oral health. World Health Organization 2020. Available online: https://www.who.int/health-topics/oral-health/ (accessed on 1 October 2020).
  2. Cavities. From the MSD Manual Consumer Version (Known as the Merck Manual in the US and Canada and the MSD Manual in the rest of the world), edited by Robert Porter. Copyright (2021) by Merck Sharp & Dohme Corp., a Subsidiary of Merck & Co., Inc., Kenilworth, NJ. Available online: https://www.msdmanuals.com/en-jp/home/mouth-and-dental-disorders/tooth-disorders/cavities (accessed on 26 January 2021).
  3. Mosquera-Lopez, C.; Agaian, S.; Velez-Hoyos, A.; Thompson, I. Computer-Aided Prostate Cancer Diagnosis from Digitized Histopathology: A Review on Texture-Based Systems. IEEE Rev. Biomed. Eng. 2015, 8, 98–113. [Google Scholar] [CrossRef] [PubMed]
  4. Mansour, R.F. Evolutionary Computing Enriched Computer-Aided Diagnosis System for Diabetic Retinopathy: A Survey. IEEE Rev. Biomed. Eng. 2017, 10, 334–349. [Google Scholar] [CrossRef] [PubMed]
  5. Sampathkumar, A.; Hughes, D.A.; Kirk, K.J.; Otten, W.; Longbottom, C. All-optical photoacoustic imaging and detection of early-stage dental caries. In Proceedings of the 2014 IEEE International Ultrasonics Symposium, Chicago, IL, USA, 3–6 September 2014; pp. 1269–1272. [Google Scholar]
  6. Hughes, D.A.; Girkin, J.M.; Poland, S.; Longbottom, C.; Cochran, S. Focused ultrasound for early detection of tooth decay. In Proceedings of the 2009 IEEE International Ultrasonics Symposium, Rome, Italy, 20–23 September 2009; pp. 1–3. [Google Scholar]
  7. Usenik, P.; Bürmen, M.; Fidler, A.; Pernuš, F.; Likar, B. Near-infrared hyperspectral imaging of water evaporation dynamics for early detection of incipient caries. J. Dent. 2014, 42, 1242–1247. [Google Scholar] [CrossRef] [PubMed]
  8. Li, S.; Pang, Z.; Song, W.; Guo, Y.; You, W.; Hao, A.; Qin, H. Low-Shot Learning of Automatic Dental Plaque Segmentation Based on Local-to-Global Feature Fusion. In Proceedings of the 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), Iowa City, IA, USA, 3–7 April 2020; pp. 664–668. [Google Scholar]
  9. Maslak, E.; Khudanov, B.; Krivtsova, D.; Tsoy, T. Application of Information Technologies and Quantitative Light-Induced Fluorescence for the Assessment of Early Caries Treatment Outcomes. In Proceedings of the 2019 12th International Conference on Developments in eSystems Engineering (DeSE), Kazan, Russia, 7–10 October 2019; pp. 912–917. [Google Scholar]
  10. Angelino, K.; Edlund, D.A.; Shah, P. Near-Infrared Imaging for Detecting Caries and Structural Deformities in Teeth. IEEE J Transl. Eng. Health Med. 2017, 5, 2300107. [Google Scholar] [CrossRef] [PubMed]
  11. Li, W.; Kuang, W.; Li, Y.; Li, Y.; Ye, W. Clinical X-Ray Image Based Tooth Decay Diagnosis using SVM. In Proceedings of the 2007 International Conference on Machine Learning and Cybernetics, Hong Kong, China, 19–22 August 2007; pp. 1616–1619. [Google Scholar]
  12. Yu, Y.; Li, Y.; Li, Y.-J.; Wang, J.-M.; Lin, D.-H.; Ye, W.-P. Tooth Decay Diagnosis using Back Propagation Neural Network. In Proceedings of the 2006 IEEE International Conference on Machine Learning and Cybernetics, Dalian, China, 13–16 August 2006; pp. 3956–3959. [Google Scholar]
  13. Patil, S.; Kulkarni, V.; Bhise, A. Intelligent system with dragonfly optimisation for caries detection. IET Image Process. 2019, 13, 429–439. [Google Scholar] [CrossRef]
  14. Pan, W.-T. A new Fruit Fly Optimization Algorithm: Taking the financial distress model as an example. Knowl. Based Syst. 2012, 26, 69–74. [Google Scholar] [CrossRef]
  15. Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey Wolf Optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef] [Green Version]
  16. Loog, M.; Duin, R.P.W. Linear dimensionality reduction via a heteroscedastic extension of LDA: The Chernoff criterion. IEEE Trans. Pattern Anal. Mach. Intell. 2004, 26, 732–739. [Google Scholar] [PubMed]
  17. Lazcano, R.; Madroñal, D.; Salvador, R.; Desnos, K.; Pelcat, M.; Guerra, R.; Fabelo, H.; Ortega, S.; Lopez, S.; Callico, G.M.; et al. Porting a PCA-based hyperspectral image dimensionality reduction algorithm for brain cancer detection on a manycore architecture. J. Syst. Archit. 2017, 77, 101–111. [Google Scholar] [CrossRef]
  18. Montefusco-Siegmund, R.; Maldonado, P.E.; Devia, C. Effects of ocular artifact removal through ICA decomposition on EEG phase. In Proceedings of the 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER), San Diego, CA, USA, 6–8 November 2013; pp. 1374–1377. [Google Scholar]
  19. Jiao, Z.; Gao, X.; Wang, Y.; Li, J.; Xu, H. Deep Convolutional Neural Networks for mental load classification based on EEG data. Pattern Recognit. 2018, 76, 582–595. [Google Scholar] [CrossRef]
  20. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. In Proceedings of the Proceedings of the 25th International Conference on Neural Information Processing Systems—Volume 1, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105.
  21. Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
  22. Tiulpin, A.; Thevenot, J.; Rahtu, E.; Lehenkari, P.; Saarakkala, S. Automatic Knee Osteoarthritis Diagnosis from Plain Radiographs: A Deep Learning-Based Approach. Sci. Rep. 2018, 8, 1727. [Google Scholar] [CrossRef] [PubMed]
  23. Stuhlsatz, A.; Lippel, J.; Zielke, T. Feature Extraction with Deep Neural Networks by a Generalized Discriminant Analysis. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 596–608. [Google Scholar] [CrossRef] [PubMed]
  24. Szegedy, C.; Wei, L.; Yangqing, J.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
  25. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
  26. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  27. Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807. [Google Scholar]
  28. Haralick, R.M.; Shanmugam, K.; Dinstein, I. Textural Features for Image Classification. IEEE Trans. Syst. ManCybern. 1973, SMC-3, 610–621. [Google Scholar] [CrossRef] [Green Version]
  29. Soh, L.; Tsatsoulis, C. Texture analysis of SAR sea ice imagery using gray level co-occurrence matrices. IEEE Trans. Geosci. Remote Sens. 1999, 37, 780–795. [Google Scholar] [CrossRef] [Green Version]
  30. Clausi, D.A. An analysis of co-occurrence texture statistics as a function of grey level quantization. Can. J. Remote Sens. 2002, 28, 45–62. [Google Scholar] [CrossRef]
  31. Vapnik, V. The Nature of Statistical Learning Theory; Springer: New York, NY, USA, 1995. [Google Scholar]
  32. Guo, G.; Wang, H.; Bell, D.; Bi, Y. KNN Model-Based Approach in Classification. In On the Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE; Springer: Berlin/Heidelberg, Germany, 2004. [Google Scholar]
  33. Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees; CRC Press: New York, NY, USA, 1984. [Google Scholar]
  34. Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning, Second Edition; Springer: New York, NY, USA, 2008. [Google Scholar]
  35. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Sample of healthy and caries tooth image [2]. (a) Structure of healthy tooth and (b) a tooth with decay.
Figure 1. Sample of healthy and caries tooth image [2]. (a) Structure of healthy tooth and (b) a tooth with decay.
Applsci 11 02005 g001
Figure 2. Samples of oral and tooth image. (a) oral panoramic radiograph and (b) segmented tooth radiographs.
Figure 2. Samples of oral and tooth image. (a) oral panoramic radiograph and (b) segmented tooth radiographs.
Applsci 11 02005 g002
Figure 3. Diagram for caries prediction.
Figure 3. Diagram for caries prediction.
Applsci 11 02005 g003
Figure 4. Diagram for experiment deep activated and fusion features.
Figure 4. Diagram for experiment deep activated and fusion features.
Applsci 11 02005 g004
Figure 5. Overlay bar graphs distribution of average accuracy between fusion feature and deep activated feature.
Figure 5. Overlay bar graphs distribution of average accuracy between fusion feature and deep activated feature.
Applsci 11 02005 g005
Figure 6. Comparison of the ROC curves for five classifiers. (a) Decision tree, (b) K-nearest neighbor, (c) Naïve Bayes, (d) Random Forest, (e) Support vector machine, and (f) Comparison of mean of receiver operating characteristic (ROC) curves for each classifier.
Figure 6. Comparison of the ROC curves for five classifiers. (a) Decision tree, (b) K-nearest neighbor, (c) Naïve Bayes, (d) Random Forest, (e) Support vector machine, and (f) Comparison of mean of receiver operating characteristic (ROC) curves for each classifier.
Applsci 11 02005 g006aApplsci 11 02005 g006b
Table 1. Convolutional neural network (CNN) model specification.
Table 1. Convolutional neural network (CNN) model specification.
Network ModelDepthSize (MB) Parameter   ( × 10 6 ) Input Size
Alexnet822761.0227 × 227 × 3
Googlenet22277.0224 × 224 × 3
VGG1623528138.4224 × 224 × 3
VGG1926549143.7224 × 224 × 3
Resnet18184511.5224 × 224 × 3
Resnet50509825.6224 × 224 × 3
Resnet10110117144.7224 × 224 × 3
Xception1268822.9299 × 299 × 3
Table 2. Geometric features and formula.
Table 2. Geometric features and formula.
FeaturesNameFormula
F1Mean μ = 1 n x , y N I x , y  
F2Entropy E = i j p i , j log p i , j  
F3Autocorrelation A u t o C o r r = i j i · j p i , j
F4Contrast C o n t = i j i j 2 p i , j
F5Correlation C o r r = i j i u x j u y p i , j σ x σ y
F6Cluster prominence C o n t = i j i + j μ x μ y 4 p i , j
F7Cluster shade S h a d e = i j i + j μ x μ y 3 p i , j
F8Dissimilarity C o n t = i j i j · p i , j
F9Maximum probability M a x P r o b = max i , j p i , j
F10Sum of square variance S u m V a r = i j i μ 2 p i , j
F11Sum of average S u m A v g = i = 2 2 N g i ·   p x + y i
F12Sum of entropy S u m E n t = i = 2 2 N g p x + y i log p x + y i
F13Sum of variance SumVar   = i = 2 2 N g i S u m E n t 2 · p x + y i  
F14Difference entropy D i f f E n t = i = 0 N g 1 p x j i log p x y i  
Table 3. Performance of deep activated features layer corresponding to networks.
Table 3. Performance of deep activated features layer corresponding to networks.
NetworkAlexnetGooglenetVGG16VGG19Resnet18Resnet50Resnet101Xception
Layerfc8pool5-7x7_s1fc8fc8pool5avg_poolpool5avg_pool
ACC0.86790.83020.90570.81130.84910.88680.88680.8868
SEN0.78260.82610.91300.73910.82610.86960.82610.9130
SPEC0.93330.83330.90000.86670.86670.90000.93330.8667
PPV0.90000.79190.87500.80950.82610.86960.90480.8400
NPV0.84850.86210.93100.81250.86670.90000.87500.9286
F1-score0.72000.67860.80770.62960.70370.76920.76000.7778
AUC0.90870.83330.95870.86740.90140.95650.90720.9464
The highest performance for each measured factor regarding to network was highlighted in bold.
Table 4. Performance of fusion features corresponding to networks.
Table 4. Performance of fusion features corresponding to networks.
NetworkAlexnetGooglenetVGG16VGG19Resnet18Resnet50Resnet101Xception
ACC0.86790.86790.90570.81130.88680.88680.88680.9245
SEN0.78260.86960.91300.78260.86960.86960.82611.0000
SPEC0.93330.86670.90000.83330.90000.90000.93330.8667
PPV0.90000.83330.87500.78260.86960.86960.90480.8519
NPV0.84850.89660.93100.83330.90000.90000.87501.0000
F1-score0.72000.74070.80770.64290.76920.76920.76000.8519
AUC0.90870.89490.95940.86590.91230.95800.90870.9688
The highest performance for each measured factor regarding to network was highlighted in bold.
Table 5. Performance of fusion features based on classifiers.
Table 5. Performance of fusion features based on classifiers.
ClassifierMeasureFive-Fold Cross-Validation
Fold-1Fold-2Fold-3Fold-4Fold-5Mean
Decision TreeAccuracy0.64150.60380.71700.60380.69810.6528
Sensitivity0.65220.78260.73910.69570.60870.6957
Specificity0.63330.46670.70000.53330.76670.6200
PPV0.57690.52940.65380.53330.66670.5920
NPV0.70370.73680.77780.69570.71880.7265
F1-score0.44120.46150.53130.43240.46670.4666
AUC0.66960.65070.77170.61590.70430.6825
K-Nearest NeighborAccuracy0.84910.83020.77360.75470.71700.7849
Sensitivity0.65220.69570.60870.60870.65220.6435
Specificity1.00000.93330.90000.86670.76670.8933
PPV1.00000.88890.82350.77780.68180.8344
NPV0.78950.80000.75000.74290.74190.7649
F1-score0.65220.64000.53850.51850.50000.5698
AUC0.82610.81450.75430.73770.70940.7684
Naïve BayesAccuracy0.73580.73330.71700.75470.75470.7391
Sensitivity0.60870.73080.60870.65220.65220.6505
Specificity0.83330.73530.80000.83330.83330.8071
PPV0.73680.67860.70000.75000.75000.7231
NPV0.73530.78130.72730.75760.75760.7518
F1-score0.50000.54290.48280.53570.53570.5194
AUC0.81010.80660.80430.76740.80940.7996
Random ForestAccuracy0.90570.86790.92450.77360.79250.8528
Sensitivity0.86960.95650.95650.73910.65220.8348
Specificity0.93330.80000.90000.80000.90000.8667
PPV0.90910.78570.88000.73910.83330.8295
NPV0.90320.96000.96430.80000.77140.8798
F1-score0.80000.75860.84620.58620.57690.7136
AUC0.95510.92610.96230.80870.86520.9035
Support Vector MachineAccuracy0.96230.92450.88680.88680.92450.9170
Sensitivity0.95650.86960.73910.95651.00000.9043
Specificity0.96670.96671.00000.83330.86670.9267
PPV0.95650.95241.00000.81480.85190.9151
NPV0.96670.90630.83330.96151.00000.9336
F1-score0.91670.83330.73910.78570.85190.8253
AUC0.99710.98990.96810.96520.96880.9778
The highest performance for accuracy regarding to network was highlighted in bold.
Table 6. Total execution time for each function of the proposed system.
Table 6. Total execution time for each function of the proposed system.
Function NameTime(s)
Load data0.37
Deep activated features extraction9.99
Geometric features extraction2.52
Fusion features combination0.01
Training classification model0.62
Predict and evaluation0.28
Total13.79
Table 7. Performance comparison of the proposed method and with the previous methods.
Table 7. Performance comparison of the proposed method and with the previous methods.
ReferencesMethodSamplesACC%SEN%SPEC%PPV%NPV%
[11,13]
  • Autocorrelation and GLCM features
  • SVM classification
12053.3359.3306.6773.676.67
[12,13]
  • Autocorrelation coefficients matrix
  • Neural network classification
12073.3377.6753.3390.3353.33
[13]
  • Multi-linear principal component analysis
  • Non-linear programming with adaptive dragonfly algorithm
  • Neural network classification
12090.0094.6763.3391.0063.33
Proposed method
  • Deep activated features
  • Geometric features
  • SVM classification
53391.7090.4392.6791.5193.36
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Bui, T.H.; Hamamoto, K.; Paing, M.P. Deep Fusion Feature Extraction for Caries Detection on Dental Panoramic Radiographs. Appl. Sci. 2021, 11, 2005. https://0-doi-org.brum.beds.ac.uk/10.3390/app11052005

AMA Style

Bui TH, Hamamoto K, Paing MP. Deep Fusion Feature Extraction for Caries Detection on Dental Panoramic Radiographs. Applied Sciences. 2021; 11(5):2005. https://0-doi-org.brum.beds.ac.uk/10.3390/app11052005

Chicago/Turabian Style

Bui, Toan Huy, Kazuhiko Hamamoto, and May Phu Paing. 2021. "Deep Fusion Feature Extraction for Caries Detection on Dental Panoramic Radiographs" Applied Sciences 11, no. 5: 2005. https://0-doi-org.brum.beds.ac.uk/10.3390/app11052005

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop