4.1. Feature Extraction Step
In this work, we chose texture image-based features that are able to provide accurate clues for many types of diseases such as pneumonia (bacterial, viral) or COVID-19. Indeed, feature extraction methods aim to identify the most relevant information (features) that is representative of the various classes in the image. It is a form of dimensionality reduction, which is essential to pathology detection by reducing ambiguity and improving the accuracy of detection. Sometimes, medical images are not easy to interpret; subsequently, extracting important features will help to provide radiologists with a clear description of both normal and pathological areas in the image. One of the most important features used for biomedical image diagnosis is textures. They represent visual patterns of interest, capture recurring patterns, and provide significant information about the spatial variation in pixel intensities. A texture may be smooth, fine, coarse, or grained according to the tone and structure.
When we use statistical methods to extract texture, the resulting features are called statistical features. They may be based on first-order, second-order, or higher order statistics of the gray level value. Statistical methods are extensively used in medical context and particularly in X-ray image analysis. For some usual applications, notably image classification or data clustering, the use of only gray level value cannot lead to the desired results; however, the use of texture features may provide more effective results [
41]. In this work, we focus on investigating a specific texture feature called the “Haralick feature”, which has been applied successfully in several applications [
42] and has been shown to be able to give interesting results in terms of X-ray images’ discrimination. The Haralick descriptor is related to second-order statistics. It is mainly estimated by the well known GLCM. The latter is a square matrix of dimension N (number of gray levels in the image) that describes patterns of gray level repetition. It provides a co-occurrence matrix of the joint probability density of the gray levels of two pixels. More specifically, it counts the co-occurrence of neighboring pixels in the image. Indeed, the value of each pixel in the image and the value of its neighbors are counted to determine the new value of its corresponding element in the GLCM matrix. GLCM is also built by obtaining information about the distance and the orientation. Therefore, all texture features are defined on the basis of the GLCM’s elements. Many texture features can be directly computed from the GLCM matrix. For instance, it is possible to compute the following features [
43]: contrast (large differences between neighboring pixels), correlation, energy, entropy, difference variance, difference entropy, normalized inverse difference, and the information measure of correlation. Thus, in our work, we favor this descriptor to discriminate and classify abnormalities.
4.2. Data Sets
We conducted our experiments on several relevant data sets where three were related to chest X-ray images and the fourth was about CT scans. These data sets are publicly available and shared by well recognized institutions such as the University of Montreal and were obtained from the GitHub repository shared by Dr. Joseph Cohen [
44]. The details regarding these data sets are depicted in
Table 1. The first CXR-based data set (
https://github.com/ieee8023/COVID-chestxray-dataset) [
44] contains X-ray images, and it is provided with metadata for every image such as the patient ID, the location, and other annotations. The second data set is called ”Augmented COVID-19”, collected from two data sets available online, Kaggle (
https://www.kaggle.com/paultimothymooney/chest-xray-pneumonia) and chest X-ray (CXR), increasing the COVID-19 X-ray images from 48 to 912 images through some image transformations such as rotation, flipping, translation, and scaling. The third data set named “Kaggle” contains two categories (pneumonia/normal) for lung images, and it is used to detect pneumonia. An illustrative sample from this data set is given in
Figure 2. We conducted also our experiments on the CT scan data set [
45], which contains in total 470 scans where 275 are positive cases for COVID-19 and 195 are negative cases and are labeled as non-COVID-19. The data set was verified by senior radiologists who have performed several diagnoses on many COVID-19 patients. Our goal here is to analyze complex CT scans in order to evaluate the proposed framework.
Figure 3a illustrates an example of a CT scan image for a patient with COVID-19.
4.3. Results Analysis
The implemented statistical models were deployed to distinguish between the normal and COVID-19 patients using chest X-ray and CT images. We performed different image processing steps. After processing the data, we extracted a list of statistical parameters. The majority of studies have shown that the primary organ that is affected by this disease is lungs. In our analysis, we focused on extracting the lungs’ area using image thresholding and segmentation processing, and we identified and isolated left and right lungs from chest X-ray images. In order to remove noise from the image, we applied the Gaussian filter. We also applied some filters to extract additional features such as the Gabor filter, GLCM, image heat map features [
46], lung abnormality, lung pixel intensity, and the lung affected region based on the heat map.
For performance investigation, we ran the three learning approaches for the finite Gamma mixture model, and we evaluated their performance in terms of overall accuracy (Acc), detection rate (DR), and false positive rate (FPR).
Table 2,
Table 3,
Table 4 and
Table 5 show the results for the tested data sets when applying different learning approaches, namely the Gamma mixture model with maximum likelihood (
MM-ML), with Bayesian inference (
MM-B), and with variational Bayes (
MM-V). It is noteworthy that apart from the batch learning approaches, we also included their online counterparts to investigate the ability of the model to learn as new data arrive. The online extension of the ML-, Bayesian-, and variational-based approaches was based on the methodologies that we previously proposed in [
40,
47,
48], respectively.
Regarding the overall Acc for the four data sets (CXR-COVID, CXR-pneumonia, CXR-augmented-COVID, and CT COVID), both the Bayesian and variational learning approaches for the finite Gamma mixture model provide better results than the Gaussian mixture. Indeed, for the CXR-COVID data set, the average accuracy to classify CXR images into COVID-19 or normal patients with Gamma mixture models using different types of learning (i.e., online, Bayesian, and variational) is about 87%, which is better than the other obtained results using Gaussian mixtures (only 83%). For CXR-pneumonia, the average accuracy is 92% for Bayesian and variational learning of Gamma mixtures (MM-B and MM-V), which outperform the rest of the methods notably Bayesian learning of Gaussian mixtures (the Acc of GMM-B is 88%). Thus, according to these results, we can see clearly that the proposed Gamma mixture model provides very encouraging results using both the batch and online learning approaches, as compared for instance to Gaussian-based models, taking into account the difficulty of the unsupervised classification. Likewise, we came to the same conclusions for the other data sets, and we noticed that the average precision increased as the data set size increased. This can be viewed for the CXR-augmented-COVID and CXR-pneumonia data sets, which contain more images than the CXR-COVID and CT COVID data sets. For instance, for the case of the CXR-augmented-COVID data set, our model MM-B is the best with Acc = 91.95% compared to, for example, 85.13% for GMM-ML and 86.77% for GMM-B. Here, the results show also that both batch and online learning models can give approximately comparable performance. By contrast, the Gaussian-based classification obtained the worst performance.
For the case of CT scans (
Figure 3), according to
Table 5, it is clear that both
MM-B and
MM-V have superior results compared to other mixtures with an average Acc equal to 82.88%. This superiority may be justified by the flexibility given by the shape parameter of the Gamma mixture. It should be noted here that the lung segmentation step is difficult. Indeed, the presence of acute respiratory distress syndrome and the small amount of contrast at the boundary of lungs can induce errors when segmenting this region of interest. In order to improve this task, we plan to apply a more attractive segmentation method as in [
49]. It is noted also that the number of images for this data set is too small. For all these reasons, the obtained results are lower than the previous ones.
In this work, it is important to confirm the merits of the variational formalism, which is able to give very encouraging results with less computational complexity. We note also the merit and efficiency of the online learning extension, which still provides very high results (for all data sets). These findings encourage the choice for the online process, which has the advantage of being used especially for online prediction and detection of several infection forms (such as COVID-19). The online setting will be also very promising to improve the models when new data are collected. In addition, it allows the model to be updated incrementally, saving time and maintaining performance. Therefore, such a choice may help in rapidly detecting COVID-19 infection in images.
We notice also that the infinite extension provides good performance. The results for the finite and infinite mixture are very similar, which can be explained by the fact that we are not really dealing with large data sets. These results are considered very encouraging given that we approach the classification problem in an unsupervised manner. In fact, the flexibility of the Gamma mixture model and the robustness of texture-based features lead to more stable results. In this case, they confirmed that they are able to differentiate between images according to texture characteristics. To improve these results, perhaps more textural features have to be considered within the proposed statistical framework. Please note that several comparative studies have been discussed in the literature (see, for instance, ref. [
50]) showing that textures are one of most important well-studied descriptors especially for medical applications [
23]. In this work, we did not develop different feature-based techniques that have been published in the state-of-the-art to model textures, which is clearly beyond the scope of this article. Instead, we focused on a significant texture feature named the Haralick descriptor, which has been applied with success in the past.