1. Introduction
Thoracic illnesses are widespread and significant health issues that affect a lot of individuals all over the globe. For instance, pneumonia kills roughly 4 million people annually and infects around 450 million worldwide (seven percent of the total population). One of the most used forms of radiological examination for identifying thoracic disorders is chest radiography, often known as chest X-ray (CXR) [
1,
2]. Globally, countless chest radiographs are produced each year, and virtually all of them are examined visually by humans. This is much expensive, time consuming, operator biased, and not able to take advantage of precious large data. It also needs a high level of skill and focus [
3]. A significant public health issue in many nations is the absence of qualified radiologists who can interpret chest radiographs. Therefore, it is essential to create an automated technique for thoracic illness detection on chest radiographs using computer-aided diagnosis (CAD). The ChestX-ray14 dataset was provided by Wang et al. [
1,
4,
5], who recognized its value and used it to assess automated techniques for diagnosing 14 thoracic illnesses using chest radiography.
For diagnosis using radiology, chest X-ray is currently the most used test in hospitals. The intricate pathological anatomy of many diseases and accompanying lesion regions make the automated chest X-ray classification a tough assignment [
6]. In hospitals, chest X-ray analysis is entirely at the discretion of a radiology professional who may diagnose the disease and the portion of the body afflicted by the lesion. To appropriately categorize a range of illnesses by analyzing chest X-ray images, computer-aided diagnosis is critical [
7,
8,
9]. This may be achieved with the help of a CAD that arises from the laborious task of converting human knowledge into artificial intelligence’s language [
10,
11,
12].
For decades, radiography has been an essential tool for detecting medical disorders and helping in therapy administration [
1,
10]. The growth of the medical business has resulted in the employment of automatic categorization techniques based on machine learning paradigms; nevertheless, the data are meaningless in the absence of a professional diagnostician [
13]. Dealing with radiological X-ray data necessitates extensive experience and knowledge. Because of the gravity of the situation, an expert will probably need a significant amount of time to review the X-ray data. Even now, there is a chance that weary paramedics may make an error that might have been prevented [
14]. Similarities between some diseases, such as pneumonia, which overlaps with various ailments, have exacerbated the problem [
13]. As a result, there is a demand for radiological X-ray data automation that can categorize diseases in ways that the human eye or expert knowledge cannot. According to the World Health Organization (WHO), more than 66% population of the world lacks access to modern radiology diagnostic and specialist skills [
15]. Atelectasis, effusion, cardiomegaly, masses, infiltration, emphysema, pneumonia, consolidation, pneumothorax, fibrosis, nodules, pleural thickening, edema, and hernia are some of the fundamental thoracic illnesses that may be detected using a chest X-ray. Additional thoracic (CXR) research can be seen in [
16,
17,
18]. Issues surrounding the detection and treatment of illness are growing progressively critical due to the COVID-19 epidemic. Researchers may now conduct their research using CXR for free on numerous digital platforms. This publicly accessible dataset contributes to bioinformatics and computer science by providing readers with an overview of the findings described in these reports [
19]. Several pre-existing methodologies and procedures have enabled the utilization of these massive CXR recordings [
3,
20,
21].
In this work, it is important to utilize techniques for deep feature extraction to achieve the best classification of 14 thoracic diseases. The major contributions of the paper are provided below:
To identify the state-of-the-art technique for thorax disease classification and localization;
To develop and design an architecture for thorax diseases, multi-classification, and their localization and implement it through the proposed model;
To evaluate the proposed model and achieve higher accuracy (AUC-ROC) as compared to the state-of-the-art research.
The remainder of the paper is organized as follows.
Section 2 discusses the related works.
Section 3 describes the ChestX-ray14 dataset, its preprocessing, and related issues.
Section 4 explains the proposed Z-NET model details.
Section 5 describes preparation of a dataset for experiments and experimental settings.
Section 6 provides results and comparisons. Finally,
Section 7 provides a discussion of the proposed research, findings, and limitations.
Section 8 presents conclusion and some future work.
2. Related Work
Currently computer vision techniques using deep learning are being used specifically for categorizing medical and natural images [
3,
22]. As a direct result of this endeavor’s success, many academics are presently using deep convolutional neural networks (DCNNs) to diagnose thoracic illnesses based on chest radiographs. However, because the vast majority of these DCNN models were designed to address a variety of problems, they typically have three shortcomings: (1) they frequently fail to take into account the characteristics of various types of thoracic diseases; (2) they frequently make the wrong diagnosis because they do not focus solely on aberrant areas; and (3) their diagnosis can be challenging to understand, which limits their utility in clinical practice.
Traditional clinical experiences, according to [
23], have demonstrated the value of lesion site attention for better diagnosis. The authors in [
23] developed disease location attention-guided network (LLAGnet) that focuses on lesion site features that are discriminative in chest X-rays for thoracic disease categorization using multiple labels. The authors in [
24] utilized a transfer learning technique for deep learning convolution layer training and worked on a small segment of data. The researchers in [
25] worked on a small dataset of a few hundred images for testing and training. They used a neutrosophic approach by applying various deep convolutional models for COVID-19 classification from different lung infections. They utilized a small amount of data, which caused overfitting. The paper [
26] utilized the heuristic optimization algorithm ROFA for the classification and segmentation of pneumonia by applying different thresholds. The drawback of their approach is that the small number of pixels taken for analysis do not determine the correct location for segmentation, resulting in low accuracy for segmentation. In the research of [
27], the authors utilized VGG-SegNet for pulmonary nodule classification and Lung-PET-CT-Dx for their segmentation. However, due to various parameters, the model does not tackle them and has low accuracy. The researchers in [
28] proposed the CheX-Net CNN model, which is the most current and sophisticated one. CheX-Net takes in chest X-ray images and generates a heatmap that shows the locations of the regions most likely to be affected by the disease. Regarding recognizing pneumonia in 420 images of X-ray, CheX-Net outperformed four experienced radiologists on average. On the other hand, the paper’s [
28] proposed model is a DenseNet variation that has not undergone any significant alterations, with the purpose of learning representations with little to no monitoring. The network weights were trained using images from ImageNet.
The research in [
20] recommended employing Consult-Net to develop relevant feature representations that might be used for the classification of lung illnesses. The Consult-Net project’s goal is to overcome the obstacles posed by a broad set of diseases and by the influence of irrelevant areas in the chest X-ray classification. Study primarily focuses on classifying thoracic disorders shown in chest X-ray. Authors in [
20] presented two-branch architecture known as Consult-Net for learning discriminative features to achieve two goals simultaneously, as Consult-Net is made up of two distinct parts. A feature selector bound by an information bottleneck retrieves important disease-specific features based on their relevance in the first step of the procedure. Second, using a feature integrator based on spatial and channel encoding, the latent semantic linkages in the feature space were improved. Consult-Net integrates these unique characteristics to increase the accuracy of thoracic illness categorization in CXRs.
A DCNN (Thorax-Net) is proposed in [
14]. It aims to utilize chest radiography to diagnose 14 thoracic diseases. Thorax-Net features both an attention and a categorization branch. The classification branch created feature maps and the attention branch can capitalize on the relationship between class labels and clinical concerns. A diagnosis can be formed by integrating the data from two components, averaging and binarizing them, then feeding the results into the trained Thorax-Net and a chest radiograph. When trained with internal data, Thorax-Net outperforms other deep models, with AUC ranging from 0.7876 to 0.896 in each trial.
Due to the absence of substantial annotations and abnormalities in pathology, CAD diagnosis of thoracic disorders is still complex. To address this CAD challenge, the paper [
29] proposed a model known as the triple-attention learning (A3 Net) system. By merging three independent attention modules into a single, coherent framework, the proposed model unifies the processes of learning attention scale-wise, channel-wise, and element-wise. The feature extraction backbone network is a pre-trained version of the DenseNet-121 network. The deep model is explicitly encouraged to focus more on the feature maps’ discriminative channels.
The initial stage in developing automated radiology classification is identifying relevant diagnostic disease in X-ray. These elements can assist in making a diagnosis. The problem is that these properties are highly non-linear, making definition difficult and providing an opportunity for subjectivity. The model must be trained to apply a sophisticated non-linear function, mapping an image with feature, to extract these features (fimg). Previous authors created a DCNN for extracting these complex non-linear features [
30].
Deep CNN has a few disadvantages, the most significant being the difficulties connected with vanishing gradients and the massive number of parameters required [
30]. Training deep networks for medical applications is difficult due to a lack of medical datasets (this dataset contains just about 3999 patient records). Furthermore, it is known that network is quite sensitive at beginning when it is trained from the scratch. This was established in the paper [
31], which demonstrated that vanishing gradients occur in an unstable training process when a deep neural network is not properly initialized, causing the training process to be unstable. To identify qualities that are complementary to one another, Chen et al. [
4] introduced DualCheXNet, a dual asymmetric feature learning network. However, the algorithms currently in use for categorizing CXR images do not take any knowledge-based information into account [
4]. Instead, the emphasis is on the development of useful representations using a range of deep models. Furthermore, as the network grows in size, issues with vanishing gradients may appear on individual CXR images.
Iterative attention-guided curriculum learning was used in [
32] to enhance localization performance under weak supervision and thoracic disease categorization. The results are likely affected by the attention-aware networks’ continued inability to identify goal regions accurately. This is due to a lack of expert-level supervision or instruction.
Similarly, their performance suffers greatly from imbalance data when the number of parameters increases and the existing model cannot handle large parameters, which causes incorrect classifications. On the other hand, concerns about overfitting and vanishing gradients have developed as the model’s depth has increased [
20]. Furthermore, a single network with a greater depth of model is more likely to miss crucial distinguishing intermediary layer features. These characteristics are frequently subtle, yet they are critical for identifying hard-classified anomalies in CXRs. These difficulties have evolved into bottlenecks, impeding the deep extension of the ImageNet [
4] model. Although 2D LSTM uses parameters more than 2D RNN. It is less efficient at runtime than 2D RNN for large images due to the issue of exploding or vanishing gradients. It is possible to overfit, especially when there are training data [
33].
Gradient vanishing issue is detected in training phase as the network is fine-tuned using pre-trained models. It is likely that the lower layers of the network will not be well-customized, since gradient magnitudes (back-propagated from the training mistake) swiftly decline. Conceptual appearance of the disease is detected in top layers. The author suggested training of network one layer at a time and building from scratch the P-Net to avoid this issue [
33].
The difficulty of appropriately identifying images is exacerbated due to high degree of similarities between distinct classes and the scarcity of data for specific conditions. Because the situations visually resemble one another, CXR images do not adequately reflect the entire spectrum. This is especially true for those who have two or more disorders. When CNNs are trained with many parameters, this proximity may result in overfitting, even for categories with small samples. Among the nearly 100,000 total images in the ChestX-ray14 collection, the “Hernia” positive detected are only 227 [
22].
Existing models have several flaws, such as vanishing gradients as network size increases, network parameter optimization [
4,
34], and overfitting when a patient has many illnesses (the model becomes confused when identifying the condition using small data). Another shortcoming of method is that it does not fully address the issue of class imbalance (some diseases may have more images than others) [
19]. As a result, the present model cannot train to the same level of accuracy for each of the 14 diseases, resulting in inaccurate disease detection. In addition, existing models do not tackle the correlation between different diseases [
14,
20]. Different models related to DCNN are being utilized for identification of thoracic diseases. There are issues related to model training, datasets, and proper labeling of the thoracic diseases. Increasing the relevant dataset sizes, obtaining more accurate labeling from professionals, and using a proper training dataset will help the DCNN models perform better.
3. Dataset
The dataset, its preprocessing and class imbalance issues are discussed in this section.
3.1. ChestX-ray14
ChestX-ray14, a hospital dataset consists of 112,120 chest radiographs of 30,805 participants as frontal-view [
1]. There are 14 illness image labels on these chest radiographs, including cardiomegaly, atelectasis, mass, infiltration, effusion, pneumothorax, nodule, pneumonia, edema, consolidation, fibrosis, emphysema, and pleural thickening. These images were initially stored in PNG format before being scaled to 1024 × 1024 [
1]. There are one to many labels of illness in the remaining 51,708 images that are retrieved from linked radiology reports by NLP. No sickness is found in 60,412 images (they are typical instances), whereas the remaining 51,412 images have one to many labels of diseases. It is expected that these labels are more than 90% accurate. The data is split into 14 distinct thorax disorders. The dataset was formally split at the individual patient level into a training subset consisting of 70% of the participants and a testing subset consisting of 20% of the patients and 10% for validation. Same patient will be found only in either training or testing subcategories. The testing set has 15,684 samples having one to many labels and 9912 are unlabeled, whereas training set contains 36,024 samples having one to many labels and 50,500 are unlabeled [
1,
14].
3.2. Preprocessing
As compared to the ImageNet dataset for classification purposes, the ChestX-ray14 dataset has very few spatial patterns of diseases included in images of 1024 × 1024 pixels, which creates issues for deep learning models development and computing hardware. ChestX-ray14 consists of 112,120 X-rays from 30,805 subjects with 14 diseases including cardiomegaly, atelectasis, consolidation, infiltration, effusion, fibrosis, mass, edema, pneumonia, emphysema, nodules, pneumothorax, hernia and pleural thickening. There are 60,412 images in the dataset, with no illness labeled as a normal instance, and 51,708 images in the dataset with multiple disease classifications. More than half of the photos in the sample are normal cases. Thus, a technique is used to encode each class label to 14-dimensional vector where a “1” denotes an instance relevant to thoracic disease and a “0” signifies the absence of sickness. The preprocessing of chest X-ray images occurs in two steps. The model resized the input X-rays from 1024 × 1024 to 224 × 224 pixels. Then, it used the weighted cross-entropy loss technique on the data samples to reduce the class imbalance problem while avoiding overfitting by using the ReLU layer as the activation function per convolution layer. Finally, by employing the ReLU layer as the activation function, the model applied the weighted cross entropy loss strategy to handle overfitting problem.
3.3. Class Imbalance
ChestX-ray14 dataset is significantly biased. Around 8000 X-ray images are available for training in the categories of atelectasis, cardiomegaly, and effusion, but only approximately 2000 images are available for edema, emphysema, and hernia. The most popular ways to deal with class imbalance are oversampling training data, undersampling training data, and penalizing classification. By including a positive/negative balancing factor in the multi-label classification loss layer eliminates the overfitting problem. There are more zeroes than ones in our one-hot-like image labeling that cause overfitting; by adding a balancing factor to the multi-label loss layer, we improve the accuracy of the proposed model. As a result, we used the well-known weighted cross-entropy loss to handle classification for increasing the model’s penalty for improperly classifying the minority group while it was being trained. Weighted cross-entropy loss was found to assist in reducing class imbalance by enhancing AUC per-class and its average across various diseases.
7. Discussion
The fundamental contribution of this study is the development of the Z-Net model and its integration with global characteristics. The integration was carried out to identify areas of pathological anomaly and allow categorization to focus on such illness zones. Given that the dataset consists of 983 bounding boxes, it was critical to verify that heatmaps created using class labels of images and ground truth bounding boxes match one another. The model was given learning heatmaps for a normal case as well as eight cases with the following conditions, in listed order: atelectasis, cardiomegaly, effusion, infiltration, mass, nodule, pneumonia, and pneumothorax. For each occurrence, the heatmap, which illustrates areas of high activation in the red color whereas areas of low activation in the blue color. Illness boundary boxes are overlaid on the image. This shows that the learned heatmaps fit the bounding boxes fairly well, even when the size varies. Fortunately, these areas of elevated activity were found to be outside the heart and lungs. Ability of Z-Net in detecting pathological anomalies in the vast majority of chest radiographs explains the performance improvement that our suggested model achieves. Limitations of the study include imbalances in the dataset and the size of the data, which may affect training and performance. Also, due to small number of annotations (as it was only supervised by image-level class labels), there were inadequate clinical abnormality annotations. In addition, there is a need to have labels that provide relationships among various thoracic diseases.