Breast Cancer Detection with an Ensemble of Deep Learning Networks Using a Consensus-Adaptive Weighting Method

Dehghan Rouzi, Mohammad; Moshiri, Behzad; Khoshnevisan, Mohammad; Akhaee, Mohammad Ali; Jaryani, Farhang; Salehi Nasab, Samaneh; Lee, Myeounggon

doi:10.3390/jimaging9110247

Open AccessArticle

Breast Cancer Detection with an Ensemble of Deep Learning Networks Using a Consensus-Adaptive Weighting Method

¹

School of Electrical and computer Engineering, College of Engineering, University of Tehran, Tehran 14174-66191, Iran

²

Department of Electrical and Computer Engineering, University of Waterloo, Ontario, ON N2L 3G1, Canada

³

College of Science, Northeastern University, Boston, MA 02115, USA

⁴

Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA

⁵

Department of Computer Engineering, Lorestan University, Khorramabad 68151-44316, Iran

⁶

College of Health Sciences, Dong-A University, Saha-gu, Busan 49315, Republic of Korea

^*

Author to whom correspondence should be addressed.

^†

Current address: Department of Surgery, Baylor College of Medicine, Houston, TX 77030, USA.

J. Imaging 2023, 9(11), 247; https://0-doi-org.brum.beds.ac.uk/10.3390/jimaging9110247

Submission received: 8 September 2023 / Revised: 20 October 2023 / Accepted: 24 October 2023 / Published: 13 November 2023

(This article belongs to the Special Issue Image Processing and Computer Vision: Algorithms and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Breast cancer’s high mortality rate is often linked to late diagnosis, with mammograms as key but sometimes limited tools in early detection. To enhance diagnostic accuracy and speed, this study introduces a novel computer-aided detection (CAD) ensemble system. This system incorporates advanced deep learning networks—EfficientNet, Xception, MobileNetV2, InceptionV3, and Resnet50—integrated via our innovative consensus-adaptive weighting (CAW) method. This method permits the dynamic adjustment of multiple deep networks, bolstering the system’s detection capabilities. Our approach also addresses a major challenge in pixel-level data annotation of faster R-CNNs, highlighted in a prominent previous study. Evaluations on various datasets, including the cropped DDSM (Digital Database for Screening Mammography), DDSM, and INbreast, demonstrated the system’s superior performance. In particular, our CAD system showed marked improvement on the cropped DDSM dataset, enhancing detection rates by approximately 1.59% and achieving an accuracy of 95.48%. This innovative system represents a significant advancement in early breast cancer detection, offering the potential for more precise and timely diagnosis, ultimately fostering improved patient outcomes.

Keywords:

deep learning; mammograms; computer-aided detection; radiology; breast cancer; medical image analysis; ensemble learning; consensus-adaptive weighting

1. Introduction

Breast cancer remains the most frequently diagnosed cancer among women worldwide, leading to significant mortality. According to the World Health Organization (WHO), it accounted for 15% of all cancer-related deaths in women in 2018 [1]. The early detection of breast cancer through screening mammography can substantially reduce the mortality rate by 38–48% [2,3], and it has proven to be an effective tool for this purpose [4]. Consequently, numerous countries in the European Union are implementing screening programs to detect this disease at its early stages [5]. While imaging modalities like PET-CT (positron emission tomography–computed tomography) are also used, they are not recommended as primary diagnostic methods due to their high costs, relative invasiveness, and limitations in detecting small tumors [6,7]. This reaffirms the importance of mammograms in breast lesion classification.

Analyzing mammogram images is challenging due to the subtleties of early cancerous lesions and the variability in breast tissue density [8,9]. While essential, traditional screening by medical professionals has an error rate of around 30% [10,11,12], leading to potentially unnecessary biopsies and associated patient discomfort [13]. To mitigate these challenges, computer-aided detection (CAD) systems have been introduced to support radiologists [14]. When combined with human expertise, these systems can match the reliability of double human reading [15,16,17,18,19]. With the advent of artificial intelligence (AI), especially deep learning (DL) methods like convolutional neural networks (CNNs), the capability of CAD systems has been significantly enhanced [20,21,22,23,24,25,26,27,28]. Building on the transformative potential of AI in medical diagnostics, Bagheri et al. [29] exploited AI and ML techniques to tackle diagnostic challenges in chronic limb-threatening ischemia (CLTI), highlighting the utility of such methods for precise diagnoses, outcome predictions, and identifying treatment disparities. Furthermore, Park et al. [30] showcased another innovative application of machine learning, specifically the random forest approach, to remotely detect physical-aggressive episodes in children using wearable sensors, underscoring the expanding horizons of AI-driven medical insights, especially in behavioral contexts. With their capacity to derive hierarchical feature representations directly from data, DL models have shown promise in detecting intricate patterns in cancers, particularly breast cancer [31,32,33,34,35,36].

However, significant challenges remain in applying DL to mammogram interpretation. One major challenge is the size and complexity of mammogram images. Full-field digital mammography (FFDM) images are typically large, often around 4000 × 3000 pixels, with the area representing potential cancerous regions of interest (ROIs) being as small as 70 × 70 pixels [33]. This creates a proverbial “needle in a haystack” problem, where detecting small but clinically significant features is difficult. Region-based convolutional neural networks (R-CNN) offer one approach to addressing this issue by focusing on ROIs within the image [37,38,39]. However, they require pixel-level annotated data, which are labor-intensive and costly to generate. This limitation was highlighted in the study by Ribli et al. [33], which utilized the faster R-CNN [39] model for breast cancer detection but encountered challenges due to the small size of the pixel-level annotated dataset. Therefore, using region-based networks in medical images such as mammography is controversial.

Despite these challenges, majority voting has been widely used in machine learning tasks for decision fusion. Its simplicity and binary nature make it a common choice for ensemble models [40]. However, its binary nature limits its effectiveness, particularly in complex medical imaging tasks where the nuances of fuzzy logic can provide superior results.

With this research, we aim to contribute significantly to the evolution of CAD systems by introducing the CAW method as an ensemble of five deep learning methods. Our primary hypothesis posits that the performance of the CAW method will surpass that of any of the five individual networks. Furthermore, we hypothesize that our unique approach, which leverages fuzzy output instead of the traditional binary output, will enhance the system’s overall performance. This strategy not only addresses the challenges associated with pixel-level annotations but also bypasses the constraints of region-based networks. The ultimate goal of these advancements is to significantly enhance patient care and outcomes.

2. Materials and Methods

2.1. The Study Design

We proposed a fusion approach, leveraging the decision-making abilities of five different trained DL networks, namely EfficientNet [41], Xception [42], MobileNet V2 [43], Inception V3 [44], and Resnet50 [45]. Each of these networks has shown excellent performance in image classification tasks, complementing each other well in an ensemble setting [46,47,48,49]. This approach involved a consensus-adaptive weighting system based on the networks’ performance in detecting malignant mammograms. (Figure 1) Initially, we trained the five base networks (serving as the backbone of the ensemble model) on 80% of the dataset with four cross-validation folds. Following this, we calculated weights based on the performance of the trained networks using CAW Version 1 (V1) and Version 2 (V2) as we explained them in the following. Finally, we estimated the ensemble model’s performance using the test dataset.

2.2. The First Proposed CAW System

In medical diagnostic tasks such as breast cancer detection, maximizing recall is often more important than precision, as failing to detect a positive case could have dire consequences. Therefore, instead of the F1 score, which equally weights precision and recall, we proposed using the F2 score for our weighting system [50]. The F2 score places more emphasis on recall, aligning more closely with the priorities of image processing tasks. This method, named “consensus-adaptive weighting (CAW)”, is adaptive in nature, allowing for adjustment of weights when additional DL networks are incorporated. Although we included only five recent DL networks in our ensemble model, any number of networks could be incorporated at any time.

After training the five networks on 80% of the data, we employed CAW V1 (Equation (1)) as our ensemble model’s weighting system, and we used the F2 score (Equation (2)) to assign weights.

W_{i} = \frac{X_{i}}{\sum_{i = 1}^{N} X_{i}}

(1)

F 2 - score = \frac{True positive}{True positive + 0.2 \times False positive + 0.8 \times False Negative}

(2)

In Equation (1),

w_{i}

stands for the weight of each model and

X_{i}

stands for the F2 score of the trained network i. As we integrated more DL networks into our system, the weights

X_{i}

were adaptively updated to reflect the contribution of each network in the ensemble.

2.3. The Second Proposed CAW System

Most of the time, the five DL networks’ performances are closely matched, making it challenging for CAW V1 to distinguish their weights effectively. As a solution, we proposed CAW V2 (Equation (3)), which resulted in a more significant differentiation in weights between networks. This method, shown to provide superior results in the results section, refines Equation (1).

w_{i} = \frac{X_{i}^{C}}{\sum_{i = 1}^{N} X_{i}^{C}} X_{i} = F 2 - score of the trained network i

(3)

In Equation (3), ‘C’ represents the power ranging from 0.5 to 20 in increments of 0.1, determined empirically. To identify the optimal ‘C’, we conducted an exhaustive search across all cross-validation subsets of the dataset.

2.4. Dataset and Experiment Setup

To demonstrate the efficacy of our proposed model, we conducted tests on three publicly available mammogram datasets: Cropped DDSM (Digital Database for Screening Mammography) [51], DDSM [52], and INbreast [53]. Each dataset has been extensively used in the literature, emphasizing their relevance and reliability. However, due to the labor-intensive and time-consuming nature of compiling and labeling medical datasets, we restricted our study to these three datasets.

The cropped DDSM dataset comprises 55,890 training images, 14% of which are labeled as positive for breast cancer. Each image within this dataset measures 299 × 299 pixels (Figure 2). Given the relatively low proportion of positive cases (14%), we noted class imbalance in this dataset. To avoid imbalanced class problems [54], we randomly selected 7290 images from the 48,600 images labeled as negative for breast cancer. As a result, both classes were represented equally within the dataset (14,580 images in total).

The DDSM [52] dataset contains 7808 images, 35% marked as malignant (Figure 2). Given the lower overall number of cases in this dataset, we achieved class balance by randomly equalizing classes within each batch during the training process. Additionally, we resized the original 3000 × 4000 pixel images to 400 × 250 pixels as part of the training regimen.

The INbreast dataset [53] contains 410 images, with 100 labeled as malignant and the remainder identified as benign. Following the resizing of all images to 400 × 250 pixels, we designated 20% of the dataset as test data, with the remaining 80% used for training. We then augmented both training and test data using a range of affine transformation techniques, such as rescaling, width shift, height shift, shearing, random zoom, horizontal flip, vertical flip, and random rotation. This augmentation increased our training dataset to 4000 images (inclusive of the original images), with an equal number of positive and negative cases. The same process was applied to the test data, resulting in 1000 augmented images that were equally divided between malignant and benign. Figure 3 presents the sample augmented images from the INbreast dataset.

For this study, we trained our models using the Adam [55] optimizer and binary cross-entropy loss function with a batch size of 32. We chose the Adam optimizer due to its adaptive learning rate capabilities and efficient handling of sparse gradients, making it particularly suitable for optimizing the deep neural networks employed in this study. The initial learning rate for the networks was set at 0.001. The models were trained via cross-validation with four folds.

In addition to the aforementioned strategies, we incorporated the transfer-learning technique into our model development process. Training a DL network typically requires a substantial, labeled dataset, a requirement that presents a significant challenge in medical imaging due to limited data availability. Additionally, the small size of available labeled datasets often leads to overfitting, a common issue in machine learning models. Transfer learning offers a solution to these challenges by leveraging the knowledge gained from one problem and applying it to a different but related problem. In this study, we employed transfer learning using weights derived from networks previously trained on the ImageNet dataset [56]. This approach allowed us to make effective use of our limited labeled datasets without compromising model performance due to overfitting. As for the technical aspect, all training processes were executed on an Nvidia (Nvidia Corporation, Santa Clara, CA, USA) GeForce GTX 1070 8 GB GPU. All codes were conducted using Python version 3.10 (Python Software Company, Fredericksburg, VA, USA).

3. Results

The performance comparison of the proposed CAW V2 method against five prominent DL networks, the majority voting decision fusion method, and the CAW V1 method are summarized in Table 1.

Table 1 underscores that our proposed CAW V2 method outperforms all other models across all datasets regarding the F2 score. Specifically, our method achieved F2 scores of 95.48%, 82.35%, and 72.31% for the cropped DDSM, DDSM, and INbreast datasets, respectively. This represents an improvement of 1.59%, 1.32%, and 1.11% over the best-performing individual networks. Additionally, as the CAW V2 method is an extension of the CAW V1, it naturally exhibits superior performance. Figure 4 displays a comprehensive comparison of the individual performance of different models, along with the proposed ensemble method. The mean F2 scores for cropped DDSM, DDSM, and INbreast are determined to be 92.81, 76.08, and 68.21, respectively.

Notably, the proposed ensemble method outperforms all other methods in terms of F2 accuracy across all evaluated methodologies. These findings reinforce the superiority of the ensemble approach in achieving enhanced performance and validate its effectiveness in improving the accuracy of the aforementioned methods.

In Table 2, the calculated ‘C’ parameters of the proposed method are presented.

A cropped DDSM dataset is not a complex dataset for DL networks because it includes images that are small and easy-to-locate lesions. In contrast, the DDSM and INbreast datasets are difficult to interpret. Since images in these datasets contain the whole mammogram, finding lesions within such large images would be challenging for DL networks. According to Table 2, the more complex and difficult the dataset is, the higher the ‘C’ parameter will be. The results of our study suggest that the discrepancy between the networks’ performances in datasets can explain this phenomenon (Table 3).

Table 3 elucidates the performance variance among various DL networks across multiple datasets. Specifically, the disparity between the most efficient (EfficientNet with an F2 score of 93.89%) and the least efficient (ResNet50 with an F2 score of 88.98%) networks on the cropped DDSM dataset is notable. The ‘C’ parameter, utilized to optimize the F2 score, shows a direct correlation with these performance gaps. In this case, the performance difference of 4.9% corresponds to a ‘C’ parameter value of 2.6. However, in datasets with larger performance disparities, such as DDSM and INbreast, where the gap exceeds 10%, ‘C’ parameters greater than 3 are required.

Table 4 offers a comparison between the weights and performance improvements of our proposed CAW V1 and CAW V2.

In Table 4, we can see how the ‘C’ parameter in the CAW V2 formula affects the models’ weights. For example, in the cropped DDSM dataset, EfficientNetB3’s performance is superior to others (Table 1), so its weight increased from 0.205 to 0.213. In contrast, ResNet50 has been given a weight reduction from 0.194 to 0.185 due to its poor performance in the cropped DDSM dataset. As a result of the use of CAW V2 weights, the F2 score was improved by 0.93% in cropped DDSM, by 0.69% in DDSM, and by 0.33% in INbreast data. The open-source code is available at https://github.com/dehghanr/BreastCancer_AdaptiveWeighting (accessed on 23 October 2023).

4. Discussion

The primary aim of our study was to enhance early breast cancer detection by introducing a CAD ensemble system utilizing multiple advanced deep learning networks. The integrated consensus-adaptive weighting (CAW) method led to significant improvements, with the cropped DDSM dataset showcasing an accuracy boost of 95.48%. Notably, our approach also innovatively addressed pixel-level data annotation challenges in faster R-CNNs, emphasizing its potential to elevate mammogram interpretation and patient care. This research represents a significant stride toward improving breast cancer detection accuracy, potentially reducing unnecessary biopsies, and ultimately improving patient care.

In Ribli et al.’s work [33], the primary challenge with using faster R-CNN for breast cancer detection was the need for pixel-level annotations in mammogram images. This requirement is both time-consuming and costly, especially in the medical realm, where acquiring such detailed annotations demands specialized expertise and considerable resources. Chen et al. [57] further underscored this challenge by highlighting that medical image datasets often comprise fewer images compared to typical computer vision datasets, and only a fraction benefit from expert annotations. Xu et al. emphasized a similar sentiment, noting the inherent difficulty and ambiguity associated with obtaining detailed annotations for medical images [58]. They advocated for the use of models less dependent on annotated datasets, particularly spotlighting the potential of unsupervised feature learning, which can extract meaningful features without requiring labeled data. Given the inherent complexity of mammogram images, where minuscule yet crucial regions of interest are embedded in high-resolution data, the task is further magnified. Our research provides a solution to this. By introducing an ensemble system, we effectively eliminate the need for these intricate pixel-level annotations. The CAW method, which combines five distinct deep learning approaches, promises to be a more cost-effective and efficient alternative to traditional region-based networks.

Two recent studies [33,59] utilized deep learning models for breast cancer classification using FFDM images. Both studies utilized ROI annotation for breast cancer classification. In contrast, our method does not use ROI annotation. Annotating medical datasets, including mammograms [32], is particularly challenging. Compared to the existing methodologies, our results showcased consistent superiority in performance, achieving F2 scores of 95.48%, 82.35%, and 72.31% across cropped DDSM, DDSM, and INbreast datasets, respectively. This improvement, especially when ROI annotation is sidestepped, establishes our method as a significant leap forward in breast cancer classification using FFDM images. Aggregating the predictive outcomes from multiple classifiers is a prevalent strategy in machine learning [60,61]. The primary benefit of this method is that it safeguards against a solitary model being adversely influenced by outliers, noise, or complex scenarios. While individual networks such as MobileNet and Xception have been integrated for mammography [62,63], our work is pioneering in leveraging their combined strengths with an advanced weighting strategy. Distinctively, our proposed consensus-adaptive weighting (CAW) system not only fuses the decisions but also adaptively weighs them using the F2 score, emphasizing recall—a critical factor in medical diagnostics.

In our study, the nature and size of the datasets presented distinct challenges for training our deep learning model. With the cropped DDSM and DDSM datasets, their larger size offered the flexibility of random selection to mitigate potential biases. This randomness in the selection, especially given the vast number of images in the cropped DDSM dataset, ensured a balanced representation for training.

On the other hand, the INbreast dataset, with its modest count of 410 images, posed a unique challenge. Given its smaller size, the necessity for augmentation became evident to ensure adequate data for training. This augmentation, achieved through various affine transformation techniques, expanded our dataset substantially. However, it is worth noting the inherent challenges that arise with such augmented data. While these transformations increase the dataset’s diversity, they introduce images that are not “natural” in their origin. Such alterations might differ from authentic mammograms in subtle ways, potentially impacting the model’s learning dynamics. The challenge then lies in ensuring the model remains robust and generalizable despite training on these transformed images.

Our study echoes the importance of mammograms in breast lesion classification and the integral role CAD systems play in assisting medical professionals in the interpretation of such complex images. We believe that our method could be successfully extended to a larger number of DL models, opening new avenues for future research and offering significant potential for improving the early detection of breast cancer and, by extension, patient outcomes.

Despite the advancements our study offers in the realm of breast cancer detection using deep learning, there are certain limitations to acknowledge. We evaluated our model on three datasets; however, real-world applications might present unforeseen challenges, such as variations in image quality, data acquisition techniques, and differences in populations, that our controlled study environments did not account for. Moreover, while our ensemble model incorporates five different DL networks, there are numerous other potential architectures that might further improve performance. Moving forward, addressing these limitations can pave the way for an even more robust and universally applicable CAD system for breast cancer detection.

This research opens up several avenues for future exploration. One particular area of interest lies in the exploration of different mammographic datasets. The DDSM, for instance, is comprised of scanned analog film mammograms, while the INbreast database consists of FFDMs that are captured directly in a digital format. The distinction in image quality between these two types of datasets can potentially affect the performance of DL models. Researchers are encouraged to consider data augmentation techniques or amalgamating these datasets to create larger, more balanced datasets, ultimately enhancing model performance and robustness. Also, due to the flexibility and adaptability of our proposed ensemble model, CAW holds significant potential for extension beyond the realm of mammogram image analysis and DL models. Its application need not be confined to the clinical sphere. In fact, the ensemble model is equally applicable to diverse types of machine learning models, extending from decision trees and support vector machines to boosting and bagging techniques. This makes it a versatile solution for numerous predictive tasks in a myriad of research areas. Furthermore, it is important to note that while our current application is focused on image data, the model is not limited to these data types. The methodology can be readily adapted for datasets in other forms, such as numerical, categorical, or textual data. This presents an exciting opportunity for researchers to explore the adaptability and performance of our ensemble model across a spectrum of machine learning models and data types, promising significant contributions to fields beyond medical imaging.

Additionally, refining hyperparameter optimization stands out as an essential frontier in advancing the performance of our model. To achieve this, we advocate for an extensive hyperparameter optimization tailored for each deep learning model. By systematically traversing the hyperparameter space, one can pinpoint optimal configurations, thereby enhancing each model’s performance. Using cutting-edge techniques, such as Bayesian optimization or genetic algorithms, could significantly streamline this search process. Not only do these techniques locate the best configurations, but they also conserve computational resources. Moreover, it is imperative to delve into the ramifications of various hyperparameters on the ensemble’s performance. By examining elements like learning rate, batch size, and weight initialization, researchers can gain insights into how different hyperparameters impact the ensemble, offering valuable cues for subsequent model implementations. It is worth noting that our research has undertaken a comprehensive exploration of the ‘C’ hyperparameter. The possibility of utilizing a machine learning model for training this specific parameter could be explored to elevate efficiency in future endeavors. Pursuing these avenues of research promises to usher in notable progress in breast cancer detection, facilitate comprehensive performance assessments, facilitate rigorous comparative analyses, and reveal potent optimization techniques, all in service of bolstering the efficacy of our proposed CAD system.

5. Conclusions

Our research contributes a novel CAD system to the field of breast cancer diagnostics. This system, a collective of multiple advanced DL networks, utilizes our novel CAW method. As a result, we have observed improved breast cancer detection rates, most notably in the cropped DDSM dataset, where our system achieved an accuracy of 95.48% and increased detection rates by approximately 1.59%. We have addressed the challenge of pixel-level data annotation in faster R-CNNs and demonstrated the benefits of efficiently integrating multiple DL networks.

Our findings underscore the crucial role of mammography in breast lesion classification and illustrate the growing necessity for CAD systems in medical image interpretation. This study lays the foundation for future research in this area, indicating the potential for employing even more diverse DL models in such an ensemble-based system. The essence of our study is the delivery of an ensemble-based CAD system that significantly improves the early detection of breast cancer. By integrating various DL networks and implementing innovative adaptive weighting methods, we anticipate a progression toward more precise, swift, and patient-friendly breast cancer diagnostics.

Author Contributions

Conceptualization, M.D.R., B.M., M.K. and M.A.A.; methodology, M.D.R.; validation, M.D.R., F.J. and S.S.N.; formal analysis, M.D.R.; data curation, M.D.R.; writing—original draft preparation, M.D.R.; writing—review and editing, M.D.R., F.J., S.S.N. and M.L., visualization, M.D.R. and F.J.; supervision, B.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Here are the publicly available links to the datasets we utilized: cropped DDSM dataset download link: https://www.kaggle.com/skooch/ddsm-mammography (accessed on 5 February 2021); DDSM dataset download link: https://www.kaggle.com/datasets/cheddad/miniddsm2 (accessed on 18 March 2021); INbreast dataset download link: https://www.kaggle.com/datasets/ramanathansp20/inbreast-dataset?resource=download (accessed on 3 March 2021).

Conflicts of Interest

The authors declare no conflict of interest.

References

World Health Organization. Latest global cancer data: Cancer burden rises to 18.1 million new cases and 9.6 million cancer deaths in 2018. In International Agency for Research on Cancer; World Health Organization: Geneva, Switzerland, 2018; pp. 1–4. [Google Scholar]
Moss, S.; Nyström, L.; Jonsson, H.; Paci, E.; Lynge, E.; Njor, S.; Broeders, M. The impact of mammographic screening on breast cancer mortality in Europe: A review of trend studies. J. Med. Screen. 2012, 19 (Suppl. S1), 26–32. [Google Scholar] [CrossRef] [PubMed]
Welch, H.G.; Prorok, P.C.; O’Malley, A.J.; Kramer, B.S. Breast-cancer tumor size, overdiagnosis, and mammography screening effectiveness. N. Engl. J. Med. 2016, 375, 1438–1447. [Google Scholar] [CrossRef] [PubMed]
Smith, R.A.; Duffy, S.W.; Gabe, R.; Tabar, L.; Yen, A.M.; Chen, T.H. The randomized trials of breast cancer screening: What have we learned? Radiol. Clin. 2004, 42, 793–806. [Google Scholar] [CrossRef] [PubMed]
Ponti, A.; Anttila, A.; Ronco, G.; Senore, C. Cancer Screening in the European Union (2017); Report on the Implementation of the Council Recommendation on Cancer Screening; World Health Organization: Geneva, Switzerland, 2017. [Google Scholar]
Choi, Y.J.; Shin, Y.D.; Kang, Y.H.; Lee, M.S.; Lee, M.K.; Cho, B.S.; Kang, Y.J.; Park, J.S. The effects of preoperative 18F-FDG PET/CT in breast cancer patients in comparison to the conventional imaging study. J. Breast Cancer 2012, 15, 441–448. [Google Scholar] [CrossRef] [PubMed]
Griffeth, L.K. Use of PET/CT Scanning in Cancer Patients: Technical and Practical Considerations; Baylor University Medical Center Proceedings; Taylor & Francis: Abingdon, UK, 2005; pp. 321–330. [Google Scholar]
Boyd, N.F.; Guo, H.; Martin, L.J.; Sun, L.; Stone, J.; Fishell, E.; Jong, R.A.; Hislop, G.; Chiarelli, A.; Minkin, S. Mammographic density and the risk and detection of breast cancer. N. Engl. J. Med. 2007, 356, 227–236. [Google Scholar] [CrossRef] [PubMed]
Carney, P.A.; Miglioretti, D.L.; Yankaskas, B.C.; Kerlikowske, K.; Rosenberg, R.; Rutter, C.M.; Geller, B.M.; Abraham, L.A.; Taplin, S.H.; Dignan, M. Individual and combined effects of age, breast density, and hormone replacement therapy use on the accuracy of screening mammography. Ann. Intern. Med. 2003, 138, 168–175. [Google Scholar] [CrossRef] [PubMed]
Hoff, S.R.; Abrahamsen, A.-L.; Samset, J.H.; Vigeland, E.; Klepp, O.; Hofvind, S. Breast cancer: Missed interval and screening-detected cancer at full-field digital mammography and screen-film mammography—Results from a retrospective review. Radiology 2012, 264, 378–386. [Google Scholar] [CrossRef]
Kerlikowske, K.; Carney, P.A.; Geller, B.; Mandelson, M.T.; Taplin, S.H.; Malvin, K.; Ernster, V.; Urban, N.; Cutter, G.; Rosenberg, R. Performance of screening mammography among women with and without a first-degree relative with breast cancer. Ann. Intern. Med. 2000, 133, 855–863. [Google Scholar] [CrossRef]
Berlin, L. Radiologic errors, past, present and future. Diagnosis 2014, 1, 79–84. [Google Scholar] [CrossRef]
Sickles, E.A. Periodic mammographic follow-up of probably benign lesions: Results in 3,184 consecutive cases. Radiology 1991, 179, 463–468. [Google Scholar] [CrossRef]
Doi, K. Computer-aided diagnosis in medical imaging: Historical review, current status and future potential. Comput. Med. Imaging Graph. 2007, 31, 198–211. [Google Scholar] [CrossRef]
Birdwell, R.L.; Ikeda, D.M.; O’Shaughnessy, K.F.; Sickles, E.A. Mammographic characteristics of 115 missed cancers later detected with screening mammography and the potential utility of computer-aided detection. Radiology 2001, 219, 192–202. [Google Scholar] [CrossRef] [PubMed]
Warren Burhenne, L.J.; Wood, S.A.; D’Orsi, C.J.; Feig, S.A.; Kopans, D.B.; O’Shaughnessy, K.F.; Sickles, E.A.; Tabar, L.; Vyborny, C.J.; Castellino, R.A. Potential contribution of computer-aided detection to the sensitivity of screening mammography. Radiology 2000, 215, 554–562. [Google Scholar] [CrossRef] [PubMed]
Morton, M.J.; Whaley, D.H.; Brandt, K.R.; Amrami, K.K. Screening mammograms: Interpretation with computer-aided detection—Prospective evaluation. Radiology 2006, 239, 375–383. [Google Scholar] [CrossRef] [PubMed]
Freer, T.W.; Ulissey, M.J. Screening mammography with computer-aided detection: Prospective study of 12,860 patients in a community breast center. Radiology 2001, 220, 781–786. [Google Scholar] [CrossRef] [PubMed]
Gilbert, F.J.; Astley, S.M.; Gillan, M.G.; Agbaje, O.F.; Wallis, M.G.; James, J.; Boggis, C.R.; Duffy, S.W. Single reading with computer-aided detection for screening mammography. N. Engl. J. Med. 2008, 359, 1675–1684. [Google Scholar] [CrossRef] [PubMed]
Gonzalo, R.B.; Corsetti, B.; Goicoechea-Telleria, I.; Husseis, A.; Liu-Jimenez, J.; Sanchez-Reillo, R.; Eglitis, T.; Ellavarason, E.; Guest, R.; Lunerti, C. Attacking a Smartphone Biometric Fingerprint System: A Novice’s Approach. In Proceedings of the 2018 International Carnahan Conference on Security Technology (ICCST), Montreal, QC, Canada, 22–25 October 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–5. [Google Scholar]
Zohrevandi, P.; Jaryani, F. Proposing an effective framework for hybrid clustering on heterogeneous data in distributed systems. Int. J. Adv. Comput. Sci. Inf. Technol. 2018, 7, 71–81. [Google Scholar]
Faraji, M.; Behnam, H.; Norizadeh Cherloo, M.; Shojaeifard, M. Novel approach for automatic mid-diastole frame detection in 2D echocardiography sequences for performing planimetry of the mitral valve orifice. IET Image Process. 2020, 14, 2890–2900. [Google Scholar] [CrossRef]
Noroozi, R.; Arif, Z.U.; Taghvaei, H.; Khalid, M.Y.; Sahbafar, H.; Hadi, A.; Sadeghianmaryan, A.; Chen, X. 3D and 4D Bioprinting Technologies: A Game Changer for the Biomedical Sector? Ann. Biomed. Eng. 2023, 51, 1683–1712. [Google Scholar] [CrossRef]
Eskandari, V.; Sahbafar, H.; Zeinalizad, L.; Hadi, A. A review of applications of surface-enhanced raman spectroscopy laser for detection of biomaterials and a quick glance into its advances for COVID-19 investigations. ISSS J. Micro Smart Syst. 2022, 11, 363–382. [Google Scholar] [CrossRef]
Ciompi, F.; de Hoop, B.; van Riel, S.J.; Chung, K.; Scholten, E.T.; Oudkerk, M.; de Jong, P.A.; Prokop, M.; van Ginneken, B. Automatic classification of pulmonary peri-fissural nodules in computed tomography using an ensemble of 2D views and a convolutional neural network out-of-the-box. Med. Image Anal. 2015, 26, 195–202. [Google Scholar] [CrossRef] [PubMed]
Chen, H.; Ni, D.; Qin, J.; Li, S.; Yang, X.; Wang, T.; Heng, P.A. Standard plane localization in fetal ultrasound via domain transferred deep neural networks. IEEE J. Biomed. Health Inform. 2015, 19, 1627–1636. [Google Scholar] [CrossRef] [PubMed]
Shin, H.-C.; Roth, H.R.; Gao, M.; Lu, L.; Xu, Z.; Nogues, I.; Yao, J.; Mollura, D.; Summers, R.M. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imaging 2016, 35, 1285–1298. [Google Scholar] [CrossRef] [PubMed]
Mehraeen, E.; Dashti, M.; Ghasemzadeh, A.; Afsahi, A.M.; Shahidi, R.; Mirzapour, P.; Karimi, K.; Rouzi, M.D.; Bagheri, A.; Mohammadi, S. Virtual Reality in Medical Education during the COVID-19 Pandemic; A Systematic Review. JMIR Serious Games 2023, 10, e35000. [Google Scholar]
Bagheri, A.B.; Rouzi, M.D.; Koohbanani, N.A.; Mahoor, M.H.; Finco, M.; Lee, M.; Najafi, B.; Chung, J. Potential Applications of Artificial Intelligence (AI) and Machine Learning (ML) on Diagnosis, Treatment, Outcome Prediction to Address Health Care Disparities of Chronic Limb-Threatening Ischemia (CLTI); Seminars in Vascular Surgery; Elsevier: Amsterdam, The Netherlands, 2023. [Google Scholar]
Park, C.; Rouzi, M.D.; Atique, M.M.U.; Finco, M.; Mishra, R.K.; Barba-Villalobos, G.; Crossman, E.; Amushie, C.; Nguyen, J.; Calarge, C. Machine Learning-Based Aggression Detection in Children with ADHD Using Sensor-Based Physical Activity Monitoring. Sensors 2023, 23, 4949. [Google Scholar] [CrossRef] [PubMed]
Jiao, Z.; Gao, X.; Wang, Y.; Li, J. A deep feature based framework for breast masses classification. Neurocomputing 2016, 197, 221–231. [Google Scholar] [CrossRef]
Becker, A.S.; Marcon, M.; Ghafoor, S.; Wurnig, M.C.; Frauenfelder, T.; Boss, A. Deep learning in mammography: Diagnostic accuracy of a multipurpose image analysis software in the detection of breast cancer. Investig. Radiol. 2017, 52, 434–440. [Google Scholar] [CrossRef] [PubMed]
Ribli, D.; Horváth, A.; Unger, Z.; Pollner, P.; Csabai, I. Detecting and classifying lesions in mammograms with deep learning. Sci. Rep. 2018, 8, 4165. [Google Scholar] [CrossRef]
Ardila, D.; Kiraly, A.P.; Bharadwaj, S.; Choi, B.; Reicher, J.J.; Peng, L.; Tse, D.; Etemadi, M.; Ye, W.; Corrado, G. End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat. Med. 2019, 25, 954–961. [Google Scholar] [CrossRef]
Coudray, N.; Ocampo, P.S.; Sakellaropoulos, T.; Narula, N.; Snuderl, M.; Fenyö, D.; Moreira, A.L.; Razavian, N.; Tsirigos, A. Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nat. Med. 2018, 24, 1559–1567. [Google Scholar] [CrossRef]
Liu, S.; Zheng, H.; Feng, Y.; Li, W. Prostate Cancer Diagnosis Using Deep Learning with 3D Multiparametric MRI; Medical Imaging 2017: Computer-Aided Diagnosis; SPIE: Bellingham, WA, USA, 2017; pp. 581–584. [Google Scholar]
Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 91–99. [Google Scholar] [CrossRef] [PubMed]
Liggins II, M.; Hall, D.; Llinas, J. Handbook of Multisensor Data Fusion: Theory and Practice; CRC Press: Boca Raton, FL, USA, 2017. [Google Scholar]
Tan, M.; Le, Q. Efficientnet: Rethinking Model Scaling for Convolutional Neural Networks; International Conference on Machine Learning; PMLR: Long beach, CA, USA, 2019; pp. 6105–6114. [Google Scholar]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Identity mappings in deep residual networks. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part IV 14. Springer: Berlin/Heidelberg, Germany; pp. 630–645. [Google Scholar]
Marques, G.; Agarwal, D.; De la Torre Díez, I. Automated medical diagnosis of COVID-19 through EfficientNet convolutional neural network. Appl. Soft Comput. 2020, 96, 106691. [Google Scholar] [CrossRef] [PubMed]
Panthakkan, A.; Anzar, S.; Jamal, S.; Mansoor, W. Concatenated Xception-ResNet50—A novel hybrid approach for accurate skin cancer prediction. Comput. Biol. Med. 2022, 150, 106170. [Google Scholar] [CrossRef] [PubMed]
Srinivasu, P.N.; SivaSai, J.G.; Ijaz, M.F.; Bhoi, A.K.; Kim, W.; Kang, J.J. Classification of skin disease using deep learning neural networks with MobileNet V2 and LSTM. Sensors 2021, 21, 2852. [Google Scholar] [CrossRef] [PubMed]
Wang, C.; Chen, D.; Hao, L.; Liu, X.; Zeng, Y.; Chen, J.; Zhang, G. Pulmonary image classification based on inception-v3 transfer learning model. IEEE Access 2019, 7, 146533–146541. [Google Scholar] [CrossRef]
Powers, D.M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv 2020, arXiv:2010.16061. [Google Scholar]
DDSM Mammography. Available online: https://www.kaggle.com/datasets/skooch/ddsm-mammography (accessed on 5 February 2021).
The Complete Mini-DDSM. Available online: https://www.kaggle.com/cheddad/miniddsm2 (accessed on 18 March 2021).
Moreira, I.C.; Amaral, I.; Domingues, I.; Cardoso, A.; Cardoso, M.J.; Cardoso, J.S. Inbreast: Toward a full-field digital mammographic database. Acad. Radiol. 2012, 19, 236–248. [Google Scholar] [CrossRef] [PubMed]
Japkowicz, N.; Stephen, S. The class imbalance problem: A systematic study. Intell. Data Anal. 2002, 6, 429–449. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; IEEE: Piscataway, NJ, USA, 2019; pp. 248–255. [Google Scholar]
Chen, X.; Wang, X.; Zhang, K.; Fung, K.-M.; Thai, T.C.; Moore, K.; Mannel, R.S.; Liu, H.; Zheng, B.; Qiu, Y. Recent advances and clinical applications of deep learning in medical image analysis. Med. Image Anal. 2022, 79, 102444. [Google Scholar] [CrossRef]
Xu, Y.; Mo, T.; Feng, Q.; Zhong, P.; Lai, M.; Eric, I.; Chang, C. Deep learning of feature representation with multiple instance learning for medical image analysis. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014; IEEE: Piscataway, NJ, USA, 2014; pp. 1626–1630. [Google Scholar]
Shen, L.; Margolies, L.R.; Rothstein, J.H.; Fluder, E.; McBride, R.; Sieh, W. Deep learning to improve breast cancer detection on screening mammography. Sci. Rep. 2019, 9, 12495. [Google Scholar] [CrossRef]
Rampun, A.; Zheng, L.; Malcolm, P.; Tiddeman, B.; Zwiggelaar, R. Computer-aided detection of prostate cancer in T2-weighted MRI within the peripheral zone. Phys. Med. Biol. 2016, 61, 4796. [Google Scholar] [CrossRef]
Hamidinekoo, A.; Suhail, Z.; Qaiser, T.; Zwiggelaar, R. Investigating the effect of various augmentations on the input data fed to a convolutional neural network for the task of mammographic mass classification. In Proceedings of the Medical Image Understanding and Analysis: 21st Annual Conference, MIUA 2017, Edinburgh, UK, 11–13 July 2017; Proceedings 21. Springer: Berlin/Heidelberg, Germany, 2017; pp. 398–409. [Google Scholar]
Abunasser, B.S.; AL-Hiealy, M.R.J.; Zaqout, I.S.; Abu-Naser, S.S. Breast cancer detection and classification using deep learning Xception algorithm. Int. J. Adv. Comput. Sci. Appl. 2022, 13, 223–228. [Google Scholar] [CrossRef]
Ansar, W.; Shahid, A.R.; Raza, B.; Dar, A.H. Breast cancer detection and localization using mobilenet based transfer learning for mammograms. In Proceedings of the Intelligent Computing Systems: Third International Symposium, ISICS 2020, Sharjah, United Arab Emirates, 18–19 March 2020; Proceedings 3. Springer: Berlin/Heidelberg, Germany, 2020; pp. 11–21. [Google Scholar]

Figure 1. Flow chart of the CAD system. The grey area represents the proposed CAW model.

Figure 2. A sample mammogram image from the DDSM dataset is on the left, with a lesion in the middle, and on the right, there is a cropped image of the same lesion with a size of 299 × 299 that is part of the cropped DDSM dataset.

Figure 3. Sample augmented images from the INbreast dataset: (A) displays a random horizontal shift and a random vertical flip. (B) shows a random rotation, while (C) shows both a random rotation and a random vertical shift. Lastly, (D) displays the effect of shearing.

Figure 4. Performance comparison of various DL models on breast cancer classification: evaluating F2 scores (%).

Table 1. Results of the networks and proposed weighting method for ensemble classifier.

Model	F2-Score (%)
Model	Cropped DDSM	DDSM	INbreast
EfficientNet	93.89 ± 0.09	81.03 ± 0.13	63.76 ± 0.11
Xception	92.03 ± 0.05	68.06 ± 0.12	66.75 ± 0.12
MobileNetV2	92.43 ± 0.16	69.56 ± 0.28	60.58 ± 0.29
InceptionV3	91.01 ± 0.17	76.74 ± 0.35	71.20 ± 0.15
ResNet50	88.98 ± 0.16	67.93 ± 0.24	67.42 ± 0.38
Majority Vote	94.12 ± 0.12	81.32 ± 0.17	71.68 ± 0.16
Initial proposed method: CAW V1	94.55 ± 0.10	81.66 ± 0.18	71.98 ± 0.15
Final proposed method: CAW V2	95.48 ± 0.08	82.35 ± 0.17	72.31 ± 0.16

Mean ± std. CAW: consensus-adaptive weighting method. DDSM: Digital Database for Screening Mammography dataset. V1: Version 1. V2: Version 2.

Table 2. Final ‘C’ values of the proposed CAW V2 method according to the type of dataset.

C (CAW V2 Formula)
Cropped DDSM	DDSM	Inbreast
2.6	3.4	3.1

CAW: Consensus-adaptive weighting method. DDSM: Digital Database for Screening Mammography dataset. V2: Version 2.

Table 3. Differences between the best and the worst DL networks’ performances in different datasets.

	Dataset
	Cropped DDSM	DDSM	INbreast
Best and worst performance differences (%)	4.906	13.105	10.62

DDSM: Digital Database for Screening Mammography dataset.

Table 4. Comparison of CAW V1 and V2.

Model	Comparison of CAW V1 and V2
	Cropped DDSM		DDSM		INbreast
	V1 Weights	V2 Weights	V1 Weights	V2 Weights	V1 Weights	V2 Weights
EfficientNet B3	0.205	0.213	0.223	0.283	0.193	0.178
Xception	0.201	0.202	0.187	0.156	0.202	0.191
MobileNetV2	0.202	0.204	0.191	0.148	0.184	0.135
InceptionV3	0.199	0.196	0.211	0.186	0.216	0.202
ResNet50	0.194	0.185	0.187	0.151	0.204	0.214
CAW model (F2 score % Improvement)	0.93 ± 0.18		0.69 ± 0.35		0.33 ± 0.31

V1: Version 1, V2: Version 2, V3: Version 3, CAW: consensus-adaptive weighting method.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dehghan Rouzi, M.; Moshiri, B.; Khoshnevisan, M.; Akhaee, M.A.; Jaryani, F.; Salehi Nasab, S.; Lee, M. Breast Cancer Detection with an Ensemble of Deep Learning Networks Using a Consensus-Adaptive Weighting Method. J. Imaging 2023, 9, 247. https://0-doi-org.brum.beds.ac.uk/10.3390/jimaging9110247

AMA Style

Dehghan Rouzi M, Moshiri B, Khoshnevisan M, Akhaee MA, Jaryani F, Salehi Nasab S, Lee M. Breast Cancer Detection with an Ensemble of Deep Learning Networks Using a Consensus-Adaptive Weighting Method. Journal of Imaging. 2023; 9(11):247. https://0-doi-org.brum.beds.ac.uk/10.3390/jimaging9110247

Chicago/Turabian Style

Dehghan Rouzi, Mohammad, Behzad Moshiri, Mohammad Khoshnevisan, Mohammad Ali Akhaee, Farhang Jaryani, Samaneh Salehi Nasab, and Myeounggon Lee. 2023. "Breast Cancer Detection with an Ensemble of Deep Learning Networks Using a Consensus-Adaptive Weighting Method" Journal of Imaging 9, no. 11: 247. https://0-doi-org.brum.beds.ac.uk/10.3390/jimaging9110247

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Breast Cancer Detection with an Ensemble of Deep Learning Networks Using a Consensus-Adaptive Weighting Method

Abstract

1. Introduction

2. Materials and Methods

2.1. The Study Design

2.2. The First Proposed CAW System

2.3. The Second Proposed CAW System

2.4. Dataset and Experiment Setup

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI