Next Article in Journal
Application of Convolutional Neural Network for Fault Diagnosis of Bearing Scratch of an Induction Motor
Previous Article in Journal
Mine-Microseismic-Signal Recognition Based on LMD–PNN Method
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Robustness Analysis of DCE-MRI-Derived Radiomic Features in Breast Masses: Assessing Quantization Levels and Segmentation Agreement

by
Carmelo Militello
1,*,
Leonardo Rundo
2,
Mariangela Dimarco
3,4,
Alessia Orlando
3,
Ildebrando D’Angelo
4,5,
Vincenzo Conti
6 and
Tommaso Vincenzo Bartolotta
3,5
1
Institute of Molecular Bioimaging and Physiology, Italian National Research Council (IBFM-CNR), 90015 Cefalù, PA, Italy
2
Department of Information and Electrical Engineering and Applied Mathematics (DIEM), University of Salerno, 84084 Fisciano, SA, Italy
3
Section of Radiology—Department of Biomedicine, Neuroscience and Advanced Diagnostics (BiND), University Hospital “Paolo Giaccone”, 90127 Palermo, PA, Italy
4
Breast Unit, Fondazione Istituto “G. Giglio”, 90015 Cefalù, PA, Italy
5
Department of Radiology, Fondazione Istituto “G. Giglio”, 90015 Cefalù, PA, Italy
6
Faculty of Engineering and Architecture, University of Enna KORE, 94100 Enna, EN, Italy
*
Author to whom correspondence should be addressed.
Submission received: 26 April 2022 / Revised: 25 May 2022 / Accepted: 27 May 2022 / Published: 29 May 2022

Abstract

:

Featured Application

The use of highly robust radiomic features is fundamental to reduce intrinsic dependencies and to provide reliable predictive models. This work presents a study on breast tumor DCE-MRI considering the radiomic feature robustness against the quantization settings and segmentation methods.

Abstract

Machine learning models based on radiomic features allow us to obtain biomarkers that are capable of modeling the disease and that are able to support the clinical routine. Recent studies have shown that it is fundamental that the computed features are robust and reproducible. Although several initiatives to standardize the definition and extraction process of biomarkers are ongoing, there is a lack of comprehensive guidelines. Therefore, no standardized procedures are available for ROI selection, feature extraction, and processing, with the risk of undermining the effective use of radiomic models in clinical routine. In this study, we aim to assess the impact that the different segmentation methods and the quantization level (defined by means of the number of bins used in the feature-extraction phase) may have on the robustness of the radiomic features. In particular, the robustness of texture features extracted by PyRadiomics, and belonging to five categories—GLCM, GLRLM, GLSZM, GLDM, and NGTDM—was evaluated using the intra-class correlation coefficient (ICC) and mean differences between segmentation raters. In addition to the robustness of each single feature, an overall index for each feature category was quantified. The analysis showed that the level of quantization (i.e., the ‘bincount’ parameter) plays a key role in defining robust features: in fact, in our study focused on a dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) dataset of 111 breast masses, sets with cardinality varying between 34 and 43 robust features were obtained with ‘binCount’ values equal to 256 and 32, respectively. Moreover, both manual segmentation methods demonstrated good reliability and agreement, while automated segmentation achieved lower ICC values. Considering the dependence on the quantization level, taking into account only the intersection subset among all the values of ‘binCount’ could be the best selection strategy. Among radiomic feature categories, GLCM, GLRLM, and GLDM showed the best overall robustness with varying segmentation methods.

1. Introduction

Radiomics techniques, aimed at analyzing a large amount of minable features extracted from medical images, have shown great potential in different clinical areas [1]. Such a large amount of quantitative radiomic features conveys more information than the qualitative patterns observed by the radiologists’ naked eye on the images. In particular, radiomic image-based signatures can be associated with clinical outcomes (i.e., biomarkers) [2,3], thereby improving clinical decision-making tasks [4]. Therefore, the development of robust biomarkers is an essential requirement which could accelerate their usefulness and, consequently, their incorporation into clinical practice [5], providing reliable diagnostic and prognostic biomarkers for precision medicine [6].
It is well known that radiomic features might be affected by several mathematical definitions, and the proliferation of toolboxes does not help this aspect. Thus, standardization initiatives have been carried out by the scientific community in order to deal with the lack of reproducibility and validation of radiomics studies. In particular, the Image Biomarker Standardization Initiative (IBSI) [7] attempts to provide guidelines about the biomarkers’ definition. Unfortunately, it is a very complicated pipeline, where every single step (e.g., image acquisition, reconstruction, segmentation, features extraction) must be tackled with caution to ensure reproducibility.
Nevertheless, there are still no comprehensive and clear guidelines to obtain radiomic features that are not only reproducible but also robust. For that reason, an accurate and careful analysis of the robustness of the radiomic features is mandatory to define robust and clinically relevant biomarkers. Due to the wider diffusion and use of radiomics, in recent years, the study of methodologies aimed at improving the reproducibility and robustness of these tools has become a research topic faced, by the scientific community, from different points of view and in several clinical application domains.
There are many ‘sources of variability’ that should be considered: many literature works have analyzed and assessed the impact on the robustness of radiomic features related to intrinsic factors, such as (i) imaging protocol [8,9] and (ii) magnetic field strength [10], or due to extrinsic factors, such as (iii) pre-processing and enhancement techniques [11,12], (iv) perturbation on the region of interest (ROI) due to the inter-operator dependence [13,14], and segmentation methods [15,16], but also to (v) post-processing techniques, which are certainly modified in original data (e.g., harmonization procedures) [17].
The main goal of this work is to analyze the dependence of robustness on the quantization level used during feature extraction and the reliability of the segmentation of breast masses on dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) by relying upon two segmentation methods (both automated and manual) obtained by two raters. The proposed analysis exploits hand-crafted features and traditional machine learning approaches, which are still popular in radiomics studies today, aimed at establishing predictive models that are reliable for clinical applications.
The main contributions of this study are:
  • an evaluation of the robustness of radiomic features, in terms of ICC, as a function of quantization level in PyRadiomics;
  • an assessment of segmentation by comparing three different raters: a automated approach and the manual segmentation of two human operators, to evaluate the features reproducibility;
  • a definition of a robustness scale for individual features as a function of ICC and mean standard deviation of the feature differences;
  • a definition and quantification of an overall robustness metric for each radiomic category (i.e., GLCM, GLRLM, GLSZM, GLDM, and NGTDM).
The remainder of this work is organized as follows. Section 2 describes the conducted study aimed at dealing with the robustness evaluation. Section 3 illustrates the experimental results in terms of dependence on the quantization level, as well as the segmentation assessment. Finally, the discussion and conclusions are provided in Section 4.

2. Materials and Methods

This section describes the characteristics of the DCE-MRI dataset used and the analysis carried out to assess the robustness of the features, relative to their dependence on the quantization level, as well as on their reproducibility concerning the segmentation method.
The radiomic feature robustness analysis and quantification method were implemented entirely in the MATLAB R2019b (64-bit version) environment MathWorks, Natick, MA, USA). PyRadiomics was used for the extraction of radiomic features, an open-source Python package developed for the standardization of radiomic feature extraction [18]. In particular, we used PyRadiomics version 2.0 and Python 3.7.5.

2.1. Dataset Description

A total of 111 breast masses from DCE-MRI exams were considered in this study, for a total of 1231 MR slices. Table 1 provides some relevant characteristics concerning the MR imaging protocol.

2.2. Breast Tumor Segmentation

ROIs containing breast masses were segmented using two different segmentation methods:
  • Automated delineation. This is a computer-assisted method based on the spatial fuzzy c-means (sFCM) clustering algorithm [19,20]. The sFCM algorithm, compared to the traditional FCM, takes into account the spatial relationship among neighboring pixels, making it less sensitive to noise and other imaging artifacts. This approach has been previously implemented and validated in [21,22];
  • Manual free-hand delineation. In order to quantify the inter-operator dependence of the robustness of the features, two delineations were performed by two radiologists with more than 5 years of experience with breast MRI, in consensus with a consultant breast radiologist (with more than 30 years of experience with breast imaging).
Segmentations—in terms of original MR slices containing the breast mass and the corresponding masks—were converted from DICOM to the Neuroimaging Informatics Technology Initiative (NIfTI) format [7], to be used successively as input to PyRadiomics [18] for feature extraction. Both automated and manual free-hand delineations were performed by using MATLAB-coded custom segmentation tools. The size of the tumor is included among the exclusion criteria [21]: the masses with maximum diameters lower than 5 mm were excluded. Considering that the images have a pixel spacing varying between 0.6875 and 0.7422 mm, this means that in the limiting case, the ROI has at least 7–8 voxels (along one dimension), which is sufficient to calculate the features. Figure 1 shows two segmentation results obtained through two different segmentation methods.

2.3. Radiomic Feature Extraction

The DCE-MRI images analyzed represent a homogeneous dataset in terms of spatial resolution along the ( x , y ) plane and slice thickness along the z axis. For this reason, the extraction of the features was performed without any resampling to avoid interpolation artifacts. Radiomic features were extracted from the 3D ROIs delineated in the previous step. Five texture-feature categories were extracted and considered in this study:
  • Gray-level co-occurrence matrix (GLCM) [23,24]: spatial relationship between pixels in a specific direction, highlighting the properties of uniformity, homogeneity, randomness, and linear dependencies;
  • Gray-level run length matrix (GLRLM) [25]: texture in specific direction, where fine texture has shorter runs while coarse texture presents more long runs with different intensity values;
  • Gray-level size zone matrix (GLSZM) [26]: regional intensity variations or the distribution of homogeneity regions;
  • Gray-level dependence matrix (GLDM) [27]: quantifies gray-level dependencies;
  • Neighboring gray tone difference matrix (NGTDM) [28]: spatial relationship among three or more pixels, closely approaching the human perception of the image.
Conversely, shape-based and first-order features were not considered in the study because they are independent of the quantization level.
It is worth noting that texture features, such as the GLCM features (also known as Haralick’s features [23,24]), are calculated from the co-occurrence matrix that is based only on the gray-level values. Therefore, the ROI shape does not play any role, because co-occurrence matrices consider only the pairs of gray-level values composed of two adjacent or neighboring voxels [29].

2.4. Statistical Analysis

Radiomic features were extracted considering different quantization levels (i.e., number of bins equal to 8 , 16 , 32 , 64 , 128 , 256 ). The ICC analysis takes into account the extracted features and allows us to determine which are more robust as the number of bins varies. ICC analysis was applied to establish the minimum number of ‘bins’ that maximizes the set of robust features. In this study, we considered the two-way random-effects model (or mixed-effects), consistency, and the single rater/measurement ICC, named ICC(3,1) [30]. Let k be the number of raters/measurements; the two-way random-effects model (or mixed-effects), consistency, and the single rater/measurement, ICC(3,1)—defined in Equation (1)—was used.
I C C ( 3 , 1 ) = M S R M S E M S R + ( k 1 ) M S E ,
where M S R and M S E are the mean squares for rows and for errors, respectively.

2.4.1. Quantization-Level Analysis

The quantization of images, in terms of the rebinning of the gray levels prior to feature computation, has a two-fold goal: (i) noise reduction and (ii) avoidance of sparse matrices (possibly resulting in unsuitable and poorly robust features for predictive modeling). The Image Biomarker Standardization Initiative (IBSI) [7] explicitly suggests quantization to optimize and improve the development of radiomics models.
The ‘binCount’ parameter is used by PyRadiomics to determine the image quantization settings (i.e., the number of bins) in the radiomic feature-extraction phase. The optimal value for ’binCount’ was chosen so as to maximize the number of robust features (in terms of ICC). As a matter of fact, this choice allowed us to carefully assess the quantization settings, thereby avoiding an arbitrary selection of the number of bins. With more details, to evaluate the robustness as the quantization level changes, the I C C q u a n t L e v e l was calculated for each feature, considering the set composed of all the segmentations (i.e., S e g m a u t o m a t e d , S e g m m a n 1 , S e g m m a n 2 ) and a specific number of bins, as indicated in Equation (2).
I C C q u a n t L e v e l = I C C ( S e g m a u t o m a t e d , S e g m m a n 1 , S e g m m a n 2 ) ,

2.4.2. Agreement of Segmentation Methods

Each breast mass was segmented via two different delineation approaches: (i) manual segmentation, performed by two distinct radiologists with 5 years of experience; (ii) an automated method based on spatial FCM clustering that was already tested and validated in [21]. The inter-observer agreement of the two segmentation methods (automated and manual) was quantified by calculating the ICC using a two-way random-effects model and the mean differences of the two observers. Regarding the manual segmentation, two distinct series of delineations were analyzed and independently performed by two different raters [31]. For this reason, three ICC series were computed, according to Equations (3)–(5):
I C C a u t o m a t e d m a n u a l 1 = I C C ( S e g m a u t o m a t e d , S e g m m a n 1 ) ,
I C C a u t o m a t e d m a n u a l 2 = I C C ( S e g m a u t o m a t e d , S e g m m a n 1 ) ,
I C C m a n u a l 1 m a n u a l 2 = I C C ( S e g m m a n 1 , S e g m m a n 2 ) ,
After determining the features showing excellent robustness, we aimed to identify the most relevant features for the analysis at hand by evaluating the dependence on the segmentation method. To this aim, we considered b i n C o u n t = 32 , which guarantees the greatest number of features (43) with excellent robustness.
The Shapiro–Wilk test [32] was used to assess the normality of the 129 distributions of the differences (obtained considering 43 features and three segmentations) among the segmentation raters, considering a 0.05 confidence interval. All distributions passed the normality test, obtaining p-values 0.0001 .
To evaluate the robustness of the MRI radiomic features, a six-level scale (ranging from 0 to 5) was defined, based on a combination of the ICCs coefficient and the standard deviation (SD) of the mean percentage differences between the three raters (i.e., auto, man1, man2), according the conditions in Table 2. Percentage differences were evaluated according to Equation (6).
PercentageDifference ( S e g m i , S e g m j ) = a b s ( S e g m i S e g m j ) m e a n ( S e g m i , S e g m j ) × 100
i , j { a u t o , m a n 1 , m a n 2 } .

3. Results

3.1. Quantization Setting Dependence Results

Quantization setting dependence was evaluated by means of ICC, as indicated in Section 2.4.1. In order to consider only radiomic features with excellent robustness, the cutoff value was set to 0.9 . Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6 show all the features—divided by category (i.e., GLCM, GLRLM, GLSZM, GLDM, NGTDM)—that obtained I C C 0.9 in at least one quantization level.
Table 3 summarizes the total of the robust features obtained for each quantization level.

3.2. Segmentation Method Dependence Results

The robustness of the MRI radiomic features was evaluated using a six-level scale (going from 0 to 5) and based on a combination of ICCs and the SD of the mean percentage differences between the three raters (i.e., auto, man1, man2). Percentage differences were evaluated according to Equation (6). Each feature was evaluated considering the ICC and SD. The score—from 0 (very high robustness) to 5 (very low robustness)—was assigned according to the conditions reported in Table 2.
The values obtained are illustrated, for each of the five feature categories, in the following figures (Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11).
Starting from the previously defined scale (see Table 2), an overall robustness value—defined according to Equation (7)—was computed for each radiomic category Φ to summarize the robustness by feature category across the segmentation methods. This formula gives values between 0 (lower robustness) and 1 (higher robustness). Results are shown in Table 4. Instead, for each category of features, Table 5 displays the percentage of robust features (which have obtained a roughness index 3 ) as the segmentation methodology varies.
R o b u s t n e s s ( Φ ) = i Φ S ϕ i m a x V a l u e × | Φ |
where:
  • Φ is the radiomic category ( Φ { G L C M , G L R L M , G L S Z M , G L D M , N G T D M } );
  • S ϕ i represents the score on the scale described above (see Table 2) for the robustness of each feature i;
  • maxValue represents the maximum possible value (i.e., 5);
  • | · | represents the cardinality of a given radiomic category Φ .

4. Discussion and Conclusions

The aim of this work was to acquire further insights into the dependence of radiomic features’ robustness on the quantization level used during feature extraction, and the reproducibility of features in relation to segmented ROIs of breast masses. In particular, first of all, in order to quantify the robustness and to provide a score—for each radiomic feature and for each category (i.e., GLCM, GLRLM, GLSZM, GLDM, NGTDM)—a correlation analysis based on the ICC was carried out to identify the features that are the least dependent on the level of quantization (i.e., the ‘binCount’ parameter). After determining the best value for ‘binCount’ and considering the subset containing only the robust features, an additional analysis was carried out in order to assess the reproducibility by comparing, relying upon the ICC, three different segmentations: two that were independently obtained through manual delineation by two radiologists, and one that was obtained using a validated automated segmentation approach [21,33] based on spatial FCM clustering. This second analysis led to the quantification of not only a robustness index for each feature, but also to the definition and quantification of an overall robustness index for each category of features.
Among the GLCM features, Contrast, DifferenceAverage, DifferenceVariance, lmc1, lmc2, and MaximumProbability seem to have a poor robustness, related to the segmentation method. Even if, in the comparison of manual1 vs. manual2, their robustness index is good (surely due to a higher accordance between the two manual segmentations), in the comparison with the automated method, the lmc1 is lower. This denotes a high dependence on the ROI. The category has a medium-to-high overall robustness index (0.48–0.8). Among the GLRLM features, LongRunHighGrayLevelEmphasis obtains the worst score in the comparison between automated and manual segmentations, followed by GrayLevelNonUniformity, RunLengthNonUniformity, and RunVariance. On the other hand, the other features present a very good score in all comparisons, and the overall category score is high (0.6–0.8). Nearly all GLSZM features have a medium/low index that brings the category a very low overall index, varying in the range 0.36–0.53, depending on the segmentation. GLDM almost all have a medium other score, except for LargeDependenceHighGrayLevelEmphasis and SmallDependenceHighGrayLevelEmphasis, showing an overall score in the range 0.42–0.67. Finally, all features belonging to the NGTDM category show very low robustness indexes and, consequently, the category obtains an overall score in the range 0.16–0.4.
Considering the dependence on the quantization level, taking into account only the intersection subset among all the values of ‘binCount’ could be the best selection strategy. Among radiomic categories, GLCM, GLRLM, and GLDM showed the best overall robustness against variations due to the segmentation method.
A fair comparison against analogous literature works is not possible, because each one does not necessarily refer to the same disease and, consequently, does not use the same data. As a matter of fact, rather than comparing results, we believe it is more appropriate to summarize the results obtained (see Table 6), in order to provide an overview of the literature and the parameters for analyzing the robustness of the characteristics.
The main focus of our work was on the development of a reliable system in terms of robust radiomic features according to the quantization settings and segmentation approaches. The adopted methods are not computationally expensive since traditional statistical analyses (mostly based on the ICC) are applied after the radiomic feature extraction. This assessment allowed us to propose a lightweight system based on classic machine learning techniques. Moreover, the type of analysis performed disregards variations in the data that could affect the stability of the system. In fact, the analysis performed on the features is computed from the segmented ROI and by quantifying the correlation (via ICC) as the quantization level and segmentation approach change.
Presently, clinical decision support systems (CDSSs) are becoming increasingly prevalent in clinical routines, and are able to assist the work of clinicians [34,35]. In the near future, radiomics will certainly represent a tool that clinicians will rely on. An analysis and quantification approach, such as the one proposed here, inserted upstream in the analysis and modeling pipeline will enable even more reliable radiomics tools. Our study can represent an improvement over a traditional radiomic analysis pipeline that might be affected by parameter choices. In fact, the used techniques, inserted upstream of the modeling pipeline, allows for the definition of predictive models based only on features (or feature categories) that are robust against levels of quantization and segmentation methods. Certainly, considering that clinical scenarios are among the most critical, any tool that takes advantage of ICT must be carefully validated before being integrated into clinical practice.
Furthermore, as future developments, it would be very interesting to validate the insights gained from this study on a more extensive dataset in order to assess the repeatability of the analysis made and its ability to generalize. A pre-analysis and feature calibration phase—such as the one proposed in this study—is absolutely essential to have an initial set of non-redundant and robust features. In fact, every radiomic study consists of several phases [3]: reducing the uncertainty on the input features allows to improve the repeatability and robustness of the study.

Author Contributions

Conceptualization, C.M. and L.R.; methodology, C.M. and L.R.; software, C.M.; validation, C.M. and L.R.; formal analysis, C.M. and L.R.; investigation, C.M.; resources, M.D., A.O. and I.D.; data curation, M.D., A.O. and I.D.; writing—original draft preparation, C.M. and L.R.; writing—review and editing, M.D., A.O., V.C. and T.V.B.; visualization, C.M. and L.R.; supervision, V.C. and T.V.B.; project administration, T.V.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethics Committee of “Azienda Ospedaliera Universitaria Policlinico P.Giaccone” of Palermo, Italy (protocol code n.1/2020-15/01/2020).

Informed Consent Statement

Retrospective data collection was approved by the Ethics Committee. The requirement for evidence of informed consent was waived because of the retrospective nature of our study.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CKContralateral normal kidney
CTComputed tomography
CTAComputed tomography angiography
DCE-MRIDynamic contrast-enhanced magnetic resonance imaging
EVKEnhancement variance kinetics
FCMFuzzy c-means
FOFirst-order
GLCMGray-level co-occurrence matrix
GLDMGray-level dependence matrix
GLRLMGray-level run length matrix
GLSZMGray-level size zone matrix
HNSCCHead and neck squamous cell carcinoma
IBSIImage Biomarker Standardization Initiative
ICCIntra-class correlation coefficient
IHIntensity histogram
IVHIntensity–volume histogram
KCAKinetic curve assessment
MRMagnetic resonance
NGTDMNeighboring gray tone difference matrix
NIfTINeuroimaging Informatics Technology Initiative
NSCLCNon-small cell lung cancer
RCCRenal cell carcinoma
ROIRegion of interest
sFCMSpatial fuzzy c-means
T1wT1 weighed
T2wT2 weighed

References

  1. Gillies, R.; Kinahan, P.; Hricak, H. Radiomics: Images Are More than Pictures, They Are Data. Radiology 2016, 278, 563–577. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Rundo, L.; Militello, C.; Vitabile, S.; Russo, G.; Sala, E.; Gilardi, M.C. A survey on nature-inspired medical image analysis: A step further in biomedical data integration. Fundam. Inform. 2020, 171, 345–365. [Google Scholar] [CrossRef]
  3. Militello, C.; Rundo, L.; Dimarco, M.; Orlando, A.; Woitek, R.; D’Angelo, I.; Russo, G.; Bartolotta, T.V. 3D DCE-MRI Radiomic Analysis for Malignant Lesion Prediction in Breast Cancer Patients. Acad. Radiol. 2022, 29, 830–840. [Google Scholar] [CrossRef] [PubMed]
  4. Rundo, L.; Pirrone, R.; Vitabile, S.; Sala, E.; Gambino, O. Recent advances of HCI in decision-making tasks for optimized clinical workflows and precision medicine. J. Biomed. Inform. 2020, 108, 103479. [Google Scholar] [CrossRef] [PubMed]
  5. Lambin, P.; Leijenaar, R.; Deist, T.M.; Peerlings, J.; de Jong, E.; van Timmeren, J.; Sanduleanu, S.; Larue, R.; Even, A.; Jochems, A.; et al. Radiomics: The bridge between medical imaging and personalized medicine. Nat. Rev. Clin. Oncol. 2017, 14, 749–762. [Google Scholar] [CrossRef] [PubMed]
  6. Fornacon-Wood, I.; Mistry, H.; Ackermann, C.J.; Blackhall, F.; McPartlin, A.; Faivre-Finn, C.; Price, G.J.; O’Connor, J. Reliability and prognostic value of radiomic features are highly dependent on choice of feature extraction platform. Eur. Radiol. 2020, 30, 6241–6250. [Google Scholar] [CrossRef] [PubMed]
  7. Zwanenburg, A.; Vallières, M.; Abdalah, M.A.; Aerts, H.J.W.L.; Andrearczyk, V.; Apte, A.; Ashrafinia, S.; Bakas, S.; Beukinga, R.J.; Boellaard, R.; et al. The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping. Radiology 2020, 295, 328–338. [Google Scholar] [CrossRef] [Green Version]
  8. Shafiq-Ul-Hassan, M.; Zhang, G.; Latifi, K.; Ullah, G.; Hunt, D.; Balagurunathan, Y.; Abdalah, M.; Schabath, M.; Goldgof, D.; Mackin, D.; et al. Intrinsic dependencies of CT radiomic features on voxel size and number of gray levels. Med. Phys. 2017, 44, 1050–1062. [Google Scholar] [CrossRef]
  9. Escudero Sanchez, L.; Rundo, L.; Gill, A.; Hoare, M.; Mendes Serrao, E.; Sala, E. Robustness of radiomic features in CT images with different slice thickness, comparing liver tumour and muscle. Sci. Rep. 2021, 11, 8262. [Google Scholar] [CrossRef]
  10. Whitney, H.; Drukker, K.; Edwards, A.; Papaioannou, J.; Medved, M.; Karczmar, G.; Giger, M. Robustness of radiomic features of benign breast lesions and hormone receptor positive/HER2-negative cancers across DCE-MR magnet strengths. Magn. Reson. Imaging 2020, 82, 111–121. [Google Scholar] [CrossRef]
  11. Scalco, E.; Belfatto, A.; Mastropietro, A.; Rancati, T.; Avuzzi, B.; Messina, A.; Valdagni, R.; Rizzo, G. T2w-MRI signal normalization affects radiomics features reproducibility. Med. Phys. 2020, 47, 1680–1691. [Google Scholar] [CrossRef] [PubMed]
  12. de Farias, E.C.; di Noia, C.; Han, C.; Sala, E.; Castelli, M.; Rundo, L. Impact of GAN-based lesion-focused medical image super-resolution on the robustness of radiomic features. Sci. Rep. 2021, 11, 21361. [Google Scholar] [CrossRef] [PubMed]
  13. Zwanenburg, A.; Leger, S.; Agolli, L.; Pilz, K.; Troost, E.; Richter, C.; Löck, S. Assessing robustness of radiomic features by image perturbation. Sci. Rep. 2019, 9, 614. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Mottola, M.; Ursprung, S.; Rundo, L.; Sanchez, L.; Klatte, T.; Mendichovszky, I.; Stewart, G.; Sala, E.; Bevilacqua, A. Reproducibility of CT-based radiomic features against image resampling and perturbations for tumour and healthy kidney in renal cancer patients. Sci. Rep. 2021, 11, 11542. [Google Scholar] [CrossRef]
  15. Tixier, F.; Um, H.; Young, R.; Veeraraghavan, H. Reliability of tumor segmentation in glioblastoma: Impact on the robustness of MRI-radiomic features. Med. Phys. 2019, 46, 3582–3591. [Google Scholar] [CrossRef]
  16. Granzier, R.; Verbakel, N.; Ibrahim, A.; van Timmeren, J.; van Nijnatten, T.; Leijenaar, R.; Lobbes, M.; Smidt, M.; Woodruff, H. MRI-based radiomics in breast cancer: Feature robustness with respect to inter-observer segmentation variability. Sci. Rep. 2020, 10, 14163. [Google Scholar] [CrossRef]
  17. Le, E.; Rundo, L.; Tarkin, J.M.; Evans, N.; Chowdhury, M.; Coughlin, P.; Pavey, H.; Wall, C.; Zaccagna, F.; Gallagher, F.; et al. Assessing robustness of carotid artery CT angiography radiomics in the identification of culprit lesions in cerebrovascular events. Sci. Rep. 2021, 11, 3499. [Google Scholar] [CrossRef]
  18. van Griethuysen, J.J.M.; Fedorov, A.; Parmar, C.; Hosny, A.; Aucoin, N.; Narayan, V.; Beets-Tan, R.G.H.; Fillion-Robin, J.C.; Pieper, S.; Aerts, H.J.W.L. Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Res. 2017, 77, e104–e107. [Google Scholar] [CrossRef] [Green Version]
  19. Chuang, K.S.; Tzeng, H.L.; Chen, S.; Wu, J.; Chen, T.J. Fuzzy c-means clustering with spatial information for image segmentation. Comput. Med. Imaging Graph. 2006, 30, 9–15. [Google Scholar] [CrossRef] [Green Version]
  20. Li, B.N.; Chui, C.K.; Chang, S.; Ong, S.H. Integrating spatial fuzzy clustering with level set methods for automated medical image segmentation. Comput. Biol. Med. 2011, 41, 1–10. [Google Scholar] [CrossRef]
  21. Militello, C.; Rundo, L.; Dimarco, M.; Orlando, A.; Conti, V.; Woitek, R.; D’Angelo, I.; Bartolotta, T.V.; Russo, G. Semi-automated and interactive segmentation of contrast-enhancing masses on breast DCE-MRI using spatial fuzzy clustering. Biomed. Signal Process. Control 2022, 71, 103113. [Google Scholar] [CrossRef]
  22. Militello, C.; Ranieri, A.; Rundo, L.; D’Angelo, I.; Marinozzi, F.; Bartolotta, T.V.; Bini, F.; Russo, G. On Unsupervised Methods for Medical Image Segmentation: Investigating Classic Approaches in Breast Cancer DCE-MRI. Appl. Sci. 2021, 12, 162. [Google Scholar] [CrossRef]
  23. Haralick, R.M.; Shanmugam, K.; Dinstein, I. Textural features for image classification. IEEE Trans. Syst. Man Cybern. 1973, SMC-3, 610–621. [Google Scholar] [CrossRef] [Green Version]
  24. Haralick, R.M. Statistical and structural approaches to texture. Proc. IEEE 1979, 67, 786–804. [Google Scholar] [CrossRef]
  25. Galloway, M.M. Texture analysis using gray level run lengths. Comput. Graph. Image Process. 1975, 4, 172–179. [Google Scholar] [CrossRef]
  26. Thibault, G.; Angulo, J.; Meyer, F. Advanced Statistical Matrices for Texture Characterization: Application to Cell Classification. IEEE Trans. Biomed. Eng. 2014, 61, 630–637. [Google Scholar] [CrossRef] [PubMed]
  27. Sun, C.; Wee, W.G. Neighboring gray level dependence matrix for texture classification. Comput. Vis. Graph. Image Process. 1983, 23, 341–352. [Google Scholar] [CrossRef]
  28. Amadasun, M.; King, R. Textural features corresponding to textural properties. IEEE Trans. Syst. Man Cybern. 1989, 19, 1264–1274. [Google Scholar] [CrossRef]
  29. Rundo, L.; Tangherloni, A.; Galimberti, S.; Cazzaniga, P.; Woitek, R.; Sala, E.; Nobile, M.S.; Mauri, G. HaraliCU: GPU-powered Haralick feature extraction on medical images exploiting the full dynamics of gray-scale levels. In Proceedings of the International Conference on Parallel Computing Technologies (PaCT), Almaty, Kazakhstan, 19–23 August 2019; Malyshkin, V., Ed.; Springer International Publishing: Cham, Switzerland, 2019. LNCS 11657. pp. 304–318. [Google Scholar]
  30. Shrout, P.E.; Fleiss, J.L. Intraclass correlations: Uses in assessing rater reliability. Psychol. Bull. 1979, 86, 420–428. [Google Scholar] [CrossRef]
  31. Koo, T.K.; Li, M.Y. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J. Chiropr. Med. 2016, 15, 155–163. [Google Scholar] [CrossRef] [Green Version]
  32. Shapiro, S.; Wilk, M. An Analysis of Variance Test for Normality (Complete Samples). Biometrika 1965, 52, 591–611. [Google Scholar] [CrossRef]
  33. Militello, C.; Vitabile, S.; Rundo, L.; Russo, G.; Midiri, M.; Gilardi, M.C. A fully automatic 2D segmentation method for uterine fibroid in MRgFUS treatment evaluation. Comput. Biol. Med. 2015, 62, 277–292. [Google Scholar] [CrossRef] [PubMed]
  34. Furqan Qadri, S.; Ai, D.; Hu, G.; Ahmad, M.; Huang, Y.; Wang, Y.; Yang, J. Automatic Deep Feature Learning via Patch-Based Deep Belief Network for Vertebrae Segmentation in CT Images. Appl. Sci. 2019, 9, 69. [Google Scholar] [CrossRef] [Green Version]
  35. Hirra, I.; Ahmad, M.; Hussain, A.; Ashraf, M.U.; Saeed, I.A.; Qadri, S.F.; Alghamdi, A.M.; Alfakeeh, A.S. Breast Cancer Classification From Histopathological Images Using Patch-Based Deep Learning Modeling. IEEE Access 2021, 9, 24273–24287. [Google Scholar] [CrossRef]
Figure 1. Examples related to a benign (on the top) and a malignant (on the bottom) breast mass. For each example, the DCE-MRI image area is shown without (on the left) and with (on the right) delineation contours. Segmentation results yielded by the automated delineation (red contour) and manual delineations—performed by the first (green contour) and the second (blue contour) radiologist—are compared. All figures are shown with a 2.5 × magnification factor.
Figure 1. Examples related to a benign (on the top) and a malignant (on the bottom) breast mass. For each example, the DCE-MRI image area is shown without (on the left) and with (on the right) delineation contours. Segmentation results yielded by the automated delineation (red contour) and manual delineations—performed by the first (green contour) and the second (blue contour) radiologist—are compared. All figures are shown with a 2.5 × magnification factor.
Applsci 12 05512 g001
Figure 2. Dependence of the quantization level of GLCM radiomic features, considering different values (i.e., 8, 16, 32, 64, 128, and 256) of the ‘bincount’ PyRadiomics parameter. The features ClusterProminence, ClusterShade, ClusterTendency, Correlation, JointAverage, and SumAverage were discarded, as the ICC did not overcome the cutoff for any ‘binCount’ setting.
Figure 2. Dependence of the quantization level of GLCM radiomic features, considering different values (i.e., 8, 16, 32, 64, 128, and 256) of the ‘bincount’ PyRadiomics parameter. The features ClusterProminence, ClusterShade, ClusterTendency, Correlation, JointAverage, and SumAverage were discarded, as the ICC did not overcome the cutoff for any ‘binCount’ setting.
Applsci 12 05512 g002
Figure 3. Dependence of the quantization level of GLRLM radiomic features, considering different values (i.e., 8, 16, 32, 64, 128, and 256) of the ‘binCount’ PyRadiomics parameter. The features SumSquares, GrayLevelNonUniformityNormalized, GrayLevelVariance, HighGrayLevelRunEmphasis, LongRunLowGrayLevelEmphasis, LowGrayLevelRunEmphasis, ShortRunHighGrayLevelEmphasis, and ShortRunLowGrayLevelEmphasis were discarded, as the ICC did not overcome the cutoff for any ‘binCount’ setting.
Figure 3. Dependence of the quantization level of GLRLM radiomic features, considering different values (i.e., 8, 16, 32, 64, 128, and 256) of the ‘binCount’ PyRadiomics parameter. The features SumSquares, GrayLevelNonUniformityNormalized, GrayLevelVariance, HighGrayLevelRunEmphasis, LongRunLowGrayLevelEmphasis, LowGrayLevelRunEmphasis, ShortRunHighGrayLevelEmphasis, and ShortRunLowGrayLevelEmphasis were discarded, as the ICC did not overcome the cutoff for any ‘binCount’ setting.
Applsci 12 05512 g003
Figure 4. Dependence of the quantization level of GLSZM radiomic features, considering different values (i.e., 8, 16, 32, 64, 128, and 256) of the ‘binCount’ PyRadiomics parameter. The features GrayLevelNonUniformityNormalized, GrayLevelVariance, HighGrayLevelZoneEmphasis, LargeAreaLowGrayLevelEmphasis, LowGrayLevelZoneEmphasis, SmallAreaHighGrayLevelEmphasis, and SmallAreaLowGrayLevelEmphasis were discarded, as the ICC did not overcome the cutoff for any ‘binCount’ setting.
Figure 4. Dependence of the quantization level of GLSZM radiomic features, considering different values (i.e., 8, 16, 32, 64, 128, and 256) of the ‘binCount’ PyRadiomics parameter. The features GrayLevelNonUniformityNormalized, GrayLevelVariance, HighGrayLevelZoneEmphasis, LargeAreaLowGrayLevelEmphasis, LowGrayLevelZoneEmphasis, SmallAreaHighGrayLevelEmphasis, and SmallAreaLowGrayLevelEmphasis were discarded, as the ICC did not overcome the cutoff for any ‘binCount’ setting.
Applsci 12 05512 g004
Figure 5. Dependence of the quantization level of GLDM radiomic features, considering different values (i.e., 8, 16, 32, 64, 128, and 256) of the ‘binCount’ PyRadiomics parameter. The features GrayLevelVariance, HighGrayLevelEmphasis, LargeDependenceLowGrayLevelEmphasis, LowGrayLevelEmphasis, and SmallDependenceLowGrayLevelEmphasis were discarded, as the ICC did not overcome the cutoff for any ‘binCount’ setting.
Figure 5. Dependence of the quantization level of GLDM radiomic features, considering different values (i.e., 8, 16, 32, 64, 128, and 256) of the ‘binCount’ PyRadiomics parameter. The features GrayLevelVariance, HighGrayLevelEmphasis, LargeDependenceLowGrayLevelEmphasis, LowGrayLevelEmphasis, and SmallDependenceLowGrayLevelEmphasis were discarded, as the ICC did not overcome the cutoff for any ‘binCount’ setting.
Applsci 12 05512 g005
Figure 6. Dependence of the quantization level of NGTDM radiomic features, considering different values (i.e., 8, 16, 32, 64, 128, and 256) of the ‘binCount’ PyRadiomics parameter.
Figure 6. Dependence of the quantization level of NGTDM radiomic features, considering different values (i.e., 8, 16, 32, 64, 128, and 256) of the ‘binCount’ PyRadiomics parameter.
Applsci 12 05512 g006
Figure 7. Robustness of the GLCM radiomic features obtained on DCE-MRI images using automated segmentation against two manual segmentations from two independent readers.
Figure 7. Robustness of the GLCM radiomic features obtained on DCE-MRI images using automated segmentation against two manual segmentations from two independent readers.
Applsci 12 05512 g007
Figure 8. Robustness of the GLRLM radiomic features obtained on DCE-MRI images using automated segmentation against two manual segmentations from two independent readers.
Figure 8. Robustness of the GLRLM radiomic features obtained on DCE-MRI images using automated segmentation against two manual segmentations from two independent readers.
Applsci 12 05512 g008
Figure 9. Robustness of the GLSZM radiomic features obtained on DCE-MRI images using automated segmentation against two manual segmentations from two independent readers.
Figure 9. Robustness of the GLSZM radiomic features obtained on DCE-MRI images using automated segmentation against two manual segmentations from two independent readers.
Applsci 12 05512 g009
Figure 10. Robustness of the GLDM radiomic features obtained on DCE-MRI images using automated segmentation against two manual segmentations from two independent readers.
Figure 10. Robustness of the GLDM radiomic features obtained on DCE-MRI images using automated segmentation against two manual segmentations from two independent readers.
Applsci 12 05512 g010
Figure 11. Robustness of the NGTDM radiomic features obtained on DCE-MRI images using automated segmentation against two manual segmentation.
Figure 11. Robustness of the NGTDM radiomic features obtained on DCE-MRI images using automated segmentation against two manual segmentation.
Applsci 12 05512 g011
Table 1. Some relevant characteristics concerning the DCE-MRI imaging protocol.
Table 1. Some relevant characteristics concerning the DCE-MRI imaging protocol.
Protocol CharacteristicValue
seriesAx VIBRANT mphase
MR scannerGE Signa HDxt
magnetic field1.5 Tesla
repetition time(37.72–56.92) ms
echo time(17.64–26.80) ms
flip angle10°
matrix size 512 × 512 pixels
slice thickness(2–3) mm
spacing between slices(1–1.5) mm
pixel spacing(0.6875–0.7422) mm
Table 2. Conditions to evaluate the robustness of each radiomic feature category.
Table 2. Conditions to evaluate the robustness of each radiomic feature category.
Score (Robustness)ICC ConditionSD Condition
5 (very high)≥90%≤10%
4 (high)≥85%≤20%
3 (medium)≥80%≤30%
2 (limited)≥75%≤40%
1 (low)≥70%≤100%
0 (very low)<70%>100%
Table 3. Summary of robust features (ICC > 0.9) considering different quantization levels. Bold values represent the setting guaranteeing the maximum number of robust features.
Table 3. Summary of robust features (ICC > 0.9) considering different quantization levels. Bold values represent the setting guaranteeing the maximum number of robust features.
Quantization Level (‘binCount’)Robust Features NumberRobust Features Percentage (%)
84285.7
163877.6
324387.8
643979.6
1283775.5
2563469.4
Table 4. Overall feature category robustness across the three segmentation raters.
Table 4. Overall feature category robustness across the three segmentation raters.
CategoryAuto vs. Man1Auto vs. Man2Man1 vs. Man2
GLCM0.480.450.8
GLRLM0.60.630.8
GLSZM0.420.360.53
GLDM0.420.420.67
NGTDM0.360.160.4
Table 5. Percentage of features with robustness index 3 .
Table 5. Percentage of features with robustness index 3 .
CategoryAuto vs. Man1Auto vs. Man2Man1 vs. Man2
GLCM50%50%100%
GLRLM50%50%100%
GLSZM33.3%0%44.4%
GLDM33.3%33.3%100%
NGTDM0%0%0%
Table 6. Overview of the state-of-the-art results, along with the considered settings, to assess the robustness of radiomic features.
Table 6. Overview of the state-of-the-art results, along with the considered settings, to assess the robustness of radiomic features.
Related WorkImaging, Disease (# Samples)DependenceExtraction Tool (# Features)Main Findings
Shafiq-Ul-Hassan et al. [8]CT, phantom (8 by 8 CT scanners)voxel size; gray levelsin-house tool (213)In total, 150 features are reproducible across voxel sizes
Escudero Sanchez et al. [9]CT, liver cancer (43)slice thicknessPyRadiomics (107)In total, 75–90% of features are highly robust
Whitney et al. [10]MRI (DCE-MRI), breast cancer (612)magnetic field strengthPyRadiomics (38)In total, 5 features are robust across field strength
Scalco et al. [11]MR (T2w)/prostate cancer/14image signal normalizationPyRadiomics (91)In total, 60% of features have a poor reproducibility
De Farias et al. [12]CT, various lesion types (10,000 slices) + validation on NSCLC (17,938 slices)super-resolutionPyRadiomics (75)In total, 10 texture features have excellent robustness
Zwanenburg et al. [13]CT, NSCLC (31) + HNSCC (19)image perturbationN.A. (4032)In total, 2310 (57.3%) NSCLC features are robust; 582 (14.4%) HNSCC features are robust; 454 (11.3%) features are robust in both cohorts
Mottola et al. [14]CT, RCC (98) + CK (93)image resampling and perturbationin-house tool (32)In total, 94.6% and 87.7% of features achieve the best reproducibility in RCC and CK
Tixier et al. [15]MRI (FLAIR, T1w), glioblastoma (98)segmentation methodPyRadiomics (108)IH and GLCM features are the most robust; GLSZM features have a mixed robustness
Granzier et al. [16]MR (T1w), breast cancer (102)inter-observer segmentation variabilityRadiomiX (1328) + PyRadiomics (833)In total, 41.6% (552/1328) and 32.8% (273/833) of all RadiomiX and Pyradiomics features, respectively, are robust
Le et al. [17]CTA, culprit lesions in carotid arteries (41)inter-observer segmentation variability; image configurationsPyRadiomics (93)In total, 55.9% (52/93) of features have excellent robustness; 33.3% (31/93) of features have moderate robustness; 10.8% (10/93) of features have poor robustness
Proposed WorkMRI (DCE-MRI), breast cancer (111)quantization level; inter-observer segmentation variabilityPyRadiomics (49)In total, 87.8% (43/49) of features were robust (ICC 0.9 ) with binCount = 32; GLCM and GLRLM features have high robustness; GLDM features have moderate robustness
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Militello, C.; Rundo, L.; Dimarco, M.; Orlando, A.; D’Angelo, I.; Conti, V.; Bartolotta, T.V. Robustness Analysis of DCE-MRI-Derived Radiomic Features in Breast Masses: Assessing Quantization Levels and Segmentation Agreement. Appl. Sci. 2022, 12, 5512. https://0-doi-org.brum.beds.ac.uk/10.3390/app12115512

AMA Style

Militello C, Rundo L, Dimarco M, Orlando A, D’Angelo I, Conti V, Bartolotta TV. Robustness Analysis of DCE-MRI-Derived Radiomic Features in Breast Masses: Assessing Quantization Levels and Segmentation Agreement. Applied Sciences. 2022; 12(11):5512. https://0-doi-org.brum.beds.ac.uk/10.3390/app12115512

Chicago/Turabian Style

Militello, Carmelo, Leonardo Rundo, Mariangela Dimarco, Alessia Orlando, Ildebrando D’Angelo, Vincenzo Conti, and Tommaso Vincenzo Bartolotta. 2022. "Robustness Analysis of DCE-MRI-Derived Radiomic Features in Breast Masses: Assessing Quantization Levels and Segmentation Agreement" Applied Sciences 12, no. 11: 5512. https://0-doi-org.brum.beds.ac.uk/10.3390/app12115512

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop