A Machine Learning Approach to Predict the Probability of Brain Metastasis in Renal Cell Carcinoma Patients

Kim, Hyung Min; Jeong, Chang Wook; Kwak, Cheol; Song, Cheryn; Kang, Minyong; Seo, Seong Il; Kim, Jung Kwon; Lee, Hakmin; Chung, Jinsoo; Hwang, Eu Chang; Park, Jae Young; Choi, In Young; Hong, Sung-Hoo

doi:10.3390/app12126174

Open AccessArticle

A Machine Learning Approach to Predict the Probability of Brain Metastasis in Renal Cell Carcinoma Patients

by

Hyung Min Kim

^1,2

,

Chang Wook Jeong

³,

Cheol Kwak

³,

Cheryn Song

⁴,

Minyong Kang

⁵,

Seong Il Seo

⁵,

Jung Kwon Kim

⁶,

Hakmin Lee

⁶,

Jinsoo Chung

⁷

,

Eu Chang Hwang

⁸,

Jae Young Park

⁹,

In Young Choi

^1,2,* and

Sung-Hoo Hong

^10,*

¹

Department of Medical Informatics, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea

²

Department of Biomedicine & Health Sciences, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea

³

Department of Urology, Seoul National University College of Medicine, Seoul National University Hospital, Seoul 03080, Korea

⁴

Department of Urology, Asan Medical Center, University of Ulsan College of Medicine, Seoul 05505, Korea

⁵

Department of Urology, Samsung Medical Center, Sungkyunkwan University School of Medicine, Seoul 06351, Korea

⁶

Department of Urology, Seoul National University College of Medicine, Seoul National University Bundang Hospital, Seongnam 13620, Korea

⁷

Department of Urology, National Cancer Center, Goyang 10408, Korea

⁸

Department of Urology, Chonnam National University Medical School, Gwangju 61469, Korea

⁹

Department of Urology, Korea University College of Medicine, Korea University Ansan Hospital, Ansan 15355, Korea

¹⁰

Department of Urology, Seoul St. Mary’s Hospital, College of Medicine, The Catholic University of Korea, Seoul 06591, Korea

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2022, 12(12), 6174; https://0-doi-org.brum.beds.ac.uk/10.3390/app12126174

Submission received: 31 March 2022 / Revised: 1 June 2022 / Accepted: 15 June 2022 / Published: 17 June 2022

(This article belongs to the Special Issue Digital Therapeutics, Digital Twin and Mixed/Augmented Reality in Healthcare)

Download

Browse Figures

Versions Notes

Abstract

:

Patients with brain metastasis (BM) have a better prognosis when it is detected early. However, current guidelines recommend brain imaging only when there are central nervous system symptoms or abnormal experimental values. Therefore, metastases are discovered later in asymptomatic patients. As a result, there is a need for an algorithm that predicts the possibility of BM using clinical data and machine learning (ML). Data from 3153 patients with renal cell carcinoma (RCC) were collected from the 11-institution Korean Renal Cancer Study group (KRoCS) database. To predict BM, clinical information of 1282 patients was extracted from the database and used to compare the performance of six ML algorithms. The final model selection was based on the area under the receiver operating characteristic (AUROC) curve. After optimizing the hyperparameters for each model, the adaptive boosting (AdaBoost) model outperformed the others, with an AUROC of 0.716. We developed an algorithm to predict the probability of BM in patients with RCC. Using the developed predictive model, it is possible to avoid detection delays by performing computed tomography scans on potentially asymptomatic patients.

Keywords:

brain metastasis; machine learning; prediction; renal cell carcinoma

1. Introduction

Approximately 17,000 new cases of brain metastasis (BM) are diagnosed each year, and the incidence has risen steadily over the past 2 decades [1]. BM is common in patients with advanced renal cell carcinoma (RCC) and is known to progress to intracranial hemorrhage [2,3]. Metastases are found in 30–40% of RCC patients, and BM occurs in more than 10% of RCC patients [4].

BM usually has a poor prognosis despite specific localized treatment with neurosurgery or radiation therapy [5,6]. A study of the overall survival (OS) rate of BM patients showed that OS was poor when metastases occurred in multiple locations rather than only in the brain, and the smaller the lesion size, the better the OS [7,8]. As a result, early identification of BM in patients would be important from the OS perspective because the location and size of the lesion are limited at the initial stage.

However, the current European Association of Urology and American Urologic Association guidelines recommend brain imaging only for patients with RCC who have central nervous system symptoms or abnormal experimental values [9,10]. As a result, in the case of asymptomatic patients, brain images are not taken, and metastases are discovered only later. If brain metastases are not detected and treated early, the prognosis is poor; therefore, methods for early prediction are required.

With the development of computer technology, studies using machine learning (ML) have been conducted. Breast [11] and prostate cancer [12] were predicted with high accuracy in studies that used medical data. However, no studies have been conducted to predict BM using ML in patients with RCC, and previous studies attempting to identify risk factors affecting BM have examined only statistical techniques such as survival analysis or logistic analysis [7,13,14]. In addition, although the risk of each variable can be identified through the odds ratio or hazard ratio of the risk factors obtained in previous studies, the risk probability cannot be calculated from the measured value of each variable. By contrast, ML can be used as a diagnostic aid tool because it is possible to predict the probability of occurrence of BM from the given variable values.

In this study, we determine whether the risk factors of previous studies have similar characteristics in Korean RCC patients and propose an ML model that predicts the probability of BM using a combination of factors. Because our algorithm was trained using data from a multicenter cohort collected from a tertiary hospital in Korea, the characteristics of domestic RCC patients are well reflected without bias. Moreover, to the best of our knowledge, this study is the first study that uses ML to predict BM in RCC patients. If the algorithm we developed is used, patients with BM could be identified early and treated appropriately, improving their prognosis. Additionally, using it for brain imaging diagnosis in high-risk patients would address the problem of delayed diagnosis due to missing brain images in asymptomatic patients.

2. Materials and Methods

2.1. Study Population and Variable Selection

The Korean Renal Cancer Study group (KRoCS) metastatic RCC (mRCC) database was retrospectively collected from 11 domestic hospitals between February 1994 and March 2018. Several studies have been conducted using the well-established KRoCS database [15,16]. The KRoCS mRCC database contains patient demographics, clinical data, pathology data, systemic therapy types, etc., and it is possible to observe metastasis through continuous follow-up. The study protocol was approved by the Institutional Review Board of the Catholic University of Korea, which waived the requirement for informed consent (IRB No. KC16MGGT0157).

Metastases are classified as synchronous (occurring within three months) or metachronous (occurring after three months). It is necessary to distinguish between the two types of metastases because they have characteristics that lead to differences in prognosis [17]. However, because the sample size of patients with synchronous BM was insufficient, we included patients only with metachronous metastases in our analysis. The process of selecting patients for the study is illustrated in Figure 1.

In this study, we selected eight variables based on previous studies [2,7,13,14] and consultations with clinicians. The final variables were age, sex, smoking, Eastern Cooperative Oncology Group performance status (ECOG PS), pathological tumor stage (pT), Fuhrman nuclear grade, Heng risk group, and lung metastasis.

2.2. Data Split and ML Model Development

The study population selection process resulted in the creation of 1282 data instances for analysis. Approximately 10.7% (137) of these patients had BM. This level was inconsistent with previous studies in which approximately 10% of RCC patients had BM [3]. To train and develop the model, we used 80% of the data instances for training, and evaluated the resulting model using the remaining 20% [18]. We partitioned the data-set using a stratified method. The proportions of the groups with and without BM were same in the training and testing datasets.

The collected data revealed a significant class imbalance of 85:15, which must be addressed before training the model. When the ML model is applied to a highly unbalanced dataset, most learners ignore the minority class and exhibit a bias toward the majority class [19]. Therefore, we applied the representative synthetic minority oversampling technique (SMOTE) method to the training set to balance the classes [20]. SMOTE synthesizes new instances to balance the dataset instead of duplicating existing instances. SMOTE-nominal continuous (SMOTE-NC) was chosen because our data contained both categorical and continuous variables [21]. Before SMOTE was applied, in the training set, the without BM and with BM instances were disproportionate (915 and 110 instances, respectively). After SMOTE was applied, both groups had an equal number of instances (915), and the same weight was given to both groups so that the ML algorithm could extract features.

The ML models used were support vector machine (SVM) [22], logistic regression [23], k-nearest neighbors (KNN) [24], random forest [25], extreme gradient boosting (XGBoost) [26], and adaptive boosting (AdaBoost) [27]. The models were evaluated using the test set in terms of accuracy, sensitivity, specificity, and the area under the receiver operating characteristic (AUROC) curve. The final ML model was selected through a performance comparison for each model using the test set. The overall process of ML model development is shown in Figure 2. ML analysis was implemented in Python opensource (version 3.7.9), and the scikit-learn analysis module was used.

3. Results

3.1. Characteristics of Study Participants

We divided the RCC patients into those with and without BM. We performed a chi-square test for categorical variables and a t-test for continuous variables to determine whether the characteristics differed between the groups. Age, sex, ECOG PS, Heng risk group, and lung metastasis showed significant differences (p < 0.05). The results are presented in Table 1.

According to the results, the average age of patients with BM was 53.0, 32.1% were women, and 11.7% were active smokers, compared to 56.8, 21.2%, and 9.4%, respectively, for patients without BM. In the group with BM, the proportions of patients with ECOG PS ≥ 1 (38.7%), pT of 3–4 (56.9%), Fuhrmann nuclear grade 4 (39.4%), Heng risk group intermediate (54.7%) and poor (8.8%), and lung metastasis (82.5%) were all higher than those for patients without BM.

3.2. Model Performance

The results of applying the six ML models using eight variables are presented in Table 2. To compare the performance of the six models, the accuracy, specificity, sensitivity, and AUROC values were obtained and compared [28]. In this study, the final model was selected based on the AUROC value, considering both sensitivity and specificity owing to the high-class imbalance in the data. A grid search was performed to determine the optimal hyperparameter values of each ML model [29]. For each model, Table 2 shows the values of the optimized hyperparameters and the model’s performance.

Figure 3 shows the receiver operating characteristic (ROC) curves of the six ML models. Larger areas indicate larger AUROC values, which in turn denote higher performance. Comparison of ROC curves showed that AdaBoost had the highest AUROC of 0.716. Sensitivity and specificity were 0.741 and 0.691, respectively, with an accuracy of 0.696. The number of trees in AdaBoost was tuned using the grid search algorithm, and it was found that the best AUROC, specificity, sensitivity, and accuracy performance was obtained when the number of trees was 50. The second best-performing model is logistic regression, and it achieved an AUROC of 0.698 with an L1 penalty and C = 0.1 (determined using grid search). We developed the final model using the AdaBoost model, because it achieved the highest AUROC of the six models. In addition, we conducted cross-validation for model generalization verification. Both three-fold and five-fold cross validations showed slightly lower performance than AdaBoost. However, in five-fold cross-validation, the AUROC mean was 0.696, which is close to that of our final model (0.716), confirming that our model was not overfitted to the test dataset. Table 3 shows the mean and standard deviation of AUROC values of ML algorithms.

4. Discussion

To our knowledge, our study is significant because it is the first to predict BM in RCC patients. Lack of studies in this area is attributed to low prevalence (only 10%) of BM in RCC patients [4], making it difficult to collect sufficient data for the application of ML in a single institution. To overcome the data shortage, we collected RCC data from multiple institutions in Korea and used SMOTE-NC to address the data imbalance problem. We developed an algorithm that uses ML to predict the probability of BM in RCC patients, compared the performance of six representative ML techniques, and concluded that AdaBoost showed optimal performance of 0.716 based on AUROC through a hyperparameter tuning process.

Tree-based models, such as AdaBoost, can use the Gini index to identify the variables that have a significant effect on prediction [30]. When the effect of the AdaBoost model on the prediction of BM was examined, the most influential characteristics were found to be age, smoking, lung metastasis, pT, Heng risk group, Fuhrmann nuclear grade, ECOG PS, and gender (Figure 4), in decreasing order. According to a previous study, the risk of BM in RCC patients was higher in those younger than 70 years than in those older than 70 [13]. In our data, the mean age of the BM group was lower than that of the group without BM. When participants were divided into groups based on their age, the number of those older than 70 years was 152 (13.3%) in the group without BM and seven (5.1%) in the group with BM, showing an age distribution similar to that of previous studies. Furthermore, in previous studies, both smoking and lung metastasis, the second and third most influential variables, demonstrated that tobacco abuse and coexistence of lung metastasis were associated with a poor prognosis [7]. In our data, the proportion of current smokers was higher in the group with BM, and the number of cases of lung metastasis co-existence was higher than that in the group without BM (Table 1). Previous studies [13] have identified pT and Fuhrmann nuclear grade as important indicators for developing BM in RCC patients. In a previous study, pT was found to be a significant factor as it progressed from stage 1 to stages 3 and 4, and it was confirmed as a significant factor as the grade increased from 1 to 4. In our data distribution, 48.1% of patients without BM were at pT stage 3–4, and 31.0% had a Fuhrmann nuclear grade of 4; this compares to values of 56.9% and 39.4%, respectively, for patients with BM. Indices measured using the Heng risk group or International Metastatic RCC Database Consortium risk score show a poorer prognosis in the higher-risk group in previous studies [2]. Even in our data, the proportion of patients in the favorable category without BM was 46.2%, which was higher than that of patients with BM (36.5%), but in the high-risk category, the proportion of patients without BM was 5.2%, whereas that with BM was 8.8%. According to previous research on ECOG PS, the risk of BM is high if it is ≥1, and the risk of BM is higher in the group with a value greater than one than in the group with a value of zero [14]. In our data, the ratio of ECOG PS for the group with 0 was 79.5% without BM and 51.3% with BM. Additionally, in the case of a value greater than 1, the proportion of patients without BM was 3.2%, whereas that with BM was 6.6%. Previous studies related to sex found that women had a higher risk of BM than men. Our data also showed that the proportion of women without BM was 21.2%, but that with BM was higher at 32.1%.

Our data reflected all the distributional characteristics of the risk factors demonstrated in previous studies. Additionally, because it is a multicenter study, it has the advantage of lowering the bias that can occur when only one institution is used for validation. However, our study had several limitations. We attempted to collect patients with BM from as many RCC patients as possible through a domestic multicenter study. Although the data were collected from 11 institutions, the probability of BM occurrence in RCC patients was only 10%, with data collected from 137 patients with BM. Generally, ML performance improves as the quantity of training data increases [31]. As a result, we collected as much data as possible for a small dataset, and it was developed at an AUROC level of 0.72. In a rough classification performance evaluation, AUROC can be interpreted as follows: 90–100 = excellent, 80–90 = good, 70–80 = fair, 60–70 = poor, and 50–60 = failure [32]. Our model performed fairly with the current data; however, in future work, we will collect more data and improve the performance using a model with good or excellent performance. Additionally, our study has the advantage of reflecting the characteristics of domestic patients and a domestic multicenter study; however, because of this, performance cannot be guaranteed when applied to other countries. Therefore, to expand and apply it to people abroad, the model must be upgraded and verified by adding samples from various countries.

5. Conclusions

In this study, a predictive model using an ML algorithm was developed for the early screening of BM in RCC patients. Among the ML algorithms, the AdaBoost model had the highest predictive power with an AUROC of 0.716. Using the developed predictive model, it is possible to avoid detection delays by performing computed tomography scans on potentially asymptomatic patients. Additionally, because the metastasis range is small when detected early, the model can help to improve the prognosis of patients through appropriate treatment. In the future, if more BM patient information is collected through continuous follow-up, additional experiments will be performed to improve the performance.

Author Contributions

Conceptualization, S.-H.H. and I.Y.C.; methodology, H.M.K. and I.Y.C.; software, H.M.K.; validation, H.M.K., C.W.J., C.K., E.C.H. and S.-H.H.; formal analysis, H.M.K.; investigation, H.M.K., C.S., M.K., S.I.S. and I.Y.C.; resources, S.-H.H. and I.Y.C.; data curation, S.-H.H., J.K.K., H.L., J.Y.P. and J.C.; writing—original draft preparation, H.M.K.; writing—review and editing, H.M.K.; visualization, H.M.K.; supervision, I.Y.C. and S.-H.H.; project administration, S.-H.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Korea Medical Device Development Fund of the Korean government (Ministry of Science and ICT, Ministry of Trade, Industry and Energy, Ministry of Health & Welfare, Republic of Korea, Ministry of Food and Drug Safety), grant number KMDF_PR_20200901_0096. This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2020R1A2C2012284).

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of the Catholic University of Korea (IRB No. KC16MGGT0157).

Informed Consent Statement

Informed consent was waived by the Institutional Review Board of the Catholic University of Korea since this study was retrospective and the personal information in the data was blinded.

Data Availability Statement

Data sharing was not applicable to this study.

Acknowledgments

We thank the Korean Renal Cancer Study Group (KRoCS) for assisting us in the data analysis.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

Tabouret, E.; Chinot, O.; Metellus, P.; Tallet, A.; Viens, P.; Goncalves, A. Recent trends in epidemiology of brain metastases: An overview. Anticancer Res. 2012, 32, 4655–4662. [Google Scholar] [PubMed]
Suarez-Sarmiento, A.; Nguyen, K.A.; Syed, J.S.; Nolte, A.; Ghabili, K.; Cheng, M.; Liu, S.; Chiang, V.; Kluger, H.; Hurwitz, M.; et al. Brain metastasis from renal-cell carcinoma: An institutional study. Clin. Genitourin. Cancer 2019, 17, e1163–e1170. [Google Scholar] [CrossRef] [PubMed]
Bitoh, S.; Hasegawa, H.; Ohtsuki, H.; Obashi, J.; Fujiwara, M.; Sakurai, M. Cerebral neoplasms initially presenting with massive intracerebral hemorrhage. Surg. Neurol. 1984, 22, 57–62. [Google Scholar] [CrossRef]
Négrier, S.; Moriceau, G.; Attignon, V.; Haddad, V.; Pissaloux, D.; Guerin, N.; Carrie, C. Activity of Cabozantinib in radioresistant brain metastases from renal cell carcinoma: Two case reports. J. Med. Case Rep. 2018, 12, 351. [Google Scholar] [CrossRef] [PubMed]
Gore, M.E.; Hariharan, S.; Porta, C.; Bracarda, S.; Hawkins, R.; Bjarnason, G.A.; Oudard, S.; Lee, S.H.; Carteni, G.; Nieto, A.; et al. Sunitinib in metastatic renal cell carcinoma patients with brain metastases. Cancer 2011, 117, 501–509. [Google Scholar] [CrossRef] [PubMed]
Chandrasekar, T.; Klaassen, Z.; Goldberg, H.; Kulkarni, G.S.; Hamilton, R.J.; Fleshner, N.E. Metastatic renal cell carcinoma: Patterns and predictors of metastases—A contemporary population-based series. Urol. Oncol. 2017, 35, 661.e7–661.e14. [Google Scholar] [CrossRef]
Hanzly, M.; Abbotoy, D.; Creighton, T.; Diorio, G.; Mehedint, D.; Murekeyisoni, C.; Attwood, K.; Kauffman, E.; Fabiano, A.J.; Schwaab, T. Early identification of asymptomatic brain metastases from renal cell carcinoma. Clin. Exp. Metastasis 2015, 32, 783–788. [Google Scholar] [CrossRef]
Andrews, D.W.; Scott, C.B.; Sperduto, P.W.; Flanders, A.E.; Gaspar, L.E.; Schell, M.C.; Werner-Wasik, M.; Demas, W.; Ryu, J.; Bahary, J.P.; et al. Whole brain radiation therapy with or without stereotactic radiosurgery boost for patients with one to three brain metastases: Phase III results of the RTOG 9508 randomised trial. Lancet 2004, 363, 1665–1672. [Google Scholar] [CrossRef]
Ljungberg, B.; Cowan, N.C.; Hanbury, D.C.; Hora, M.; Kuczyk, M.A.; Merseburger, A.S.; Patard, J.J.; Mulders, P.F.A.; Sinescu, I.C.E.A.U.; European Association of Urology Guideline Group. EAU guidelines on renal cell carcinoma: The 2010 update. Eur. Urol. 2010, 58, 398–406. [Google Scholar] [CrossRef]
Donat, S.M.; Diaz, M.; Bishoff, J.T.; Coleman, J.A.; Dahm, P.; Derweesh, I.H.; Herrell, S.D.; Hilton, S.; Jonasch, E.; Lin, D.W.; et al. Follow-up for clinically localized renal neoplasms: AUA guideline [AUA guideline]. J. Urol. 2013, 190, 407–416. [Google Scholar] [CrossRef]
Islam, M.M.; Haque, M.R.; Iqbal, H.; Hasan, M.M.; Hasan, M.; Kabir, M.N. Breast cancer prediction: A comparative study using machine learning techniques. SN Comput. Sci. 2020, 1, 1–14. [Google Scholar] [CrossRef]
Barlow, H.; Mao, S.; Khushi, M. Predicting high-risk prostate cancer using machine learning methods. Data 2019, 4, 129. [Google Scholar] [CrossRef] [Green Version]
Ke, Z.B.; Chen, S.H.; Chen, Y.H.; Wu, Y.P.; Lin, F.; Xue, X.Y.; Zheng, Q.S.; Xu, N.; Wei, Y. Risk factors for brain metastases in patients with renal cell carcinoma. BioMed Res. Int. 2020, 2020, 6836234. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Vogl, U.M.; Bojic, M.; Lamm, W.; Frischer, J.M.; Pichelmayer, O.; Kramer, G.; Haitel, A.; Kitz, K.; Harmankaya, K.; Zielinski, C.C.; et al. Extracerebral metastases determine the outcome of patients with brain metastases from renal cell carcinoma. BMC Cancer 2010, 10, 480. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Lee, C.H.; Chung, J.; Kwak, C.; Jeong, C.W.; Seo, S., II; Kang, M.; Hong, S.H.; Song, C.; Park, J.Y.; Hwang, E.C.; et al. Targeted therapy response in early versus late recurrence of renal cell carcinoma after surgical treatment: A propensity score-matched study using the Korean Renal Cancer Study Group Database. Int. J. Urol. 2021, 28, 417–423. [Google Scholar] [CrossRef] [PubMed]
Shin, T.J.; Song, C.; Jeong, C.W.; Kwak, C.; Seo, S.; Kang, M.; Chung, J.; Hong, S.H.; Hwang, E.C.; Park, J.Y.; et al. Metastatic renal cell carcinoma to the pancreas: Clinical features and treatment outcome. J. Surg. Oncol. 2021, 123, 204–213. [Google Scholar] [CrossRef]
Ruste, V.; Sunyach, M.P.; Tanguy, R.; Jouanneau, E.; Schiffler, C.; Carbonnaux, M.; Moriceau, G.; Neidhardt, E.M.; Boyle, H.; Robin, S.; et al. Synchronous brain metastases as a poor prognosis factor in clear cell renal carcinoma: A strong argument for systematic brain screening. J. Neurooncol. 2021, 153, 133–141. [Google Scholar] [CrossRef]
Vyshnav, M.T.; Sowmya, V.; Gopalakrishnan, E.A.; Sajith Variyar, V.V.; Menon, V.K.; Soman, K.P. Deep learning based approach for multiple myeloma detection. In Proceedings of the 11th International Conference Computability Communal Network Technologia ICCCNT, Kharagpur, India, 1–3 July 2020; Volume 2020. [Google Scholar] [CrossRef]
Johnson, J.M.; Khoshgoftaar, T.M. Survey on deep learning with class imbalance. J. Big Data 2019, 6, 1–54. [Google Scholar] [CrossRef]
Chen, J.; Huang, H.; Cohn, A.G.; Zhang, D.; Zhou, M. Machine learning-based classification of rock discontinuity trace: SMOTE oversampling integrated with GBT ensemble learning. Int. J. Min. Sci. Technol. 2022, 32, 309–322. [Google Scholar] [CrossRef]
Mukherjee, M.; Khushi, M. Smote-Enc: A novel smote-based method to generate synthetic data for nominal and continuous features. Appl. Syst. Innov. 2021, 4, 18. [Google Scholar] [CrossRef]
Utkin, L.V. An imprecise extension of SVM-based machine learning models. Neurocomputing 2019, 331, 18–32. [Google Scholar] [CrossRef]
Rymarczyk, T.; Kozłowski, E.; Kłosowski, G.; Niderla, K. Logistic regression for machine learning in process tomography. Sensors 2019, 19, 3400. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Itoo, F.; Meenakshi, S.; Singh, S. Comparison and analysis of logistic regression, naïve Bayes and KNN machine learning algorithms for credit card fraud detection. Int. J. Inf. Technol. 2021, 13, 1503–1511. [Google Scholar] [CrossRef]
Reis, I.; Baron, D.; Shahaf, S. Probabilistic random forest: A machine learning algorithm for noisy data sets. Astron. J. 2018, 157, 16. [Google Scholar] [CrossRef] [Green Version]
Jia, Y.; Jin, S.; Savi, P.; Gao, Y.; Tang, J.; Chen, Y.; Li, W. GNSS-R soil moisture retrieval based on a XGboost machine learning aided method: Performance and validation. Remote Sens. 2019, 11, 1655. [Google Scholar] [CrossRef] [Green Version]
Chen, S.; Shen, B.; Wang, X.; Yoo, S.J.; Strong Machine Learning, A. Classifier and decision stumps based hybrid AdaBoost classification algorithm for cognitive radios. Sensors 2019, 19, 5077. [Google Scholar] [CrossRef] [Green Version]
Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef] [Green Version]
Chen, H.; Liu, Z.; Cai, K.; Xu, L.; Chen, A. Grid search parametric optimization for FT-NIR quantitative analysis of solid soluble content in strawberry samples. Vib. Spectrosc. 2018, 94, 7–15. [Google Scholar] [CrossRef]
Sevinç, E. An empowered AdaBoost algorithm implementation: A COVID-19 dataset study. Comput. Ind. Eng. 2022, 165, 107912. [Google Scholar] [CrossRef]
Vabalas, A.; Gowen, E.; Poliakoff, E.; Casson, A.J. Machine learning algorithm validation with a limited sample size. PLoS ONE 2019, 14, e0224365. [Google Scholar] [CrossRef]
Safari, S.; Baratloo, A.; Elfil, M.; Negida, A. Part 5: Receiver operating curve and area under the curve. Emergency 2016, 4, 111–113. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Patient selection flowchart.

Figure 2. Machine learning (ML) modeling process.

Figure 3. Receiver operating characteristic (ROC) curves of the ML models.

Figure 4. Variable importance in the AdaBoost model.

Table 1. Comparison of characteristics between groups with and without brain metastasis (BM).

Variable	Without BM Group (n = 1145)	With BM Group (n = 137)	p-Value
Age	56.8 ± 11.2	53.0 ± 10.6	<0.001
Sex			0.005
Male	902 (78.8%)	93 (67.9%)
Female	243 (21.2%)	44 (32.1%)
Smoking			0.392
Non-smoker	785 (68.6%)	97 (70.8%)
Ex-smoker	252 (22.0%)	24 (17.5%)
Current-smoker	108 (9.4%)	16 (11.7%)
ECOG PS			<0.001
0	910 (79.5%)	84 (61.3%)
1	198 (17.3%)	44 (32.1%)
>1	37 (3.2%)	9 (6.6%)
Pathological tumor stage			0.063
1–2	594 (51.9%)	59 (43.1%)
3–4	551 (48.1%)	78 (56.9%)
Fuhrmann nuclear grade			0.156
1	21 (1.8%)	4 (2.9%)
2	277 (24.2%)	30 (21.9%)
3	492 (43.0%)	49 (35.8%)
4	355 (31.0%)	54 (39.4%)
Heng risk group			0.042
Favorable	526 (46.2%)	50 (36.5%)
Intermediate	557 (48.6%)	75 (54.7%)
Poor	59 (5.2%)	12 (8.8%)
Lung metastasis			0.001
No	370 (32.3%)	24 (17.5%)
Yes	775 (67.7%)	113 (82.5%)

Table 2. Performance of the machine learning (ML) algorithms.

Model (Hyperparameters)	AUROC	Sensitivity	Specificity	Accuracy
Kernel SVM (kernel)
(linear) ¹	0.652	0.704	0.600	0.611
(rbf)	0.557	0.370	0.743	0.704
Logistic regression (penalty, C)
(L1, 0.1) ¹	0.698	0.778	0.617	0.624
(L1, 1)	0.663	0.704	0.622	0.630
(L1, 100)	0.658	0.704	0.613	0.623
(L2, 0.1)	0.654	0.704	0.604	0.615
(L2, 0.5)	0.648	0.667	0.630	0.634
(L2, 0.01)	0.637	0.704	0.570	0.584
KNN (neighbors)
(3)	0.527	0.260	0.800	0.739
(5) ¹	0.551	0.333	0.770	0.724
(10)	0.548	0.370	0.726	0.689
Random forest (number of trees)
(10)	0.538	0.259	0.817	0.759
(50) ¹	0.604	0.407	0.800	0.759
(100)	0.585	0.370	0.800	0.755
XGBoost (number of trees)
(10)	0.519	0.519	0.630	0.619
(50) ¹	0.585	0.444	0.726	0.696
(100)	0.512	0.259	0.765	0.712
AdaBoost (number of trees)
(10)	0.707	0.741	0.673	0.681
(50) ¹	0.716	0.741	0.691	0.696
(100)	0.684	0.704	0.665	0.669

¹ Indicates optimized hyperparameter values after parameter tuning.

Table 3. Cross-validation AUROC values of ML algorithms.

Model	3-Fold Mean (Standard Deviation)	5-Fold Mean (Standard Deviation)
Kernel SVM	0.533 (0.036)	0.504 (0.044)
Logistic regression	0.640 (0.057)	0.654 (0.076)
KNN	0.525 (0.044)	0.546 (0.024)
Random forest	0.597 (0.055)	0.599 (0.060)
XGBoost	0.636 (0.045)	0.647 (0.077)
AdaBoost	0.678 (0.065)	0.696 (0.087)

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, H.M.; Jeong, C.W.; Kwak, C.; Song, C.; Kang, M.; Seo, S.I.; Kim, J.K.; Lee, H.; Chung, J.; Hwang, E.C.; et al. A Machine Learning Approach to Predict the Probability of Brain Metastasis in Renal Cell Carcinoma Patients. Appl. Sci. 2022, 12, 6174. https://0-doi-org.brum.beds.ac.uk/10.3390/app12126174

AMA Style

Kim HM, Jeong CW, Kwak C, Song C, Kang M, Seo SI, Kim JK, Lee H, Chung J, Hwang EC, et al. A Machine Learning Approach to Predict the Probability of Brain Metastasis in Renal Cell Carcinoma Patients. Applied Sciences. 2022; 12(12):6174. https://0-doi-org.brum.beds.ac.uk/10.3390/app12126174

Chicago/Turabian Style

Kim, Hyung Min, Chang Wook Jeong, Cheol Kwak, Cheryn Song, Minyong Kang, Seong Il Seo, Jung Kwon Kim, Hakmin Lee, Jinsoo Chung, Eu Chang Hwang, and et al. 2022. "A Machine Learning Approach to Predict the Probability of Brain Metastasis in Renal Cell Carcinoma Patients" Applied Sciences 12, no. 12: 6174. https://0-doi-org.brum.beds.ac.uk/10.3390/app12126174

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Machine Learning Approach to Predict the Probability of Brain Metastasis in Renal Cell Carcinoma Patients

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Population and Variable Selection

2.2. Data Split and ML Model Development

3. Results

3.1. Characteristics of Study Participants

3.2. Model Performance

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI