Next Article in Journal
Dominance of Fossil Fuels in Japan’s National Energy Mix and Implications for Environmental Sustainability
Previous Article in Journal
The Genomic Landscape of a Restricted ALL Cohort from Patients Residing on the U.S./Mexico Border
Article

Predicting Type 2 Diabetes Using Logistic Regression and Machine Learning Approaches

by 1,† and 2,*,†
1
Department of Economics, Texas Tech University, Lubbock, TX 79409, USA
2
Department of Agricultural and Applied Economics, University of Georgia, Athens, GA 30602, USA
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Academic Editor: Pedro Femia-Marzo
Int. J. Environ. Res. Public Health 2021, 18(14), 7346; https://0-doi-org.brum.beds.ac.uk/10.3390/ijerph18147346
Received: 27 May 2021 / Revised: 2 July 2021 / Accepted: 5 July 2021 / Published: 9 July 2021
(This article belongs to the Special Issue Statistical Methods with Applications in Human Health and Disease)
Diabetes mellitus is one of the most common human diseases worldwide and may cause several health-related complications. It is responsible for considerable morbidity, mortality, and economic loss. A timely diagnosis and prediction of this disease could provide patients with an opportunity to take the appropriate preventive and treatment strategies. To improve the understanding of risk factors, we predict type 2 diabetes for Pima Indian women utilizing a logistic regression model and decision tree—a machine learning algorithm. Our analysis finds five main predictors of type 2 diabetes: glucose, pregnancy, body mass index (BMI), diabetes pedigree function, and age. We further explore a classification tree to complement and validate our analysis. The six-fold classification tree indicates glucose, BMI, and age are important factors, while the ten-node tree implies glucose, BMI, pregnancy, diabetes pedigree function, and age as the significant predictors. Our preferred specification yields a prediction accuracy of 78.26% and a cross-validation error rate of 21.74%. We argue that our model can be applied to make a reasonable prediction of type 2 diabetes, and could potentially be used to complement existing preventive measures to curb the incidence of diabetes and reduce associated costs. View Full-Text
Keywords: decision tree; diabetes risk factors; machine learning; prediction accuracy decision tree; diabetes risk factors; machine learning; prediction accuracy
Show Figures

Figure 1

MDPI and ACS Style

Joshi, R.D.; Dhakal, C.K. Predicting Type 2 Diabetes Using Logistic Regression and Machine Learning Approaches. Int. J. Environ. Res. Public Health 2021, 18, 7346. https://0-doi-org.brum.beds.ac.uk/10.3390/ijerph18147346

AMA Style

Joshi RD, Dhakal CK. Predicting Type 2 Diabetes Using Logistic Regression and Machine Learning Approaches. International Journal of Environmental Research and Public Health. 2021; 18(14):7346. https://0-doi-org.brum.beds.ac.uk/10.3390/ijerph18147346

Chicago/Turabian Style

Joshi, Ram D., and Chandra K. Dhakal 2021. "Predicting Type 2 Diabetes Using Logistic Regression and Machine Learning Approaches" International Journal of Environmental Research and Public Health 18, no. 14: 7346. https://0-doi-org.brum.beds.ac.uk/10.3390/ijerph18147346

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop