Machine Learning Based Predication of Diabetes

No Thumbnail Available

Date

2024

Journal Title

Journal ISSN

Volume Title

Publisher

University of Notingham

Abstract

Diabetes mellitus, commonly known as diabetes, is a chronic disease related to the metabolic system of humans. It is part of a broader category of chronic diseases, including cardiovascular disease, acute kidney infection, eye problems, and foot ulcers. Currently, 537 million people worldwide are living with diabetes, a figure expected to rise to 643 million by 2030. Given the limited availability of medical professionals, there is an increasing need to develop automated tools to assist decision-making for various diseases using prevalence datasets. This dissertation focuses on the implementation of both deterministic models, such as decision trees, random forests, support vector machines, and neural networks, and probabilistic models, including logistic regression, Naïve Bayes, Gaussian Naïve Bayes, and nonparametric Naïve Bayes, for binary diabetes classification. Seven input features—age, gender, BMI, blood glucose level, HbA1C level, hypertension (yes/no), and heart disease (yes/no)—along with the binary response variable (diabetes), are utilized to develop these classification models. The dataset comprises 100,000 patients and eight features, with a significant class imbalance: 91.5% do not have diabetes. Among the models, the decision tree exhibited the highest balanced accuracy of 98.48%, with a sensitivity of 100% and a specificity of 96.95%. The decision tree outperformed all other models when applied to the imbalanced data. For the balanced data, the random forest model demonstrated superior performance (except logistic regression) with a balanced accuracy of 92.42%, sensitivity of 92%, and specificity of 92.85%. These models can be further refined by considering additional relevant variables and applying advanced deep-learning models.

Description

Keywords

Diabetes Prediction, Machine Learning Models, Deterministic Models, Probabilistic Models, Decision Trees, Random Forests, Support Vector Machines (SVM), Naïve Bayes, Artificial Neural Networks (ANN), Logistic Regression, Classification Algorithms, Synthetic Minority Over-sampling Technique (SMOTE), Diabetes Mellitus

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025