Machine Learning Based Predication of Diabetes

dc.contributor.advisorWilkinson, Richard
dc.contributor.authorAlmhmadi, Anas
dc.date.accessioned2024-12-08T10:11:27Z
dc.date.issued2024
dc.description.abstractDiabetes mellitus, commonly known as diabetes, is a chronic disease related to the metabolic system of humans. It is part of a broader category of chronic diseases, including cardiovascular disease, acute kidney infection, eye problems, and foot ulcers. Currently, 537 million people worldwide are living with diabetes, a figure expected to rise to 643 million by 2030. Given the limited availability of medical professionals, there is an increasing need to develop automated tools to assist decision-making for various diseases using prevalence datasets. This dissertation focuses on the implementation of both deterministic models, such as decision trees, random forests, support vector machines, and neural networks, and probabilistic models, including logistic regression, Naïve Bayes, Gaussian Naïve Bayes, and nonparametric Naïve Bayes, for binary diabetes classification. Seven input features—age, gender, BMI, blood glucose level, HbA1C level, hypertension (yes/no), and heart disease (yes/no)—along with the binary response variable (diabetes), are utilized to develop these classification models. The dataset comprises 100,000 patients and eight features, with a significant class imbalance: 91.5% do not have diabetes. Among the models, the decision tree exhibited the highest balanced accuracy of 98.48%, with a sensitivity of 100% and a specificity of 96.95%. The decision tree outperformed all other models when applied to the imbalanced data. For the balanced data, the random forest model demonstrated superior performance (except logistic regression) with a balanced accuracy of 92.42%, sensitivity of 92%, and specificity of 92.85%. These models can be further refined by considering additional relevant variables and applying advanced deep-learning models.
dc.format.extent79
dc.identifier.urihttps://hdl.handle.net/20.500.14154/74058
dc.language.isoen
dc.publisherUniversity of Notingham
dc.subjectDiabetes Prediction
dc.subjectMachine Learning Models
dc.subjectDeterministic Models
dc.subjectProbabilistic Models
dc.subjectDecision Trees
dc.subjectRandom Forests
dc.subjectSupport Vector Machines (SVM)
dc.subjectNaïve Bayes
dc.subjectArtificial Neural Networks (ANN)
dc.subjectLogistic Regression
dc.subjectClassification Algorithms
dc.subjectSynthetic Minority Over-sampling Technique (SMOTE)
dc.subjectDiabetes Mellitus
dc.titleMachine Learning Based Predication of Diabetes
dc.typeThesis
sdl.degree.departmentSchool of Mathematical Sciences
sdl.degree.disciplineStatistics
sdl.degree.grantorUniversity of Notingham
sdl.degree.nameMaster of Science

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
SACM-Dissertation.pdf
Size:
2.45 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed to upon submission
Description:

Copyright owned by the Saudi Digital Library (SDL) © 2025