Advanced Machine Learning Approaches for Comprehensive Cardiovascular Disease Risk Prediction Using Synthetic Data and Dynamic Feature Selection

dc.contributor.advisorYang, Po
dc.contributor.authorAlqulaity, Malak
dc.date.accessioned2026-02-24T11:22:11Z
dc.date.issued2025
dc.description.abstractCardiovascular diseases (CVD) are a leading cause of global mortality, highlighting the need for accurate and reliable risk prediction models. Traditional CVD risk assessment tools, such as Framingham, SCORE, and QRISK, have several limitations that affect their accuracy and applicability. These tools typically focus on a narrow set of major risk factors, potentially overlooking important non-traditional factors, resulting in a less comprehensive risk assessment. Additionally, they often rely on linear models, which may fail to capture complex, non-linear interactions within the data. This thesis addresses the limitations of traditional CVD risk assessment tools by developing a hybrid predictive framework that integrates advanced machine learning (ML) techniques to enhance the accuracy of Coronary Artery Calcium (CAC) score prediction and CVD risk assessment using both traditional and non-traditional risk factors. The research is structured around three key objectives: generating synthetic data, enhancing feature selection, and developing a hybrid approach. To address data limitations, a Tabular Generative Adversarial Network (GAN) was enhanced to generate high-quality synthetic data, effectively expanding the training dataset and improving model robustness. Feature selection was further refined through an adaptive SHAP-based method, which dynamically adjusts feature importance thresholds to capture both traditional and non-traditional CVD risk factors more accurately. Finally, a hybrid approach combining hyperparameter tuning algorithms (Genetic Algorithms, Particle Swarm Optimisation, and Bayesian Optimisation) with Gradient Boosting algorithms (XGBoost, LightGBM, and CatBoost) was implemented to maximise predictive accuracy. This two-stage model first predicts CAC scores and then uses these predictions, alongside additional risk factors, to assess the likelihood of CVD events. Results demonstrate that the hybrid approach consistently enhances prediction accuracy across multiple metrics, with the CatBoost model particularly outperforming in both CAC score prediction and CVD classification.
dc.format.extent224
dc.identifier.urihttps://hdl.handle.net/20.500.14154/78290
dc.language.isoen
dc.publisherSaudi Digital Library
dc.subjectCardiovascular Disease
dc.subjectMachine Learning
dc.subjectSynthetic Data
dc.subjectGenerative Adarial Networks (GAN)
dc.subjectFeature Selection
dc.subjectGradient Boosting
dc.titleAdvanced Machine Learning Approaches for Comprehensive Cardiovascular Disease Risk Prediction Using Synthetic Data and Dynamic Feature Selection
dc.typeThesis
sdl.degree.departmentDepartment of Computer Science
sdl.degree.disciplineComputer Science
sdl.degree.grantorUniversity of Sheffield
sdl.degree.namePhd

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
SACM-Dissertation.pdf
Size:
6.2 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed to upon submission
Description:

Copyright owned by the Saudi Digital Library (SDL) © 2026