Advanced Machine Learning Approaches for Comprehensive Cardiovascular Disease Risk Prediction Using Synthetic Data and Dynamic Feature Selection

Alqulaity, Malak

Advanced Machine Learning Approaches for Comprehensive Cardiovascular Disease Risk Prediction Using Synthetic Data and Dynamic Feature Selection

dc.contributor.advisor	Yang, Po
dc.contributor.author	Alqulaity, Malak
dc.date.accessioned	2026-02-24T11:22:11Z
dc.date.issued	2025
dc.description.abstract	Cardiovascular diseases (CVD) are a leading cause of global mortality, highlighting the need for accurate and reliable risk prediction models. Traditional CVD risk assessment tools, such as Framingham, SCORE, and QRISK, have several limitations that affect their accuracy and applicability. These tools typically focus on a narrow set of major risk factors, potentially overlooking important non-traditional factors, resulting in a less comprehensive risk assessment. Additionally, they often rely on linear models, which may fail to capture complex, non-linear interactions within the data. This thesis addresses the limitations of traditional CVD risk assessment tools by developing a hybrid predictive framework that integrates advanced machine learning (ML) techniques to enhance the accuracy of Coronary Artery Calcium (CAC) score prediction and CVD risk assessment using both traditional and non-traditional risk factors. The research is structured around three key objectives: generating synthetic data, enhancing feature selection, and developing a hybrid approach. To address data limitations, a Tabular Generative Adversarial Network (GAN) was enhanced to generate high-quality synthetic data, effectively expanding the training dataset and improving model robustness. Feature selection was further refined through an adaptive SHAP-based method, which dynamically adjusts feature importance thresholds to capture both traditional and non-traditional CVD risk factors more accurately. Finally, a hybrid approach combining hyperparameter tuning algorithms (Genetic Algorithms, Particle Swarm Optimisation, and Bayesian Optimisation) with Gradient Boosting algorithms (XGBoost, LightGBM, and CatBoost) was implemented to maximise predictive accuracy. This two-stage model first predicts CAC scores and then uses these predictions, alongside additional risk factors, to assess the likelihood of CVD events. Results demonstrate that the hybrid approach consistently enhances prediction accuracy across multiple metrics, with the CatBoost model particularly outperforming in both CAC score prediction and CVD classification.
dc.format.extent	224
dc.identifier.uri	https://hdl.handle.net/20.500.14154/78290
dc.language.iso	en
dc.publisher	Saudi Digital Library
dc.subject	Cardiovascular Disease
dc.subject	Machine Learning
dc.subject	Synthetic Data
dc.subject	Generative Adarial Networks (GAN)
dc.subject	Feature Selection
dc.subject	Gradient Boosting
dc.title	Advanced Machine Learning Approaches for Comprehensive Cardiovascular Disease Risk Prediction Using Synthetic Data and Dynamic Feature Selection
dc.type	Thesis
sdl.degree.department	Department of Computer Science
sdl.degree.discipline	Computer Science
sdl.degree.grantor	University of Sheffield
sdl.degree.name	Phd

Files

Original bundle

Now showing 1 - 1 of 1

Name:: SACM-Dissertation.pdf
Size:: 6.2 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.61 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

SACM - United Kingdom