Saudi Cultural Missions Theses & Dissertations

Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10

Browse

Search Results

Now showing 1 - 7 of 7

Restricted
Advanced Machine Learning Approaches for Comprehensive Cardiovascular Disease Risk Prediction Using Synthetic Data and Dynamic Feature Selection
(Saudi Digital Library, 2025) Alqulaity, Malak; Yang, Po
Cardiovascular diseases (CVD) are a leading cause of global mortality, highlighting the need for accurate and reliable risk prediction models. Traditional CVD risk assessment tools, such as Framingham, SCORE, and QRISK, have several limitations that affect their accuracy and applicability. These tools typically focus on a narrow set of major risk factors, potentially overlooking important non-traditional factors, resulting in a less comprehensive risk assessment. Additionally, they often rely on linear models, which may fail to capture complex, non-linear interactions within the data. This thesis addresses the limitations of traditional CVD risk assessment tools by developing a hybrid predictive framework that integrates advanced machine learning (ML) techniques to enhance the accuracy of Coronary Artery Calcium (CAC) score prediction and CVD risk assessment using both traditional and non-traditional risk factors. The research is structured around three key objectives: generating synthetic data, enhancing feature selection, and developing a hybrid approach. To address data limitations, a Tabular Generative Adversarial Network (GAN) was enhanced to generate high-quality synthetic data, effectively expanding the training dataset and improving model robustness. Feature selection was further refined through an adaptive SHAP-based method, which dynamically adjusts feature importance thresholds to capture both traditional and non-traditional CVD risk factors more accurately. Finally, a hybrid approach combining hyperparameter tuning algorithms (Genetic Algorithms, Particle Swarm Optimisation, and Bayesian Optimisation) with Gradient Boosting algorithms (XGBoost, LightGBM, and CatBoost) was implemented to maximise predictive accuracy. This two-stage model first predicts CAC scores and then uses these predictions, alongside additional risk factors, to assess the likelihood of CVD events. Results demonstrate that the hybrid approach consistently enhances prediction accuracy across multiple metrics, with the CatBoost model particularly outperforming in both CAC score prediction and CVD classification.
13 0
Unknown
Machine Learning Techniques for Financial Loan Default Prediction in UK: A Comparative Analysis of Decision Tree and Random Forest Models
(Saudi Digital Library, 2025) Alrakan, Fahad Abdulaziz; Alwzinani, Faris
This dissertation proposes a comprehensive approach to variable selection and model comparison applied to credit scoring, based on a Lending Club 2016–2018 dataset. The methodology combines an initial manual selection, based on completeness and business logic, followed by an automatic selection via RFECV (Recursive Feature Elimination with Cross-Validation) using a Random Forest. Finally, an importance permutation analysis and an ablation experiment (Top 10 variables) complete the evaluation. The results show that all 21 variables selected are considered relevant by RFECV, but that most of the predictive power is concentrated in a subset of about 15 variables. A comparison of the models highlights the clear superiority of Random Forest (AUC ≈ 0.713; PR-AUC ≈ 0.437) over Decision Tree (AUC ≈ 0.594; PR-AUC ≈ 0.319). Permutation importance analysis confirms business intuition: interest rate, credit sub- grade, and residential status appear to be the main explanatory factors, supplemented by financial indicators (debt ratio, loan amount, FICO score). The ablation experiment shows that these ten main variables are sufficient to preserve almost all of the Random Forest's performance (AUC = 0.708), while reducing training time by approximately 40%. These results highlight two major points: (i) Random Forest is robust and capable of effectively exploiting a small core of variables, but its performance remains below the standards expected for an industrial model (>0.80 AUC); (ii) the hierarchy of variables reveals both the relevance of expected indicators and the redundancy between certain correlated measures. The limitations identified concern sensitivity to correlations, the temporal restriction of the sample (2016–2018), and the computational cost of certain steps (RFECV). In conclusion, this project validates the feasibility of a robust and parsimonious model based on Random Forest, while opening up prospects for improvement: use of boosting algorithms, calibration of thresholds according to economic issues, temporal robustness tests, and pipeline optimization.
10 0
Unknown
Deep Learning Approaches for Multivariate Time Series: Advances in Feature Selection, Classification, and Forecasting
(New Mexico State University, 2024) Alshammari, Khaznah Raghyan; Tran, Son; Hamdi, Shah Muhammad
In this work, we present the latest developments and advancements in the machine learning-based prediction and feature selection of multivariate time series (MVTS) data. MVTS data, which involves multiple interrelated time series, presents significant challenges due to its high dimensionality, complex temporal dependencies, and inter-variable relationships. These challenges are critical in domains such as space weather prediction, environmental monitoring, healthcare, sensor networks, and finance. Our research addresses these challenges by developing and implementing advanced machine-learning algorithms specifically designed for MVTS data. We introduce innovative methodologies that focus on three key areas: feature selection, classification, and forecasting. Our contributions include the development of deep learning models, such as Long Short-Term Memory (LSTM) networks and Transformer-based architectures, which are optimized to capture and model complex temporal and inter-parameter dependencies in MVTS data. Additionally, we propose a novel feature selection framework that gradually identifies the most relevant variables, enhancing model interpretability and predictive accuracy. Through extensive experimentation and validation, we demonstrate the superior performance of our approaches compared to existing methods. The results highlight the practical applicability of our solutions, providing valuable tools and insights for researchers and practitioners working with high-dimensional time series data. This work advances the state of the art in MVTS analysis, offering robust methodologies that address both theoretical and practical challenges in this field.
51 0
Unknown
Enhancing Stock Price Prediction Using Machine Learning Models: A Comparative Study of SVM, LSTM, and GRU
(University College London, 2024-08) AlMohamdy, Razan; Andrea, Ducci
This study evaluates the effectiveness of three machine learning models—Support Vector Machine (SVM), Long Short-Term Memory (LSTM) networks, and Gated Recurrent Units (GRU)—in predicting the stock prices of Saudi Aramco. Using historical stock price data and technical indicators, the models were assessed based on their accuracy in both long-term and short-term predictions. The findings reveal that LSTM and GRU significantly outperform SVM, with LSTM showing superior performance in capturing long-term dependencies and GRU offering a balance between accuracy and computational efficiency. Specifically, LSTM achieved a Root Mean Squared Error (RMSE) of 0.0516 and a Mean Absolute Error (MAE) of 0.0323, while GRU recorded an RMSE of 0.0539 and an MAE of 0.0234. In contrast, SVM exhibited a much higher RMSE of 0.1712 and an MAE of 0.1079, indicating its struggles with market volatility. The 30-day prediction analysis further highlighted the strengths of LSTM and GRU in short-term forecasting, with both models maintaining an R² value above 0.993, while SVM lagged behind at 0.9332. Despite their advantages, the study identified limitations such as the exclusion of external economic factors and the models' varying effectiveness across different time horizons. These findings contribute to the growing field of financial forecasting, offering practical insights for investors and analysts on model selection. Future research is recommended to incorporate broader economic indicators, explore cross-market validation, and enhance the models' responsiveness to short-term market fluctuations.
32 0
Unknown
Feature Selection for High Dimensional Healthcare Data
(University of Surrey, 2024-01) Alayed, Abdulrahman; Kouchaki, Samaneh
In today’s digital landscape, researchers frequently encounter the complexity of handling highdimensional datasets. At times, data mining and machine learning methods struggle when confronted with immense datasets, leading to inefficiencies. The presence of extensive raw data with numerous features can negatively impact machine learning algorithms, affecting accuracy, increasing overfitting, and amplifying complexity. This is primarily due to the inclusion of redundant and irrelevant data, which hampers the learning process. However, employing feature selection techniques can effectively address these challenges. By selectively choosing relevant features, these techniques enable machine learning algorithms to operate more efficiently. They contribute to faster training, reduce model complexity, enhance accuracy, and mitigate overfitting issues. The primary objective of this project is to create an automatic variable selection pipeline by choosing the best features among various innovative feature selection techniques. The pipeline incorporates different categories of variable selection methods: Filter methods, Wrapper methods, Embedded methods, and Hybrid Method. The variable selection techniques are applied to the MIMIC-III (Medical Information Mart for Intensive Care) dataset, which is reachable at no cost. This database is well-suited for the project's goals, as it is a centralized database containing details about patients admitted to the critical care unit of a vast regional hospital. The dataset is particularly useful for forecasting the likelihood of death pst-ICU admission during hospital stay. To achieve this goal, the project employs six classification techniques: Logistic Regression (LR), K-nearest Neighbours (KNN), Decision Tree (DT), Random Forest (RF), Support Vector Machine (SVM), and Artificial Neural Network (ANN). The project systematically evaluates and compares the model's performance using various assessment metrics.
52 0
Unknown
Supervised Machine Learning Assessment of Dementia Using Feature Selection Filter Methods
(Spring Nature, 2023-10-30) Rajab, Mohammed Dabash; Wang, Dennis
The prevalence of dementia is increasing globally. Due to the massive resources required, this issue is pressuring governments and private healthcare systems. Accurate diagnosis by clinicians on the cause of dementia, such as Alzheimer’s disease (AD), is difficult because of the time and assessments needed like neuropathological. The issue becomes more challenging when considering if various brain lesions contribute to the pathological assessment of dementia, the relationship of these lesions to the various dementia conditions, how they interact, and how to quantify them. Thereby, systematically assessing neuropathological measures by their degree of association with dementia, especially AD, may lead to better diagnostic systems and treatment targets. One promising approach that can answer these challenges is to develop data-driven solutions with core functions of feature evaluation and automatic subject classification based on machine learning (ML). Recent research studies in medical diagnosis, including dementia research, reveal that ML techniques, when used with feature selection, can identify critical features of Alzheimer-related pathologies and their association with the disease’s diagnosis and prognosis. The feature selection removes noisy features from the dementia data to increase the predictive performance and improve interpretability while reducing the dimensionality and computational complexity. However, filter-based feature selection methods can generate dissimilar feature rankings and may be sensitive to the correlations among themselves. This thesis investigates dementia with a focus on AD neuropathological assessments from a data-driven perspective to develop mechanisms to assist pathologists during these clinical assessments. The thesis investigation comprises phases such as feature ranking, feature-feature correlation, and classification. The work determines the impact of neuropathological feature-features correlations on the feature ranking for better biomarker identification. The investigation assesses real datasets related to dementia, the Cognitive Function and Aging Studies (CFAS) and the Alzheimer’s Disease Neuroimaging Initiative (ADNI), using filter methods and classification techniques. The results showed that classification models generated from the CFAS and ADNI sets of chosen neuropathological features were strong in terms of sensitivity, accuracy, and other measures when mined by different classification techniques. In the ADNI dataset results, the significant neuropathological features contributing to AD included neocortical neuritic plaques, Braak stage, Thal phase, diffuse plaques, and cerebral amyloid angiopathy (CAA), all of which showed a high correlation with AD’s diagnostic label. In the CFAS dataset, the results were consistent with those derived from the ADNI dataset. Moreover, among the filter methods considered, reliefF had the strongest correlation with feature-feature correlations in both ADNI and CFAS datasets, less sensitive to feature-feature correlations. However, no filter method had clear dominance over ADNI results. More essentially, the results indicated limited consistency in feature rankings between ADNI and CFAS. However, reliefF had the most agreement, while the Gain Ratio method had less consistency in ranking the features in both datasets. In summary, this thesis provided valuable insights into the application of filter methods and neuropathology data for developing classification models for dementia conditions’ diagnosis. The study demonstrated the significance of considering feature-feature correlations when selecting influential features and the impact of different filter methods on feature ranking and classification performance. These findings suggest that the proposed approach could effectively minimise the discrepancy of feature ranking and generate an impactful set of features for classification algorithms. These results had practical implications for pathologists in improving the understanding of AD pathology. Furthermore, the study has highlighted the potential for future research to leverage diverse filter methods to identify more reliable biomarkers and enhance the detection of dementia, particularly for AD.
26 0
Unknown
Deep Discourse Analysis for Early Prediction of Multi-Type Dementia
(Saudi Digital Library, 2023-06-12) Alkenani, Ahmed Hassan A; Li, Yuefeng
Ageing populations are a worldwide phenomenon. Although it is not an inevitable consequence of biological ageing, dementia is strongly associated with increasing age, and is therefore anticipated to pose enormous future challenges to public health systems and aged care providers. While dementia affects its patients first and foremost, it also has negative associations with caregivers’ mental and physical health. Dementia is characterized by irreversible gradual impairment of nerve cells that control cognitive, behavioural, and language processes, causing speech and language deterioration, even in preclinical stages. Early prediction can significantly alleviate dementia symptoms and could even curtail the cognitive decline in some cases. However, the diagnostic procedure is currently challenging as it is usually initiated with clinical-based traditional screening tests. Typically, such tests are manually interpreted and therefore may entail further tests and physical examinations thus considered timely, expensive, and invasive. Therefore, many researchers have adopted speech and language analysis to facilitate and automate its initial prescreening. Although recent studies have proposed promising methods and models, there is still room for improvement, without which automated pre-screening remains impracticable. There is currently limited empirical literature on the modelling of the discourse ability of people with prodromal dementia stages and types, which is defined as spoken and written conversations and communications. Specifically, few researchers have investigated the nature of lexical and syntactic structures in spontaneous discourse generated by patients with dementia under different conditions for automated diagnostic modelling. In addition, most previous work has focused on modelling and improving the diagnosis of Alzheimer’s disease (AD), as the most common dementia pathology, and neglect other types of dementia. Further, current proposed models suffer from poor performance, a lack of generalizability, and low interpretability. Therefore, this research thesis explores lexical and syntactic presentations in written and spoken narratives of people with different dementia syndromes to develop high-performing diagnostic models using fusions of different lexical and syntactic (i.e., lexicosyntactic) features as well as language models. In this thesis, multiple novel diagnostic frameworks are proposed and developed based on the “wisdom of crowds” theory, in which different mathematical and statistical methods are investigated and properly integrated to establish ensemble approaches for an optimized overall performance and better inferences of the diagnostic models. Firstly, syntactic- and lexical-level components are explored and extracted from the only two disparate data sources available for this study: spoken and written narratives retrieved from the well-known DementiaBank dataset, and a blog-based corpus collected as a part of this research, respectively. Due to their dispersity, each data source was independently analysed and processed for exploratory data analysis and feature extraction. One of the most common problems in this context is how to ensure a proper feature space is generated for machine learning modelling. We solve this problem by proposing multiple innovative ensemble-based feature selection pipelines to reveal optimal lexicosyntactics. Secondly, we explore language vocabulary spaces (i.e., n-grams) given their proven ability to enhance the modelling performance, with an overall aim of establishing two-level feature fusions that combine optimal lexicosyntactics and vocabulary spaces. These fusions are then used with single and ensemble learning algorithms for individual diagnostic modelling of the dementia syndromes in question, including AD, Mild Cognitive Impairment (MCI), Possible AD (PoAD), Frontotemporal Dementia (FTD), Lewy Body Dementia (LBD), and Mixed Dementia (PwD). A comprehensive empirical study and series of experiments were conducted for each of the proposed approaches using these two real-world datasets to verify our frameworks. Evaluation was carried out using multiple classification metrics, returning results that not only show the effectiveness of the proposed frameworks but also outperform current “state-of-the-art” baselines. In summary, this research provides a substantial contribution to the underlying task of effective dementia classification needed for the development of automated initial pre-screenings of multiple dementia syndromes through language analysis. The lexicosyntactics presented and discussed across dementia syndromes may highly contribute to our understanding of language processing in these pathologies. Given the current scarcity of related datasets, it is also hoped that the collected writing-based blog corpus will facilitate future analytical and diagnostic studies. Furthermore, since this study deals with associated problems that have been commonly faced in this research area and that are frequently discussed in the academic literature, its outcomes could potentially assist in the development of better classification models, not only for dementia but also for other linguistic pathologies.
28 0

Saudi Cultural Missions Theses & Dissertations

Browse

Filters

Settings

Sort By

Results per page

Search Results