SACM - United Kingdom
Permanent URI for this collectionhttps://drepo.sdl.edu.sa/handle/20.500.14154/9667
Browse
35 results
Search Results
Item Restricted Feasibility of a Multi-Dimensional AI System for Gifted Student Identification in Saudi Education(Saudi Digital Library, 2026) Alahdal, Ashwaq; Abuelmaatti, AishaIdentifying gifted students poses a considerable challenge in educational research, especially in contexts that require extensive data collection. This study introduces a data-driven method for identifying gifted students, employing machine learning models that integrate both simulated and actual datasets. A simulated dataset was developed to reflect the traits of gifted Saudi students, based on genuine academic patterns and educational research, covering academic, creative, and cultural dimensions. By utilizing a randomized classifier, gifted students were classified based on indicators from various disciplines. The model attained 96% predictive accuracy on the dataset examined and 98% on the global Cagle dataset. The findings revealed that academic and creative variables were the most significant predictors of giftedness. This research provides a practical framework for educational systems to identify gifted students in contexts where detailed data are limited, thereby enhancing equity and effectiveness in programs for gifted students. Keywords: Artificial intelligence in education, gifted students, machine learning, Random Forest classifier, classification, educational data analysis.10 0Item Restricted A DATA ANALYTICS FRAMEWORK TO SUPPORT DECISION MAKING IN RAILWAY INFRASTRUCTURE ASSET MANAGEMENT(Saudi Digital Library, 2026) Alotaibi, Abdulaziz; Cardenas, IsidroThe process of management of the assets of the railway infrastructure is becoming increasingly dependent on the big amounts of the condition-monitoring information produced by the recent inspection technologies. Although this type of data can give a detailed picture of the track condition, it also brings issues of interpretation, prioritisation and decision making. The current asset-management methods usually are based on evaluating thresholds and disjointed analysis tools, which restrict their strengths in promoting proactive and data-driven maintenance practices. The study creates and assesses a combined visual analytics system in order to aid decision making in the management of railway infrastructure assets. The framework integrates data pre-processing, analytical intelligence, machine-learning, and interactive visual analytics to convert raw track geometry data into actionable decision-support products. The research design was a mixed-methods research design comprising of two large-scale case studies, one of them on the basis of the UK and Saudi Arabian railway networks, and the other one on the basis of expert validation. The data of track geometry measured by Network Measurement Trains and Track Geometry Inspection Vehicles was analysed to prove the relevance of the framework to the different operational and environmental conditions. The case study of the UK is a fully developed, regulation-based data environment whereas the Saudi Arabian case study is a developing network that is functioning in the harsh desert conditions. Findings indicate that the suggested framework improves the interpretability of complex condition data using integrated 2D, 3D, and GIS-based visual analytics. The unsupervised and supervised methods were combined to form machine-learning techniques which enhanced the performance of fault detection and classification and led to quantifiable reductions in false positive alerts compared with the baseline threshold-based methods. A comparative analysis shows that the framework can be adjusted to differences in data maturity, regulatory environment, and operational issues. The study brings on board a transferable and validated visual analytics model that provides the balance between advanced data analytics and feasible decision support in the management of railway infrastructure assets.12 0Item Restricted Predicting Comorbidities Using Electronic Health Records: The Role of Genetics and Explainable Artificial Intelligence(Saudi Digital Library, 2026) Alsaleh, Mohanad; Thygesen, Johan; Honghan, Wu; Andrew, McQuillinBackground: Comorbidity, the coexistence of multiple conditions in one individual, complicates care, reduces quality of life, and increases costs. This thesis examines whether medically actionable genes, as defined by the American College of Medical Genetics and Genomics (ACMG), are associated with additional comorbidities. Identifying such links, particularly when incidental pathogenic variants are discovered, could inform clinical action, guide patient management, and improve outcomes. Materials and methods: A systematic review evaluated machine learning (ML) models for comorbidity prediction, with emphasis on performance and explainability. A phenome-wide association study (PheWAS) using data from Genomics England (n = 78,121) examined associations between pathogenic variants in 81 ACMG genes and 301 comorbidities. Finally, SHAP, an explainable AI method, was applied to interpret genetic, clinical, and demographic drivers of comorbidity predictions. Results: The systematic review covered 22 studies describing 61 ML models. While 52 models showed good performance (accuracy 70–95% and AUC 0.70–0.89), only five incorporated explainability. The PheWAS identified 102 significant associations between 32 ACMG genes and 49 comorbidities, confirming known findings (TSC2 with acute kidney injury) and suggesting novel ones (TTR with intellectual disability). For ML prediction, XGBoost achieved the best performance (AUC = 0.93) and was used for SHAP analysis. SHAP highlighted established contributions, such as TTN to cardiovascular disease, and novel findings, including RYR1 with neonatal sepsis. Age and sex also played important roles across multiple comorbidity predictions. Discussion and conclusion: These findings expand the understanding of the impact of pathogenic variants in ACMG genes, highlighting broader comorbidity associations and demonstrating the value of XAI for interpreting prediction drivers. Limitations, including small sample sizes and extreme data imbalance, contributed to poor model performance and led to the exclusion of some genes and diseases. Future work should validate the findings in larger, independent cohorts and address challenges related to imbalance.16 0Item Restricted Optimising the Regeneration Process of Spent Lithium-Ion Battery Cathode Through a Performance Analysis Model(Saudi Digital Library, 2026) Alyoubi, Mohammed; Abdelkader, Amor; Huang, YiThe urgent global demand for sustainable energy storage materials has amplified interest in efficient recycling and regeneration methods for lithium-ion batteries (LIBs), particularly in response to the increasing volume of spent batteries generated by electric vehicles and portable electronics. This thesis investigates the potential of machine learning (ML) to optimise the regeneration process of spent LIB cathode materials, aiming to enhance performance recovery while reducing time, cost, and experimental workload. The research focuses on direct regeneration methods, which restore the electrochemical activity of cathode materials by repairing their crystal structure, rather than decomposing them into elemental components, making them highly promising for sustainable battery reuse. While ML has been applied to predict the performance of fresh LIBs, its application to regenerated cathode materials remains unexplored. Unlike fresh batteries, regenerated materials may exhibit residual impurities that affect their electrochemical behaviour, highlighting the need for specialised data-driven approaches tailored to these conditions. The study developed and validated an ML framework that integrated experimental data and predictive modelling to enable the optimisation of regeneration processes of three widely used cathode chemistries: lithium cobalt oxide (LCO), lithium iron phosphate (LFP), and nickel-manganese-cobalt oxide (NMC). A total of eight ML algorithms were evaluated, including Classification and Regression Trees, Support Vector Machine, K-Nearest Neighbours, Random Forest, and Artificial Neural Networks (ANN), to model battery performance and optimise regeneration conditions. Each case study demonstrated how ML can predict the discharge capacity of regenerated materials based on key parameters of the direct regeneration method, including regeneration temperature, duration, and the ratio and amount of added lithium salt. Results show that ANN provides the highest prediction accuracy, with R2 values exceeding 0.99 across all case studies. The ANN model was then employed to identify optimal regeneration conditions, with findings indicating that ML-guided approaches outperform traditional empirical methods in restoring battery performance. This thesis demonstrates the transformative potential of ML in the regeneration of spent LIB cathodes, presenting an accurate and sustainable approach to improving circularity in battery materials.37 0Item Restricted Assessment of Thoracic Aortic Morphology from 3D CT Scan Images Using a Multi-Stage Machine Learning Model for Representation Learning, Clustering, and Disease Prediction.(Saudi Digital Library, 2025) Alsolami, Ghadeer; Alexander, Smith; Mirsadraee, SaeedAbstract Introduction Aortic diseases (AD) are associated with significant morphological changes along the thoracic aorta (TA). Thoracic aortic aneurysm and dissection are among the most common morphological manifestations of AD, representing two major life-threatening conditions that contribute eventually to morbidity and mortality if left undetected and untreated. Morphological changes can arise from either inherited connective tissue disorders, such as Marfan syndrome (MFS) or natural aging that can cause different morphological patterns which not fully identified. Current risk assessment of the AD relies in 1 dimensional measurement of the aortic size which has been shown as a poor risk predictor. Therefore, there is a need for more accurate and advanced assessment tool to identify the variations of the morphological changes in MFS and aging groups. Recently, a growing subfield of machine learning (ML), particularly self-supervised machine learning (SSML), has gained attention for its ability to extract clinically meaningful anatomical patterns from 3-dimensional (3D) medical imaging without requiring large, labelled datasets. Despite numerous studies on ML in medical imaging, the application of SSML in these patients remains due to the insufficient availability of large, population-based imaging datasets. Aims (1) to generate an accurate images of volume-rendered thoracic aortas from patients with MFS, age-related changes; (2) to develop a SSML model that capable of extracting and learning morphological patterns in MFS and ageing; (3) to identify meaningful clusters of different aortic morphologies; and (4) to evaluate the added value of SSML pretraining by assessing the improvement in computer vision models for predicting MFS status from aortic volume-rendered images. Research question 1. Can a self-supervised machine learning model pre-trained on 3D thoracic aorta images capture the different morphological patterns Marfan syndrome and aging? 2. Can a predictive model distinguish between Marfan and non-Marfan cases across the entire cohort? Method A total of 117 3D volume-rendered images from patients with MFS (pre- and post-operative), aging group, and control group were used for model training and evaluation. Hierarchical agglomerative clustering (HAC) was applied to SSL-derived embeddings to explore latent morphological subgroups, while a predictive model was developed to classify Marfan vs. non-Marfan cases, comparing performance with and without SSML pretraining. For categorical variables (e.g., gender, ethnicity), one-vs-all chi-squared tests were used. For continuous variables (e.g., age, aortic measurements), one-vs-all Welch’s t-tests were applied, and results were summarized as mean ± standard deviation (SD). Results The SSML framework successfully captured discriminative morphological pattens of the TA represented as embeddings of the Uniform Manifold Approximation and Projection (UMAP). The HAC revealed 6 subclusters that were clinically meaningful clusters reflecting variations across all the measured anatomical parameter in sinus of Valsalva diameters, ascending aortic length, and overall thoracic morphology. Control-like clusters were consistently grouped in clusters 6.1 and 6.2. MFS patients grouped into clusters, 6.3, 6.5, and 6.6. whereas cluster 6.4 represented individuals with age-related changes. Predictive evaluation demonstrated that the SSML-pretrained model outperformed the baseline, achieving higher accuracy (90.6%), sensitivity (87.9%), specificity (93.2%), and area under the ROC curve (AUC 0.97 vs. 0.84). Conclusion The SSML framework developed in this study demonstrated strong performance in detecting aortic pathologies and shows promise in supporting physicians with the early identification of patients with Marfan syndrome. By enabling more accurate recognition of characteristic morphological patterns, this approach could ultimately support diagnostic decisions and improve patient outcomes. These methods represent excellent candidates for advancing state-of-the-art prediction model of Marfan syndrome based solely on imaging features.6 0Item Restricted Scientific Portfolio Optimization: A Risk-Adjusted Approach to Asset Allocation(Saudi Digital Library, 2025) Alawad, Naif; Neil, PhillipsThis dissertation evaluates the robustness of traditional, risk-based, and machine learning (ML) portfolio optimization methods under realistic market conditions. Classical Mean–Variance Optimization (MVO) is elegant in theory but fragile in practice due to estimation error and instability in crises. Risk-based approaches such as Risk Parity (RP) and Hierarchical Risk Parity (HRP) provide more resilient alternatives by allocating on volatility and correlation structures instead of unstable return forecasts. ML-enhanced MVO (ML-MVO), which substitutes predicted for historical returns, remains of uncertain value. A modular Python artefact was developed to compare these strategies using rolling five-year windows, monthly rebalancing, and strict walk-forward validation, complemented by an interactive dashboard interface. Performance was assessed through risk-adjusted metrics (Sharpe, Sortino, maximum drawdown, volatility) across both normal and crisis regimes, including the Global Financial Crisis (2008–2009) and the COVID-19 shock (2020). Sensitivity analysis with realistic weight constraints was also conducted to test robustness under practical implementation settings. Results show HRP consistently achieved the most robust risk-adjusted outcomes, outperforming MVO and ML-MVO in both full-sample and stressed settings. RP and equal weighting remained competitive baselines, while ML-MVO underperformed despite moderate predictive accuracy. Overall, the findings suggest ML contributes more effectively to restructuring optimization processes, as in HRP, than to direct return forecasting. The study also highlights inherent limitations of short-horizon ML forecasting and points to future research extending horizons, incorporating richer features, and exploring ML-enhanced risk estimation.8 0Item Restricted Graph Neural Networks for Drug Screening(Saudi Digital Library, 2025) Aqeeli, Noura Eissa; Panas, DagaDrug discovery is a lengthy and costly process that often involves small, noisy, and imbalanced datasets. In our study, we investigate the use of graph neural networks (GNNs) for predicting molecular homeostatic activity in neuronal cells through transfer learning. We evaluate Graph Convolutional Networks (GCNs) and Message Passing Neural Networks (MPNNs) with transfer learning, comparing their performance to Random Forest and non-transfer GNN baselines. To guide the selection of source datasets for pre-training, we implement a molecular latent representation similarity framework across nine MoleculeNet datasets. Additionally, we fine-tune a foundational molecular model on our target dataset. We evaluate the models using five-fold cross-validation, using the Area Under the Receiver Operating Characteristic curve (AUC-ROC) and the Area Under the Precision-Recall curve (AUC-PR) as metrics. Our results indicate that transferring knowledge from high-similarity source datasets outperforms the baseline models. Moreover, source-to-target transfer is more effective than fine-tuning the foundation model; however, the foundation model exhibits superior generalisation capabilities. Finally, we employ a selected set of models to rank an unlabelled molecular dataset. Our findings demonstrate that GNNs, combined with similarity-guided transfer learning, enhance performance in predicting bioactivity within low-data and imbalanced settings, highlighting the importance of carefully selecting source datasets to avoid negative transfer.9 0Item Restricted Multi-Omics Approaches to Explore Vancomycin Treatment Mechanism in Patients with Primary Sclerosing Cholangitis (PSC) - Inflammatory Bowel Disease (IBD)(Saudi Digital Library, 2025) AlOmar, Haneen; Acharjee, AnimeshIntroduction: Primary sclerosing cholangitis (PSC) is a comorbid condition associated with inflammatory bowel disease (PSC-IBD) that lacks effective treatments beyond liver transplantation. Although oral vancomycin (OV) has shown therapeutic promise, disease activity often returns after treatment withdrawal. This study aims to investigate the mechanisms of OV in PSC-IBD patients, supporting the development of more durable and targeted therapies. Method: Paired multi-omics data from 15 patients before and after OV treatment were analysed. The datasets included RNA-Seq, metatranscriptomics, bile acid metabolites, and 16S rRNA. After preprocessing, feature selection was performed using LASSO, ElasticNet, and Boruta-RF. Selected features were analysed in two complementary ways: first, intersected features that were identified by all models were assessed for their predictive robustness and integrated into correlation network graphs. Union features were subjected to pathway enrichment analysis to elucidate their biological significance. Results: The 3 models consistently selected a total of 13, 2, 4, and 3 intersected features simultaneously from RNA-Seq, metatranscriptomics, bile acid metabolites, and 16S rRNA, respectively. These features achieved predictive performance comparable to or superior to the full datasets. For example, intersected features outperformed the full dataset in metatranscriptomics, where Boruta-RF achieved a higher AUC (0.936 vs. 0.896), demonstrating the robustness and efficiency of selected features. Pathway enrichment analysis of union features in each omics revealed pathways related to mucosal healing, metabolism, and immune modulation. Correlation networks graphs demonstrated that OV-induced alterations in cross-omics before and after treatment. Conclusion: Based on paired data from only 15 patients, this study provided a comprehensive multi-omics perspective on OV’s impact in PSC-IBD patients and identified robust biomarkers. We also uncovered novel host–microbiome interactions not previously reported, highlighting potential targets for future therapies. While findings are promising, they require validation in larger, independent cohorts.10 0Item Restricted Predicting Osteoarthritis in Older Adults Using Literature-Based, Non-Invasive Risk Factors: A Cross-Sectional Analysis of ELSA Wave 9(Saudi Digital Library, 2025) Fnais, Tesneem; Yang, HuiOsteoarthritis (OA) is a prevalent joint disorder in older adults that is often diagnosed at a later stage, as clinical assessments typically rely on imaging and laboratory tests that are not readily accessible in all settings. This study aimed to develop and evaluate machine learning models that predict OA using non-invasive, self-reported features from Wave 9 of the English Longitudinal Study of Ageing (ELSA). A total of 4,723 participants aged 60 and above were included. An initial set of 32 features was selected based on existing literature and refined through a structured feature selection pipeline, resulting in a final set of 25 features, including joint pain and mobility limitations. Four supervised models -Logistic Regression, Random Forest, XGBoost, and CatBoost- were trained using a stratified train-test split and resampling to address class imbalance. The upsampled logistic regression model achieved the highest sensitivity (0.769) and strong overall performance (AUC = 0.755), while CatBoost showed the highest specificity (0.759) and an AUC of 0.747. A reduced logistic regression model using only the top 15 features retained similar accuracy and AUC. These findings demonstrate that OA can be predicted without imaging or biomarkers. The resulting models, particularly the logistic regression model, offer promise as cost-effective screening tools to support early identification and guide decisions about further clinical assessment. making them well-suited for primary care and digital health settings, especially where resources are limited.10 0Item Restricted Exploring Nonlinear Associations and Interactions of Risk Factors for Breast Cancer Incidence Using Machine Learning Approaches(Imperial College London, 2024-08) Alqarni, Lina; Heath Alicia; Berrington, AmyBACKGROUND: Breast cancer is influenced by a complex array of risk factors. This study aimed to identify nonlinear associations and interactions between various risk factors and breast cancer incidence using computationally efficient, interpretable methods. METHODS: Data from the Generations Study, a long-term prospective cohort of 104,423 women, were analysed. Risk factors evaluated included demographic, medical, reproductive, hormonal, and lifestyle variables. We compared the performance of traditional Cox proportional hazards models with tree-based methods, including Classification and Regression Trees (CART) and random forests, using the C-statistic. SHapley Additive exPlanations (SHAP) values were extracted to interpret random forest outputs, highlighting key risk factors and interactions. Stability selection was applied to enhance computational efficiency and identify the most stable and important variables. RESULTS: The multivariable Cox model achieved the highest predictive accuracy with C-index of 0.657, slightly outperforming the random forest model (C-index of 0.650). However, the random forest model revealed nonlinear associations and interactions not captured by the Cox model. Age, family history of breast cancer, and benign breast disease were among the most critical factors identified, with complex interactions noted between age, body mass index at entry, and family history with other risk factors such as hormone replacement therapy duration, oral contraceptive duration, and smoking pack-years. Stability selection effectively reduced the number of variables without compromising model performance. CONCLUSIONS: While linear models capture dominant associations, tree-based models like random forests offer additional insights into complex, nonlinear relationships among breast cancer risk factors, highlighting the potential for more personalised screening and prevention strategies16 0
