Saudi Cultural Missions Theses & Dissertations
Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10
Browse
60 results
Search Results
Item Restricted A DATA ANALYTICS FRAMEWORK TO SUPPORT DECISION MAKING IN RAILWAY INFRASTRUCTURE ASSET MANAGEMENT(Saudi Digital Library, 2026) Alotaibi, Abdulaziz; Cardenas, IsidroThe process of management of the assets of the railway infrastructure is becoming increasingly dependent on the big amounts of the condition-monitoring information produced by the recent inspection technologies. Although this type of data can give a detailed picture of the track condition, it also brings issues of interpretation, prioritisation and decision making. The current asset-management methods usually are based on evaluating thresholds and disjointed analysis tools, which restrict their strengths in promoting proactive and data-driven maintenance practices. The study creates and assesses a combined visual analytics system in order to aid decision making in the management of railway infrastructure assets. The framework integrates data pre-processing, analytical intelligence, machine-learning, and interactive visual analytics to convert raw track geometry data into actionable decision-support products. The research design was a mixed-methods research design comprising of two large-scale case studies, one of them on the basis of the UK and Saudi Arabian railway networks, and the other one on the basis of expert validation. The data of track geometry measured by Network Measurement Trains and Track Geometry Inspection Vehicles was analysed to prove the relevance of the framework to the different operational and environmental conditions. The case study of the UK is a fully developed, regulation-based data environment whereas the Saudi Arabian case study is a developing network that is functioning in the harsh desert conditions. Findings indicate that the suggested framework improves the interpretability of complex condition data using integrated 2D, 3D, and GIS-based visual analytics. The unsupervised and supervised methods were combined to form machine-learning techniques which enhanced the performance of fault detection and classification and led to quantifiable reductions in false positive alerts compared with the baseline threshold-based methods. A comparative analysis shows that the framework can be adjusted to differences in data maturity, regulatory environment, and operational issues. The study brings on board a transferable and validated visual analytics model that provides the balance between advanced data analytics and feasible decision support in the management of railway infrastructure assets.6 0Item Restricted Predicting Comorbidities Using Electronic Health Records: The Role of Genetics and Explainable Artificial Intelligence(Saudi Digital Library, 2026) Alsaleh, Mohanad; Thygesen, Johan; Honghan, Wu; Andrew, McQuillinBackground: Comorbidity, the coexistence of multiple conditions in one individual, complicates care, reduces quality of life, and increases costs. This thesis examines whether medically actionable genes, as defined by the American College of Medical Genetics and Genomics (ACMG), are associated with additional comorbidities. Identifying such links, particularly when incidental pathogenic variants are discovered, could inform clinical action, guide patient management, and improve outcomes. Materials and methods: A systematic review evaluated machine learning (ML) models for comorbidity prediction, with emphasis on performance and explainability. A phenome-wide association study (PheWAS) using data from Genomics England (n = 78,121) examined associations between pathogenic variants in 81 ACMG genes and 301 comorbidities. Finally, SHAP, an explainable AI method, was applied to interpret genetic, clinical, and demographic drivers of comorbidity predictions. Results: The systematic review covered 22 studies describing 61 ML models. While 52 models showed good performance (accuracy 70–95% and AUC 0.70–0.89), only five incorporated explainability. The PheWAS identified 102 significant associations between 32 ACMG genes and 49 comorbidities, confirming known findings (TSC2 with acute kidney injury) and suggesting novel ones (TTR with intellectual disability). For ML prediction, XGBoost achieved the best performance (AUC = 0.93) and was used for SHAP analysis. SHAP highlighted established contributions, such as TTN to cardiovascular disease, and novel findings, including RYR1 with neonatal sepsis. Age and sex also played important roles across multiple comorbidity predictions. Discussion and conclusion: These findings expand the understanding of the impact of pathogenic variants in ACMG genes, highlighting broader comorbidity associations and demonstrating the value of XAI for interpreting prediction drivers. Limitations, including small sample sizes and extreme data imbalance, contributed to poor model performance and led to the exclusion of some genes and diseases. Future work should validate the findings in larger, independent cohorts and address challenges related to imbalance.12 0Item Restricted Optimising the Regeneration Process of Spent Lithium-Ion Battery Cathode Through a Performance Analysis Model(Saudi Digital Library, 2026) Alyoubi, Mohammed; Abdelkader, Amor; Huang, YiThe urgent global demand for sustainable energy storage materials has amplified interest in efficient recycling and regeneration methods for lithium-ion batteries (LIBs), particularly in response to the increasing volume of spent batteries generated by electric vehicles and portable electronics. This thesis investigates the potential of machine learning (ML) to optimise the regeneration process of spent LIB cathode materials, aiming to enhance performance recovery while reducing time, cost, and experimental workload. The research focuses on direct regeneration methods, which restore the electrochemical activity of cathode materials by repairing their crystal structure, rather than decomposing them into elemental components, making them highly promising for sustainable battery reuse. While ML has been applied to predict the performance of fresh LIBs, its application to regenerated cathode materials remains unexplored. Unlike fresh batteries, regenerated materials may exhibit residual impurities that affect their electrochemical behaviour, highlighting the need for specialised data-driven approaches tailored to these conditions. The study developed and validated an ML framework that integrated experimental data and predictive modelling to enable the optimisation of regeneration processes of three widely used cathode chemistries: lithium cobalt oxide (LCO), lithium iron phosphate (LFP), and nickel-manganese-cobalt oxide (NMC). A total of eight ML algorithms were evaluated, including Classification and Regression Trees, Support Vector Machine, K-Nearest Neighbours, Random Forest, and Artificial Neural Networks (ANN), to model battery performance and optimise regeneration conditions. Each case study demonstrated how ML can predict the discharge capacity of regenerated materials based on key parameters of the direct regeneration method, including regeneration temperature, duration, and the ratio and amount of added lithium salt. Results show that ANN provides the highest prediction accuracy, with R2 values exceeding 0.99 across all case studies. The ANN model was then employed to identify optimal regeneration conditions, with findings indicating that ML-guided approaches outperform traditional empirical methods in restoring battery performance. This thesis demonstrates the transformative potential of ML in the regeneration of spent LIB cathodes, presenting an accurate and sustainable approach to improving circularity in battery materials.33 0Item Restricted DYNAMIC REFINEMENT OF SOCKPUPPET DETECTION MODELS WITH HUMAN-IN-THE-LOOP PROCESSES GUIDED BY MACHINE LEARNING ENGINEERING RULES(Saudi Digital Library, 2025) Baamer, Rafeef Abdullah B; Boicu, MihaiIn recent years, people have increasingly relied on Online Social Networks (OSNs) for various aspects of their daily lives, including communication, information sharing, and entertainment. Although these platforms provide many benefits, their massive and continuous use has also caused negative behaviors and malicious activities. One of the most critical challenges is the growing presence of malicious accounts that undermine the trustworthiness and integrity of online interactions and communication. Such accounts include personal spammers, impersonators, and cyborgs. However, one of the most harmful and complex types is the sockpuppet account. Sockpuppet accounts refer to accounts created by an individual or a coordinated group for deceptive or manipulative purposes, such as spreading misinformation or promoting specific agendas. The term encompasses several subtypes, which are impersonation sockpuppets, fake-profile sockpuppets, promotional or antagonistic sockpuppets, misinformation sockpuppets, troll sockpuppets, and spam sockpuppets. These accounts negatively affect OSNs in multiple ways: they reduce the authenticity and integrity of online communication, degrade information quality by disseminating false or biased content, manipulate public opinion by supporting certain agendas or campaigns, and contribute to community disruption and toxicity through hate speech or coordinated harassment. While prior studies have achieved promising results in sockpuppet account detection, several limitations and research gaps remain. First, most existing approaches focus on identifying a specific type of sockpuppet account—such as spammers or fake reviewers—which limits the generalizability of their models. Second, only a few studies have explored or implemented hybrid detection techniques, as most rely on a single methodological approach. Third, many models are tested on a single platform or dataset, which restricts their scalability and cross-platform applicability. Moreover, no prior research has proposed a detection model specifically designed for Arabic sockpuppet accounts. Finally, there has been limited involvement of human expertise and underutilization of Human-in-the-Loop (HITL) analysis in refining and validating detection outcomes. To address these limitations, this dissertation presents three major experiments conducted across Wikipedia, Reddit, and X/Twitter platforms, targeting different categories of sockpuppets—general, troll, and spammer accounts. In these three experiments, various detection approaches were employed, including individual machine learning classifiers, ensemble voting, deep learning, and transformer-based (AraBERT) models, to detect and classify sockpuppet accounts across multiple platforms. These models were subsequently integrated into a Human-in-the-Loop analysis framework to enhance their performance through multiple refinement cycles, identifying and applying machine learning engineering rules (MLE), e.g., mixed-initiative feature optimization, data improvement, and hyperparameter tuning of classifiers. The process involved iterative model tuning and evaluation, resulting in the formulation of MLE rules derived from both model insights and human feedback. This research yields several contributions: it developed generalizable hybrid detection techniques that increased the performance in sockpuppet accounts detection (as measured by accuracy, precision, recall, and F-Score); second, it introduced a validation process for sockpuppet datasets combining transformer-based model for posts labeling and Human-in-the-Loop analysis and review which also resulted in the first Arabic labeled sockpuppet accounts dataset, addressing a major linguistic and cultural gap in existing research; third, it established a systematic approach for identifying borderline cases that require human review and translating these insights into model-refinement and MLE rules to enhance overall detection performance and generalizability; finally, it developed a Human-in-the-Loop process for analysts for model development and dynamic refinement that was tested across multiple datasets representing diverse online platforms and different types of sockpuppet accounts.22 0Item Restricted Assessment of Thoracic Aortic Morphology from 3D CT Scan Images Using a Multi-Stage Machine Learning Model for Representation Learning, Clustering, and Disease Prediction.(Saudi Digital Library, 2025) Alsolami, Ghadeer; Alexander, Smith; Mirsadraee, SaeedAbstract Introduction Aortic diseases (AD) are associated with significant morphological changes along the thoracic aorta (TA). Thoracic aortic aneurysm and dissection are among the most common morphological manifestations of AD, representing two major life-threatening conditions that contribute eventually to morbidity and mortality if left undetected and untreated. Morphological changes can arise from either inherited connective tissue disorders, such as Marfan syndrome (MFS) or natural aging that can cause different morphological patterns which not fully identified. Current risk assessment of the AD relies in 1 dimensional measurement of the aortic size which has been shown as a poor risk predictor. Therefore, there is a need for more accurate and advanced assessment tool to identify the variations of the morphological changes in MFS and aging groups. Recently, a growing subfield of machine learning (ML), particularly self-supervised machine learning (SSML), has gained attention for its ability to extract clinically meaningful anatomical patterns from 3-dimensional (3D) medical imaging without requiring large, labelled datasets. Despite numerous studies on ML in medical imaging, the application of SSML in these patients remains due to the insufficient availability of large, population-based imaging datasets. Aims (1) to generate an accurate images of volume-rendered thoracic aortas from patients with MFS, age-related changes; (2) to develop a SSML model that capable of extracting and learning morphological patterns in MFS and ageing; (3) to identify meaningful clusters of different aortic morphologies; and (4) to evaluate the added value of SSML pretraining by assessing the improvement in computer vision models for predicting MFS status from aortic volume-rendered images. Research question 1. Can a self-supervised machine learning model pre-trained on 3D thoracic aorta images capture the different morphological patterns Marfan syndrome and aging? 2. Can a predictive model distinguish between Marfan and non-Marfan cases across the entire cohort? Method A total of 117 3D volume-rendered images from patients with MFS (pre- and post-operative), aging group, and control group were used for model training and evaluation. Hierarchical agglomerative clustering (HAC) was applied to SSL-derived embeddings to explore latent morphological subgroups, while a predictive model was developed to classify Marfan vs. non-Marfan cases, comparing performance with and without SSML pretraining. For categorical variables (e.g., gender, ethnicity), one-vs-all chi-squared tests were used. For continuous variables (e.g., age, aortic measurements), one-vs-all Welch’s t-tests were applied, and results were summarized as mean ± standard deviation (SD). Results The SSML framework successfully captured discriminative morphological pattens of the TA represented as embeddings of the Uniform Manifold Approximation and Projection (UMAP). The HAC revealed 6 subclusters that were clinically meaningful clusters reflecting variations across all the measured anatomical parameter in sinus of Valsalva diameters, ascending aortic length, and overall thoracic morphology. Control-like clusters were consistently grouped in clusters 6.1 and 6.2. MFS patients grouped into clusters, 6.3, 6.5, and 6.6. whereas cluster 6.4 represented individuals with age-related changes. Predictive evaluation demonstrated that the SSML-pretrained model outperformed the baseline, achieving higher accuracy (90.6%), sensitivity (87.9%), specificity (93.2%), and area under the ROC curve (AUC 0.97 vs. 0.84). Conclusion The SSML framework developed in this study demonstrated strong performance in detecting aortic pathologies and shows promise in supporting physicians with the early identification of patients with Marfan syndrome. By enabling more accurate recognition of characteristic morphological patterns, this approach could ultimately support diagnostic decisions and improve patient outcomes. These methods represent excellent candidates for advancing state-of-the-art prediction model of Marfan syndrome based solely on imaging features.6 0Item Restricted Machine Learning Systems for Unsupervised Time Series Anomaly Detection(Saudi Digital Library, 2025) Alnegheimish, Sarah; Veeramachaneni, KalyanModern assets – from launched satellites to electric vehicles – output dense, multivariate time series data that must be monitored for deviations from “normal” behavior. This monitoring task is referred to as time series anomaly detection. The current state of the industry still depends on fixed or heuristic thresholds that often drown operators in false alarms, and can miss the subtle, context-dependent faults that matter most. This thesis addresses unsupervised time series anomaly detection as an end-to-end problem, asking how we can learn, evaluate, and deploy models that judiciously flag anomalies while remaining intuitive to the end user. This thesis provides contributions in the form of both algorithms and systems. First, it introduces three models that enlarge the design space of unsupervised time series anomaly detection: TadGAN, which leverages adversarial reconstruction; AER, which unifies predictive and reconstructive objectives in a single hybrid score; and MixedLSTM, which explicitly incorporates interdependencies to improve anomaly detection in multivariate time series. We propose two range-based evaluation metrics that quantify detection quality over temporal intervals. Second, it presents our system Orion, which abstracts anomaly detection pipelines as directed acyclic graphs of reusable primitives, providing user-friendly APIs and enabling interactive visual inspection. Building on this infrastructure, OrionBench performs periodic, fully reproducible benchmarks, producing leaderboards that align research innovations with the needs of end users. Third, the thesis explores a new paradigm – foundation models for unsupervised time series anomaly detection – by formulating SigLLM, which employs large language models and time series foundation models for zero-shot anomaly detection via prompting and forecasting. This paradigm indicates a promising path to developing scalable models for anomaly detection. Finally, beyond evaluating our systems on publicly available datasets, we provide extensive experiments on two industrial case studies that demonstrate improved detection accuracy and practical usability of our system.25 0Item Restricted Machine Learning-based Detection Strategies for DDoS Attacks(Saudi Digital Library, 2025) Alshmlan, Abdullah Salem A; Songfeng, LuWith the rapid development of information technology, Distributed Denial-of-Service (DDoS) attacks have become a major threat to network security, posing severe challenges to the online services of enterprises and individuals. Traditional defense methods are often inefficient against complex, evolving attack patterns and fail to provide better detection and response. To address these limitations, this study focuses on developing and evaluating machine learning-based models for detecting Distributed Denial-of-Service (DDoS) attacks. A hybrid model combining lightweight Convolutional Neural Networks (CNNs) and Bidirectional Long Short-Term Memory (BiLSTM) networks is developed to leverage CNN’s spatial feature extraction and BiLSTM’s temporal dependency modeling. The CIDDS-001 dataset is used after rigorous preprocessing, including cleaning, feature selection, normalization, and sliding-window segmentation. Several architectures are trained and compared, including the proposed CNN-BiLSTM and an enhanced Self-Attention BiLSTM variant that dynamically emphasizes critical traffic patterns. Experimental evaluation using metrics such as accuracy, precision, recall, and F1-score demonstrates that the hybrid and attention-based models achieve superior performance and effectively reduce false alarm rates. Overall, the study provides a practical and adaptable approach for DDoS attack detection, enhancing the responsiveness and reliability of network defense systems. Future work will focus on extending this framework to larger and more diverse datasets to further improve its generalization in real-world scenarios.24 0Item Restricted Scientific Portfolio Optimization: A Risk-Adjusted Approach to Asset Allocation(Saudi Digital Library, 2025) Alawad, Naif; Neil, PhillipsThis dissertation evaluates the robustness of traditional, risk-based, and machine learning (ML) portfolio optimization methods under realistic market conditions. Classical Mean–Variance Optimization (MVO) is elegant in theory but fragile in practice due to estimation error and instability in crises. Risk-based approaches such as Risk Parity (RP) and Hierarchical Risk Parity (HRP) provide more resilient alternatives by allocating on volatility and correlation structures instead of unstable return forecasts. ML-enhanced MVO (ML-MVO), which substitutes predicted for historical returns, remains of uncertain value. A modular Python artefact was developed to compare these strategies using rolling five-year windows, monthly rebalancing, and strict walk-forward validation, complemented by an interactive dashboard interface. Performance was assessed through risk-adjusted metrics (Sharpe, Sortino, maximum drawdown, volatility) across both normal and crisis regimes, including the Global Financial Crisis (2008–2009) and the COVID-19 shock (2020). Sensitivity analysis with realistic weight constraints was also conducted to test robustness under practical implementation settings. Results show HRP consistently achieved the most robust risk-adjusted outcomes, outperforming MVO and ML-MVO in both full-sample and stressed settings. RP and equal weighting remained competitive baselines, while ML-MVO underperformed despite moderate predictive accuracy. Overall, the findings suggest ML contributes more effectively to restructuring optimization processes, as in HRP, than to direct return forecasting. The study also highlights inherent limitations of short-horizon ML forecasting and points to future research extending horizons, incorporating richer features, and exploring ML-enhanced risk estimation.6 0Item Restricted Graph Neural Networks for Drug Screening(Saudi Digital Library, 2025) Aqeeli, Noura Eissa; Panas, DagaDrug discovery is a lengthy and costly process that often involves small, noisy, and imbalanced datasets. In our study, we investigate the use of graph neural networks (GNNs) for predicting molecular homeostatic activity in neuronal cells through transfer learning. We evaluate Graph Convolutional Networks (GCNs) and Message Passing Neural Networks (MPNNs) with transfer learning, comparing their performance to Random Forest and non-transfer GNN baselines. To guide the selection of source datasets for pre-training, we implement a molecular latent representation similarity framework across nine MoleculeNet datasets. Additionally, we fine-tune a foundational molecular model on our target dataset. We evaluate the models using five-fold cross-validation, using the Area Under the Receiver Operating Characteristic curve (AUC-ROC) and the Area Under the Precision-Recall curve (AUC-PR) as metrics. Our results indicate that transferring knowledge from high-similarity source datasets outperforms the baseline models. Moreover, source-to-target transfer is more effective than fine-tuning the foundation model; however, the foundation model exhibits superior generalisation capabilities. Finally, we employ a selected set of models to rank an unlabelled molecular dataset. Our findings demonstrate that GNNs, combined with similarity-guided transfer learning, enhance performance in predicting bioactivity within low-data and imbalanced settings, highlighting the importance of carefully selecting source datasets to avoid negative transfer.9 0Item Restricted Multi-Omics Approaches to Explore Vancomycin Treatment Mechanism in Patients with Primary Sclerosing Cholangitis (PSC) - Inflammatory Bowel Disease (IBD)(Saudi Digital Library, 2025) AlOmar, Haneen; Acharjee, AnimeshIntroduction: Primary sclerosing cholangitis (PSC) is a comorbid condition associated with inflammatory bowel disease (PSC-IBD) that lacks effective treatments beyond liver transplantation. Although oral vancomycin (OV) has shown therapeutic promise, disease activity often returns after treatment withdrawal. This study aims to investigate the mechanisms of OV in PSC-IBD patients, supporting the development of more durable and targeted therapies. Method: Paired multi-omics data from 15 patients before and after OV treatment were analysed. The datasets included RNA-Seq, metatranscriptomics, bile acid metabolites, and 16S rRNA. After preprocessing, feature selection was performed using LASSO, ElasticNet, and Boruta-RF. Selected features were analysed in two complementary ways: first, intersected features that were identified by all models were assessed for their predictive robustness and integrated into correlation network graphs. Union features were subjected to pathway enrichment analysis to elucidate their biological significance. Results: The 3 models consistently selected a total of 13, 2, 4, and 3 intersected features simultaneously from RNA-Seq, metatranscriptomics, bile acid metabolites, and 16S rRNA, respectively. These features achieved predictive performance comparable to or superior to the full datasets. For example, intersected features outperformed the full dataset in metatranscriptomics, where Boruta-RF achieved a higher AUC (0.936 vs. 0.896), demonstrating the robustness and efficiency of selected features. Pathway enrichment analysis of union features in each omics revealed pathways related to mucosal healing, metabolism, and immune modulation. Correlation networks graphs demonstrated that OV-induced alterations in cross-omics before and after treatment. Conclusion: Based on paired data from only 15 patients, this study provided a comprehensive multi-omics perspective on OV’s impact in PSC-IBD patients and identified robust biomarkers. We also uncovered novel host–microbiome interactions not previously reported, highlighting potential targets for future therapies. While findings are promising, they require validation in larger, independent cohorts.10 0
