SACM - United Kingdom
Permanent URI for this collectionhttps://drepo.sdl.edu.sa/handle/20.500.14154/9667
Browse
57 results
Search Results
Item Restricted Machine Learning for Radiotherapy Treatment of Prostate Cancer(Saudi Digital Library, 2026) Alqarni, Maram; Teresa, Guerrero Urbano; Andrew, KingExternal beam radiotherapy (EBRT) and brachytherapy (BT) are both forms of radiation treatment used for prostate cancer to destroy cancer cells. EBRT applies the radiation externally while BT involves placing radioactive seeds inside the prostate. At Guy’s Cancer Centre, both treatment modalities are performed depending on various factors. Each of the treatment modalities involves different imaging modalities used for treatment planning, delivery and follow-up. However, both have some overlapped clinical tasks such as defining the clinical target volume (CTV) and organs at risk (OARs) from imaging data. The work described in this thesis aims to perform research to promote clinical translation of machine learning (ML) techniques to streamline workflows in EBRT and BT. The first piece of work in this thesis focuses on an ML-based segmentation model for prostate MRI. One of the main challenges affecting clinical adoption of ML in MRI segmentation is the domain shift problem. The findings of this piece of work reveal for the first time the significant impact on model performance of using different acquisition/annotation protocols, even if using the same scanner vendor/field strength. It is shown that training an ML model with data that covers the important sources of domain shift can produce a robust model with good generalisability performance. The next piece of work investigates the possibility of race bias in ML-based prostate MRI segmentation. Through experiments on a controlled dataset of White and Black patients, it is shown that the model performance gap between Black and White subjects is dependent on the level of (im)balance between Black and White subjects in the training data. Again, it is shown that training using demographically balanced data can produce a fair and robust model. The conclusion from both of these pieces of work is that model performance can be robust if the training data is sufficiently diverse, both in terms of image characteristics and patient demographics. Building upon these analyses, the thesis next investigates the clinical utility of a diagnostic prostate MRI model trained on diverse data and externally validates it on in-house clinical data. The evaluation of this model encompasses not only standard quantitative metrics but also measurement of inter-observer variability in manual segmentation and assessments of performance on downstream clinical tasks. Next, the thesis investigates the clinical utility of multi-organ ML-based segmentation models. Here, two models are investigated: one for planning MRI called the “FIMRAa-P” model and another radiotherapy CT model called the “PelvisMA-CT” model. Both models are extensively evaluated quantitatively and qualitatively by five observers. The agreement between the quantitative metrics and the qualitative clinical metrics is also investigated for each clinical structure, revealing generally poor agreement between the two. It is also shown that this agreement is dependent on the structure being segmented and the profession of the clinicians who perform the evaluations. One of the main clinical translation outcomes of this thesis is the deployment of PelvisMA-CT by the Clinical Scientific Computing (CSC) group at GSTFT, and its integration into a contouring application called GSTTAutoSeg. This model is currently being used clinically at Guy’s Cancer Centre and the thesis presents the results of a monitoring and enhancement study based on its ongoing clinical use. Overall, the thesis presents a number of key contributions, all aimed at promoting clinical translation of ML in EBRT and BT. It is hoped that the work performed will accelerate the benefits of ML in radiotherapy treatment planning and delivery and ensure that all patients benefit from the introduction of the thoroughly evaluated new technology.8 0Item Restricted Predicting Carbon Credit Prices Using Advanced Machine Learning Techniques(Saudi Digital Library, 2026) Rayan, Najdi; Wang, HaiAccurate forecasting of carbon credit prices supports risk management, investment decisions, and policy assessment in the context of climate action. EU ETS carbon prices exhibit volatility, non-linearity, and non-stationarity, which reduces the effectiveness of traditional forecasting models. This dissertation proposes and evaluates a three-stage hybrid machine learning model for one-day-ahead forecasting of EU Emissions Trading System (EU ETS) carbon prices. The architecture follows a divide-and-conquer strategy. First, Wavelet Packet Decomposition (WPD) decomposes the carbon price signal into multiple frequency components. Second, a Gated Recurrent Unit (GRU) network models temporal dependencies and forecasts the trend component. Third, an Extreme Gradient Boosting (XGBoost) model predicts and corrects the GRU residual errors using wavelet-derived detail components as input features. The model was trained and tested on a dataset covering January 2018 to December 2024. The dataset includes EU ETS carbon prices, Brent crude oil prices, and electricity prices, while the forecasting model is univariate and uses the carbon price series only. On an unseen test set of 510 days, the model achieved a Mean Absolute Percentage Error (MAPE) of 1.66%, a Root Mean Squared Error (RMSE) of 4.86 EUR/ton, and a Mean Absolute Error (MAE) of 4.41 EUR/ton. The results indicate that combining signal decomposition, deep learning, and gradient boosting provides stable forecasting performance for EU ETS carbon prices under realistic evaluation conditions.13 0Item Restricted Analysing Large-Scale Attacks in IoT Environments using ML/DL(Saudi Digital Library, 2025) Bokhari, Mohammed Ibrahim K; Neetesh, SexenaThe fine-grained classification of malicious network traffic presents a significant and persistent challenge in cybersecurity, primarily due to the extreme class imbalance inherent in real-world network data. Conventional machine learning approaches, which apply a single, unitary model to the problem, have demonstrated limited success, often failing to effectively identify rare but critical minority attack classes. This dissertation argues that the conventional model paradigm is fundamentally flawed for this problem space and proposes a hierarchical, multi-stage classification framework as a more robust alternative. This research presents a comprehensive, multi-faceted investigation into this problem, using the 34-class CICIoT2023 dataset as a benchmark. The study was conducted across four distinct experimental paths, comparing two ensemble methods (XGBoost and Random Forest) and two class-handling strategies (a "Grouped" approach that manually merges similar classes and an "Ungrouped" approach that tackles all 34 classes directly). Within this structure, we designed and implemented a 4-tier hierarchical framework that employs a "divide and conquer" strategy, using an initial classifier to handle majority traffic and a class-level routing mechanism to delegate ambiguous samples to specialised recovery tiers. An adaptive resampling strategy was deployed within these tiers, concentrating aggressive SMOTE only where required. The empirical results provide a holistic validation of the proposed architecture. The optimal configuration—an Ungrouped, XGBoost-led hierarchical framework—achieved a final accuracy of 0.9228 and Macro-F1 score of 0.7948, a substantial improvement over all other experimental paths and conventional baselines. More significantly, this approach demonstrated a more than 800% increase in the F1-score for some of the under-represented minority classes. The analysis also revealed a key architectural principle: classifier performance is role-dependent, with different ensemble methods excelling in different roles within the hierarchy, highlighting the importance of managing the bias-variance trade-off at a systemic level. Finally, this work provides a rigorous, data-centric analysis that distinguishes between model limitations and the inherent limitations of the dataset, identifying a "dataset-induced ceiling" on performance for 5 of the 34 classes. The primary contribution of this dissertation is, therefore, a methodologically robust and architecturally novel framework, validated through a comprehensive, multi-path experimental design. The principles of hierarchical decomposition and adaptive resource allocation are domain-agnostic and offer a promising direction for future research into extreme imbalance problems.13 0Item Restricted AI-Powered Multimodel Detection System for Cybersecurity Attacks: Design, Implementation, and Evaluation(Saudi Digital Library, 2025) Alhazmi, Marwan; Nguyen, HoangAs cyber threats have become increasingly complex, so too has the need for advanced detection methods to be able to analyze different types of data. Historically, traditional intrusion detection systems (IDS), have relied on analyzing one form of data, either a statistical analysis of network traffic or an alert log written in text format. These limitations restrict the capability of IDSs to detect the many complexities associated with modern attacks. Therefore, this dissertation proposes an AI powered, multimodel detection system that utilizes a combination of both structured network data, and unstructured alert text, to improve the performance of intrusion detection systems. The methodologies include preprocessing and feature extraction on the CICIDS2017 dataset, machine learning algorithms for the analysis of structured data and Natural Language Processing (NLP) algorithms for the analysis of text data. The multimodel fusion method used late fusion where the predictions from each modality are combined to produce a single prediction. In addition, several classification algorithms were trained and tested including Random Forest, Logistic Regression, and Text Classification. Results showed that the multimodel system significantly outperformed the single-modality systems based on the evaluation metrics of Accuracy, Precision, Recall, and F1-Score. Furthermore, the multimodel fusion strategy enhanced the context of the detection by reducing false positive detections; this addresses a major challenge that is commonly experienced by researchers in the field of Intrusion Detection Systems (IDS). Therefore, this dissertation provides a practical, scalable, multimodel AI-based framework for detecting cybersecurity threats and demonstrates the effectiveness of using a combination of structured and unstructured data sources, along with providing direction for further advancements in Intelligent Intrusion Detection Systems.28 0Item Restricted Evaluating Static, Contextual, and End-to-End Embedding Techniques for Malware Detection on Dynamic API Call Data(Saudi Digital Library, 2026) Basfar, Mohammed Raed; Joey, LamThe rate of malware development continues to challenge cybersecurity, with traditional signature- and heuristic-based techniques overwhelmed by polymorphic and zero-day attacks. Natural language processing (NLP) offers a promising direction by modeling dynamic API call sequences as semantic linguistic data, enabling sophisticated embedding and sequence-learning methods to be used for malware detection. This dissertation contrasts and analyzes three typical embedding methods static, contextual, and end-to-end task-learned representations—under a shared experimental framework. Specifically, it employs Word2Vec embeddings with a Convolutional Neural Network (CNN), contextual BERT embeddings with a CNN, and a Bidirectional Long Short-Term Memory (BiLSTM) network with a trainable embedding layer and weighted loss function to address class imbalance. The experiments were conducted on a dynamic API call dataset of around 44,000 malware and 1,000 benign samples, summarized by the first 100 API calls executed under sandboxed conditions. Results indicate that the Word2Vec + CNN pipeline had the highest overall accuracy and malware detection precision but the lowest benign recall. The BERT + CNN model provided more balanced class performance, but at the expense of added computational overhead. The BiLSTM had the highest benign recall, as it was able to easily distinguish from non-malicious activity, but the lowest precision and hugely added resource use. The findings point out the competing trade-offs among detection accuracy, benign recall, and processing efficiency, highlighting the issue of aligning model selection with actual security contexts' resource constraints and priorities. The study contributes by reporting a comparative systematic review of the embedding approaches for malware detection and offering informative insights into performance vs. efficiency trade-offs. Apart from its scientific significance, it proves the larger potential of NLP-based approaches to supporting malware detection systems and to informing the design of responsive, resource-aware cybersecurity systems.17 0Item Restricted Advances in Artificial Intelligence for Energy Forecasting and Performance Management in Buildings(Saudi Digital Library, 2026) Alkhatani, Nasser; Petri, IoanAccurate energy forecasting is essential for intelligent building management, supporting operational optimisation, strategic planning, and demand-side flexibility. However, existing forecasting methods often struggle to remain accurate across multiple time horizons and to generalise across different building types with limited data. This thesis addresses these challenges by developing a modular modelling framework that advances both multi-horizon forecasting and cross-building adaptability. The first contribution is a hybrid forecasting model (SVR → XGBoost → LSTM) designed to deliver stable prediction performance across four horizons: 24 hours, one week, one month, and one year. The hybrid design leverages the complementary strengths of its components SVR for noise reduction, XGBoost for nonlinear feature learning, and LSTM for long-range temporal modelling resulting in improved robustness and generalisation compared with single-model approaches. The second contribution introduces a deep hybrid model (CNN → GRU → LSTM) within a transfer learning framework. Pretrained on multi-building datasets and fine-tuned using limited data from new buildings, this approach enhances cross-domain adaptability while reducing training time and data requirements, demonstrating the practical value of transfer learning for scalable energy forecasting. A third contribution integrates statistical peak detection to support the identification of high- demand events, enabling forecasting outputs to inform grid-interactive building operations. Rigorous evaluation including multi-metric assessment, residual diagnostics, ablation testing, and statistical significance analysis confirms the reliability and robustness of the proposed models. Overall, the thesis provides methodological and empirical advances that strengthen data-driven building energy management. The results show that hybridisation and transfer learning, when carefully designed, can enhance accuracy, stability, and generalisation, offering a scalable pathway toward more efficient and sustainable smart building operations.16 0Item Restricted Advanced Machine Learning Approaches for Comprehensive Cardiovascular Disease Risk Prediction Using Synthetic Data and Dynamic Feature Selection(Saudi Digital Library, 2025) Alqulaity, Malak; Yang, PoCardiovascular diseases (CVD) are a leading cause of global mortality, highlighting the need for accurate and reliable risk prediction models. Traditional CVD risk assessment tools, such as Framingham, SCORE, and QRISK, have several limitations that affect their accuracy and applicability. These tools typically focus on a narrow set of major risk factors, potentially overlooking important non-traditional factors, resulting in a less comprehensive risk assessment. Additionally, they often rely on linear models, which may fail to capture complex, non-linear interactions within the data. This thesis addresses the limitations of traditional CVD risk assessment tools by developing a hybrid predictive framework that integrates advanced machine learning (ML) techniques to enhance the accuracy of Coronary Artery Calcium (CAC) score prediction and CVD risk assessment using both traditional and non-traditional risk factors. The research is structured around three key objectives: generating synthetic data, enhancing feature selection, and developing a hybrid approach. To address data limitations, a Tabular Generative Adversarial Network (GAN) was enhanced to generate high-quality synthetic data, effectively expanding the training dataset and improving model robustness. Feature selection was further refined through an adaptive SHAP-based method, which dynamically adjusts feature importance thresholds to capture both traditional and non-traditional CVD risk factors more accurately. Finally, a hybrid approach combining hyperparameter tuning algorithms (Genetic Algorithms, Particle Swarm Optimisation, and Bayesian Optimisation) with Gradient Boosting algorithms (XGBoost, LightGBM, and CatBoost) was implemented to maximise predictive accuracy. This two-stage model first predicts CAC scores and then uses these predictions, alongside additional risk factors, to assess the likelihood of CVD events. Results demonstrate that the hybrid approach consistently enhances prediction accuracy across multiple metrics, with the CatBoost model particularly outperforming in both CAC score prediction and CVD classification.13 0Item Restricted Agarwood Quality Classification in the Middle East: A Mixed-Methods Study of Social, Sensory, and Data-Driven Insights(Saudi Digital Library, 2025) AlSalem, Fatmah; Bembibre, CeciliaThis dissertation investigates the classification of agarwood quality in the Middle East, focusing on the Gulf Cooperation Council (GCC) countries, where oud holds profound cultural, religious, and economic value. The market lacks a unified formal grading system leading to multiple discrepan- cies. Employing a mixed-methods approach, the study first conducted a sensory panel to gain relative consumer insight. Composed of both Middle Eastern and non-Middle Eastern participants, the panel revealed how quality perception varies among non-experts. Next, semantic analysis of cultural discourse was extracted from social media that was then used to design a contextualized two-layer grading system; finally, that framework was applied on an e-commerce dataset of oud products, whereby an optimized Random Forest model leveraging TF-IDF classified quality grades using textual descriptions with 90.5% accuracy. This demonstrates how efficient machine learn- ing can effectively approximate sensory and cultural judgment from text data alone. The research concludes that digital platforms are repositories of cultural knowledge, anticipating that such frame- works can provide transparent, standardized, and scalable agarwood classification—channelling tradition and innovation for a fairer, more sustainable oud market in the region.16 0Item Restricted Modelling and Optimisation of The Continuous Pharmaceutical Manufacturing Process: A New Data-Driven Approach For Right-First-Time Production(Saudi Digital Library, 2025) Deebes, Motaz; Mahfouf, MahdiPharmaceutical industries, like most industries, are subjected to stringent quality and regulatory requirements to ensure the manufacturing of safe and high-quality medicinal products. Continuous manufacturing has emerged as a transformative approach offering the potential to meet global demands of medicines through efficient and continuous processes. However, its adoption in tablet manufacturing remains constrained by the complex, multivariate behaviour of particulate processes. Moreover, the lack of comprehensive modelling frameworks further hinders understanding and control of the multistage processes. This thesis aims to develop and evaluate novel predictive modelling frameworks tailored to the continuous manufacturing of pharmaceutical tablets, using data collected from an industrial-scale pilot plant (Consigma-25) encompassing five critical unit operations. An integrated and sequential modelling framework was constructed using ensemble machine learning techniques, including gradient boosting machines and random forests, to predict key quality attributes across stages, with Gaussian mixture models incorporated to reduce uncertainties. To enhance interpretability, a hybrid modelling approach combining artificial neural networks with interval type-2 fuzzy inference system was developed. Additionally, a novel integration of Adaptive Neuro-Fuzzy Inference System with a Genetic Algorithm formed the basis of a model-informed optimisation strategy, enabling the identification of optimal process settings to control the final product quality under “Right-First-Time” manufacturing. The results demonstrate that proposed frameworks were effective in capturing the non-linearity among process parameters and quality outcomes, achieving R2 values exceeding 0.90 across the frameworks. This represents a predictive capability improvement of 56% compared with prior studies. The incorporation of interpretable, uncertainty-aware methods ensured model outputs remained effective to illustrate the processes' understanding despite complexity. The model-informed optimisation strategy was validated through practical application within the right-first-time manufacturing concept. These research findings demonstrate the potential of the proposed frameworks to advance pharmaceutical tablet manufacturing by bridging the gap between scientific research innovation and scalable industrial implementation.20 0Item Restricted AI-Based Analysis of Magnetic Nanoparticle Relaxometry Curves for Structure-Specific Cancer Detection and Classification(Saudi Digital Library, 2025) AlHumam, Malack; Hovorka, OndrejCancer remains one of the world’s leading causes of death, and the key to successful treatment relies heavily on early and accurate diagnosis. This thesis explores a minimally invasive diagnostic method by combining magnetorelaxometry (MRX) with artificial intelligence (AI). Magnetorelaxometry measures how magnetic nanoparticles relax after being excited by an external magnetic field, producing relaxation curves that depend on anisotropy orientation and variation, particle number, structure geometry. Among magnetic nanoparticles, superparamagnetic iron oxide nanoparticles (SPIONs) are particularly suited for biomedical applications due to their biocompatibility and tunable relaxation properties. However, these curves often overlap and appear indistinguishable to the human eye, making traditional analysis challenging. The central research question of this thesis is whether AI can classify nanoparticle ensembles by structure and particle number from their relaxation curves, using them as unique markers for cancer detection and classification. To address this, five simulated datasets were generated, each incorporating multiple structures with different particle numbers under varying anisotropy conditions. After preprocessing, the data were analyzed with supervised, semi-supervised, and unsupervised models, supported by dimensionality reduction visualizations (PCA, t-SNE, UMAP). Supervised models achieved the strongest performance, with multiclass logistic regression reaching an accuracy of 0.89 in the dataset with aligned anisotropy and no variation. ZChains consistently emerged as the most distinguishable ensembles, relaxing roughly twice as long as YChains and providing clearer separability in both geometry and particle number, as confirmed by PCA scatter plots. In contrast, YChains frequently collapsed under z-axis anisotropy alignment, while Triangles and Rings were distinguishable only under controlled anisotropy variation. Arkus structures degraded rapidly when anisotropy variation increased. Semi-supervised pseudo-labeling maintained comparable accuracy of 0.817 under limited labeling, while unsupervised KMeans clustering, although non-predictive, provided insights into ensemble overlap and natural similarity groupings. The main contribution of this work is the demonstration that AI can classify nanoparticle ensembles through relaxation curve morphology rather than biomarker binding assays. This represents a shift from proof of detection toward structure-based classification, bridging magnetic physics with biomedical AI applications. Future directions include aligning anisotropy axes experimentally, exploring relaxation saturation for cancer staging, and translating AI pipelines to real biological magnetorelaxometry data.15 0
