Saudi Cultural Missions Theses & Dissertations
Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10
Browse
121 results
Search Results
Item Restricted COPD-Aware Modelling of Heart Failure Hospital Admissions Using Routinely Collected Primary Care Prescription Data(Saudi Digital Library, 2025) Alghamdi, Taghreed Safar; AishwaryaprajnaHeart failure (HF) is a leading cause of unplanned hospital admissions in the United Kingdom (UK), consuming 1–2% of the National Health Service (NHS) annual budget, with most costs from inpatient care. Many predictive models oversimplify medication histories, relying on static indicators instead of time-aware prescribing patterns. This study improves HF admission prediction using UK primary care data, focusing on monthly dosage trends of three therapeutic classes angiotensin converting en- zyme inhibitors (ACEIs), beta-blockers, and angiotensin receptor blockers (ARBs) and the influence of Chronic Obstructive Pulmonary Disease (COPD). Three linked datasets patient demographics and comorbidities (patientinfo), prescription records (prescriptions), and chronic condition diagnoses (indexdates) were merged after cleaning and validation. Static attributes and temporal medication features were used to train Long Short-Term Memory (LSTM) networks, Random Forest, and Logistic Regression. Due to poor performance of the LSTM and Random Forest in a multi-class setting (ad- mission count categories), the task was reframed as binary classification (admission vs. no admission), with class imbalance addressed using the Synthetic Minority Oversampling Technique (SMOTE). The final dataset included 963 patients and over 109521 monthly prescription records. The best perfor- mance was from standard Random Forest (without SMOTE), which retained clinical interpretability, identifying COPD status, total monthly medication dosage, and age at HF diagnosis as top predic- tors. COPD patients had a 12% higher admission rate (59.1% vs. 41.8%). These findings show that granular, dosage-aware prescribing data can enhance HF admission prediction. Future work will ex- plore hybrid classification regression models, incorporate laboratory and lifestyle data, and validate externally to improve generalisability and support NHS decision-making.7 0Item Restricted Combining Traditional and Machine Learning Approaches to Predict TCGA Colon Cancer Outcomes(Saudi Digital Library, 2025) Alotaibi, Reem; MONDAL, SUDIPThis study utilises standardised clinical data from the Cancer Genome Atlas Colon Adenocarcinoma (TCGA-COAD) cohort to perform a comparative survival analysis of colorectal cancer (CRC). Three modelling approaches were evaluated: the Cox Proportional Hazards (Cox PH) model, LASSO-penalised Cox regression, and the Gradient Boosted Survival Model (GBSM). Models were trained and evaluated using the concordance index (C-index) and time-dependent area under the curve (AUC) following comprehensive data preprocessing, including missing value imputation, outlier removal, and Kaplan–Meier–based variable stratification. LASSO-Cox improved model sparsity and feature selection (C-index = 0.80), while Cox PH demonstrated consistent identification of clinically established predictors with strong interpretability (C-index = 0.76). GBSM achieved the highest predictive performance (C-index = 0.87; AUC = 0.841) by effectively modelling complex non-linear relationships. Model interpretability was enhanced using SHAP values, which highlighted key prognostic factors, including tumour staging components (T4, N2, M1), as well as underexplored but clinically meaningful variables such as residual tumour status (R2), age at diagnosis, and ethnicity. These findings demonstrate the potential of interpretable machine learning models to improve survival prediction and feature discovery in colorectal cancer. The study highlights the importance of external validation and multimodal data integration to enhance generalisability and translational relevance in precision oncology.5 0Item Restricted Risk Factor Analysis and Prediction of Chronic Kidney Disease Using Clinical Data from Indian Patients(Saudi Digital Library, 2025) Alkhunaizan, Sarah; Claudio, FronterreChronic kidney disease (CKD) is a progressive condition that is frequently underdiagnosed as it is asymptomatic in early stages, creating a need for reliable prediction tools to support earlier identification and intervention. This study aimed to (1) identify key clinical and demographic factors associated with CKD and (2) develop and compare predictive models by applying routinely collected health data. Analysis was conducted using a real-world clinical dataset from Apollo Hospitals in Tamil Nadu, India, made publicly available via the UCI Machine Learning Repository (n = 397; 25 variables; binary outcome: CKD vs non-CKD). To reduce data leakage and focus on disease prediction, direct diagnostic biomarkers (serum creatinine, blood urea, and urine albumin) were excluded. Missingness (10.5%) was assessed, Little’s MCAR test rejected MCAR, and regression findings were consistent with a MAR mechanism; four strategies were compared (complete-case analysis, deterministic, stochastic, and random forest imputation), with random forest imputation selected for subsequent analyses. Exploratory analyses described distributions and associations, and correlated predictors were removed to mitigate multicollinearity. Three models, LASSO logistic regression, decision tree (CART), and XGBoost, were trained using a 70/30 train-test split with 10-fold cross-validation and evaluated using accuracy, sensitivity, specificity, ROC-AUC, and calibration. XGBoost achieved the best discrimination (accuracy 96.6%, AUC 0.991), while the decision tree demonstrated the strongest calibration. Across models, the most influential predictors consistently included red blood cell count, hypertension, diabetes mellitus, sodium, abnormal urinary red blood cells, and appetite. These findings support the utility of machine learning models, particularly XGBoost, for early CKD risk prediction using routine clinical data, while highlighting the importance of robust preprocessing and validation to improve clinical applicability.5 0Item Restricted Machine Learning Techniques for Financial Loan Default Prediction in UK: A Comparative Analysis of Decision Tree and Random Forest Models(Saudi Digital Library, 2025) Alrakan, Fahad Abdulaziz; Alwzinani, FarisThis dissertation proposes a comprehensive approach to variable selection and model comparison applied to credit scoring, based on a Lending Club 2016–2018 dataset. The methodology combines an initial manual selection, based on completeness and business logic, followed by an automatic selection via RFECV (Recursive Feature Elimination with Cross-Validation) using a Random Forest. Finally, an importance permutation analysis and an ablation experiment (Top 10 variables) complete the evaluation. The results show that all 21 variables selected are considered relevant by RFECV, but that most of the predictive power is concentrated in a subset of about 15 variables. A comparison of the models highlights the clear superiority of Random Forest (AUC ≈ 0.713; PR-AUC ≈ 0.437) over Decision Tree (AUC ≈ 0.594; PR-AUC ≈ 0.319). Permutation importance analysis confirms business intuition: interest rate, credit sub- grade, and residential status appear to be the main explanatory factors, supplemented by financial indicators (debt ratio, loan amount, FICO score). The ablation experiment shows that these ten main variables are sufficient to preserve almost all of the Random Forest's performance (AUC = 0.708), while reducing training time by approximately 40%. These results highlight two major points: (i) Random Forest is robust and capable of effectively exploiting a small core of variables, but its performance remains below the standards expected for an industrial model (>0.80 AUC); (ii) the hierarchy of variables reveals both the relevance of expected indicators and the redundancy between certain correlated measures. The limitations identified concern sensitivity to correlations, the temporal restriction of the sample (2016–2018), and the computational cost of certain steps (RFECV). In conclusion, this project validates the feasibility of a robust and parsimonious model based on Random Forest, while opening up prospects for improvement: use of boosting algorithms, calibration of thresholds according to economic issues, temporal robustness tests, and pipeline optimization.2 0Item Restricted The Additional Regulatory Challenges Posed by AI In Financial Trading(Saudi Digital Library, 2025) Almutairi, Nasser; Alessio, AzzuttiAlgorithmic trading has shifted from rule-based speed to adaptive autonomy, with deep learning and reinforcement learning agents that learn, re-parameterize, and redeploy in near real time, amplifying opacity, correlated behaviours, and flash-crash dynamics. Against this backdrop, the dissertation asks whether existing EU and US legal frameworks can keep pace with new generations of AI trading systems. It adopts a doctrinal and comparative method, reading MiFID II and MAR, the EU AI Act, SEC and CFTC regimes, and global soft law (IOSCO, NIST) through an engineering lens of AI lifecycles and value chains to test functional adequacy. Chapter 1 maps the evolution from deterministic code to self-optimizing agents and locates the shrinking space for real-time human oversight. Chapter 2 reframes technical attributes as risk vectors, such as herding, feedback loops, and brittle liquidity, and illustrates enforcement and stability implications. Chapter 3 exposes human-centric assumptions (intent, explainability, “kill switches”) embedded in current rules and the gaps they create for attribution, auditing, and cross-border supervision. Chapter 4 proposes a hybrid, lifecycle-based model of oversight that combines value-chain accountability, tiered AI-agent licensing, mandatory pre-deployment verification, explainability XAI requirements, cryptographically sealed audit trails, human-in-the-loop controls, continuous monitoring, and sandboxed co-regulation. The contribution is threefold: (1) a technology-aware risk typology linking engineering realities to market integrity outcomes; (2) a comparative map of EU and US regimes that surfaces avenues for regulatory arbitrage; and (3) a practicable governance toolkit that restores traceable accountability without stifling beneficial innovation. Overall, the thesis argues for moving from incremental, disclosure-centric tweaks to proactive, lifecycle governance that embeds accountability at design, deployment, and post-trade, aligning next-generation trading technology with the enduring goals of fair, orderly, and resilient markets.8 0Item Restricted Semi-Supervised Approach For Automatic Head Gesture Classification(Saudi Digital Library, 2025) Alsharif, Wejdan; Hiroshi, ShimodairaThis study utilizes a semi-supervised method, particularly self-training, for automatic head gesture recognition using motion caption data. It explores and compares fully supervised deep learning models and self-training pipelines in terms of their perfor- mance and training approaches. The proposed approach achieved an accuracy score of 52% and a macro F1 score of 44% in the cross validation. Results have shown that leveraging self-training as part of the learning process contributes to improved model performance, due to generating pseudo-labeled data that effectively supplements the original labeled dataset, thereby enabling the model to learn from a larger and more diverse set of training examples.4 0Item Restricted Deep Learning-Based White Blood Cell Classification Through a Free and Accessible Application(Saudi Digital Library, 2025) Alluwaim, Yaseer; Campbell, NeillBackground Microscopy of peripheral blood smears (PBS) continues to play a fundamental role in hematology diagnostics, offering detailed morphological insights that complement automated blood counts. Examination of a stained blood film by a trained technician is among the most frequently performed tests in clinical hematology laboratories. Nevertheless, manual smear analysis is labor-intensive, time-consuming, and prone to considerable variability between observers. These challenges have spurred interest in automated, deep learning-based approaches to enhance efficiency and consistency in blood cell assessment. Methods We designed a convolutional neural network (CNN) using a ResNet-50 backbone, applying standard transfer-learning techniques for white blood cell (WBC) classification. The model was trained on a publicly available dataset of approximately 4,000 annotated peripheral smear images representing eight WBC types. The image processing workflow included automated nucleus detection, normalization, and extensive augmentation (rotation, scaling, etc.) to improve model generalization. Training was performed with the PyTorch Lightning framework for efficient development. Application The final model was integrated into a lightweight web application and deployed on Hugging Face Spaces, allowing accessible browser-based inference. The application provides an easy-to-use interface to upload images, which are then automatically cropped and analyzed in real-time. This open and free tool is intended to provide immediate classification results. It is also a useful tool for laboratory technologists without requiring specialized hardware or software. Results Testing on an independent set revealed that the ResNet-50 network reached 98.67% overall accuracy. Performance was consistently high across all eight WBC categories. Precision, recall, and specificity closely matched the overall accuracy, indicating well-balanced classification. However, for the assessment of real-world generalization, the model was tested on an external heterogeneous dataset from different sources. It performed with 86.33% accuracy, reflecting strong performance outside of its main training data. The confusion matrix showed negligible misclassifications. This suggested consistent distinction between leukocyte types. Conclusion This study indicates that a lightweight AI tool can support peripheral smear analysis by offering rapid and consistent WBC identification via a web interface. Such a system may reduce laboratory workload and observer variability, particularly in resource-limited or remote settings where expert microscopists are scarce, and serve as a practical training aid for personnel learning cell morphology. Limitations include reliance on a single dataset, which may not encompass all staining or imaging variations, and evaluation performed offline. Future work will aim to expand dataset diversity, enable real-time integration with digital microscopes, and conduct clinical validation to broaden applicability and adoption. Application link: https://huggingface.co/spaces/xDyas/wbc-classifier10 0Item Restricted AI-Based Approaches for Respiratory Disease Detection Using Audio Signals and Imaging Data(Saudi Digital Library, 2025) Shati, Asmaa; Hassan, Ghulam Mubashar; Datta, AmitavaRespiratory diseases (RDs) remain major global health concerns, typically diagnosed through imaging and auscultation, with cough sounds also offering diagnostic cues. These methods, however, are often subjective and depend on expert interpretation. Advances in machine learning (ML) enable automated RD diagnosis, yet challenges such as limited data, high computational costs, and accessibility gaps persist, underscoring the need for innovative approaches. This thesis proposes a series of novel approaches for automated RD detection, utilizing either cough audio or CXR as input modalities, selected for their availability and affordability. These approaches integrate advanced techniques for segmentation, feature extraction, and subsequent classification, offering practical and cost-effective diagnostic solutions. Extensive evaluation on multiple open-source datasets demonstrates the effectiveness of the proposed approaches across diverse diagnostic contexts.27 0Item Restricted Insider Threat Detection in a Hybrid IT Environment Using Unsupervised Anomaly Detection Techniques(Saudi Digital Library, 2025) Alharbi, Mohammed; Antonio, GouglidisThis dissertation analyses insider threat detection in hybrid IT environments with unsupervised anomaly detection techniques. Insider threats, including those committed by trusted persons with granted access, are considered to be one of the most challenging to alleviate cybersecurity threats because they resemble legal user behavior and do not have labelled datasets to train supervised models. Hybrid infrastructures, an integration of on-premise and cloud resources, also make detection harder as they create large, heterogeneous and fragmented logs. In order to cope with such challenges, this paper presents a detection system that uses isolation forest and local outlier factor algorithms. Multi-source organisational data, such as authentication, file, email, HTTP, device and LDAP logs, were pre-processed and loaded into enriched user profiles, with psychometric attributes added where possible. The framework was assessed by the CERT Insider Threat Dataset v6.2, where the results indicated that both algorithms were effective in detecting anomalous behaviours: Isolation Forest was effective in detecting global outliers, whereas Local Outlier Factor was good in detecting subtle local outliers. It was found through the comparative analysis that the strength of each method was complementary, and they should be used together when stratifying users into high-, medium-, and low-risk groups. Although it still has constraints in terms of synthetic data, real-time implementation, and ecological validity, the study is relevant in the development of anomaly-based detection methods and offers viable information to organisations wishing to be proactive in curbing insider threats52 0Item Restricted Enhancing Gravitational-Wave Detection from Cosmic String Cusps in Real Noise Using Deep Learning(Saudi Digital Library, 2025) Taghreed, Bahlool; Patrick, SuttonCosmic strings are topological defects that may have formed in the early universe and could produce bursts of gravitational waves through cusp events. Detecting such signals is particularly challenging due to the presence of transient non-astrophysical artifacts—known as glitches—in gravitational-wave detector data. In this work, we develop a deep learning-based classifier designed to distinguish cosmic string cusp signals from common transient noise types, such as blips, using raw, whitened 1D time-series data extracted from real detector noise. Unlike previous approaches that rely on simulated or idealized noise environments, our method is trained and tested entirely on real noise, making it more applicable to real-world search pipelines. Using a dataset of 50,000 labeled 2-second samples, our model achieves a classification accuracy of 84.8% , recall 78.7% and false-positive rate 9.1% on unseen data. This demonstrates the feasibility of cusp-glitch discrimination directly in the time domain, without requiring time-frequency representations or synthetic data, and contributes toward robust detection of exotic astrophysical signals in realistic gravitational-wave conditions.16 0
