Saudi Cultural Missions Theses & Dissertations
Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10
Browse
113 results
Search Results
Item Restricted Insider Threat Detection in a Hybrid IT Environment Using Unsupervised Anomaly Detection Techniques(Saudi Digital Library, 2025) Alharbi, Mohammed; Antonio, GouglidisThis dissertation analyses insider threat detection in hybrid IT environments with unsupervised anomaly detection techniques. Insider threats, including those committed by trusted persons with granted access, are considered to be one of the most challenging to alleviate cybersecurity threats because they resemble legal user behavior and do not have labelled datasets to train supervised models. Hybrid infrastructures, an integration of on-premise and cloud resources, also make detection harder as they create large, heterogeneous and fragmented logs. In order to cope with such challenges, this paper presents a detection system that uses isolation forest and local outlier factor algorithms. Multi-source organisational data, such as authentication, file, email, HTTP, device and LDAP logs, were pre-processed and loaded into enriched user profiles, with psychometric attributes added where possible. The framework was assessed by the CERT Insider Threat Dataset v6.2, where the results indicated that both algorithms were effective in detecting anomalous behaviours: Isolation Forest was effective in detecting global outliers, whereas Local Outlier Factor was good in detecting subtle local outliers. It was found through the comparative analysis that the strength of each method was complementary, and they should be used together when stratifying users into high-, medium-, and low-risk groups. Although it still has constraints in terms of synthetic data, real-time implementation, and ecological validity, the study is relevant in the development of anomaly-based detection methods and offers viable information to organisations wishing to be proactive in curbing insider threats18 0Item Restricted Enhancing Gravitational-Wave Detection from Cosmic String Cusps in Real Noise Using Deep Learning(Saudi Digital Library, 2025) Taghreed, Bahlool; Patrick, SuttonCosmic strings are topological defects that may have formed in the early universe and could produce bursts of gravitational waves through cusp events. Detecting such signals is particularly challenging due to the presence of transient non-astrophysical artifacts—known as glitches—in gravitational-wave detector data. In this work, we develop a deep learning-based classifier designed to distinguish cosmic string cusp signals from common transient noise types, such as blips, using raw, whitened 1D time-series data extracted from real detector noise. Unlike previous approaches that rely on simulated or idealized noise environments, our method is trained and tested entirely on real noise, making it more applicable to real-world search pipelines. Using a dataset of 50,000 labeled 2-second samples, our model achieves a classification accuracy of 84.8% , recall 78.7% and false-positive rate 9.1% on unseen data. This demonstrates the feasibility of cusp-glitch discrimination directly in the time domain, without requiring time-frequency representations or synthetic data, and contributes toward robust detection of exotic astrophysical signals in realistic gravitational-wave conditions.13 0Item Restricted Generalization of Machine-Learning in Clinical Randomized Controlled Trials: Evaluation and Development(Saudi Digital Library, 2025) ALMADHI, SHAYKHAH; Karwath, AndreasIn healthcare, machine learning (ML) shows significant promise in improving patient diagnostics, prognostics, and personalized care. However, its real-world deployment is often constrained by models' inconsistent performance on diverse and unseen patient data, a critical challenge known as generalization. Despite ongoing advancements, existing methodologies have shown only limited success in assessing and improving ML generalization, raising uncertainty in clinical deployment. This dissertation tackles this gap by presenting a robust evaluation framework and a predictive tool to cultivate more reliable healthcare AI. Applying Logistic Regression and XGBoost models on a dataset from nine double-blind, randomized, placebo-controlled trials investigating beta-blockers in heart failure. This study employs leave-one-trial-out, reverse leave- one-trial-out, and systematic evaluation to comprehensively assess generalization. The findings indicate that while generalization is often suboptimal, strategic selection of training cohorts markedly improves performance. Furthermore, a developed meta-learning framework effectively predicts model degradation. This research provides crucial insights into model generalizability across varied clinical datasets and introduces a practical pre-screening tool, essential for facilitating a safer and more effective integration of ML into clinical practice and promoting fair patient outcomes.12 0Item Restricted A CLOUD-BASED AI SYSTEM FOR SKILL GAP ANALYSIS AND TRAINING PATH RECOMMENDATION IN HR DEPARTMENTS(Saudi Digital Library, 2025) Alanazi, Abdullah Ramadan; AlYamani, AbdulghaniThis dissertation presents the development of a cloud-based artificial intelligence (AI) system designed to automate skill gap analysis and provide personalised training recommendations in Human Resource (HR) departments. The system integrates employee profiles, job role requirements, and training histories to identify competency gaps using a decision tree algorithm. The AI model achieved an accuracy of 0.86 and demonstrated strong interpretability and efficiency in recommending relevant training paths. Usability testing with HR professionals confirmed the system’s practicality and reliability in supporting workforce development and data-driven training strategies. The research contributes to the field of HR analytics by combining Human Capital Theory with Knowledge Discovery in Databases (KDD) to provide an explainable, scalable, and cloud-enabled HR decision-support framework.10 0Item Restricted Advancing narcolepsy diagnosis: Leveraging machine learning to identify novel neuro-biomarkers(Saudi Digital Library, 2024) Orkouby, Hadir; Bartsch, UllrichNarcolepsy is a rare neurological disorder with a well-identified pathophysiology that manifests as a sudden onset of sleep during wake behaviour. The current diagnostic pathways for narcolepsy involve complex assessments of sleep neurophysiology, including polysomnography and the multiple sleep latency (MSLT) test. These are cumbersome and work-intensive, and with limited resources within the NHS, this has led to increased waiting times for diagnosis and treatment of narcolepsy. This project harnessed the power of digital neuro-biomarkers and Artificial Intelligence (AI) to develop novel diagnostic markers for narcolepsy. Leveraging an open-source dataset of labelled archival polysomnography (PSG) recordings, including electroencephalography (EEG), I created a data analysis and classification pipeline to enhance diagnostic decision-making in clinical settings. This pipeline combines comprehensive data preprocessing and feature extraction with XGBoost and Random Forest (RF) classification models. The feature extraction process included selected time- series analysis features, spectral frequency ratios, cross-frequency coupling and moment-based statistical features of Intrinsic Mode Functions (IMFs) derived from empirical mode decomposition (EMD). The RF classifier emerged as the best model, achieving an accuracy of 82.5%, with a specificity of 82.5% and a sensitivity of 92.86%, by combining and averaging these feature sets and incorporating sleep stage labels during model training. These results underscore the potential of a novel approach using single-channel sleep EEG data from wearable devices. This innovative method simplifies the lengthy and costly pathway for narcolepsy diagnosis and also paves the way for developing new tools to diagnose sleep disorders automatically in non-clinical environments.14 0Item Restricted Predicting Delayed Flights for International Airports Using Artificial Intelligence Models & Techniques(Saudi Digital Library, 2025) Alsharif, Waleed; MHallah, RymDelayed flights are a pervasive challenge in the aviation industry, significantly impacting operational efficiency, passenger satisfaction, and economic costs. This thesis aims to develop predictive models that demonstrate strong performance and reliability, capable of maintaining high accuracy within the tested dataset and showcasing potential for application in various real-world aviation scenarios. These models leverage advanced artificial intelligence and deep learning techniques to address the complexity of predicting delayed flights. The study evaluates the performance of Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN), and their hybrid model (LSTM-CNN), which combine temporal and spatial pattern analysis, alongside Large Language Models (LLM, specifically OpenAI's Babbage model), which excel in processing structured and unstructured text data. Additionally, the research introduces a unified machine learning framework utilizing Gradient Boosting Machine (GBM) for regression and Light Gradient Boosting Machine (LGBM) for classification, aimed at estimating both flight delay durations and their underlying causes. The models were tested on high-dimensional datasets from John F. Kennedy International Airport (JFK), and a synthetic dataset from King Abdulaziz International Airport (KAIA). Among the evaluated models, the hybrid LSTM-CNN model demonstrated the best performance, achieving 99.91% prediction accuracy with a prediction time of 2.18 seconds, outperforming the GBM model (98.5% accuracy, 6.75 seconds) and LGBM (99.99% precision, 4.88 seconds). Additionally, GBM achieved a strong correlation score (R² = 0.9086) in predicting delay durations, while LGBM exhibited exceptionally high precision (99.99%) in identifying delay causes. Results indicated that National Aviation System delays (correlation: 0.600), carrier-related delays (0.561), and late aircraft arrivals (0.519) were the most significant contributors, while weather factors played a moderate role. These findings underscore the exceptional accuracy and efficiency of LSTM-CNN, establishing it as the optimal model for predicting delayed flights due to its superior performance and speed. The study highlights the potential for integrating LSTM-CNN into real-time airport management systems, enhancing operational efficiency and decision-making while paving the way for smarter, AI-driven air traffic systems.11 0Item Restricted Harnessing Machine Learning and Deep Learning for Analyzing Electrical Load Patterns to Identify Energy Loss(Saudi Digital Library, 2025) Alabbas, Mashhour Sadun Abdulkarim; Albatah, MohammadMeeting the challenges of energy requirements, consumption patterns, and the push for sustainability makes energy management in contemporary agriculture critically important. This study aims to devise a holistic model for energy efficiency in agricultural contexts by integrating modern computer vision methodologies for field boundary extraction together with anomaly detection techniques. To achieve the accurate segmentation of agricultural fields from satellite imagery, high-resolution imagery is processed using the YOLOv8 object detection model. The subsequently generated field feature datasets enable the smart grid data to serve as a basis for the anomaly detection process using the Isolation Forest algorithm. The methodology follows a multi-stage pipeline: data collection, preprocessing, augmentation, model training, fine-tuning, and evaluation. To validate accurate and reliable field boundary detection, evaluation metrics precision, recall, and mAP (mean Average Precision) are computed and analyzed. Subsequently, energy consumption data are processed for anomaly detection, enabling the identification of irregular and potentially inefficient consumption patterns. The findings indicate that YOLOv8 has a very high detection accuracy with an mAP score over 90%. Furthermore, the Isolation Forest algorithm has shown improved F1 scores over traditional approaches in detecting anomalies in energy consumption. This integrated method provides an automated and scalable solution in precision agriculture which allows users to monitor cultivation conditions and minimize energy consumption, thereby enhancing the energy efficiency and the overall decision-making framework. The study advances the convergence of artificial intelligence, remote sensing, and intelligent energy management systems, offering a basis for developing technological innovations that promote sustainablility in agriculture.32 0Item Restricted Predicting Client Default Payments Using Machine Learning in Production Environment(Saudi Digital Library, 2025) Alanazi, Reem; LavendiniThis project investigates the application of machine learning techniques to predict client default payments in a credit card setting. Using a dataset of 30,000 Taiwanese clients, the study addresses the challenges of class imbalance, predictive accuracy, and fairness in credit risk assessment. An XGBoost model was developed and enhanced through feature engineering, resampling techniques (SMOTE/ADASYN), and class weighting to improve recall for defaulters while maintaining overall accuracy. Interpretability was achieved using SHAP values, providing transparency into model decisions. To mitigate demographic disparities, particularly across education levels, a fairness-constrained Random Forest was integrated into a two-stage cascade framework, reducing false positives while preserving high recall. The final cascade model achieved 84% accuracy, with 93% recall for non-defaulters and 53% recall for defaulters, significantly outperforming baseline benchmarks. Fairness audits revealed that education-based disparities could be reduced with minimal performance trade-offs, while age-based fairness was largely maintained. The project demonstrates a practical, interpretable, and ethically aware pipeline for credit default prediction, with deployment considerations and directions for future research in cost-sensitive learning, advanced fairness constraints, and real-time monitoring30 0Item Restricted Paraphrase Generation and Identification at Paragraph-Level(Saudi Digital Library, 2025) Alsaqaabi, Arwa; Stewart, Craig; Akrida, Eleni; Cristea, AlexandraThe widespread availability of the Internet and the ease of accessing written content have significantly contributed to the rising incidence of plagiarism across various domains, including education. This behaviour directly undermines academic integrity, as evidenced by reports highlighting increased plagiarism in student work. Notably, students tend to plagiarize entire paragraphs more often than individual sentences, further complicating efforts to detect and prevent academic dishonesty. Additionally, advancements in natural language processing (NLP) have further facilitated plagiarism, particularly by using online paraphrasing tools and deep-learning language models designed to generate paraphrased text. These developments underscore the critical need to develop and refine effective paraphrase identification (PI) methodologies. This thesis addresses one of the most challenging aspects of plagiarism detection (PD): identifying instances of plagiarism at the paragraph-level, with a particular emphasis on paraphrased paragraphs rather than individual sentences. By focusing on this level of granularity, the approach considers both intra-sentence and inter-sentence relationships, offering a more comprehensive solution to the detection of sophisticated forms of plagiarism. To achieve this aim, the research examines the influence of text length on the performance of NLP machine learning (ML) and deep learning (DL) models. Furthermore, it introduces ALECS-SS (ALECS – Social Sciences), a large-scale dataset of paragraph-length paraphrases, and develops three novel SALAC algorithms designed to preserve semantic integrity while restructuring paragraph content. These algorithms suggest a novel approach that modifies the structure of paragraphs while maintaining their semantics. The methodology involves converting text into a graph where each node corresponds to a sentence’s semantic vector, and each edge is weighted by a numerical value representing the sentence order probability. Subsequently, a masking approach is applied to the reconstructed paragraphs modifying the v lexical elements while preserving the original semantic content. This step introduces variability to the dataset while maintaining its core meaning, effectively simulating paraphrased text. Human and automatic evaluations assess the reliability and quality of paraphrases, and additional studies examine the adaptability of SALAC across multiple academic domains. Moreover, state-of-the-art large language models (LLMs) are analysed for their ability to differentiate between human-written and machine-paraphrased text. This investigation involves the use of multiple PI datasets in addition to the newly established paragraph-level paraphrases dataset (ALECS-SS). The findings demonstrate that text length significantly affects model performance, with limitations arising from dataset segmentation. Additionally, the results show that the SALAC algorithms effectively maintain semantic integrity and coherence across different domains, highlighting their potential for domain-independent paraphrasing. The thesis also analysed the state-of-the-art LLMs’ performance in detecting auto-paraphrased content and distinguishing them from human-written content at both the sentence and paragraph levels, showing that the models could reliably identify reworded content from individual sentences up to entire paragraphs. Collectively, these findings contribute to educational applications and plagiarism detection by improving how paraphrased content is generated and recognized, and they advance NLP-driven paraphrasing techniques by providing strategies that ensure that meaning and coherence are preserved in reworded material.17 0Item Restricted Sensing, Scheduling, and Learning for Resource-Constrained Edge Systems(Saudi Digital Library, 2025) Bukhari, Abdulrahman; Kim, HyoseungRecent advances in Internet of Things (IoT) technologies have sparked significant interest in developing learning-based sensing applications on embedded edge devices. These efforts, however, are challenged by adapting to unforeseen conditions in open-world environments and by the practical limitations of low-cost sensors in the field. This dissertation presents the design, implementation, and evaluation of resource-constrained edge systems that address these challenges through time-series sensing, scheduling, and classification. First, we present OpenSense, an open-world time-series sensing framework for performing inference and incremental classification on an embedded edge device, eliminating reliance on powerful cloud servers. To create time for on-device updates without missing events and to reduce sensing and communication overhead, we introduce two dynamic sensor-scheduling techniques: (i) a class-level period assignment scheduler that selects an appropriate sensing period for each inferred class and (ii) a Q-learning–based scheduler that learns event patterns to choose the sensing interval at each classification moment. Experimental results show that OpenSense incrementally adapts to unforeseen conditions and schedules effectively on a resource-constrained device. Second, to bridge the gap between theoretical potential and field practice for low-cost sensors, we present a comprehensive evaluation of a sensing and classification system for early stress and disease detection in avocado plants. The greenhouse deployment spans 72 plants in four treatment categories over six months. For leaves, spectral reflectance coupled with multivariate analysis and permutation testing yields statistically significant results and reliable inference. For soils, we develop a two-level hierarchical classification approach tailored to treatment characteristics that achieves 75–86\% accuracy across avocado genotypes and outperforms conventional approaches by over 20\%. Embedded evaluations on Raspberry Pi and Jetson report end-to-end latency, computation, memory usage, and power consumption, demonstrating practical feasibility. In summary, the contributions are a generalized framework for dynamic, open-world learning on edge devices and an application-specific system for robust classification in noisy field deployments. These real-world deployments collectively outline a practical framework for designing intelligent, cloud-independent edge systems from sensing to inference.20 0
