Saudi Cultural Missions Theses & Dissertations

Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10

Browse

Search Results

Now showing 1 - 10 of 109
  • ItemRestricted
    Advancing narcolepsy diagnosis: Leveraging machine learning to identify novel neuro-biomarkers
    (Saudi Digital Library, 2024) Orkouby, Hadir; Bartsch, Ullrich
    Narcolepsy is a rare neurological disorder with a well-identified pathophysiology that manifests as a sudden onset of sleep during wake behaviour. The current diagnostic pathways for narcolepsy involve complex assessments of sleep neurophysiology, including polysomnography and the multiple sleep latency (MSLT) test. These are cumbersome and work-intensive, and with limited resources within the NHS, this has led to increased waiting times for diagnosis and treatment of narcolepsy. This project harnessed the power of digital neuro-biomarkers and Artificial Intelligence (AI) to develop novel diagnostic markers for narcolepsy. Leveraging an open-source dataset of labelled archival polysomnography (PSG) recordings, including electroencephalography (EEG), I created a data analysis and classification pipeline to enhance diagnostic decision-making in clinical settings. This pipeline combines comprehensive data preprocessing and feature extraction with XGBoost and Random Forest (RF) classification models. The feature extraction process included selected time- series analysis features, spectral frequency ratios, cross-frequency coupling and moment-based statistical features of Intrinsic Mode Functions (IMFs) derived from empirical mode decomposition (EMD). The RF classifier emerged as the best model, achieving an accuracy of 82.5%, with a specificity of 82.5% and a sensitivity of 92.86%, by combining and averaging these feature sets and incorporating sleep stage labels during model training. These results underscore the potential of a novel approach using single-channel sleep EEG data from wearable devices. This innovative method simplifies the lengthy and costly pathway for narcolepsy diagnosis and also paves the way for developing new tools to diagnose sleep disorders automatically in non-clinical environments.
    7 0
  • ItemRestricted
    Predicting Delayed Flights for International Airports Using Artificial Intelligence Models & Techniques
    (Saudi Digital Library, 2025) Alsharif, Waleed; MHallah, Rym
    Delayed flights are a pervasive challenge in the aviation industry, significantly impacting operational efficiency, passenger satisfaction, and economic costs. This thesis aims to develop predictive models that demonstrate strong performance and reliability, capable of maintaining high accuracy within the tested dataset and showcasing potential for application in various real-world aviation scenarios. These models leverage advanced artificial intelligence and deep learning techniques to address the complexity of predicting delayed flights. The study evaluates the performance of Long Short-Term Memory (LSTM), Convolutional Neural Networks (CNN), and their hybrid model (LSTM-CNN), which combine temporal and spatial pattern analysis, alongside Large Language Models (LLM, specifically OpenAI's Babbage model), which excel in processing structured and unstructured text data. Additionally, the research introduces a unified machine learning framework utilizing Gradient Boosting Machine (GBM) for regression and Light Gradient Boosting Machine (LGBM) for classification, aimed at estimating both flight delay durations and their underlying causes. The models were tested on high-dimensional datasets from John F. Kennedy International Airport (JFK), and a synthetic dataset from King Abdulaziz International Airport (KAIA). Among the evaluated models, the hybrid LSTM-CNN model demonstrated the best performance, achieving 99.91% prediction accuracy with a prediction time of 2.18 seconds, outperforming the GBM model (98.5% accuracy, 6.75 seconds) and LGBM (99.99% precision, 4.88 seconds). Additionally, GBM achieved a strong correlation score (R² = 0.9086) in predicting delay durations, while LGBM exhibited exceptionally high precision (99.99%) in identifying delay causes. Results indicated that National Aviation System delays (correlation: 0.600), carrier-related delays (0.561), and late aircraft arrivals (0.519) were the most significant contributors, while weather factors played a moderate role. These findings underscore the exceptional accuracy and efficiency of LSTM-CNN, establishing it as the optimal model for predicting delayed flights due to its superior performance and speed. The study highlights the potential for integrating LSTM-CNN into real-time airport management systems, enhancing operational efficiency and decision-making while paving the way for smarter, AI-driven air traffic systems.
    9 0
  • ItemRestricted
    Harnessing Machine Learning and Deep Learning for Analyzing Electrical Load Patterns to Identify Energy Loss
    (Saudi Digital Library, 2025) Alabbas, Mashhour Sadun Abdulkarim; Albatah, Mohammad
    Meeting the challenges of energy requirements, consumption patterns, and the push for sustainability makes energy management in contemporary agriculture critically important. This study aims to devise a holistic model for energy efficiency in agricultural contexts by integrating modern computer vision methodologies for field boundary extraction together with anomaly detection techniques. To achieve the accurate segmentation of agricultural fields from satellite imagery, high-resolution imagery is processed using the YOLOv8 object detection model. The subsequently generated field feature datasets enable the smart grid data to serve as a basis for the anomaly detection process using the Isolation Forest algorithm. The methodology follows a multi-stage pipeline: data collection, preprocessing, augmentation, model training, fine-tuning, and evaluation. To validate accurate and reliable field boundary detection, evaluation metrics precision, recall, and mAP (mean Average Precision) are computed and analyzed. Subsequently, energy consumption data are processed for anomaly detection, enabling the identification of irregular and potentially inefficient consumption patterns. The findings indicate that YOLOv8 has a very high detection accuracy with an mAP score over 90%. Furthermore, the Isolation Forest algorithm has shown improved F1 scores over traditional approaches in detecting anomalies in energy consumption. This integrated method provides an automated and scalable solution in precision agriculture which allows users to monitor cultivation conditions and minimize energy consumption, thereby enhancing the energy efficiency and the overall decision-making framework. The study advances the convergence of artificial intelligence, remote sensing, and intelligent energy management systems, offering a basis for developing technological innovations that promote sustainablility in agriculture.
    24 0
  • ItemRestricted
    Predicting Client Default Payments Using Machine Learning in Production Environment
    (Saudi Digital Library, 2025) Alanazi, Reem; Lavendini
    This project investigates the application of machine learning techniques to predict client default payments in a credit card setting. Using a dataset of 30,000 Taiwanese clients, the study addresses the challenges of class imbalance, predictive accuracy, and fairness in credit risk assessment. An XGBoost model was developed and enhanced through feature engineering, resampling techniques (SMOTE/ADASYN), and class weighting to improve recall for defaulters while maintaining overall accuracy. Interpretability was achieved using SHAP values, providing transparency into model decisions. To mitigate demographic disparities, particularly across education levels, a fairness-constrained Random Forest was integrated into a two-stage cascade framework, reducing false positives while preserving high recall. The final cascade model achieved 84% accuracy, with 93% recall for non-defaulters and 53% recall for defaulters, significantly outperforming baseline benchmarks. Fairness audits revealed that education-based disparities could be reduced with minimal performance trade-offs, while age-based fairness was largely maintained. The project demonstrates a practical, interpretable, and ethically aware pipeline for credit default prediction, with deployment considerations and directions for future research in cost-sensitive learning, advanced fairness constraints, and real-time monitoring
    23 0
  • ItemRestricted
    Paraphrase Generation and Identification at Paragraph-Level
    (Saudi Digital Library, 2025) Alsaqaabi, Arwa; Stewart, Craig; Akrida, Eleni; Cristea, Alexandra
    The widespread availability of the Internet and the ease of accessing written content have significantly contributed to the rising incidence of plagiarism across various domains, including education. This behaviour directly undermines academic integrity, as evidenced by reports highlighting increased plagiarism in student work. Notably, students tend to plagiarize entire paragraphs more often than individual sentences, further complicating efforts to detect and prevent academic dishonesty. Additionally, advancements in natural language processing (NLP) have further facilitated plagiarism, particularly by using online paraphrasing tools and deep-learning language models designed to generate paraphrased text. These developments underscore the critical need to develop and refine effective paraphrase identification (PI) methodologies. This thesis addresses one of the most challenging aspects of plagiarism detection (PD): identifying instances of plagiarism at the paragraph-level, with a particular emphasis on paraphrased paragraphs rather than individual sentences. By focusing on this level of granularity, the approach considers both intra-sentence and inter-sentence relationships, offering a more comprehensive solution to the detection of sophisticated forms of plagiarism. To achieve this aim, the research examines the influence of text length on the performance of NLP machine learning (ML) and deep learning (DL) models. Furthermore, it introduces ALECS-SS (ALECS – Social Sciences), a large-scale dataset of paragraph-length paraphrases, and develops three novel SALAC algorithms designed to preserve semantic integrity while restructuring paragraph content. These algorithms suggest a novel approach that modifies the structure of paragraphs while maintaining their semantics. The methodology involves converting text into a graph where each node corresponds to a sentence’s semantic vector, and each edge is weighted by a numerical value representing the sentence order probability. Subsequently, a masking approach is applied to the reconstructed paragraphs modifying the v lexical elements while preserving the original semantic content. This step introduces variability to the dataset while maintaining its core meaning, effectively simulating paraphrased text. Human and automatic evaluations assess the reliability and quality of paraphrases, and additional studies examine the adaptability of SALAC across multiple academic domains. Moreover, state-of-the-art large language models (LLMs) are analysed for their ability to differentiate between human-written and machine-paraphrased text. This investigation involves the use of multiple PI datasets in addition to the newly established paragraph-level paraphrases dataset (ALECS-SS). The findings demonstrate that text length significantly affects model performance, with limitations arising from dataset segmentation. Additionally, the results show that the SALAC algorithms effectively maintain semantic integrity and coherence across different domains, highlighting their potential for domain-independent paraphrasing. The thesis also analysed the state-of-the-art LLMs’ performance in detecting auto-paraphrased content and distinguishing them from human-written content at both the sentence and paragraph levels, showing that the models could reliably identify reworded content from individual sentences up to entire paragraphs. Collectively, these findings contribute to educational applications and plagiarism detection by improving how paraphrased content is generated and recognized, and they advance NLP-driven paraphrasing techniques by providing strategies that ensure that meaning and coherence are preserved in reworded material.
    17 0
  • ItemRestricted
    Sensing, Scheduling, and Learning for Resource-Constrained Edge Systems
    (Saudi Digital Library, 2025) Bukhari, Abdulrahman; Kim, Hyoseung
    Recent advances in Internet of Things (IoT) technologies have sparked significant interest in developing learning-based sensing applications on embedded edge devices. These efforts, however, are challenged by adapting to unforeseen conditions in open-world environments and by the practical limitations of low-cost sensors in the field. This dissertation presents the design, implementation, and evaluation of resource-constrained edge systems that address these challenges through time-series sensing, scheduling, and classification. First, we present OpenSense, an open-world time-series sensing framework for performing inference and incremental classification on an embedded edge device, eliminating reliance on powerful cloud servers. To create time for on-device updates without missing events and to reduce sensing and communication overhead, we introduce two dynamic sensor-scheduling techniques: (i) a class-level period assignment scheduler that selects an appropriate sensing period for each inferred class and (ii) a Q-learning–based scheduler that learns event patterns to choose the sensing interval at each classification moment. Experimental results show that OpenSense incrementally adapts to unforeseen conditions and schedules effectively on a resource-constrained device. Second, to bridge the gap between theoretical potential and field practice for low-cost sensors, we present a comprehensive evaluation of a sensing and classification system for early stress and disease detection in avocado plants. The greenhouse deployment spans 72 plants in four treatment categories over six months. For leaves, spectral reflectance coupled with multivariate analysis and permutation testing yields statistically significant results and reliable inference. For soils, we develop a two-level hierarchical classification approach tailored to treatment characteristics that achieves 75–86\% accuracy across avocado genotypes and outperforms conventional approaches by over 20\%. Embedded evaluations on Raspberry Pi and Jetson report end-to-end latency, computation, memory usage, and power consumption, demonstrating practical feasibility. In summary, the contributions are a generalized framework for dynamic, open-world learning on edge devices and an application-specific system for robust classification in noisy field deployments. These real-world deployments collectively outline a practical framework for designing intelligent, cloud-independent edge systems from sensing to inference.
    20 0
  • ItemRestricted
    Sensing, Scheduling, and Learning for Resource-Constrained Edge Systems
    (Saudi Digital Library, 2025) Bukhari, Abdulrahman Ismail Ibrahim; Kim, Hyoseung
    Recent advances in Internet of Things (IoT) technologies have sparked significant interest in developing learning-based sensing applications on embedded edge devices. These efforts, however, are challenged by adapting to unforeseen conditions in open-world environments and by the practical limitations of low-cost sensors in the field. This dissertation presents the design, implementation, and evaluation of resource-constrained edge systems that address these challenges through time-series sensing, scheduling, and classification. First, we present OpenSense, an open-world time-series sensing framework for performing inference and incremental classification on an embedded edge device, eliminating reliance on powerful cloud servers. To create time for on-device updates without missing events and to reduce sensing and communication overhead, we introduce two dynamic sensor-scheduling techniques: (i) a class-level period assignment scheduler that selects an appropriate sensing period for each inferred class and (ii) a Q-learning–based scheduler that learns event patterns to choose the sensing interval at each classification moment. Experimental results show that OpenSense incrementally adapts to unforeseen conditions and schedules effectively on a resource-constrained device. Second, to bridge the gap between theoretical potential and field practice for low-cost sensors, we present a comprehensive evaluation of a sensing and classification system for early stress and disease detection in avocado plants. The greenhouse deployment spans 72 plants in four treatment categories over six months. For leaves, spectral reflectance coupled with multivariate analysis and permutation testing yields statistically significant results and reliable inference. For soils, we develop a two-level hierarchical classification approach tailored to treatment characteristics that achieves 75–86\% accuracy across avocado genotypes and outperforms conventional approaches by over 20\%. Embedded evaluations on Raspberry Pi and Jetson report end-to-end latency, computation, memory usage, and power consumption, demonstrating practical feasibility. In summary, the contributions are a generalized framework for dynamic, open-world learning on edge devices and an application-specific system for robust classification in noisy field deployments. These real-world deployments collectively outline a practical framework for designing intelligent, cloud-independent edge systems from sensing to inference.
    30 0
  • ItemRestricted
    Deep Learning based Cancer Classification and Segmentation in Medical Images
    (Saudi Digital Library, 2025) Alharbi, Afaf; Zhang, Qianni
    Cancer has significantly threatened human life and health for many years. In the clinic, medical images analysis is the golden stand for evaluating the prediction of patient prog- nosis and treatment outcome. Generally, manually labelling tumour regions in hundreds of medical images is time- consuming and expensive for pathologists, radiologists and CT scans experts. Recently, the advancements in hardware and computer vision have allowed deep-learning-based methods to become main stream to segment tumours automatically, significantly reducing the workload of healthcare professionals. However, there still remain many challenging tasks towards medical images such as auto- mated cancer categorisation, tumour area segmentation, and relying on large-scale labeled images. Therefore, this research studies theses challenges tasks in medical images proposing novel deep-learning paradigms that can support healthcare professionals in cancer diagnosis and treatment plans. Chapter 3 proposes automated tissue classification framework called Multiple Instance Learning (MIL) in whole slide histology images. To overcome the limitations of weak super- vision in tissue classification, we incorporate the attention mechanism into the MIL frame- work. This integration allows us to effectively address the challenges associated with the inadequate labeling of training data and improve the accuracy and reliability of the tissue classification process. Chapter 4 proposes a novel approach for histopathology image classification with MIL model that combines an adaptive attention mechanism into an end-to-end deep CNN as well as transfer learning pre-trained models (Trans-AMIL). Well-known Transfer Learning architectures of VGGNet [14], DenseNet [15] and ResNet[16] are leverage in our framework implementation. Experiment and deep analysis have been conducted on public histopathol- ogy breast cancer dataset. The results show that our Trans-AMIL proposed approach with VGG pre- trained model demonstrates excellent improvement over the state-of-the-art. Chapter 5 proposes a self-supervised learning for Magnetic resonance imaging (MRI) tu- mour segmentation. A self-supervised cancer segmentation framework is proposed to re- duce label dependency. An innovative Barlow-Twins technique scheme combined with swin transformer is developed to perform this self supervised method in MRI brain medical im- ages. Additionally, data augmentation are applied to improve the discriminability of tumour features. Experimental results show that the proposed method achieves better tumour seg- mentation performance than other popular self- supervised methods. Chapter 6 proposes an innovative Barlow Twins self supervised technique combined with Regularised variational auto-encoder for MRI tumour images as well as CT scans images segmentation task. A self-supervised cancer segmentation framework is proposed to reduce label dependency. An innovative Barlow-Twins technique scheme is developed to represent tumour features based on unlabeled images. Additionally, data augmentation are applied to improve the discriminability of tumour features. Experimental results show that the pro- posed method achieves better tumour segmentation performance than other existing state of the art methods. The thesis presents four approaches for classifying and segmenting cancer images from his- tology images, MRI images and CT scans images: unsupervised, and weakly supervised methods. This research effectively classifies histopathology images tumour regions based on histopathological annotations and well-designed modules. The research additionally comprehensively segments MRI and CT images. Our studies comprehensively demonstrate label-effective automatic on various types of medical image classification and segmentation. Experimental results prove that our works achieve state-of-the-art performances on both classification and segmentation tasks on real world datasets
    16 0
  • ItemRestricted
    EXPERIMENTAL STUDY OF THE IMPORTANCE OF DATA FOR MACHINE LEARNING-BASED BREAST CANCER OUTCOME PREDICTION
    (Saudi Digital Library, 2024) Yamani, Wid; Wojtusaik, Janusz
    EXPERIMENTAL STUDY OF THE IMPORTANCE OF DATA FOR MACHINE LEARNING-BASED BREAST CANCER OUTCOME PREDICTION Wid Yamani, Ph.D. George Mason University, 2025 Dissertation Director: Dr. Janusz Wojtusiak Researchers have used various large-scale datasets to develop and validate predictive models in breast cancer outcome prediction. However, a notable gap exists due to the lack of a systematic comparison among these datasets regarding predictive performance, feature availability, and suitability for different analytical objectives. While each dataset has unique strengths and limitations, no comprehensive studies evaluate how these differences impact model performance, particularly across diverse timeframes, survival, and recurrence outcomes. This gap limits researchers in making informed choices about the most appropriate dataset for specific research questions. Effective modeling and prediction of breast cancer outcomes (such as cancer survival and recurrence) rely on the dataset's quality, the pre-processing techniques used to clean and transform data, and the choice of predictive models. Therefore, selecting a suitable dataset and identifying relevant variables are as crucial as the choice of the model itself. This thesis addresses this gap by systematically comparing five prominent datasets for predicting breast cancer outcomes. This dissertation compares five datasets—SEER Research 8, SEER Research 17, SEER Research Plus, SEER-Medicare, and Medicare Claims data—focusing on breast cancer survival and recurrence. It evaluates the predictive performance of each dataset using supervised machine learning methods, including logistic regression, random forest, and gradient boosting. The models were tested on metrics such as AUC, accuracy, recall, and precision, with gradient boosting delivering the most accurate results. The findings indicate that SEER-Medicare, which integrates cancer registry data with three years of retrospective claims, outperformed the other datasets, achieving AUCs of 0.891 for 5-year survival and 0.942 for 10-year survival. This dataset's inclusion of comprehensive health information, including pre-existing conditions and other claims data, makes it particularly valuable for outcome prediction. However, a drawback of SEER-Medicare is that it primarily includes patients aged 65 and older, as it is based on Medicare data. This limitation reduces its suitability for predicting outcomes in younger breast cancer patients, a significant subgroup with distinct risk factors and treatment responses. SEER Research Plus ranked second, offering data on patient demographics, breast cancer characteristics, staging, outcomes, and treatment, with AUC values of 0.877, 0.901, and 0.937 for 5-year, 10-year, and 15-year survival, respectively. SEER Research 17 and SEER Research 8 include patient demographics, breast cancer characteristics, and staging information but lack treatment details. SEER Research 17, which covers a larger population with more variables, yielded AUC values of 0.870 for 5-year survival, 0.897 for 10-year survival, and 0.920 for 15-year survival. SEER Research 8, which covers a smaller population over a more extended period, yielded slightly lower AUC values of 0.857, 0.868, and 0.880 for 5-year, 10-year, and 15-year survival, respectively. Results indicate that including treatment and additional variables significantly enhances prediction accuracy while the data size is less critical. This thesis is the first study that compares SEER datasets and provides a groundbreaking, comprehensive evaluation of these datasets, providing crucial insights into how data characteristics influence breast cancer outcome modeling.
    15 0
  • ItemRestricted
    Stress Detection: Leveraging IoMT Data and Machine Learning for Enhanced Well-being
    (Saudi Digital Library, 2025) Alsharef, Moudy Sharaf; Alshareef, Moudy
    we focus on the detection of acute stress, characterized by short-term physiological changes such as changes in heart rate variability (HRV), breathing patterns, and other bodily functions. Often measurable through wearable or contactless sensors. Accurate detection of acute stress is crucial in high-pressure environments, such as clinical settings, to reduce cognitive overload, prevent burnout, and minimize errors. Current research on stress detection faces multiple challenges. First, most proposed methods are not designed to identify stress in unseen subjects, limiting their generalizability and practical applicability. Second, due to the sensitive nature of stress-related physiological data and the risk of data leakage, insufficient attention has been paid to ensuring data privacy while preserving utility. Third, many existing studies rely on synthetically induced stress in controlled environments, overlooking real-world scenarios where stress can have severe consequences. Finally, nearly all research in this domain employs invasive IoMT sensors or wearable devices, which may not be practical or scalable for real-world applications. This thesis presents five key contributions in the field of stress detection using Internet of Medical Things (IoMT) sensors and machine learning. First, it introduces a deep learning model based on self-attention (Transformer), trained and evaluated using the WESAD dataset, a widely used benchmark collected from 15 participants under controlled stress tasks. The model achieved 96% accuracy in detecting stress and was validated using leave-one-subject-out (LOSO) cross-validation to demonstrate generalizability to unseen individuals. Second, to ensure data privacy, a differential privacy framework was integrated into the model. This approach adds noise during training to prevent sensitive data leakage and achieved 93% accuracy, confirming it is both private and effective. Third, the thesis introduces a new dataset called PARFAIT, collected from 30 healthcare workers during real hospital duties (ICU, ER, OR) using non-invasive HRV sensors and the Maslach Burnout Inventory (MBI) to label stress levels. This dataset supports real-world analysis of stress among physicians. Fourth, a cost-sensitive model is developed using XGBoost and the PARFAIT dataset, assigning higher penalties to stress misclassifications that could lead to medical errors. This model achieved 98% accuracy and reduced false negatives, making it suitable for clinical settings. Finally, a contactless radar-based system is presented to detect stress using ultrawideband (UWB) radar, capturing HRV and breathing data. A deep learning model achieved 92.35% accuracy, offering a non-wearable, scalable alternative. Although the radar-based model achieved a slightly lower accuracy (92.35%) compared to the wearable-based model (96%), it provides several important advantages. It works with out any physical contact, helps maintain user privacy, and can be more practical to deploy in clinical settings where wearable sensors may not be suitable. The small drop in accuracy is mainly due to the limitations of radar in measuring HRV precisely. However, by combining radar-based HRV with breathing features, the overall performance remains competitive. 3
    14 0

Copyright owned by the Saudi Digital Library (SDL) © 2025