SACM - United States of America

Permanent URI for this collectionhttps://drepo.sdl.edu.sa/handle/20.500.14154/9668

Browse

Search Results

Now showing 1 - 10 of 42
  • ItemRestricted
    ENHANCING TRAFFIC SAFETY THROUGH AI-DRIVEN, PRIVACY-PRESERVING, AND SECURE IMPAIRED DRIVING DETECTION SYSTEMS
    (Saudi Digital Library, 2026) Alsulieman, Razan; Sherif, Ahmed
    Drunk driving remains a major threat to road safety worldwide, contributing significantly to traffic injuries and fatalities each year. Traditional detection approaches are largely reactive and vehicle-centric, relying on in-vehicle sensors, breathalyzers, or post-incident enforcement. These methods often depend on driver cooperation, intrusive hardware installations, or limited monitoring environments, restricting their scalability and effectiveness in large transportation systems. At the same time, modern cities increasingly deploy roadside cameras, surveillance networks, and drone based monitoring systems, creating new opportunities for proactive intoxication detection at the infrastructure level. However, leveraging such external monitoring introduces challenges related to secure data collection, reliable AI-based analysis, privacy protection, and real-world deployment. This dissertation proposes a secure, privacy-preserving Artificial Intelligence framework for proactive drunk driving detection using out-of-vehicle surveillance data. The framework addresses three key aspects required for reliable infrastructure-level monitoring. First, a lightweight authentication scheme is developed to ensure secure data collection from distributed monitoring platforms such as drones and surveillance devices. The proposed design employs physically unclonable functions and symmetric cryptographic primitives to provide protection against impersonation, replay attacks, and device cloning while maintaining low computational overhead for resource-constrainedenvironments. Second, AI-based intoxication detection models are developed using Machine Learning and Deep Learning techniques to analyze facial imagery captured under real-world surveillance conditions. Extensive experiments evaluate multiple models under varying noise and disruption scenarios to ensure robustness across both low- and high-resource computational environments. The framework also incorporates explainable AI methods to improve transparency and verify that model decisions rely on meaningful facial features. Finally, the framework integrates privacy-preserving learning mechanisms through federated learning, enabling distributed model training without transferring sensitive facial images to centralized servers. This approach protects user privacy while maintaining strong detection performance across distributed monitoring nodes. These contributions establish a secure, scalable, and privacy-aware infrastructure-level system for proactive intoxication detection, supporting intelligent transportation systems aimed at improving traffic safety.
    2 0
  • ItemEmbargo
    Understanding Ransomware and Enhancing Their Detection Using Machine Learning
    (Saudi Digital Library, 2026) Alzahrani, Saleh; Xiao, Yang
    Ransomware attacks have escalated significantly in recent years, causing substantial financial losses and operational disruptions to individuals, organizations, and critical infrastructure worldwide. According to The Chainalysis 2024 Crypto Crime Report, ransomware attacks have imposed increasing financial burdens on victims over recent years. The total value received by ransomware attackers reached $1.1 billion in 2023, representing a significant rise from $567 million in 2022. This trend highlights the evolving threat posed by ransomware as attackers continue to refine their methods. Compared to $220 million in 2019. Despite the proliferation of detection methods, contemporary ransomware continues to evade traditional security measures through increasingly sophisticated evasion techniques. This dissertation addresses critical gaps in ransomware detection research through a investigation that combines in-depth malware analysis, evolutionary tracking, systematic literature review, novel detection methodology, and dataset development. The research begins with a detailed examination of Conti ransomware, one of the most notorious Ransomware-as-a-Service operations that caused approximately $45 million in damages and significantly impacted healthcare systems. Through analysis of leaked source code and controlled environment testing, this study reveals advanced evasion mechanisms including API disguise techniques, anti-hook mechanisms, and multithreaded encryption for rapid file encryption. Building upon this foundation, the research tracks Conti's evolution from its beta version through multiple iterations, categorizing samples into seven distinct versions. This longitudinal analysis demonstrates that modern ransomware success stems from continuous development and delivery practices, with features such as API hashing and runtime API loading being progressively integrated over time. To contextualize these findings within the broader detection landscape, a survey of existing ransomware detection methods was conducted, examining both machine learning and non-machine learning approaches alongside available datasets. This survey identifies critical limitations in current research, specifically that non-machine learning methods fail to identify new samples from known variants, while machine learning approaches suffer from inadequate model design and the absence of comprehensive, standardized datasets. These deficiencies severely limit their effectiveness against emerging ransomware variants. Addressing these identified gaps, this dissertation introduces RansomFormer, a Transformer-based detection model that leverages cross-attention mechanisms to fuse Portable Executable byte data with Application Programming Interface information, including both static imports and dynamic sequence calls. Unlike existing single-feature approaches that ransomware developers can circumvent, RansomFormer's multi-modal architecture achieves exceptional accuracy of 99.25% on static datasets and 99.50% on combined static-dynamic datasets across more than 150 ransomware families. Furthermore, recognizing the fundamental need for comprehensive training data, this dissertation presents RanDS, a rigorously curated dataset comprising a large collection of ransomware samples spanning hundreds of families alongside a substantial set of benign samples, collected and verified over multiple years from an initial corpus of millions of malware files. RanDS includes several processed feature extraction datasets encompassing static raw strings, English strings, imported and exported APIs, demangled APIs, and dynamic behavioral activities, all made publicly available. This dissertation makes contributions to cybersecurity by providing deep insights into modern ransomware operations, demonstrating the importance of evolutionary analysis in understanding threat progression, and delivering both an detection methodology and a foundational dataset that addresses longstanding research limitations in the field.
    65 0
  • ItemRestricted
    INTELLIGENT ROBOTICS WITH DIGITAL-TWIN ALIGNMENT: SEMANTIC NAVIGATION, MANIPULATION, PLANNING, AND HUMAN-TO-ROBOT ACTION TRANSFORMATION
    (Saudi Digital Library, 2025) Alanazi, Ahmed Hamdan; Lee, Yugyung
    This dissertation advances AI-empowered indoor robotics through four interconnected contributions that unify navigation, manipulation, semantic planning, and human-to-robot action transformation within a digital-twin-aligned framework. GRIP, a grid-aware semantic navigation module, integrates symbolic scene understanding with hybrid search-and-policy execution to achieve robust and context-aware ObjectNav. PathFormer, a transformer-based manipulation model structured around a 3D spatial--semantic grid, generates smooth, interpretable, and physically consistent trajectories that remain tightly aligned with digital-twin simulation. KG-Transformer, a knowledge-guided semantic planner, leverages a lightweight digital twin to calibrate execution, veto unsafe behaviors, and autonomously repair failing plans across diverse indoor environments. ActionFormer, an action-generation transformer, introduces a unified imitation-learning pipeline that integrates human-activity recognition, human-motion generation, and robot-motion generation. ActionFormer supports more than twenty complex human activities, producing robot-ready demonstrations that generalize across platforms and enable end-to-end imitation learning from video and landmark sequences. Collectively, these contributions establish a coherent foundation for AI-empowered robotics grounded in digital-twin intelligence. Across benchmarks and real-world deployments, GRIP yields up to 9.6\% higher success rate and more than $2\times$ gains in path efficiency (SPL, SAE). PathFormer produces digitally consistent manipulation trajectories validated through robust sim-to-real transfer. KG-Transformer achieves 99.6\% executability, delivers a +4.6-point improvement on unseen-scene tasks, and eliminates safety violations in both simulated and multi-robot execution. ActionFormer attains state-of-the-art performance in human-activity recognition and high execution accuracy across more than 20 activities, generating realistic human-motion traces and corresponding robot-motion trajectories for embodied robotic demonstration. Together, these advances deliver a trustworthy, semantically aligned, and high-performance simulation-to-reality pipeline that significantly enhances the adaptability, reliability, and real-world readiness of autonomous indoor robotic systems.
    42 0
  • ItemRestricted
    Sensing, Scheduling, and Learning for Resource-Constrained Edge Systems
    (Saudi Digital Library, 2025) Bukhari, Abdulrahman; Kim, Hyoseung
    Recent advances in Internet of Things (IoT) technologies have sparked significant interest in developing learning-based sensing applications on embedded edge devices. These efforts, however, are challenged by adapting to unforeseen conditions in open-world environments and by the practical limitations of low-cost sensors in the field. This dissertation presents the design, implementation, and evaluation of resource-constrained edge systems that address these challenges through time-series sensing, scheduling, and classification. First, we present OpenSense, an open-world time-series sensing framework for performing inference and incremental classification on an embedded edge device, eliminating reliance on powerful cloud servers. To create time for on-device updates without missing events and to reduce sensing and communication overhead, we introduce two dynamic sensor-scheduling techniques: (i) a class-level period assignment scheduler that selects an appropriate sensing period for each inferred class and (ii) a Q-learning–based scheduler that learns event patterns to choose the sensing interval at each classification moment. Experimental results show that OpenSense incrementally adapts to unforeseen conditions and schedules effectively on a resource-constrained device. Second, to bridge the gap between theoretical potential and field practice for low-cost sensors, we present a comprehensive evaluation of a sensing and classification system for early stress and disease detection in avocado plants. The greenhouse deployment spans 72 plants in four treatment categories over six months. For leaves, spectral reflectance coupled with multivariate analysis and permutation testing yields statistically significant results and reliable inference. For soils, we develop a two-level hierarchical classification approach tailored to treatment characteristics that achieves 75–86\% accuracy across avocado genotypes and outperforms conventional approaches by over 20\%. Embedded evaluations on Raspberry Pi and Jetson report end-to-end latency, computation, memory usage, and power consumption, demonstrating practical feasibility. In summary, the contributions are a generalized framework for dynamic, open-world learning on edge devices and an application-specific system for robust classification in noisy field deployments. These real-world deployments collectively outline a practical framework for designing intelligent, cloud-independent edge systems from sensing to inference.
    26 0
  • ItemRestricted
    Sensing, Scheduling, and Learning for Resource-Constrained Edge Systems
    (Saudi Digital Library, 2025) Bukhari, Abdulrahman Ismail Ibrahim; Kim, Hyoseung
    Recent advances in Internet of Things (IoT) technologies have sparked significant interest in developing learning-based sensing applications on embedded edge devices. These efforts, however, are challenged by adapting to unforeseen conditions in open-world environments and by the practical limitations of low-cost sensors in the field. This dissertation presents the design, implementation, and evaluation of resource-constrained edge systems that address these challenges through time-series sensing, scheduling, and classification. First, we present OpenSense, an open-world time-series sensing framework for performing inference and incremental classification on an embedded edge device, eliminating reliance on powerful cloud servers. To create time for on-device updates without missing events and to reduce sensing and communication overhead, we introduce two dynamic sensor-scheduling techniques: (i) a class-level period assignment scheduler that selects an appropriate sensing period for each inferred class and (ii) a Q-learning–based scheduler that learns event patterns to choose the sensing interval at each classification moment. Experimental results show that OpenSense incrementally adapts to unforeseen conditions and schedules effectively on a resource-constrained device. Second, to bridge the gap between theoretical potential and field practice for low-cost sensors, we present a comprehensive evaluation of a sensing and classification system for early stress and disease detection in avocado plants. The greenhouse deployment spans 72 plants in four treatment categories over six months. For leaves, spectral reflectance coupled with multivariate analysis and permutation testing yields statistically significant results and reliable inference. For soils, we develop a two-level hierarchical classification approach tailored to treatment characteristics that achieves 75–86\% accuracy across avocado genotypes and outperforms conventional approaches by over 20\%. Embedded evaluations on Raspberry Pi and Jetson report end-to-end latency, computation, memory usage, and power consumption, demonstrating practical feasibility. In summary, the contributions are a generalized framework for dynamic, open-world learning on edge devices and an application-specific system for robust classification in noisy field deployments. These real-world deployments collectively outline a practical framework for designing intelligent, cloud-independent edge systems from sensing to inference.
    33 0
  • ItemRestricted
    EXPERIMENTAL STUDY OF THE IMPORTANCE OF DATA FOR MACHINE LEARNING-BASED BREAST CANCER OUTCOME PREDICTION
    (Saudi Digital Library, 2024) Yamani, Wid; Wojtusaik, Janusz
    EXPERIMENTAL STUDY OF THE IMPORTANCE OF DATA FOR MACHINE LEARNING-BASED BREAST CANCER OUTCOME PREDICTION Wid Yamani, Ph.D. George Mason University, 2025 Dissertation Director: Dr. Janusz Wojtusiak Researchers have used various large-scale datasets to develop and validate predictive models in breast cancer outcome prediction. However, a notable gap exists due to the lack of a systematic comparison among these datasets regarding predictive performance, feature availability, and suitability for different analytical objectives. While each dataset has unique strengths and limitations, no comprehensive studies evaluate how these differences impact model performance, particularly across diverse timeframes, survival, and recurrence outcomes. This gap limits researchers in making informed choices about the most appropriate dataset for specific research questions. Effective modeling and prediction of breast cancer outcomes (such as cancer survival and recurrence) rely on the dataset's quality, the pre-processing techniques used to clean and transform data, and the choice of predictive models. Therefore, selecting a suitable dataset and identifying relevant variables are as crucial as the choice of the model itself. This thesis addresses this gap by systematically comparing five prominent datasets for predicting breast cancer outcomes. This dissertation compares five datasets—SEER Research 8, SEER Research 17, SEER Research Plus, SEER-Medicare, and Medicare Claims data—focusing on breast cancer survival and recurrence. It evaluates the predictive performance of each dataset using supervised machine learning methods, including logistic regression, random forest, and gradient boosting. The models were tested on metrics such as AUC, accuracy, recall, and precision, with gradient boosting delivering the most accurate results. The findings indicate that SEER-Medicare, which integrates cancer registry data with three years of retrospective claims, outperformed the other datasets, achieving AUCs of 0.891 for 5-year survival and 0.942 for 10-year survival. This dataset's inclusion of comprehensive health information, including pre-existing conditions and other claims data, makes it particularly valuable for outcome prediction. However, a drawback of SEER-Medicare is that it primarily includes patients aged 65 and older, as it is based on Medicare data. This limitation reduces its suitability for predicting outcomes in younger breast cancer patients, a significant subgroup with distinct risk factors and treatment responses. SEER Research Plus ranked second, offering data on patient demographics, breast cancer characteristics, staging, outcomes, and treatment, with AUC values of 0.877, 0.901, and 0.937 for 5-year, 10-year, and 15-year survival, respectively. SEER Research 17 and SEER Research 8 include patient demographics, breast cancer characteristics, and staging information but lack treatment details. SEER Research 17, which covers a larger population with more variables, yielded AUC values of 0.870 for 5-year survival, 0.897 for 10-year survival, and 0.920 for 15-year survival. SEER Research 8, which covers a smaller population over a more extended period, yielded slightly lower AUC values of 0.857, 0.868, and 0.880 for 5-year, 10-year, and 15-year survival, respectively. Results indicate that including treatment and additional variables significantly enhances prediction accuracy while the data size is less critical. This thesis is the first study that compares SEER datasets and provides a groundbreaking, comprehensive evaluation of these datasets, providing crucial insights into how data characteristics influence breast cancer outcome modeling.
    15 0
  • ItemRestricted
    Cross Dataset Fairness Evaluation of Transformer Based Sentiment Models
    (Saudi Digital Library, 2025-05-10) Zuiran, Sara; Bhattacharyya, Siddhartha
    With the growing exploration of Natural Language Processing (NLP) systems in decision-making environments, it is essential to evaluate technical and ethical aspects of the dataset and the NLP model to improve fairness. To assess fairness, the thesis examines demographic imbalances in sentiment classification models by evaluating transformer-based models fine-tuned on the Stanford Sentiment Treebank version 2 dataset (SST-2) against the demographically annotated Comprehensive Assessment of Language Model dataset (CALM). This work identifies performance disparities in sentiment prediction across demographic groups by examining sensitive attributes such as gender and race. The study evaluates both the RoBERTa and MentalBERT transformer models using a complete set of fairness metrics consisting of Statistical Parity Difference (SPD), Equal Opportunity Difference (EOD), False Positive Rates (FPR), False Negative Rates (FNR), Jensen-Shannon Divergence (JSD), and Wasserstein Distance (WD). The analysis examines both group-vs-rest and pairwise subgroup comparisons, including gender and ethnicity. Results show that applying adversarial mitigation reduced fairness disparities across demographic subgroups, with the most notable improvements observed for non-binary and Asian users. The observed disparities emphasize the challenge of reducing performance gaps across demographic subgroups in sentiment classification tasks. The thesis introduces a practical framework for evaluating demographic dis- disparities, extends fairness analysis, and assesses the impact of mitigation techniques in cross-dataset sentiment classification. This research proposes a framework that demonstrates a path toward creating inclusive NLP systems and establishes the groundwork for upcoming ethical Artificial Intelligence (AI) studies.
    34 0
  • ItemRestricted
    GRAPH-BASED APPROACH: BRIDGING INSIGHTS FROM STRUCTURED AND UNSTRUCTURED DATA
    (Temple University, 2025) Aljurbua, Rafaa; Obradovic, Zoran
    Graph-based methodologies provide powerful tools for uncovering intricate relationships and patterns in complex data, enabling the integration of structured and unstructured information for insightful decision-making across diverse domains. Our research focuses on constructing graphs from structured and unstructured data, demonstrating their applications in healthcare and power systems. In healthcare, we examine how social networks influence the attitudes of hemodialysis patients toward kidney transplantation. Using a network-based approach, we investigate how social networks within hemodialysis clinics affect patients' attitudes, contributing to a growing understanding of this dynamic. Our findings emphasize that social networks improve the performance of machine learning models, highlighting the importance of social interactions in clinical settings (Aljurbua et al., 2022). We further introduce Node2VecFuseClassifier, a graph-based model that combines patient interactions with patient characteristics. By comparing problem representations that focus on sociodemographics versus social interactions, we demonstrate that incorporating patient-to-patient and patient-to-staff interactions results in more accurate predictions. This multi-modal analysis, which merges patient experiences with staff expertise, underscores the role of social networks in influencing attitudes toward transplantation (Aljurbua et al., 2024b). In power systems, we explore the impact of severe weather events that lead to power outages, specifically focusing on predicting weather-induced outages three hours in advance at the county level in the Pacific Northwest of the United States. By utilizing a multi-model multiplex network that integrates data from multiple sources including weather, transmission lines, lightning, vegetation, and social media posts from two leading platforms (Twitter and Reddit), we show how multiplex networks offer valuable insights for predicting power outages. This integration of diverse data sources and network-based modeling emphasizes the importance of leveraging multiple perspectives to enhance the understanding and prediction of power disruptions (Aljurbua et al., 2023). We further present HMN-RTS, a hierarchical multiplex network that classifies disruption severity by temporal learning from integrated weather recordings and social media posts. The multiplex network layers of this framework gather information about power outages, weather, lighting, land cover, transmission lines, and social media comments. By incorporating multiplex network layers consisting of data collected over time and across regions, we demonstrate that HMN-RTS significantly improves the accuracy of predicting the duration of weather-related outages. This framework enables grid operators to make more reliable predictions up to 6 hours in advance, supporting early risk assessment and proactive mitigation (Aljurbua et al., 2024a, 2025a). Additionally, we introduce SMN-WVF, a spatiotemporal multiplex network designed to predict the duration of power outages in distribution grids. By integrating network-based approach and multi-modal data across space and time, SMN-WVF offers a novel method for predicting disruption durations in distribution grids, enhancing decision-making and mitigation efforts while highlighting the critical role of network-based approaches in forecasting (Aljurbua et al., 2025b). Overall, our research showcases the potential of graph-based models in tackling complex challenges in both power systems and healthcare. By combining the network-based approach with multi-modal data, we present innovative solutions for predicting power outages and understanding patient attitudes.
    34 0
  • ItemRestricted
    Quantifying and Profiling Echo Chambers on Social Media
    (Arizona State University, 2024) Alatawi, Faisal; Liu, Huan; Sen, Arunabha; Davulcu, Hasan; Shu, Kai
    Echo chambers on social media have become a critical focus in the study of online behavior and public discourse. These environments, characterized by the ideological homogeneity of users and limited exposure to opposing viewpoints, contribute to polarization, the spread of misinformation, and the entrenchment of biases. While significant research has been devoted to proving the existence of echo chambers, less attention has been given to understanding their internal dynamics. This dissertation addresses this gap by developing novel methodologies for quantifying and profiling echo chambers, with the goal of providing deeper insights into how these communities function and how they can be measured. The first core contribution of this work is the introduction of the Echo Chamber Score (ECS), a new metric for measuring the degree of ideological segregation in social media interaction networks. The ECS captures both the cohesion within communities and the separation between them, offering a more nuanced approach to assessing polarization. By using a self-supervised Graph Auto-Encoder (EchoGAE), the ECS bypasses the need for explicit ideological labeling, instead embedding users based on their interactions and linguistic patterns. The second contribution is a Heterogeneous Information Network (HIN)-based framework for profiling echo chambers. This framework integrates social and linguistic features, allowing for a comprehensive analysis of the relationships between users, topics, and language within echo chambers. By combining community detection, topic modeling, and language analysis, the profiling method reveals how discourse and group behavior reinforce ideological boundaries. Through the application of these methods to real-world social media datasets, this dissertation demonstrates their effectiveness in identifying polarized communities and profiling their internal discourse. The findings highlight how linguistic homophily and social identity theory shape echo chambers and contribute to polarization. Overall, this research advances the understanding of echo chambers by moving beyond detection to explore their structural and linguistic complexities, offering new tools for measuring and addressing polarization on social media platforms.
    27 0
  • ItemRestricted
    Deep Learning Approaches for Multivariate Time Series: Advances in Feature Selection, Classification, and Forecasting
    (New Mexico State University, 2024) Alshammari, Khaznah Raghyan; Tran, Son; Hamdi, Shah Muhammad
    In this work, we present the latest developments and advancements in the machine learning-based prediction and feature selection of multivariate time series (MVTS) data. MVTS data, which involves multiple interrelated time series, presents significant challenges due to its high dimensionality, complex temporal dependencies, and inter-variable relationships. These challenges are critical in domains such as space weather prediction, environmental monitoring, healthcare, sensor networks, and finance. Our research addresses these challenges by developing and implementing advanced machine-learning algorithms specifically designed for MVTS data. We introduce innovative methodologies that focus on three key areas: feature selection, classification, and forecasting. Our contributions include the development of deep learning models, such as Long Short-Term Memory (LSTM) networks and Transformer-based architectures, which are optimized to capture and model complex temporal and inter-parameter dependencies in MVTS data. Additionally, we propose a novel feature selection framework that gradually identifies the most relevant variables, enhancing model interpretability and predictive accuracy. Through extensive experimentation and validation, we demonstrate the superior performance of our approaches compared to existing methods. The results highlight the practical applicability of our solutions, providing valuable tools and insights for researchers and practitioners working with high-dimensional time series data. This work advances the state of the art in MVTS analysis, offering robust methodologies that address both theoretical and practical challenges in this field.
    51 0

Copyright owned by the Saudi Digital Library (SDL) © 2026