SACM - United States of America
Permanent URI for this collectionhttps://drepo.sdl.edu.sa/handle/20.500.14154/9668
Browse
39 results
Search Results
Item Restricted Sensing, Scheduling, and Learning for Resource-Constrained Edge Systems(Saudi Digital Library, 2025) Bukhari, Abdulrahman; Kim, HyoseungRecent advances in Internet of Things (IoT) technologies have sparked significant interest in developing learning-based sensing applications on embedded edge devices. These efforts, however, are challenged by adapting to unforeseen conditions in open-world environments and by the practical limitations of low-cost sensors in the field. This dissertation presents the design, implementation, and evaluation of resource-constrained edge systems that address these challenges through time-series sensing, scheduling, and classification. First, we present OpenSense, an open-world time-series sensing framework for performing inference and incremental classification on an embedded edge device, eliminating reliance on powerful cloud servers. To create time for on-device updates without missing events and to reduce sensing and communication overhead, we introduce two dynamic sensor-scheduling techniques: (i) a class-level period assignment scheduler that selects an appropriate sensing period for each inferred class and (ii) a Q-learning–based scheduler that learns event patterns to choose the sensing interval at each classification moment. Experimental results show that OpenSense incrementally adapts to unforeseen conditions and schedules effectively on a resource-constrained device. Second, to bridge the gap between theoretical potential and field practice for low-cost sensors, we present a comprehensive evaluation of a sensing and classification system for early stress and disease detection in avocado plants. The greenhouse deployment spans 72 plants in four treatment categories over six months. For leaves, spectral reflectance coupled with multivariate analysis and permutation testing yields statistically significant results and reliable inference. For soils, we develop a two-level hierarchical classification approach tailored to treatment characteristics that achieves 75–86\% accuracy across avocado genotypes and outperforms conventional approaches by over 20\%. Embedded evaluations on Raspberry Pi and Jetson report end-to-end latency, computation, memory usage, and power consumption, demonstrating practical feasibility. In summary, the contributions are a generalized framework for dynamic, open-world learning on edge devices and an application-specific system for robust classification in noisy field deployments. These real-world deployments collectively outline a practical framework for designing intelligent, cloud-independent edge systems from sensing to inference.10 0Item Restricted Sensing, Scheduling, and Learning for Resource-Constrained Edge Systems(Saudi Digital Library, 2025) Bukhari, Abdulrahman Ismail Ibrahim; Kim, HyoseungRecent advances in Internet of Things (IoT) technologies have sparked significant interest in developing learning-based sensing applications on embedded edge devices. These efforts, however, are challenged by adapting to unforeseen conditions in open-world environments and by the practical limitations of low-cost sensors in the field. This dissertation presents the design, implementation, and evaluation of resource-constrained edge systems that address these challenges through time-series sensing, scheduling, and classification. First, we present OpenSense, an open-world time-series sensing framework for performing inference and incremental classification on an embedded edge device, eliminating reliance on powerful cloud servers. To create time for on-device updates without missing events and to reduce sensing and communication overhead, we introduce two dynamic sensor-scheduling techniques: (i) a class-level period assignment scheduler that selects an appropriate sensing period for each inferred class and (ii) a Q-learning–based scheduler that learns event patterns to choose the sensing interval at each classification moment. Experimental results show that OpenSense incrementally adapts to unforeseen conditions and schedules effectively on a resource-constrained device. Second, to bridge the gap between theoretical potential and field practice for low-cost sensors, we present a comprehensive evaluation of a sensing and classification system for early stress and disease detection in avocado plants. The greenhouse deployment spans 72 plants in four treatment categories over six months. For leaves, spectral reflectance coupled with multivariate analysis and permutation testing yields statistically significant results and reliable inference. For soils, we develop a two-level hierarchical classification approach tailored to treatment characteristics that achieves 75–86\% accuracy across avocado genotypes and outperforms conventional approaches by over 20\%. Embedded evaluations on Raspberry Pi and Jetson report end-to-end latency, computation, memory usage, and power consumption, demonstrating practical feasibility. In summary, the contributions are a generalized framework for dynamic, open-world learning on edge devices and an application-specific system for robust classification in noisy field deployments. These real-world deployments collectively outline a practical framework for designing intelligent, cloud-independent edge systems from sensing to inference.17 0Item Restricted EXPERIMENTAL STUDY OF THE IMPORTANCE OF DATA FOR MACHINE LEARNING-BASED BREAST CANCER OUTCOME PREDICTION(Saudi Digital Library, 2024) Yamani, Wid; Wojtusaik, JanuszEXPERIMENTAL STUDY OF THE IMPORTANCE OF DATA FOR MACHINE LEARNING-BASED BREAST CANCER OUTCOME PREDICTION Wid Yamani, Ph.D. George Mason University, 2025 Dissertation Director: Dr. Janusz Wojtusiak Researchers have used various large-scale datasets to develop and validate predictive models in breast cancer outcome prediction. However, a notable gap exists due to the lack of a systematic comparison among these datasets regarding predictive performance, feature availability, and suitability for different analytical objectives. While each dataset has unique strengths and limitations, no comprehensive studies evaluate how these differences impact model performance, particularly across diverse timeframes, survival, and recurrence outcomes. This gap limits researchers in making informed choices about the most appropriate dataset for specific research questions. Effective modeling and prediction of breast cancer outcomes (such as cancer survival and recurrence) rely on the dataset's quality, the pre-processing techniques used to clean and transform data, and the choice of predictive models. Therefore, selecting a suitable dataset and identifying relevant variables are as crucial as the choice of the model itself. This thesis addresses this gap by systematically comparing five prominent datasets for predicting breast cancer outcomes. This dissertation compares five datasets—SEER Research 8, SEER Research 17, SEER Research Plus, SEER-Medicare, and Medicare Claims data—focusing on breast cancer survival and recurrence. It evaluates the predictive performance of each dataset using supervised machine learning methods, including logistic regression, random forest, and gradient boosting. The models were tested on metrics such as AUC, accuracy, recall, and precision, with gradient boosting delivering the most accurate results. The findings indicate that SEER-Medicare, which integrates cancer registry data with three years of retrospective claims, outperformed the other datasets, achieving AUCs of 0.891 for 5-year survival and 0.942 for 10-year survival. This dataset's inclusion of comprehensive health information, including pre-existing conditions and other claims data, makes it particularly valuable for outcome prediction. However, a drawback of SEER-Medicare is that it primarily includes patients aged 65 and older, as it is based on Medicare data. This limitation reduces its suitability for predicting outcomes in younger breast cancer patients, a significant subgroup with distinct risk factors and treatment responses. SEER Research Plus ranked second, offering data on patient demographics, breast cancer characteristics, staging, outcomes, and treatment, with AUC values of 0.877, 0.901, and 0.937 for 5-year, 10-year, and 15-year survival, respectively. SEER Research 17 and SEER Research 8 include patient demographics, breast cancer characteristics, and staging information but lack treatment details. SEER Research 17, which covers a larger population with more variables, yielded AUC values of 0.870 for 5-year survival, 0.897 for 10-year survival, and 0.920 for 15-year survival. SEER Research 8, which covers a smaller population over a more extended period, yielded slightly lower AUC values of 0.857, 0.868, and 0.880 for 5-year, 10-year, and 15-year survival, respectively. Results indicate that including treatment and additional variables significantly enhances prediction accuracy while the data size is less critical. This thesis is the first study that compares SEER datasets and provides a groundbreaking, comprehensive evaluation of these datasets, providing crucial insights into how data characteristics influence breast cancer outcome modeling.15 0Item Restricted Cross Dataset Fairness Evaluation of Transformer Based Sentiment Models(Saudi Digital Library, 2025-05-10) Zuiran, Sara; Bhattacharyya, SiddharthaWith the growing exploration of Natural Language Processing (NLP) systems in decision-making environments, it is essential to evaluate technical and ethical aspects of the dataset and the NLP model to improve fairness. To assess fairness, the thesis examines demographic imbalances in sentiment classification models by evaluating transformer-based models fine-tuned on the Stanford Sentiment Treebank version 2 dataset (SST-2) against the demographically annotated Comprehensive Assessment of Language Model dataset (CALM). This work identifies performance disparities in sentiment prediction across demographic groups by examining sensitive attributes such as gender and race. The study evaluates both the RoBERTa and MentalBERT transformer models using a complete set of fairness metrics consisting of Statistical Parity Difference (SPD), Equal Opportunity Difference (EOD), False Positive Rates (FPR), False Negative Rates (FNR), Jensen-Shannon Divergence (JSD), and Wasserstein Distance (WD). The analysis examines both group-vs-rest and pairwise subgroup comparisons, including gender and ethnicity. Results show that applying adversarial mitigation reduced fairness disparities across demographic subgroups, with the most notable improvements observed for non-binary and Asian users. The observed disparities emphasize the challenge of reducing performance gaps across demographic subgroups in sentiment classification tasks. The thesis introduces a practical framework for evaluating demographic dis- disparities, extends fairness analysis, and assesses the impact of mitigation techniques in cross-dataset sentiment classification. This research proposes a framework that demonstrates a path toward creating inclusive NLP systems and establishes the groundwork for upcoming ethical Artificial Intelligence (AI) studies.13 0Item Restricted GRAPH-BASED APPROACH: BRIDGING INSIGHTS FROM STRUCTURED AND UNSTRUCTURED DATA(Temple University, 2025) Aljurbua, Rafaa; Obradovic, ZoranGraph-based methodologies provide powerful tools for uncovering intricate relationships and patterns in complex data, enabling the integration of structured and unstructured information for insightful decision-making across diverse domains. Our research focuses on constructing graphs from structured and unstructured data, demonstrating their applications in healthcare and power systems. In healthcare, we examine how social networks influence the attitudes of hemodialysis patients toward kidney transplantation. Using a network-based approach, we investigate how social networks within hemodialysis clinics affect patients' attitudes, contributing to a growing understanding of this dynamic. Our findings emphasize that social networks improve the performance of machine learning models, highlighting the importance of social interactions in clinical settings (Aljurbua et al., 2022). We further introduce Node2VecFuseClassifier, a graph-based model that combines patient interactions with patient characteristics. By comparing problem representations that focus on sociodemographics versus social interactions, we demonstrate that incorporating patient-to-patient and patient-to-staff interactions results in more accurate predictions. This multi-modal analysis, which merges patient experiences with staff expertise, underscores the role of social networks in influencing attitudes toward transplantation (Aljurbua et al., 2024b). In power systems, we explore the impact of severe weather events that lead to power outages, specifically focusing on predicting weather-induced outages three hours in advance at the county level in the Pacific Northwest of the United States. By utilizing a multi-model multiplex network that integrates data from multiple sources including weather, transmission lines, lightning, vegetation, and social media posts from two leading platforms (Twitter and Reddit), we show how multiplex networks offer valuable insights for predicting power outages. This integration of diverse data sources and network-based modeling emphasizes the importance of leveraging multiple perspectives to enhance the understanding and prediction of power disruptions (Aljurbua et al., 2023). We further present HMN-RTS, a hierarchical multiplex network that classifies disruption severity by temporal learning from integrated weather recordings and social media posts. The multiplex network layers of this framework gather information about power outages, weather, lighting, land cover, transmission lines, and social media comments. By incorporating multiplex network layers consisting of data collected over time and across regions, we demonstrate that HMN-RTS significantly improves the accuracy of predicting the duration of weather-related outages. This framework enables grid operators to make more reliable predictions up to 6 hours in advance, supporting early risk assessment and proactive mitigation (Aljurbua et al., 2024a, 2025a). Additionally, we introduce SMN-WVF, a spatiotemporal multiplex network designed to predict the duration of power outages in distribution grids. By integrating network-based approach and multi-modal data across space and time, SMN-WVF offers a novel method for predicting disruption durations in distribution grids, enhancing decision-making and mitigation efforts while highlighting the critical role of network-based approaches in forecasting (Aljurbua et al., 2025b). Overall, our research showcases the potential of graph-based models in tackling complex challenges in both power systems and healthcare. By combining the network-based approach with multi-modal data, we present innovative solutions for predicting power outages and understanding patient attitudes.23 0Item Restricted Quantifying and Profiling Echo Chambers on Social Media(Arizona State University, 2024) Alatawi, Faisal; Liu, Huan; Sen, Arunabha; Davulcu, Hasan; Shu, KaiEcho chambers on social media have become a critical focus in the study of online behavior and public discourse. These environments, characterized by the ideological homogeneity of users and limited exposure to opposing viewpoints, contribute to polarization, the spread of misinformation, and the entrenchment of biases. While significant research has been devoted to proving the existence of echo chambers, less attention has been given to understanding their internal dynamics. This dissertation addresses this gap by developing novel methodologies for quantifying and profiling echo chambers, with the goal of providing deeper insights into how these communities function and how they can be measured. The first core contribution of this work is the introduction of the Echo Chamber Score (ECS), a new metric for measuring the degree of ideological segregation in social media interaction networks. The ECS captures both the cohesion within communities and the separation between them, offering a more nuanced approach to assessing polarization. By using a self-supervised Graph Auto-Encoder (EchoGAE), the ECS bypasses the need for explicit ideological labeling, instead embedding users based on their interactions and linguistic patterns. The second contribution is a Heterogeneous Information Network (HIN)-based framework for profiling echo chambers. This framework integrates social and linguistic features, allowing for a comprehensive analysis of the relationships between users, topics, and language within echo chambers. By combining community detection, topic modeling, and language analysis, the profiling method reveals how discourse and group behavior reinforce ideological boundaries. Through the application of these methods to real-world social media datasets, this dissertation demonstrates their effectiveness in identifying polarized communities and profiling their internal discourse. The findings highlight how linguistic homophily and social identity theory shape echo chambers and contribute to polarization. Overall, this research advances the understanding of echo chambers by moving beyond detection to explore their structural and linguistic complexities, offering new tools for measuring and addressing polarization on social media platforms.25 0Item Restricted Deep Learning Approaches for Multivariate Time Series: Advances in Feature Selection, Classification, and Forecasting(New Mexico State University, 2024) Alshammari, Khaznah Raghyan; Tran, Son; Hamdi, Shah MuhammadIn this work, we present the latest developments and advancements in the machine learning-based prediction and feature selection of multivariate time series (MVTS) data. MVTS data, which involves multiple interrelated time series, presents significant challenges due to its high dimensionality, complex temporal dependencies, and inter-variable relationships. These challenges are critical in domains such as space weather prediction, environmental monitoring, healthcare, sensor networks, and finance. Our research addresses these challenges by developing and implementing advanced machine-learning algorithms specifically designed for MVTS data. We introduce innovative methodologies that focus on three key areas: feature selection, classification, and forecasting. Our contributions include the development of deep learning models, such as Long Short-Term Memory (LSTM) networks and Transformer-based architectures, which are optimized to capture and model complex temporal and inter-parameter dependencies in MVTS data. Additionally, we propose a novel feature selection framework that gradually identifies the most relevant variables, enhancing model interpretability and predictive accuracy. Through extensive experimentation and validation, we demonstrate the superior performance of our approaches compared to existing methods. The results highlight the practical applicability of our solutions, providing valuable tools and insights for researchers and practitioners working with high-dimensional time series data. This work advances the state of the art in MVTS analysis, offering robust methodologies that address both theoretical and practical challenges in this field.43 0Item Restricted Toward a Better Understanding of Accessibility Adoption: Developer Perceptions and Challenges(University Of North Texas, 2024-12) Alghamdi, Asmaa Mansour; Stephanie, LudiThe primary aim of this dissertation is to explore the challenges developers face in interpreting and implementing accessibility in web applications. We analyze developers’ discussions on web accessibility to gain a comprehensive understanding of the challenges, misconceptions, and best practices prevalent within the development community. As part of this analysis, we built a taxonomy of accessibility aspects discussed by developers on Stack Overflow, identifying recurring trends, common obstacles, and the types of disabilities associated with the features addressed by developers in their posts. This dissertation also evaluates the extent to which developers on online platforms engage with and deliberate upon accessibility issues, assessing their awareness and implementation of accessibility standards throughout the web application development process. Given the volume and variety of these discussions, manual analysis alone would be insufficient to capture the full scope of accessibility challenges. Therefore, we employed supervised machine learning techniques to classify these posts based on their relevance to different aspects of the WCAG 2.2 guidelines principle. By training our models on labeled data, we were able to automatically detect patterns and keywords that indicate specific accessibility issues, even when the language used by developers is not directly aligned with the official guidelines. The results emphasize developers’ struggles with complex accessibility issues, such as time-based media customization and screen reader configuration. The findings indicate that machine learning holds significant potential for enhancing compliance with accessibility standards, providing a pathway for more efficient and accurate adherence to these guidelines.70 0Item Restricted Online conversations: A study of their toxicity(University of Illinois Urbana-Champaign, 2024) Alkhabaz, Ridha; Sundaram, HariSocial media platforms are essential spaces for modern human communication. There is a dire need to make these spaces most welcoming and engaging to their participants. A potential threat to this need is the propagation of toxic content in online spaces. Hence, it becomes crucial for social media platforms to detect early signs of a toxic conversation. In this work, we tackle the problem of toxicity prediction by proposing a definition for conversational structures. This definition empowers us to provide a new framework for toxicity prediction. Thus, we examine more than 1.18 million X (made by 4.4 million users), formerly known as Twitter, threads to provide a few key insights about the current state of online conversations. Our results indicated that most of the X threads do not exhibit a conversational structure. Also, our newly defined structures are distributed differently than previously thought of online conversations. Additionally, our definitions give a meaningful sign for models to start predicting the future toxicity of online conversations. We also showcase that message-passing graph neural networks outperform state-of-the-art gradient- boosting trees for toxicity prediction. Most importantly, we find that once we observe the first two terminating conversational structures, we can predict the future toxicity of online threads with ≈88 % accuracy. We hope our findings will help social media platforms better curate content in their spaces and promote more conversations in online spaces.22 0Item Restricted ECG CLASSIFICATION USING NEURAL NETWORK(University of Bridgeport, 2018) Alhassani, Ahmad; Faezipour, MiadAn electrocardiogram (ECG) is one of the biomedical signals that is considered a very useful approach to providing information about heart problems. This thesis has been done to contribute to making machines of observation of hearts have more ability for making accurate and fast diagnosis so that life of more patients might be saved. Physios Bank was the source of our dataset. It has many real examples of heart diseases that we can choose for our studies. In this research, there are five heart cases that were used for this research, normal N, atrial premature beat PAC, premature ventricular contraction PVC, left bundle branch block beat LBBB, and right bundle branch block beat RBBB. Classifying these five cases with a high efficiency and accuracy using neural network is our final goal. To achieve this goal, ECG signals must go through specific procedures or steps. The first procedure was ECG signal preprocessing. This step has three sup steps, signal filtering, signal detrending, and signal smoothing. The second procedure is extracting features of ECG signals. The forth one is classifying ECG signals using neural network. Finally, the results of NN will be saved for future purposes. Our system was implemented by using MATLAB because it is a very powerful software for signal processing and signal analysis. Our research was ended with some good achievements and optimizations. For example, discovering good techniques for filtering, finding new way for features extraction, building one neural network to classify multiple heart diseases, and making a high accuracy with 96.88% percent.66 0