SACM - United States of America
Permanent URI for this collectionhttps://drepo.sdl.edu.sa/handle/20.500.14154/9668
Browse
38 results
Search Results
Item Restricted Sensing, Scheduling, and Learning for Resource-Constrained Edge Systems(Saudi Digital Library, 2025) Bukhari, Abdulrahman; Kim, HyoseungRecent advances in Internet of Things (IoT) technologies have sparked significant interest in developing learning-based sensing applications on embedded edge devices. These efforts, however, are challenged by adapting to unforeseen conditions in open-world environments and by the practical limitations of low-cost sensors in the field. This dissertation presents the design, implementation, and evaluation of resource-constrained edge systems that address these challenges through time-series sensing, scheduling, and classification. First, we present OpenSense, an open-world time-series sensing framework for performing inference and incremental classification on an embedded edge device, eliminating reliance on powerful cloud servers. To create time for on-device updates without missing events and to reduce sensing and communication overhead, we introduce two dynamic sensor-scheduling techniques: (i) a class-level period assignment scheduler that selects an appropriate sensing period for each inferred class and (ii) a Q-learning–based scheduler that learns event patterns to choose the sensing interval at each classification moment. Experimental results show that OpenSense incrementally adapts to unforeseen conditions and schedules effectively on a resource-constrained device. Second, to bridge the gap between theoretical potential and field practice for low-cost sensors, we present a comprehensive evaluation of a sensing and classification system for early stress and disease detection in avocado plants. The greenhouse deployment spans 72 plants in four treatment categories over six months. For leaves, spectral reflectance coupled with multivariate analysis and permutation testing yields statistically significant results and reliable inference. For soils, we develop a two-level hierarchical classification approach tailored to treatment characteristics that achieves 75–86\% accuracy across avocado genotypes and outperforms conventional approaches by over 20\%. Embedded evaluations on Raspberry Pi and Jetson report end-to-end latency, computation, memory usage, and power consumption, demonstrating practical feasibility. In summary, the contributions are a generalized framework for dynamic, open-world learning on edge devices and an application-specific system for robust classification in noisy field deployments. These real-world deployments collectively outline a practical framework for designing intelligent, cloud-independent edge systems from sensing to inference.13 0Item Restricted Sensing, Scheduling, and Learning for Resource-Constrained Edge Systems(Saudi Digital Library, 2025) Bukhari, Abdulrahman Ismail Ibrahim; Kim, HyoseungRecent advances in Internet of Things (IoT) technologies have sparked significant interest in developing learning-based sensing applications on embedded edge devices. These efforts, however, are challenged by adapting to unforeseen conditions in open-world environments and by the practical limitations of low-cost sensors in the field. This dissertation presents the design, implementation, and evaluation of resource-constrained edge systems that address these challenges through time-series sensing, scheduling, and classification. First, we present OpenSense, an open-world time-series sensing framework for performing inference and incremental classification on an embedded edge device, eliminating reliance on powerful cloud servers. To create time for on-device updates without missing events and to reduce sensing and communication overhead, we introduce two dynamic sensor-scheduling techniques: (i) a class-level period assignment scheduler that selects an appropriate sensing period for each inferred class and (ii) a Q-learning–based scheduler that learns event patterns to choose the sensing interval at each classification moment. Experimental results show that OpenSense incrementally adapts to unforeseen conditions and schedules effectively on a resource-constrained device. Second, to bridge the gap between theoretical potential and field practice for low-cost sensors, we present a comprehensive evaluation of a sensing and classification system for early stress and disease detection in avocado plants. The greenhouse deployment spans 72 plants in four treatment categories over six months. For leaves, spectral reflectance coupled with multivariate analysis and permutation testing yields statistically significant results and reliable inference. For soils, we develop a two-level hierarchical classification approach tailored to treatment characteristics that achieves 75–86\% accuracy across avocado genotypes and outperforms conventional approaches by over 20\%. Embedded evaluations on Raspberry Pi and Jetson report end-to-end latency, computation, memory usage, and power consumption, demonstrating practical feasibility. In summary, the contributions are a generalized framework for dynamic, open-world learning on edge devices and an application-specific system for robust classification in noisy field deployments. These real-world deployments collectively outline a practical framework for designing intelligent, cloud-independent edge systems from sensing to inference.27 0Item Restricted EXPERIMENTAL STUDY OF THE IMPORTANCE OF DATA FOR MACHINE LEARNING-BASED BREAST CANCER OUTCOME PREDICTION(Saudi Digital Library, 2024) Yamani, Wid; Wojtusaik, JanuszEXPERIMENTAL STUDY OF THE IMPORTANCE OF DATA FOR MACHINE LEARNING-BASED BREAST CANCER OUTCOME PREDICTION Wid Yamani, Ph.D. George Mason University, 2025 Dissertation Director: Dr. Janusz Wojtusiak Researchers have used various large-scale datasets to develop and validate predictive models in breast cancer outcome prediction. However, a notable gap exists due to the lack of a systematic comparison among these datasets regarding predictive performance, feature availability, and suitability for different analytical objectives. While each dataset has unique strengths and limitations, no comprehensive studies evaluate how these differences impact model performance, particularly across diverse timeframes, survival, and recurrence outcomes. This gap limits researchers in making informed choices about the most appropriate dataset for specific research questions. Effective modeling and prediction of breast cancer outcomes (such as cancer survival and recurrence) rely on the dataset's quality, the pre-processing techniques used to clean and transform data, and the choice of predictive models. Therefore, selecting a suitable dataset and identifying relevant variables are as crucial as the choice of the model itself. This thesis addresses this gap by systematically comparing five prominent datasets for predicting breast cancer outcomes. This dissertation compares five datasets—SEER Research 8, SEER Research 17, SEER Research Plus, SEER-Medicare, and Medicare Claims data—focusing on breast cancer survival and recurrence. It evaluates the predictive performance of each dataset using supervised machine learning methods, including logistic regression, random forest, and gradient boosting. The models were tested on metrics such as AUC, accuracy, recall, and precision, with gradient boosting delivering the most accurate results. The findings indicate that SEER-Medicare, which integrates cancer registry data with three years of retrospective claims, outperformed the other datasets, achieving AUCs of 0.891 for 5-year survival and 0.942 for 10-year survival. This dataset's inclusion of comprehensive health information, including pre-existing conditions and other claims data, makes it particularly valuable for outcome prediction. However, a drawback of SEER-Medicare is that it primarily includes patients aged 65 and older, as it is based on Medicare data. This limitation reduces its suitability for predicting outcomes in younger breast cancer patients, a significant subgroup with distinct risk factors and treatment responses. SEER Research Plus ranked second, offering data on patient demographics, breast cancer characteristics, staging, outcomes, and treatment, with AUC values of 0.877, 0.901, and 0.937 for 5-year, 10-year, and 15-year survival, respectively. SEER Research 17 and SEER Research 8 include patient demographics, breast cancer characteristics, and staging information but lack treatment details. SEER Research 17, which covers a larger population with more variables, yielded AUC values of 0.870 for 5-year survival, 0.897 for 10-year survival, and 0.920 for 15-year survival. SEER Research 8, which covers a smaller population over a more extended period, yielded slightly lower AUC values of 0.857, 0.868, and 0.880 for 5-year, 10-year, and 15-year survival, respectively. Results indicate that including treatment and additional variables significantly enhances prediction accuracy while the data size is less critical. This thesis is the first study that compares SEER datasets and provides a groundbreaking, comprehensive evaluation of these datasets, providing crucial insights into how data characteristics influence breast cancer outcome modeling.15 0Item Restricted Cross Dataset Fairness Evaluation of Transformer Based Sentiment Models(Saudi Digital Library, 2025-05-10) Zuiran, Sara; Bhattacharyya, SiddharthaWith the growing exploration of Natural Language Processing (NLP) systems in decision-making environments, it is essential to evaluate technical and ethical aspects of the dataset and the NLP model to improve fairness. To assess fairness, the thesis examines demographic imbalances in sentiment classification models by evaluating transformer-based models fine-tuned on the Stanford Sentiment Treebank version 2 dataset (SST-2) against the demographically annotated Comprehensive Assessment of Language Model dataset (CALM). This work identifies performance disparities in sentiment prediction across demographic groups by examining sensitive attributes such as gender and race. The study evaluates both the RoBERTa and MentalBERT transformer models using a complete set of fairness metrics consisting of Statistical Parity Difference (SPD), Equal Opportunity Difference (EOD), False Positive Rates (FPR), False Negative Rates (FNR), Jensen-Shannon Divergence (JSD), and Wasserstein Distance (WD). The analysis examines both group-vs-rest and pairwise subgroup comparisons, including gender and ethnicity. Results show that applying adversarial mitigation reduced fairness disparities across demographic subgroups, with the most notable improvements observed for non-binary and Asian users. The observed disparities emphasize the challenge of reducing performance gaps across demographic subgroups in sentiment classification tasks. The thesis introduces a practical framework for evaluating demographic dis- disparities, extends fairness analysis, and assesses the impact of mitigation techniques in cross-dataset sentiment classification. This research proposes a framework that demonstrates a path toward creating inclusive NLP systems and establishes the groundwork for upcoming ethical Artificial Intelligence (AI) studies.13 0Item Restricted GRAPH-BASED APPROACH: BRIDGING INSIGHTS FROM STRUCTURED AND UNSTRUCTURED DATA(Temple University, 2025) Aljurbua, Rafaa; Obradovic, ZoranGraph-based methodologies provide powerful tools for uncovering intricate relationships and patterns in complex data, enabling the integration of structured and unstructured information for insightful decision-making across diverse domains. Our research focuses on constructing graphs from structured and unstructured data, demonstrating their applications in healthcare and power systems. In healthcare, we examine how social networks influence the attitudes of hemodialysis patients toward kidney transplantation. Using a network-based approach, we investigate how social networks within hemodialysis clinics affect patients' attitudes, contributing to a growing understanding of this dynamic. Our findings emphasize that social networks improve the performance of machine learning models, highlighting the importance of social interactions in clinical settings (Aljurbua et al., 2022). We further introduce Node2VecFuseClassifier, a graph-based model that combines patient interactions with patient characteristics. By comparing problem representations that focus on sociodemographics versus social interactions, we demonstrate that incorporating patient-to-patient and patient-to-staff interactions results in more accurate predictions. This multi-modal analysis, which merges patient experiences with staff expertise, underscores the role of social networks in influencing attitudes toward transplantation (Aljurbua et al., 2024b). In power systems, we explore the impact of severe weather events that lead to power outages, specifically focusing on predicting weather-induced outages three hours in advance at the county level in the Pacific Northwest of the United States. By utilizing a multi-model multiplex network that integrates data from multiple sources including weather, transmission lines, lightning, vegetation, and social media posts from two leading platforms (Twitter and Reddit), we show how multiplex networks offer valuable insights for predicting power outages. This integration of diverse data sources and network-based modeling emphasizes the importance of leveraging multiple perspectives to enhance the understanding and prediction of power disruptions (Aljurbua et al., 2023). We further present HMN-RTS, a hierarchical multiplex network that classifies disruption severity by temporal learning from integrated weather recordings and social media posts. The multiplex network layers of this framework gather information about power outages, weather, lighting, land cover, transmission lines, and social media comments. By incorporating multiplex network layers consisting of data collected over time and across regions, we demonstrate that HMN-RTS significantly improves the accuracy of predicting the duration of weather-related outages. This framework enables grid operators to make more reliable predictions up to 6 hours in advance, supporting early risk assessment and proactive mitigation (Aljurbua et al., 2024a, 2025a). Additionally, we introduce SMN-WVF, a spatiotemporal multiplex network designed to predict the duration of power outages in distribution grids. By integrating network-based approach and multi-modal data across space and time, SMN-WVF offers a novel method for predicting disruption durations in distribution grids, enhancing decision-making and mitigation efforts while highlighting the critical role of network-based approaches in forecasting (Aljurbua et al., 2025b). Overall, our research showcases the potential of graph-based models in tackling complex challenges in both power systems and healthcare. By combining the network-based approach with multi-modal data, we present innovative solutions for predicting power outages and understanding patient attitudes.23 0Item Restricted Quantifying and Profiling Echo Chambers on Social Media(Arizona State University, 2024) Alatawi, Faisal; Liu, Huan; Sen, Arunabha; Davulcu, Hasan; Shu, KaiEcho chambers on social media have become a critical focus in the study of online behavior and public discourse. These environments, characterized by the ideological homogeneity of users and limited exposure to opposing viewpoints, contribute to polarization, the spread of misinformation, and the entrenchment of biases. While significant research has been devoted to proving the existence of echo chambers, less attention has been given to understanding their internal dynamics. This dissertation addresses this gap by developing novel methodologies for quantifying and profiling echo chambers, with the goal of providing deeper insights into how these communities function and how they can be measured. The first core contribution of this work is the introduction of the Echo Chamber Score (ECS), a new metric for measuring the degree of ideological segregation in social media interaction networks. The ECS captures both the cohesion within communities and the separation between them, offering a more nuanced approach to assessing polarization. By using a self-supervised Graph Auto-Encoder (EchoGAE), the ECS bypasses the need for explicit ideological labeling, instead embedding users based on their interactions and linguistic patterns. The second contribution is a Heterogeneous Information Network (HIN)-based framework for profiling echo chambers. This framework integrates social and linguistic features, allowing for a comprehensive analysis of the relationships between users, topics, and language within echo chambers. By combining community detection, topic modeling, and language analysis, the profiling method reveals how discourse and group behavior reinforce ideological boundaries. Through the application of these methods to real-world social media datasets, this dissertation demonstrates their effectiveness in identifying polarized communities and profiling their internal discourse. The findings highlight how linguistic homophily and social identity theory shape echo chambers and contribute to polarization. Overall, this research advances the understanding of echo chambers by moving beyond detection to explore their structural and linguistic complexities, offering new tools for measuring and addressing polarization on social media platforms.25 0Item Restricted Deep Learning Approaches for Multivariate Time Series: Advances in Feature Selection, Classification, and Forecasting(New Mexico State University, 2024) Alshammari, Khaznah Raghyan; Tran, Son; Hamdi, Shah MuhammadIn this work, we present the latest developments and advancements in the machine learning-based prediction and feature selection of multivariate time series (MVTS) data. MVTS data, which involves multiple interrelated time series, presents significant challenges due to its high dimensionality, complex temporal dependencies, and inter-variable relationships. These challenges are critical in domains such as space weather prediction, environmental monitoring, healthcare, sensor networks, and finance. Our research addresses these challenges by developing and implementing advanced machine-learning algorithms specifically designed for MVTS data. We introduce innovative methodologies that focus on three key areas: feature selection, classification, and forecasting. Our contributions include the development of deep learning models, such as Long Short-Term Memory (LSTM) networks and Transformer-based architectures, which are optimized to capture and model complex temporal and inter-parameter dependencies in MVTS data. Additionally, we propose a novel feature selection framework that gradually identifies the most relevant variables, enhancing model interpretability and predictive accuracy. Through extensive experimentation and validation, we demonstrate the superior performance of our approaches compared to existing methods. The results highlight the practical applicability of our solutions, providing valuable tools and insights for researchers and practitioners working with high-dimensional time series data. This work advances the state of the art in MVTS analysis, offering robust methodologies that address both theoretical and practical challenges in this field.44 0Item Restricted Toward a Better Understanding of Accessibility Adoption: Developer Perceptions and Challenges(University Of North Texas, 2024-12) Alghamdi, Asmaa Mansour; Stephanie, LudiThe primary aim of this dissertation is to explore the challenges developers face in interpreting and implementing accessibility in web applications. We analyze developers’ discussions on web accessibility to gain a comprehensive understanding of the challenges, misconceptions, and best practices prevalent within the development community. As part of this analysis, we built a taxonomy of accessibility aspects discussed by developers on Stack Overflow, identifying recurring trends, common obstacles, and the types of disabilities associated with the features addressed by developers in their posts. This dissertation also evaluates the extent to which developers on online platforms engage with and deliberate upon accessibility issues, assessing their awareness and implementation of accessibility standards throughout the web application development process. Given the volume and variety of these discussions, manual analysis alone would be insufficient to capture the full scope of accessibility challenges. Therefore, we employed supervised machine learning techniques to classify these posts based on their relevance to different aspects of the WCAG 2.2 guidelines principle. By training our models on labeled data, we were able to automatically detect patterns and keywords that indicate specific accessibility issues, even when the language used by developers is not directly aligned with the official guidelines. The results emphasize developers’ struggles with complex accessibility issues, such as time-based media customization and screen reader configuration. The findings indicate that machine learning holds significant potential for enhancing compliance with accessibility standards, providing a pathway for more efficient and accurate adherence to these guidelines.70 0Item Restricted Online conversations: A study of their toxicity(University of Illinois Urbana-Champaign, 2024) Alkhabaz, Ridha; Sundaram, HariSocial media platforms are essential spaces for modern human communication. There is a dire need to make these spaces most welcoming and engaging to their participants. A potential threat to this need is the propagation of toxic content in online spaces. Hence, it becomes crucial for social media platforms to detect early signs of a toxic conversation. In this work, we tackle the problem of toxicity prediction by proposing a definition for conversational structures. This definition empowers us to provide a new framework for toxicity prediction. Thus, we examine more than 1.18 million X (made by 4.4 million users), formerly known as Twitter, threads to provide a few key insights about the current state of online conversations. Our results indicated that most of the X threads do not exhibit a conversational structure. Also, our newly defined structures are distributed differently than previously thought of online conversations. Additionally, our definitions give a meaningful sign for models to start predicting the future toxicity of online conversations. We also showcase that message-passing graph neural networks outperform state-of-the-art gradient- boosting trees for toxicity prediction. Most importantly, we find that once we observe the first two terminating conversational structures, we can predict the future toxicity of online threads with ≈88 % accuracy. We hope our findings will help social media platforms better curate content in their spaces and promote more conversations in online spaces.22 0Item Restricted Network Alignment Using Topological And Node Embedding Features(Purdue University, 2024-08) Almulhim, Aljohara; AlHasan, MohammadIn today’s big data environment, development of robust knowledge discovery solutions depends on integration of data from various sources. For example, intelligence agencies fuse data from multiple sources to identify criminal activities; e-commerce platforms consolidate user activities on various platforms and devices to build better user profile; scientists connect data from various modality to develop new drugs, and treatments. In all such activities, entities from different data sources need to be aligned—first, to ensure accurate analysis and more importantly, to discover novel knowledge regarding these entities. If the data sources are networks, aligning entities from different sources leads to the task of network alignment, which is the focus of this thesis. The main objective of this task is to find an optimal one-to-one correspondence among nodes in two or more networks utilizing graph topology and nodes/edges attributes. In existing works, diverse computational schemes have been adopted for solving the network alignment task; these schemes include finding eigen-decomposition of similarity matrices, solving quadratic assignment problems via sub-gradient optimization, and designing iterative greedy matching techniques. Contemporary works approach this problem using a deep learning framework by learning node representations to identify matches. Node matching’s key challenges include computational complexity and scalability. However, privacy concerns or unavailability often prevent the utilization of node attributes in real-world scenarios. In light of this, we aim to solve this problem by relying solely on the graph structure, without the need for prior knowledge, external attributes, or guidance from landmark nodes. Clearly, topology-based matching emerges as a hard problem when compared to other network matching tasks. In this thesis, I propose two original works to solve network topology-based alignment task. The first work, Graphlet-based Alignment (Graphlet-Align), employs a topological approach to network alignment. Graphlet-Align represents each node with a local graphlet count based signature and use that as feature for deriving node to node similarity across a pair of networks. By using these similarity values in a bipartite matching algorithm GraphletAlign obtains a preliminary alignment. It then uses high-order information extending to k-hop neighborhood of a node to further refine the alignment, achieving better accuracy. We validated Graphlet-Align’s efficacy by applying it to various large real-world networks, achieving accuracy improvements ranging from 20% to 72% over state-of-the-art methods on both duplicated and noisy graphs. Expanding on this paradigm that focuses solely on topology for solving graph alignment, in my second work, I develop a self-supervised learning framework known as Self-Supervised Topological Alignment (SST-Align). SST-Align uses graphlet-based signature for creating self-supervised node alignment labels, and then use those labels to generate node embedding vectors of both the networks in a joint space from which node alignment task can be effectively and accurately solved. It starts with an optimization process that applies average pooling on top of the extracted graphlet signature to construct an initial node assignment. Next, a self-supervised Siamese network architecture utilizes both the initial node assignment and graph convolutional networks to generate node embeddings through a contrastive loss. By applying kd-tree similarity to the two networks’ embeddings, we achieve the final node mapping. Extensive testing on real-world graph alignment datasets shows that our developed methodology has competitive results compared to seven existing competing models in terms of node mapping accuracy. Additionally, we establish the Ablation Study to evaluate the two-stage accuracy, excluding the learning representation part and comparing the mapping accuracy accordingly. This thesis enhances the theoretical understanding of topological features in the analysis of graph data for network alignment task, hence facilitating future advancements toward the field.12 0