SACM - United States of America

Permanent URI for this collectionhttps://drepo.sdl.edu.sa/handle/20.500.14154/9668

Browse

Search Results

Now showing 1 - 10 of 32
  • ItemRestricted
    Deep Learning Approaches for Multivariate Time Series: Advances in Feature Selection, Classification, and Forecasting
    (New Mexico State University, 2024) Alshammari, Khaznah Raghyan; Tran, Son; Hamdi, Shah Muhammad
    In this work, we present the latest developments and advancements in the machine learning-based prediction and feature selection of multivariate time series (MVTS) data. MVTS data, which involves multiple interrelated time series, presents significant challenges due to its high dimensionality, complex temporal dependencies, and inter-variable relationships. These challenges are critical in domains such as space weather prediction, environmental monitoring, healthcare, sensor networks, and finance. Our research addresses these challenges by developing and implementing advanced machine-learning algorithms specifically designed for MVTS data. We introduce innovative methodologies that focus on three key areas: feature selection, classification, and forecasting. Our contributions include the development of deep learning models, such as Long Short-Term Memory (LSTM) networks and Transformer-based architectures, which are optimized to capture and model complex temporal and inter-parameter dependencies in MVTS data. Additionally, we propose a novel feature selection framework that gradually identifies the most relevant variables, enhancing model interpretability and predictive accuracy. Through extensive experimentation and validation, we demonstrate the superior performance of our approaches compared to existing methods. The results highlight the practical applicability of our solutions, providing valuable tools and insights for researchers and practitioners working with high-dimensional time series data. This work advances the state of the art in MVTS analysis, offering robust methodologies that address both theoretical and practical challenges in this field.
    14 0
  • ItemRestricted
    Toward a Better Understanding of Accessibility Adoption: Developer Perceptions and Challenges
    (University Of North Texas, 2024-12) Alghamdi, Asmaa Mansour; Stephanie, Ludi
    The primary aim of this dissertation is to explore the challenges developers face in interpreting and implementing accessibility in web applications. We analyze developers’ discussions on web accessibility to gain a comprehensive understanding of the challenges, misconceptions, and best practices prevalent within the development community. As part of this analysis, we built a taxonomy of accessibility aspects discussed by developers on Stack Overflow, identifying recurring trends, common obstacles, and the types of disabilities associated with the features addressed by developers in their posts. This dissertation also evaluates the extent to which developers on online platforms engage with and deliberate upon accessibility issues, assessing their awareness and implementation of accessibility standards throughout the web application development process. Given the volume and variety of these discussions, manual analysis alone would be insufficient to capture the full scope of accessibility challenges. Therefore, we employed supervised machine learning techniques to classify these posts based on their relevance to different aspects of the WCAG 2.2 guidelines principle. By training our models on labeled data, we were able to automatically detect patterns and keywords that indicate specific accessibility issues, even when the language used by developers is not directly aligned with the official guidelines. The results emphasize developers’ struggles with complex accessibility issues, such as time-based media customization and screen reader configuration. The findings indicate that machine learning holds significant potential for enhancing compliance with accessibility standards, providing a pathway for more efficient and accurate adherence to these guidelines.
    52 0
  • ItemRestricted
    Online conversations: A study of their toxicity
    (University of Illinois Urbana-Champaign, 2024) Alkhabaz, Ridha; Sundaram, Hari
    Social media platforms are essential spaces for modern human communication. There is a dire need to make these spaces most welcoming and engaging to their participants. A potential threat to this need is the propagation of toxic content in online spaces. Hence, it becomes crucial for social media platforms to detect early signs of a toxic conversation. In this work, we tackle the problem of toxicity prediction by proposing a definition for conversational structures. This definition empowers us to provide a new framework for toxicity prediction. Thus, we examine more than 1.18 million X (made by 4.4 million users), formerly known as Twitter, threads to provide a few key insights about the current state of online conversations. Our results indicated that most of the X threads do not exhibit a conversational structure. Also, our newly defined structures are distributed differently than previously thought of online conversations. Additionally, our definitions give a meaningful sign for models to start predicting the future toxicity of online conversations. We also showcase that message-passing graph neural networks outperform state-of-the-art gradient- boosting trees for toxicity prediction. Most importantly, we find that once we observe the first two terminating conversational structures, we can predict the future toxicity of online threads with ≈88 % accuracy. We hope our findings will help social media platforms better curate content in their spaces and promote more conversations in online spaces.
    19 0
  • Thumbnail Image
    ItemRestricted
    Network Alignment Using Topological And Node Embedding Features
    (Purdue University, 2024-08) Almulhim, Aljohara; AlHasan, Mohammad
    In today’s big data environment, development of robust knowledge discovery solutions depends on integration of data from various sources. For example, intelligence agencies fuse data from multiple sources to identify criminal activities; e-commerce platforms consolidate user activities on various platforms and devices to build better user profile; scientists connect data from various modality to develop new drugs, and treatments. In all such activities, entities from different data sources need to be aligned—first, to ensure accurate analysis and more importantly, to discover novel knowledge regarding these entities. If the data sources are networks, aligning entities from different sources leads to the task of network alignment, which is the focus of this thesis. The main objective of this task is to find an optimal one-to-one correspondence among nodes in two or more networks utilizing graph topology and nodes/edges attributes. In existing works, diverse computational schemes have been adopted for solving the network alignment task; these schemes include finding eigen-decomposition of similarity matrices, solving quadratic assignment problems via sub-gradient optimization, and designing iterative greedy matching techniques. Contemporary works approach this problem using a deep learning framework by learning node representations to identify matches. Node matching’s key challenges include computational complexity and scalability. However, privacy concerns or unavailability often prevent the utilization of node attributes in real-world scenarios. In light of this, we aim to solve this problem by relying solely on the graph structure, without the need for prior knowledge, external attributes, or guidance from landmark nodes. Clearly, topology-based matching emerges as a hard problem when compared to other network matching tasks. In this thesis, I propose two original works to solve network topology-based alignment task. The first work, Graphlet-based Alignment (Graphlet-Align), employs a topological approach to network alignment. Graphlet-Align represents each node with a local graphlet count based signature and use that as feature for deriving node to node similarity across a pair of networks. By using these similarity values in a bipartite matching algorithm GraphletAlign obtains a preliminary alignment. It then uses high-order information extending to k-hop neighborhood of a node to further refine the alignment, achieving better accuracy. We validated Graphlet-Align’s efficacy by applying it to various large real-world networks, achieving accuracy improvements ranging from 20% to 72% over state-of-the-art methods on both duplicated and noisy graphs. Expanding on this paradigm that focuses solely on topology for solving graph alignment, in my second work, I develop a self-supervised learning framework known as Self-Supervised Topological Alignment (SST-Align). SST-Align uses graphlet-based signature for creating self-supervised node alignment labels, and then use those labels to generate node embedding vectors of both the networks in a joint space from which node alignment task can be effectively and accurately solved. It starts with an optimization process that applies average pooling on top of the extracted graphlet signature to construct an initial node assignment. Next, a self-supervised Siamese network architecture utilizes both the initial node assignment and graph convolutional networks to generate node embeddings through a contrastive loss. By applying kd-tree similarity to the two networks’ embeddings, we achieve the final node mapping. Extensive testing on real-world graph alignment datasets shows that our developed methodology has competitive results compared to seven existing competing models in terms of node mapping accuracy. Additionally, we establish the Ablation Study to evaluate the two-stage accuracy, excluding the learning representation part and comparing the mapping accuracy accordingly. This thesis enhances the theoretical understanding of topological features in the analysis of graph data for network alignment task, hence facilitating future advancements toward the field.
    12 0
  • Thumbnail Image
    ItemRestricted
    EAVESDROPPING-DRIVEN PROFILING ATTACKS ON ENCRYPTED WIFI NETWORKS: UNVEILING VULNERABILITIES IN IOT DEVICE SECURITY
    (University of Central Florida, 2024-08-02) Alwhbi, Ibrahim; Zou, Changchun
    This dissertation investigates the privacy implications of WiFi communication in Internet-of-Things (IoT) environments, focusing on the threat posed by out-of-network observers. Recent research has shown that in-network observers can glean information about IoT devices, user identities, and activities. However, the potential for information inference by out-of-network observers, who do not have WiFi network access, has not been thoroughly examined. The first study provides a detailed summary dataset, utilizing Random Forest for data summary classifica- tion. This study highlights the significant privacy threat to WiFi networks and IoT applications from out-of-network observers. Building on this investigation, the second study extends the research by utilizing a new set of time series monitored WiFi data frames and advanced machine learning algorithms, specifically xGboost, for Time Series classification. This extension achieved high accuracy of up to 94% in identifying IoT devices and their working status, demonstrating faster IoT device profiling while maintaining classification accuracy. Furthermore, the study underscores the ease with which out- side intruders can harm IoT devices without joining a WiFi network, launching attacks quickly and leaving no detectable footprints. Additionally, the dissertation presents a comprehensive survey of recent advancements in machine- learning-driven encrypted traffic analysis and classification. Given the challenges posed by encryp- tion for traditional packet and traffic inspection, understanding and classifying encrypted traffic are crucial. The survey provides insights into utilizing machine learning for encrypted network traffic analysis and classification, reviewing state-of-the-art techniques and methodologies. This survey serves as a valuable resource for network administrators, cybersecurity professionals, and policy enforcement entities, offering insights into current practices and future directions in encrypted traffic analysis and classification.
    25 0
  • Thumbnail Image
    ItemRestricted
    Exploring the Impact of Sentiment Analysis on Price Prediction
    (Lehigh University, 2024-07) Zahhar, Abdulkarim Ali Y.; Robinson, Daniel P.
    The integration of sentiment analysis into predictive models for financial markets, particularly Bitcoin, combines behavioral finance with quantitative analysis. This thesis investigates the extent to which sentiment data, derived from social media platforms such as X (formerly Twitter), can enhance the accuracy of Bitcoin price predictions. A key idea in the study is that public sentiment, as shown on social media, affects Bitcoin’s market prices. The research uses linear regression models that combine Bitcoin’s opening prices with sentiment scores from social media to forecast closing prices. The analysis covers the period from January 2012 to December 2019. Sentiment scores were analyzed using VADER and TextBlob lexicons. The empirical findings show that models incorporating sentiment scores enhance predictive accuracy. For example, incorporating daily average sentiment scores (v avg and B avg) into the models reduced the Mean Squared Error (MSE) from 81184 to 81129 and improved other metrics such as Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE), particularly at specific lag times like 8 and 76 days. These results emphasize the potential benefits of sentiment analysis to improve financial forecasting models. However, it also acknowledges limitations related to the scope of data and the complexities of accurately measuring sentiment. Future research is encouraged to explore more sophisticated models and diverse data sources to further enhance and validate the integration of sentiment analysis in financial forecasting.
    91 0
  • Thumbnail Image
    ItemRestricted
    Towards Cost-Effective Noise-Resilient Machine Learning Solutions
    (University of Georgia, 2026-06-04) Gharawi, Abdulrahman Ahmed; Ramaswamy, Lakshmish
    Machine learning models have demonstrated exceptional performance in various applications as a result of the emergence of large labeled datasets. Although there are many available datasets, acquiring high-quality labeled datasets is challenging since it involves huge human supervision or expert annotation, which are extremely labor-intensive and time-consuming. The problem is magnified by the considerable amount of label noise present in datasets from real-world scenarios, which significantly undermines the performance accuracy of machine learning models. Since noisy datasets can affect the performance of machine learning models, acquiring high-quality datasets without label noise becomes a critical problem. However, it is challenging to significantly decrease label noise in real-world datasets without hiring expensive expert annotators. Based on extensive testing and research, this dissertation examines the impact of different levels of label noise on the accuracy of machine learning models. It also investigates ways to cut labeling expenses without sacrificing required accuracy. Finally, to enhance the robustness of machine learning models and mitigate the pervasive issue of label noise, we present a novel, cost-effective approach called Self Enhanced Supervised Training (SEST).
    21 0
  • Thumbnail Image
    ItemRestricted
    Detecting Flaky Tests Without Rerunning Tests
    (George Mason University, 2024-07-26) Alshammari, Abdulrahman Turqi; Lam, Wing; Ammann, Paul
    A critical component of modern software development practices, particularly continuous integration (CI), is the halt of development activities in response to test failures which requires further investigation and debugging. As software changes, regression testing becomes vital to verify that new code does not affect existing functionality. However, this process is often delayed by the presence of flaky tests—those that yield inconsistent results on the same codebase, alternating between pass and fail. Test flakiness introduces challenges to the trust in testing outcomes and undermines the reliability of the CI process. The typical approach to identifying flaky tests has involved executing them multiple times; if a test yields both passing and failing results without any modifications to the codebase, it is flaky, as discussed by Luo et al. in their empirical study. This approach, while straightforward, can be resource-intensive and time-consuming, resulting in considerable overhead costs for development teams. Moreover, this technique might not consistently reveal flakiness in tests that exhibit varied behavior across varying execution environments. Given these challenges, the research community has been actively seeking more efficient and reliable alternatives to the repetitive execution of tests for flakiness detection. These explorations aim to uncover methods that can accurately detect flaky tests without the need for multiple reruns, thereby reducing the time and resources required for testing. This dissertation addresses three principal dimensions of test flakiness. First, it presents a machine learning classifier designed to detect which tests are flaky, based on previously detected flaky tests. Second, the dissertation proposes three de-duplication-based approaches to assist developers in determining whether a flaky test failure is flaky or not. Third, it highlights the impact of test flakiness on other testing activities (particularly mutation testing) and discusses how to mitigate the effects of test flakiness on mutation testing. This dissertation explores the detection of test flakiness by conducting an empirical study on the limitations of rerunning tests as a method for identifying flaky tests, which results in a large dataset of flaky tests. This dataset is then utilized to develop FlakeFlagger, a machine learning classifier, which is designed to automatically predict the likelihood of a test being flaky through static and dynamic analysis. The objective is to leverage FlakeFlagger to identify flaky tests without the need for reruns by detecting patterns and symptoms common among previously identified flaky tests. In addressing the challenge of detecting whether a failure is due to flakiness, this dissertation demonstrates how developers can better manage flaky tests within their test suites. The dissertation proposes three deduplication-based methods to help developers determine whether a specific failure is genuinely flaky or not. Furthermore, the dissertation discusses the effects of test flakiness on mutation testing, a critical activity for assessing the quality of test suites. It includes an extensive rerun experiment on the mutation analysis of flaky tests identified earlier in the study. This is to highlight the significant impact of flaky tests on the validity of the mutation testing.
    26 0
  • Thumbnail Image
    ItemRestricted
    A Deep Learning Framework for Blockage Mitigation in mmWave Wireless
    (Portland State University, 2024-05-28) Almutairi, Ahmed; Aryafar, Ehsan
    Millimeter-Wave (mmWave) communication is a key technology to enable next generation wireless systems. However, mmWave systems are highly susceptible to blockages, which can lead to a substantial decrease in signal strength at the receiver. Identifying blockages and mitigating them is thus a key challenge to achieve next generation wireless technology goals, such as enhanced mobile broadband (eMBB) and Ultra-Reliable and Low-Latency Communication (URLLC). This thesis proposes several deep learning (DL) frameworks for mmWave wireless blockage detection, mitigation, and duration prediction. First, we propose a DL framework to address the problem of identifying whether the mmWave wireless channel between two devices (e.g., a base station and a client device) is Lineof- Sight (LoS) or non-Line-of-Sight (nLoS). Specifically, we show that existing beamforming training messages that are exchanged periodically between mmWave wireless devices can also be used in a DL model to solve the channel classification problem with no additional overhead. We extend this DL framework by developing a transfer learning model (t-LNCC) that is trained on simulated data and can successfully solve the channel classification problem on any commercial-off-the-shelf (COTS) mmWave device with/without any real-world labeled data. The second part of the thesis leverages our channel classification mechanism from the first part and introduces new DL frameworks to mitigate the negative impacts of blockages. Previous research on blockage mitigation has introduced several model and protocol based blockage mitigation solutions that focus on one technique at a time, such as handoff to a different base station or beam adaptation to the same base station. We go beyond those techniques by proposing DL frameworks that address the overarching problem: what blockage mitigation method should be employed? and what is the optimal sub-selection within that method? To do so, we developed two Gated Recurrent Unit (GRU) models that are trained using periodically exchanged messages in mmWave systems. Specifically, we first developed a GRU model that tackled the blockage mitigation problem in single-antenna clients wireless environment. Then, we proposed another GRU model to expand our investigation to cover more complex scenarios where both base stations and clients are equipped with multiple antennas and collaboratively mitigate blockages. Those two models are trained on datasets that are gathered using a commercially available mmWave simulator. Both models achieve outstanding results in selecting the optimal blockage mitigation method with an accuracy higher than 93% and 91% for single-antenna and multiple-antenna clients, respectively. We also show that the proposed methods significantly increases the amount of transferred data compared to several other blockage mitigation policies.
    17 0
  • Thumbnail Image
    ItemRestricted
    HIGH DIMENSIONAL TIME SERIES DATA MINING IN AUTOMATIC FIRE MONITORING AND AUTOMOTIVE QUALITY MANAGEMENT
    (Rutgers, The State University of New Jersey, 2024-05) Alhindi, Taha Jaweed O; Jeong, Myong K.
    Time series data is increasingly being generated in many domains around the world. Monitoring an event using multiple variables gathered over time forms a high-dimensional time series when the number of variables is high. High-dimensional time series are being widely applied across many areas. Thus, the need to develop more efficient and effective approaches to analyze and monitor high-dimensional time series data has become more critical. For instance, within the realm of fire disaster management, the advancement of fire detection systems has garnered research interest aimed at safeguarding human lives and property against devastating fire incidents. Nonetheless, the task of monitoring indoor fires presents complexities attributed to the distinct attributes of fire sensor signals (namely, high-dimensional time series), including the presence of time-based dependencies and varied signal patterns across different types of fires, such as those from flaming, heating, and smoldering sources. In the field of automobile quality management, minimizing internal vehicle noise is crucial for enhancing both customer satisfaction and the overall quality of the vehicle. Windshield wipers are significant contributors to such noise, and defective wipers can adversely impact the driving perception of passengers. Therefore, detecting wiper defects during production can lead to an improved driving experience, enhanced vehicle and road safety, and decreased driver distraction. Currently, the process for detecting noise from windshield wipers is manual, subjective, and requires considerable time. This dissertation presents several novel time series monitoring and anomaly detection approaches in two domains: 1) fire disaster management and 2) automotive quality management. The proposed approaches effectively address the limitations of traditional and existing systems and enhance human safety while reducing human and economic losses. In the fire disaster management domain, we first propose two fire detection systems using dynamic time warping (DTW) distance measure. The first fire detection system is based on DTW and the nearest neighbor (NN) classifier (NN-DTW). The second fire detection system utilizes a support vector machine with DTW kernel function (SVM-DTWK) to improve classification accuracy by utilizing SVM capability to obtain nonlinear decision boundaries. Using the DTW distance measure, both fire detection systems retain the temporal dynamics in the sensor signals of different fire types. Additionally, the suggested systems dynamically identify the essential sensors for early fire detection through the newly developed k-out-of-P fire voting rule. This rule integrates decision-making processes from P multichannel sensor signals effectively. To validate the efficiency of these systems, a case study was conducted using a real-world fire dataset from the National Institute of Standards and Technology. Secondly, we introduce a real-time, wavelet-based fire detection algorithm that leverages the multi-resolution capability of wavelet transformation. This approach differs from traditional fire detection methods by capturing the temporal dynamics of chemical sensor signals for different fire scenarios, including flaming, heating, and smoldering fires. A novel feature selection method tailored to fire types is employed to identify optimal features for distinguishing between normal conditions and various fire situations. Subsequently, a real-time detection algorithm incorporating a multi-model framework is developed to efficiently apply these chosen features, creating multiple fire detection models adept at identifying different fire types without pre-existing knowledge. Testing with publicly available fire data indicates that our algorithm surpasses conventional methods in terms of early detection capabilities, maintaining a low rate of false alarms across all fire categories. Thirdly, we introduce an innovative fire detection system designed for monitoring a range of indoor fire types. Unlike traditional research, which tends to separate the development of fire sensing and detection algorithms, our system seamlessly integrates these phases. This integration allows for the effective real-time utilization of varied sensor signals to identify fire outbreaks at their inception. Our system collects data from multiple types of sensors, each sensitive to different fire-emitted components. This data then feeds into a similarity matching-based detection algorithm that identifies distinct pattern shapes within the sensor signals across various fire conditions, enabling early detection of fires with minimal false alarms. The efficacy of this system is demonstrated through the use of real sensor data and experimental results, underscoring the system’s ability to accurately detect fires at an early stage. Lastly, in the automotive quality management domain, we introduce an innovative automated system for detecting faults in windshield wipers. Initially, we apply a new binarization technique to transform spectrograms of the sound produced by windshield wipers, isolating noisy regions. Following this, we propose a novel matrix factorization technique, termed orthogonal binary singular value decomposition, to break down these binarized mel spectrograms into uncorrelated binary eigenimages. This process enables the extraction of significant features for identifying defective wipers. Utilizing the k-NN classifier, these features are then categorized into normal or faulty wipers. The system’s efficiency was validated using real-world datasets of windshield wiper reversal and squeal noises, demonstrating superior performance over existing methodologies. The proposed approaches excel in detecting complex temporal patterns in high-dimensional time series data, with wide applicability across healthcare, environmental monitoring, and manufacturing for tasks like vital signs monitoring, climate and pollution tracking, and machinery maintenance. Additionally, the OBSVD technique, producing binary, uncorrelated eigenimages for unique information capture, broadens its use to medical imaging for anomaly detection, security for facial recognition, quality control for defect detection, document processing, and environmental analysis via satellite imagery. This versatility highlights the research's significant potential across machine learning and signal processing, improving efficiency and accuracy in time series data analysis.
    25 0

Copyright owned by the Saudi Digital Library (SDL) © 2024