SACM - United States of America

Permanent URI for this collectionhttps://drepo.sdl.edu.sa/handle/20.500.14154/9668

Browse

Search Results

Now showing 1 - 6 of 6
  • ItemRestricted
    Exploring the Security Landscape of AR/VR Applications: A Multi-Dimensional Perspective
    (University of Central Florida, 2025) Alghamdi, Abdulaziz; Mohaisen, David
    The rapid evolution of Augmented Reality (AR) and Virtual Reality (VR) technologies on mobile platforms has significantly impacted the digital landscape, raising concerns about security and privacy. As these technologies integrate into everyday life, understanding their security infrastructure and privacy policies is crucial to protect user data. To address this, our first study analyzes AR/VR applications from a security and performance perspective. Recognizing the lack of benchmark datasets for security research, we compiled a dataset of 408 AR/VR applications from the Google Play Store. The dataset includes control flow graphs, strings, functions, permissions, API calls, hexdump, and metadata, providing a valuable resource for improving application security. In the second study, we use BERT to analyze the privacy policies of AR/VR applications. A comparative analysis reveals that while AR/VR apps offer more comprehensive privacy policies than free content websites, they still lag behind premium websites. Additionally, we assess 20 U.S. state privacy regulations using the Coverage Quality Metric (CQM), identifying strengths, gaps, and enforcement measures. This study highlights the importance of critical privacy practices and key terms to enhance policy effectiveness and align industry standards with evolving regulations. Finally, our third study introduces a scalable approach to malware detection using machine learning models, specifically Random Forest (RF) and Graph Neural Networks (GNN). Utilizing two datasets—one with Android apps, including AR/VR, and Executable and Linkable Format (ELF) files—this research incorporates features such as API call groups and Android-specific features. The GNN model outperforms RF, demonstrating its ability to capture complex feature relationships and significantly improve malware detection accuracy. This work contributes to enhancing AR/VR application security, improving privacy practices, and advancing malware detection techniques.
    27 0
  • ItemRestricted
    Enhancing Opinion Mining in E-Commerce: The Role of Text Segmentation and K-Means Clustering in Transformer-Based Consumer Trust Analysis
    (Texas Tech University, 2025) Alkhalil, Bandar; Zhuang, Yu
    As the E-commerce market expands, customer reviews have become essential for companies aiming to understand consumer opinions. Building consumer trust is critical to the success of E-commerce businesses, as it significantly influences purchasing decisions. Understanding how to build this trust is essential, especially given that 93% of consumers report that online reviews influence their purchasing choices. Trust in E-commerce is commonly understood as a consumer’s willingness to rely on an online seller based on expectations of reliability, security, and competence. In other words, various factors affect consumer purchase decisions when shopping online. Customer reviews are crucial for gauging consumer opinions and can help identify the factors influencing trust in online shopping. However, current research primarily focuses on using transformer models to classify reviews as positive, negative, or neutral or to predict customer ratings based on the content of those reviews. This dissertation introduces a new approach that expands the capabilities of pre-trained transformer models, such as GPT, BART, and BERT, to extract trust factors from customer reviews, addressing a significant gap in the current literature. The research notably improves the model’s accuracy by utilizing text segmentation. Comparative analysis between segmented and unsegmented datasets, benchmarked against manually annotated reviews, demonstrates that text segmentation increases accuracy. Specifically, GPT-3.5 achieved an accuracy of 86.9%, representing a 15.5 percentage point improvement over unsegmented data. These findings validate segmentation as a critical technique for enhancing granularity and enabling models to identify nuanced trust factors effectively. To further validate the effectiveness of our approach, a second experiment was conducted using a different dataset to determine whether segmentation would yield comparable or even better performance in terms of accuracy. In this experiment, text segmentation was applied before the initial factor extraction to enhance the identification of trust factors. However, the large number of extracted factors created new challenges, as many were redundant or represented similar concepts under different names, complicating large-scale analysis. To address this challenge, K-means clustering, combined with the elbow method, successfully standardized the 2,890 extracted factors and grouped them into nine key categories. This refined process further improved the GPT-3.5 model’s accuracy to 88.5%, demonstrating the scalability and robustness of the proposed methodology in handling large-scale review datasets. The findings highlight the centrality of text segmentation and underscore the crucial role of normalization techniques, particularly K-means clustering, in managing large-scale review datasets. By offering a scalable and adaptable framework, this dissertation provides actionable insights for improving E-commerce analytics. Furthermore, it lays the groundwork for broader applications, extending its suitability beyond E-commerce to other areas where manual labeling is challenging or resource-intensive.
    56 0
  • ItemRestricted
    Disinformation Classification Using Transformer based Machine Learning
    (Howard University, 2024) alshaqi, Mohammed Al; Rawat, Danda B
    The proliferation of false information via social media has become an increasingly pressing problem. Digital means of communication and social media platforms facilitate the rapid spread of disinformation, which calls for the development of advanced techniques for identifying incorrect information. This dissertation endeavors to devise effective multimodal techniques for identifying fraudulent news, considering the noteworthy influence that deceptive stories have on society. The study proposes and evaluates multiple approaches, starting with a transformer-based model that uses word embeddings for accurate text classification. This model significantly outperforms baseline methods such as hybrid CNN and RNN, achieving higher accuracy. The dissertation also introduces a novel BERT-powered multimodal approach to fake news detection, combining textual data with extracted text from images to improve accuracy. By lever aging the strengths of the BERT-base-uncased model for text processing and integrating it with image text extraction via OCR, this approach calculates a confidence score indicating the likeli hood of news being real or fake. Rigorous training and evaluation show significant improvements in performance compared to state-of-the-art methods. Furthermore, the study explores the complexities of multimodal fake news detection, integrat ing text, images, and videos into a unified framework. By employing BERT for textual analysis and CNN for visual data, the multimodal approach demonstrates superior performance over traditional models in handling multiple media formats. Comprehensive evaluations using datasets such as ISOT and MediaEval 2016 confirm the robustness and adaptability of these methods in combating the spread of fake news. This dissertation contributes valuable insights to fake news detection, highlighting the effec tiveness of transformer-based models, emotion-aware classifiers, and multimodal frameworks. The findings provide robust solutions for detecting misinformation across diverse platforms and data types, offering a path forward for future research in this critical area.
    34 0
  • Thumbnail Image
    ItemRestricted
    Synonym-based Adversarial Attacks in Arabic Text Classification Systems
    (Clarkson University, 2024-05-21) Alshahrani, Norah Falah S; Matthews, Jeanna
    Text classification systems have been proven vulnerable to adversarial text examples, modified versions of the original text examples that are often unnoticed by human eyes, yet can force text classification models to alter their classification. Often, research works quantifying the impact of adversarial text attacks have been applied only to models trained in English. In this thesis, we introduce the first word-level study of adversarial attacks in Arabic. Specifically, we use a synonym (word-level) attack using a Masked Language Modeling (MLM) task with a BERT model in a black-box setting to assess the robustness of the state-of-the-art text classification models to adversarial attacks in Arabic. To evaluate the grammatical and semantic similarities of the newly produced adversarial examples using our synonym BERT-based attack, we invite four human evaluators to assess and compare the produced adversarial examples with their original examples. We also study the transferability of these newly produced Arabic adversarial examples to various models and investigate the effectiveness of defense mechanisms against these adversarial examples on the BERT models. We find that fine-tuned BERT models were more susceptible to our synonym attacks than the other Deep Neural Networks (DNN) models like WordCNN and WordLSTM we trained. We also find that fine-tuned BERT models were more susceptible to transferred attacks. We, lastly, find that fine-tuned BERT models successfully regain at least 2% in accuracy after applying adversarial training as an initial defense mechanism. We share our code scripts and trained models on GitHub at https://github.com/NorahAlshahrani/bert_synonym_attack.
    39 0
  • Thumbnail Image
    ItemRestricted
    EXPLORING LANGUAGE MODELS AND QUESTION ANSWERING IN BIOMEDICAL AND ARABIC DOMAINS
    (University of Delaware, 2024-05-10) Alrowili, Sultan; Shanker, K.Vijay
    Despite the success of the Transformer model and its variations (e.g., BERT, ALBERT, ELECTRA, T5) in addressing NLP tasks, similar success is not achieved when these models are applied to specific domains (e.g., biomedical) and limited-resources language (e.g., Arabic). This research addresses issues to overcome some challenges in the use of Transformer models to specialized domains and languages that lack in language processing resources. One of the reasons for reduced performance in limited domains might be due to the lack of quality contextual representations. We address this issue by adapting different types of language models and introducing five BioM-Transformer models for the biomedical domain and Funnel transformer and T5 models for the Arabic language. For each of our models, we present experiments for studying the impact of design factors (e.g., corpora and vocabulary domain, model-scale, architecture design) on performance and efficiency. Our evaluation of BioM-Transformer models shows that we obtain state-of-the-art results on several biomedical NLP tasks and achieved the top-performing models on the BLURB leaderboard. The evaluation of our small scale Arabic Funnel and T5 models shows that we achieve comparable performance while utilizing less computation compared to the fine tuning cost of existing Arabic models. Further, our base-scale Arabic language models extend state-of-the-art results on several Arabic NLP tasks while maintaining a comparable fine-tuning cost to existing base-scale models. Next, we focus on the question-answering task, specifically tackling issues in specialized domains and low-resource languages such as the limited size of question-answering datasets and limited topics coverage within them. We employ several methods to address these issues in the biomedical domain, including the employment of models adapted to the domain and Task-to-Task Transfer Learning. We evaluate the effectiveness of these methods at the BioASQ10 (2022) challenge, showing that we achieved the top-performing system on several batches of the BioASQ10 challenge. In Arabic, we address similar existing issues by introducing a novel approach to create question-answer-passage triplets, and propose a pipeline, Pair2Passage, to create large QA datasets. Using this method and the pipeline, we create the ArTrivia dataset, a new Arabic question-answering dataset comprising more than +10,000 high-quality question-answer-passage triplets. We presented a quantitative and qualitative analysis of ArTrivia that shows the importance of some often overlooked yet important components, such as answer normalization in enhancing the quality of the question-answer dataset and future annotation. In addition, our evaluation shows the ability of ArTrivia to build a question-answering model that can address the out-of-distribution issue in existing Arabic QA datasets.
    22 0
  • Thumbnail Image
    ItemRestricted
    Improving vulnerability description using natural language generation
    (Saudi Digital Library, 2023-10-25) Althebeiti, Hattan; Mohaisen, David
    Software plays an integral role in powering numerous everyday computing gadgets. As our reliance on software continues to grow, so does the prevalence of software vulnerabilities, with significant implications for organizations and users. As such, documenting vulnerabilities and tracking their development becomes crucial. Vulnerability databases addressed this issue by storing a record with various attributes for each discovered vulnerability. However, their contents suffer several drawbacks, which we address in our work. In this dissertation, we investigate the weaknesses associated with vulnerability descriptions in public repositories and alleviate such weaknesses through Natural Language Processing (NLP) approaches. The first contribution examines vulnerability descriptions in those databases and approaches to improve them. We propose a new automated method leveraging external sources to enrich the scope and context of a vulnerability description. Moreover, we exploit fine-tuned pretrained language models for normalizing the resulting description. The second contribution investigates the need for uniform and normalized structure in vulnerability descriptions. We address this need by breaking the description of a vulnerability into multiple constituents and developing a multi-task model to create a new uniform and normalized summary that maintains the necessary attributes of the vulnerability using the extracted features while ensuring a consistent vulnerability description. Our method proved effective in generating new summaries with the same structure across a collection of various vulnerability descriptions and types. Our final contribution investigates the feasibility of assigning the Common Weakness Enumeration (CWE) attribute to a vulnerability based on its description. CWE offers a comprehensive framework that categorizes similar exposures into classes, representing the types of exploitation associated with such vulnerabilities. Our approach utilizing pre-trained language models is shown to outperform Large Language Model (LLM) for this task. Overall, this dissertation provides various technical approaches exploiting advances in NLP to improve publicly available vulnerability databases.
    10 0

Copyright owned by the Saudi Digital Library (SDL) © 2025