Saudi Cultural Missions Theses & Dissertations

Permanent URI for this communityhttps://drepo.sdl.edu.sa/handle/20.500.14154/10

Browse

Search Results

Now showing 1 - 3 of 3
  • ItemRestricted
    AI GENERATED TEXT VS. HUMAN GENERATED TEXT
    (University of East Anglia, 2024-09) Hadi, Nedaa; Misri, Kazhan
    The ability to distinguish between AI-generated and human-generated texts is becom- ing increasingly critical as AI technologies advance. This dissertation explores the development and evaluation of various machine learning models to accurately classify text as either AI-generated or human-generated. The research aims to identify the most effective classification techniques and preprocessing methods to enhance model performance and generalization across different text datasets. A range of machine learning and deep learning models, including Support Vec- tor Machine (SVM), Random Forest, Logistic Regression, Decision Tree, BERT, and LSTM, were employed to evaluate their effectiveness in distinguishing between the two types of texts. The study utilized a balanced and representative dataset through data sampling and augmentation techniques. Key preprocessing steps were implemented to refine the input data, and hyperparameter tuning was conducted to optimize model performance. The generalization capabilities of the models were further tested on additional datasets with varying text characteristics. The findings revealed that SVM and Random Forest models achieved the highest accuracy and reliability in classifying texts, demonstrating strong performance across multiple evaluation metrics. In contrast, deep learning models like BERT and LSTM were less effective under the given conditions, suggesting a need for more extensive datasets and computational resources to leverage their full potential. These results highlight the strengths and limitations of different approaches to text classification, providing a foundation for future research to enhance AI detection in diverse applications.
    21 0
  • Thumbnail Image
    ItemRestricted
    Adapting to Change: The Temporal Performance of Text Classifiers in the Context of Temporally Evolving Data
    (Queen Mary University of London, 2024-07-08) Alkhalifa, Rabab; Zubiaga, Arkaitz
    This thesis delves into the evolving landscape of NLP, particularly focusing on the temporal persistence of text classifiers amid the dynamic nature of language use. The primary objective is to understand how changes in language patterns over time impact the performance of text classification models and to develop methodologies for maintaining their effectiveness. The research begins by establishing a theoretical foundation for text classification and temporal data analysis, highlighting the challenges posed by the evolving use of language and its implications for NLP models. A detailed exploration of various datasets, including the stance detection and sentiment analysis datasets, sets the stage for examining these dynamics. The characteristics of the datasets, such as linguistic variations and temporal vocabulary growth, are carefully examined to understand their influence on the performance of the text classifier. A series of experiments are conducted to evaluate the performance of text classifiers across different temporal scenarios. The findings reveal a general trend of performance degradation over time, emphasizing the need for classifiers that can adapt to linguistic changes. The experiments assess models' ability to estimate past and future performance based on their current efficacy and linguistic dataset characteristics, leading to valuable insights into the factors influencing model longevity. Innovative solutions are proposed to address the observed performance decline and adapt to temporal changes in language use over time. These include incorporating temporal information into word embeddings and comparing various methods across temporal gaps. The Incremental Temporal Alignment (ITA) method emerges as a significant contributor to enhancing classifier performance in same-period experiments, although it faces challenges in maintaining effectiveness over longer temporal gaps. Furthermore, the exploration of machine learning and statistical methods highlights their potential to maintain classifier accuracy in the face of longitudinally evolving data. The thesis culminates in a shared task evaluation, where participant-submitted models are compared against baseline models to assess their classifiers' temporal persistence. This comparison provides a comprehensive understanding of the short-term, long-term, and overall persistence of their models, providing valuable information to the field. The research identifies several future directions, including interdisciplinary approaches that integrate linguistics and sociology, tracking textual shifts on online platforms, extending the analysis to other classification tasks, and investigating the ethical implications of evolving language in NLP applications. This thesis contributes to the NLP field by highlighting the importance of evaluating text classifiers' temporal persistence and offering methodologies to enhance their sustainability in dynamically evolving language environments. The findings and proposed approaches pave the way for future research, aiming at the development of more robust, reliable, and temporally persistent text classification models.
    20 0
  • Thumbnail Image
    ItemRestricted
    Synonym-based Adversarial Attacks in Arabic Text Classification Systems
    (Clarkson University, 2024-05-21) Alshahrani, Norah Falah S; Matthews, Jeanna
    Text classification systems have been proven vulnerable to adversarial text examples, modified versions of the original text examples that are often unnoticed by human eyes, yet can force text classification models to alter their classification. Often, research works quantifying the impact of adversarial text attacks have been applied only to models trained in English. In this thesis, we introduce the first word-level study of adversarial attacks in Arabic. Specifically, we use a synonym (word-level) attack using a Masked Language Modeling (MLM) task with a BERT model in a black-box setting to assess the robustness of the state-of-the-art text classification models to adversarial attacks in Arabic. To evaluate the grammatical and semantic similarities of the newly produced adversarial examples using our synonym BERT-based attack, we invite four human evaluators to assess and compare the produced adversarial examples with their original examples. We also study the transferability of these newly produced Arabic adversarial examples to various models and investigate the effectiveness of defense mechanisms against these adversarial examples on the BERT models. We find that fine-tuned BERT models were more susceptible to our synonym attacks than the other Deep Neural Networks (DNN) models like WordCNN and WordLSTM we trained. We also find that fine-tuned BERT models were more susceptible to transferred attacks. We, lastly, find that fine-tuned BERT models successfully regain at least 2% in accuracy after applying adversarial training as an initial defense mechanism. We share our code scripts and trained models on GitHub at https://github.com/NorahAlshahrani/bert_synonym_attack.
    37 0

Copyright owned by the Saudi Digital Library (SDL) © 2025