EXPLORING THE TRANSFERABILITY OF ADVERSARIAL EXAMPLES IN NATURAL LANGUAGE PROCESSING
Date
2024-06-21
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Texas A&M University-Kingsville
Abstract
In recent years, there has been a growing concern about the vulnerability of machine learning models, particularly in the field of natural language processing (NLP). Many tasks in natural language processing, such as text classification, machine translation, and question answering, are at risk of adversarial attacks where maliciously crafted inputs can cause them to make incorrect predictions or classifications.
Adversarial examples created on one model can also fool another model. The transferability of adversarial has also garnered significant attention as it is a crucial property for facilitating black-box attacks.
In our comprehensive research, we employed an array of widely used NLP models for sentiment analysis and text classification tasks. We first generated adversarial examples for a set of source models, using five state-of-the-art attack methods. We then evaluated the transferability of these adversarial examples by testing their effectiveness on different target models, to explore the main factors such as model architecture, dataset characteristics and the perturbation techniques impacting transferability.
Moreover, we extended our investigation by delving into transferability-enhancing techniques. We assisted two transferability-enhancing methods and leveraged the power of Large Language Models (LLM) to generate natural adversarial examples that show a moderate transferability across different NLP architecture.
Through our research, we aim to provide insights into the transferability of adversarial examples in NLP, and shed light on the factors that contribute to their transferability. This knowledge can then be used to develop more robust, and resilient, NLP models that are less susceptible to adversarial attacks; ultimately, enhancing the security and reliability of these systems in various applications.
Description
Keywords
Machine Learning, Natural Language Processing (NLP), Adversarial Attacks, Black-Box Attacks, Adversarial Examples Transferability, Large Language Models (LLM)