EXPLORING THE TRANSFERABILITY OF ADVERSARIAL EXAMPLES IN NATURAL LANGUAGE PROCESSING

Allahyani, Samah

EXPLORING THE TRANSFERABILITY OF ADVERSARIAL EXAMPLES IN NATURAL LANGUAGE PROCESSING

Date

2024-06-21

Authors

Allahyani, Samah

Publisher

Texas A&M University-Kingsville

Abstract

In recent years, there has been a growing concern about the vulnerability of machine learning models, particularly in the field of natural language processing (NLP). Many tasks in natural language processing, such as text classification, machine translation, and question answering, are at risk of adversarial attacks where maliciously crafted inputs can cause them to make incorrect predictions or classifications. Adversarial examples created on one model can also fool another model. The transferability of adversarial has also garnered significant attention as it is a crucial property for facilitating black-box attacks. In our comprehensive research, we employed an array of widely used NLP models for sentiment analysis and text classification tasks. We first generated adversarial examples for a set of source models, using five state-of-the-art attack methods. We then evaluated the transferability of these adversarial examples by testing their effectiveness on different target models, to explore the main factors such as model architecture, dataset characteristics and the perturbation techniques impacting transferability. Moreover, we extended our investigation by delving into transferability-enhancing techniques. We assisted two transferability-enhancing methods and leveraged the power of Large Language Models (LLM) to generate natural adversarial examples that show a moderate transferability across different NLP architecture. Through our research, we aim to provide insights into the transferability of adversarial examples in NLP, and shed light on the factors that contribute to their transferability. This knowledge can then be used to develop more robust, and resilient, NLP models that are less susceptible to adversarial attacks; ultimately, enhancing the security and reliability of these systems in various applications.

Keywords

Machine Learning, Natural Language Processing (NLP), Adversarial Attacks, Black-Box Attacks, Adversarial Examples Transferability, Large Language Models (LLM)

URI

https://hdl.handle.net/20.500.14154/72406

Collections

SACM - United States of America

Full item page

EXPLORING THE TRANSFERABILITY OF ADVERSARIAL EXAMPLES IN NATURAL LANGUAGE PROCESSING

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By