GPT-4 attempting to attack AI-text detectors

Alshehri, Nojoud

GPT-4 attempting to attack AI-text detectors

dc.contributor.advisor	Lin, Yuhao
dc.contributor.author	Alshehri, Nojoud
dc.date.accessioned	2024-07-17T07:06:44Z
dc.date.available	2024-07-17T07:06:44Z
dc.date.issued	2024-07-10
dc.description.abstract	Recent large language models (LLMs) generate machine content across a wide range of channels, including news, social media, and educational frameworks. The significant challenge of differentiating between AI-generated content and the content written by humans raised the potential misuse of LLMs. Academic integrity risks have become a growing concern due to the potential utilisation of these models in completing assignments and writing essays. There-fore, many detection tools have been developed to identify AI-generated and hu-man-generated texts. The effectiveness of these tools against attack strategies and adversarial perturbations has not been adequately validated, specifically in the context of student essay writing. In this work, we aim to utilize GPT-4 model to apply a series of perturbations to an essay generated originally by GPT-4 in order to confuse three AI detectors: GPTZero, DetectGPT, and ZeroGPT. The pro-posed attack technique produces a text as an adversarial sample used to examine the effect on the detection accuracy of AI detectors. The results demonstrate that utilizing GPT-4 to rephrase and apply perturbation at the sentence and word level is able to confuse the detection models and reduce their prediction probabilities. Moreover, the final essay, after applying the series of perturbations, maintains a reasonable amount of both writing quality and semantic similarity with the orig-inal GPT-generated essay. This project will provide insights for further improve-ments to increase the robustness of AI detectors and future AI-generated text classification studies.
dc.format.extent	19
dc.identifier.uri	https://hdl.handle.net/20.500.14154/72610
dc.language.iso	en
dc.publisher	University of Adelaide
dc.subject	LLM
dc.subject	AI-generated text
dc.subject	AI-text detectors.
dc.title	GPT-4 attempting to attack AI-text detectors
dc.type	Thesis
sdl.degree.department	Engneering and Technology
sdl.degree.discipline	Computer Science
sdl.degree.grantor	University of Adelaide
sdl.degree.name	Master of Science
sdl.thesis.source	SACM - Australia

Collections

SACM - Australia

GPT-4 attempting to attack AI-text detectors

Files

Collections