GPT-4 attempting to attack AI-text detectors

dc.contributor.advisorLin, Yuhao
dc.contributor.authorAlshehri, Nojoud
dc.date.accessioned2024-07-17T07:06:44Z
dc.date.available2024-07-17T07:06:44Z
dc.date.issued2024-07-10
dc.description.abstractRecent large language models (LLMs) generate machine content across a wide range of channels, including news, social media, and educational frameworks. The significant challenge of differentiating between AI-generated content and the content written by humans raised the potential misuse of LLMs. Academic integrity risks have become a growing concern due to the potential utilisation of these models in completing assignments and writing essays. There-fore, many detection tools have been developed to identify AI-generated and hu-man-generated texts. The effectiveness of these tools against attack strategies and adversarial perturbations has not been adequately validated, specifically in the context of student essay writing. In this work, we aim to utilize GPT-4 model to apply a series of perturbations to an essay generated originally by GPT-4 in order to confuse three AI detectors: GPTZero, DetectGPT, and ZeroGPT. The pro-posed attack technique produces a text as an adversarial sample used to examine the effect on the detection accuracy of AI detectors. The results demonstrate that utilizing GPT-4 to rephrase and apply perturbation at the sentence and word level is able to confuse the detection models and reduce their prediction probabilities. Moreover, the final essay, after applying the series of perturbations, maintains a reasonable amount of both writing quality and semantic similarity with the orig-inal GPT-generated essay. This project will provide insights for further improve-ments to increase the robustness of AI detectors and future AI-generated text classification studies.
dc.format.extent19
dc.identifier.urihttps://hdl.handle.net/20.500.14154/72610
dc.language.isoen
dc.publisherUniversity of Adelaide
dc.subjectLLM
dc.subjectAI-generated text
dc.subjectAI-text detectors.
dc.titleGPT-4 attempting to attack AI-text detectors
dc.typeThesis
sdl.degree.departmentEngneering and Technology
sdl.degree.disciplineComputer Science
sdl.degree.grantorUniversity of Adelaide
sdl.degree.nameMaster of Science
sdl.thesis.sourceSACM - Australia

Files

Collections

Copyright owned by the Saudi Digital Library (SDL) © 2024