AI GENERATED TEXT VS. HUMAN GENERATED TEXT
No Thumbnail Available
Date
2024-09
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of East Anglia
Abstract
The ability to distinguish between AI-generated and human-generated texts is becom-
ing increasingly critical as AI technologies advance. This dissertation explores the
development and evaluation of various machine learning models to accurately classify
text as either AI-generated or human-generated. The research aims to identify the
most effective classification techniques and preprocessing methods to enhance model
performance and generalization across different text datasets.
A range of machine learning and deep learning models, including Support Vec-
tor Machine (SVM), Random Forest, Logistic Regression, Decision Tree, BERT, and
LSTM, were employed to evaluate their effectiveness in distinguishing between the two
types of texts. The study utilized a balanced and representative dataset through data
sampling and augmentation techniques. Key preprocessing steps were implemented to
refine the input data, and hyperparameter tuning was conducted to optimize model
performance. The generalization capabilities of the models were further tested on
additional datasets with varying text characteristics.
The findings revealed that SVM and Random Forest models achieved the highest
accuracy and reliability in classifying texts, demonstrating strong performance across
multiple evaluation metrics. In contrast, deep learning models like BERT and LSTM
were less effective under the given conditions, suggesting a need for more extensive
datasets and computational resources to leverage their full potential. These results
highlight the strengths and limitations of different approaches to text classification,
providing a foundation for future research to enhance AI detection in diverse applications.
Description
Keywords
Artificial intelligence, AI, Data Mining, Text Classification, Human Text, AI Text