Detecting LLM Generated Phishing Emails Using Machine Learning: A Multi-Classification Approach And A Comprehensive Evaluation
No Thumbnail Available
Date
2024-09
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Birmingham
Abstract
Phishing is a significant cybersecurity threat that targets organisations as well as individuals. The aim of this project is to provide a comprehensive machine learning model that can accurately detect LLM generated phishing with high accuracy from a dataset of four different classes of emails: LLM phishing, LLM non-phishing, Human phishing and Human non-phishing.
This balanced and diverse dataset of 4000 emails acts as a real-world representation of the different types of emails that are sent daily that include different distinct features, allowing for an accurate feature differentiation from the classes of the dataset. The five machine learning algorithms that were used for this research are: Decision Tree, Support Vector Machine (SVM), Random Forest, Gradient Boost and K-Nearest Neighbours (KNN). These algorithms were tuned to evaluate the performance of the models after hyperparameter tuning. The highest accuracy achieved from the model before tuning was the SVM with an accuracy of 97.3%. The subsequent highly accurate models are Random Forest of 96.9%, KNN of 96.8% and Gradient Boosting of 96.7%. The model that achieved the lowest accuracy was Decision Tree, achieving an accuracy of 90.7%. Hyperparameter tuning was applied to models and the performance was re-evaluated to investigate if hyperparameter tuning enhanced the performance of the models. Other metrics such as precision, recall and F1-score were also measured.
The developed and trained models were then integrated with a web page developed using streamlit for a user-friendly interface for the classifications of the emails. Overall, this research aims to provide a framework for detecting LLM phishing emails. The results of this research signify that with the correct methodologies, we can enhance the detection of LLM generated phishing, contributing to robust defences against emerging cyber threats.
Description
Keywords
Cyber Crime, Large Language Model, Phishing, LLM Phishing, Model Training, Machine Learning, Phishing Vs Legitimate
Citation
Harvard