Detecting LLM Generated Phishing Emails Using Machine Learning: A Multi-Classification Approach And A Comprehensive Evaluation

Alharthi, Alanoud

Detecting LLM Generated Phishing Emails Using Machine Learning: A Multi-Classification Approach And A Comprehensive Evaluation

Files

SACM-Dissertation.pdf (2.91 MB)

Date

2024-09

Authors

Alharthi, Alanoud

Publisher

University of Birmingham

Abstract

Phishing is a significant cybersecurity threat that targets organisations as well as individuals. The aim of this project is to provide a comprehensive machine learning model that can accurately detect LLM generated phishing with high accuracy from a dataset of four different classes of emails: LLM phishing, LLM non-phishing, Human phishing and Human non-phishing. This balanced and diverse dataset of 4000 emails acts as a real-world representation of the different types of emails that are sent daily that include different distinct features, allowing for an accurate feature differentiation from the classes of the dataset. The five machine learning algorithms that were used for this research are: Decision Tree, Support Vector Machine (SVM), Random Forest, Gradient Boost and K-Nearest Neighbours (KNN). These algorithms were tuned to evaluate the performance of the models after hyperparameter tuning. The highest accuracy achieved from the model before tuning was the SVM with an accuracy of 97.3%. The subsequent highly accurate models are Random Forest of 96.9%, KNN of 96.8% and Gradient Boosting of 96.7%. The model that achieved the lowest accuracy was Decision Tree, achieving an accuracy of 90.7%. Hyperparameter tuning was applied to models and the performance was re-evaluated to investigate if hyperparameter tuning enhanced the performance of the models. Other metrics such as precision, recall and F1-score were also measured. The developed and trained models were then integrated with a web page developed using streamlit for a user-friendly interface for the classifications of the emails. Overall, this research aims to provide a framework for detecting LLM phishing emails. The results of this research signify that with the correct methodologies, we can enhance the detection of LLM generated phishing, contributing to robust defences against emerging cyber threats.

Keywords

Cyber Crime, Large Language Model, Phishing, LLM Phishing, Model Training, Machine Learning, Phishing Vs Legitimate

Citation

Harvard

URI

https://hdl.handle.net/20.500.14154/73880

Collections

SACM - United Kingdom

Full item page

Detecting LLM Generated Phishing Emails Using Machine Learning: A Multi-Classification Approach And A Comprehensive Evaluation

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By