Detecting LLM Generated Phishing Emails Using Machine Learning: A Multi-Classification Approach And A Comprehensive Evaluation

dc.contributor.advisorAndriotis, Panagiotis
dc.contributor.authorAlharthi, Alanoud
dc.date.accessioned2024-11-28T12:32:06Z
dc.date.issued2024-09
dc.description.abstractPhishing is a significant cybersecurity threat that targets organisations as well as individuals. The aim of this project is to provide a comprehensive machine learning model that can accurately detect LLM generated phishing with high accuracy from a dataset of four different classes of emails: LLM phishing, LLM non-phishing, Human phishing and Human non-phishing. This balanced and diverse dataset of 4000 emails acts as a real-world representation of the different types of emails that are sent daily that include different distinct features, allowing for an accurate feature differentiation from the classes of the dataset. The five machine learning algorithms that were used for this research are: Decision Tree, Support Vector Machine (SVM), Random Forest, Gradient Boost and K-Nearest Neighbours (KNN). These algorithms were tuned to evaluate the performance of the models after hyperparameter tuning. The highest accuracy achieved from the model before tuning was the SVM with an accuracy of 97.3%. The subsequent highly accurate models are Random Forest of 96.9%, KNN of 96.8% and Gradient Boosting of 96.7%. The model that achieved the lowest accuracy was Decision Tree, achieving an accuracy of 90.7%. Hyperparameter tuning was applied to models and the performance was re-evaluated to investigate if hyperparameter tuning enhanced the performance of the models. Other metrics such as precision, recall and F1-score were also measured. The developed and trained models were then integrated with a web page developed using streamlit for a user-friendly interface for the classifications of the emails. Overall, this research aims to provide a framework for detecting LLM phishing emails. The results of this research signify that with the correct methodologies, we can enhance the detection of LLM generated phishing, contributing to robust defences against emerging cyber threats.
dc.format.extent77
dc.identifier.citationHarvard
dc.identifier.urihttps://hdl.handle.net/20.500.14154/73880
dc.language.isoen
dc.publisherUniversity of Birmingham
dc.subjectCyber Crime
dc.subjectLarge Language Model
dc.subjectPhishing
dc.subjectLLM Phishing
dc.subjectModel Training
dc.subjectMachine Learning
dc.subjectPhishing Vs Legitimate
dc.titleDetecting LLM Generated Phishing Emails Using Machine Learning: A Multi-Classification Approach And A Comprehensive Evaluation
dc.typeThesis
sdl.degree.departmentComputer Science
sdl.degree.disciplineCyber Security
sdl.degree.grantorUniversity of Birmingham
sdl.degree.nameMaster of Science

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
SACM-Dissertation.pdf
Size:
2.91 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed to upon submission
Description:

Copyright owned by the Saudi Digital Library (SDL) © 2024