An Exploration of Word Embedding Models for Phishing Email Detection

Alghamdi, Rawan

An Exploration of Word Embedding Models for Phishing Email Detection

Date

2023-09-21

Authors

Alghamdi, Rawan

Publisher

University of Southampton

Abstract

Phishing emails are dangerous cyberattacks that attackers use to steal information. Manual solutions such as blacklists can be used to detect phishing emails. However, The emergence of machine learning solutions has made phishing email detection faster and easier. This study explored and compared the performance of three deep learning models for detecting text-based phishing emails. The models used different word embedding techniques: Word2Vec, FastText, and GloVe. All three models used a Long Short-Term Memory (LSTM) classifier. Two publicly available datasets were merged to create a balanced dataset of phishing and legitimate emails using only the body text of the emails, excluding the header. The first dataset is the Fraudulent E-mail Corpus - Nigerian Letter or ”419” Fraud, which contains phishing emails. The second dataset is the Enron Email Dataset, which contains legitimate emails. The Word2Vec- LSTM model achieved the best performance, with an F1 score of 98.62% and an accuracy of 98.62%. The FastText-LSTM also performed well, but its performance was slightly lower than the Word2Vec-LSTM model, with an F1 score of 95.73% and an accuracy of 95.73%. The GloVe-LSTM model performed poorly, with an F1 score of 55.79% and an accuracy of 60.53%. We therefore conclude that using different embedding techniques with the same classifier can result in different performances for detecting and classifying phishing and legitimate emails.

Keywords

data science, machine learning, AI, phishing emails, deep learning

URI

https://hdl.handle.net/20.500.14154/72288

Collections

SACM - United Kingdom

Full item page

An Exploration of Word Embedding Models for Phishing Email Detection

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By