An NLP-Driven Framework for Business Email Compromise Detection and Authorship Verifcation

Almutairi, Amirah

An NLP-Driven Framework for Business Email Compromise Detection and Authorship Verifcation

Files

Primary SACM-Dissertation.pdf (3.68 MB)

Date

2025

Authors

Almutairi, Amirah

Publisher

Saudi Digital Library

Abstract

Business Email Compromise (BEC) presents a critical cybersecurity threat, leveraging linguistic impersonation and social engineering rather than traditional malicious payloads. These attacks routinely evade conventional flters by mimicking legitimate communication styles and exploiting trusted identities. This thesis explores content-based detection strategies for BEC using a sequence of natural language processing (NLP) models. First, it proposes a transformer-based classifer to detect semantic indicators of deception in email body text. Second, it develops a Siamese authorship verifcation (AV) model that captures stylistic consistency, even under adversarial mimicry. These components are unifed within a multi-task learning (MTL) framework that simultaneously optimizes for BEC detection and AV by sharing underlying representations while preserving task-specifc objectives. To support empirical evaluation, a structured taxonomy of BEC fraud is introduced, and a synthetic email dataset is generated through prompt-guided language model fne-tuning and human validation. Experiments on combined real and synthetic corpora demonstrate that the MTL model achieves up to 97% F1-score in BEC detection and 93% in AV, outperforming transfer learning baseline while reducing false positives and computational overhead. This work contributes a principled, modular, and extensible framework for enhancing email security through joint semantic and stylistic analysis, addressing gaps in current defenses against sophisticated impersonation attacks.

Keywords

Email security, Authorship verification, Stylometry, Natural language processing, Transformer models, BERT, DistilBERT, BiLSTM, Siamese networks, Multi-task learning, Synthetic datasets, Phishing detection, Impersonation attacks, Cybersecurity

URI

https://hdl.handle.net/20.500.14154/76619

Collections

SACM - United Kingdom

Full item page

An NLP-Driven Framework for Business Email Compromise Detection and Authorship Verifcation

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By