An NLP-Driven Framework for Business Email Compromise Detection and Authorship Verifcation

dc.contributor.advisorAlHashimy, Nawfal
dc.contributor.advisorKang, BooJoong
dc.contributor.authorAlmutairi, Amirah
dc.date.accessioned2025-10-13T09:06:53Z
dc.date.issued2025
dc.description.abstractBusiness Email Compromise (BEC) presents a critical cybersecurity threat, leveraging linguistic impersonation and social engineering rather than traditional malicious payloads. These attacks routinely evade conventional flters by mimicking legitimate communication styles and exploiting trusted identities. This thesis explores content-based detection strategies for BEC using a sequence of natural language processing (NLP) models. First, it proposes a transformer-based classifer to detect semantic indicators of deception in email body text. Second, it develops a Siamese authorship verifcation (AV) model that captures stylistic consistency, even under adversarial mimicry. These components are unifed within a multi-task learning (MTL) framework that simultaneously optimizes for BEC detection and AV by sharing underlying representations while preserving task-specifc objectives. To support empirical evaluation, a structured taxonomy of BEC fraud is introduced, and a synthetic email dataset is generated through prompt-guided language model fne-tuning and human validation. Experiments on combined real and synthetic corpora demonstrate that the MTL model achieves up to 97% F1-score in BEC detection and 93% in AV, outperforming transfer learning baseline while reducing false positives and computational overhead. This work contributes a principled, modular, and extensible framework for enhancing email security through joint semantic and stylistic analysis, addressing gaps in current defenses against sophisticated impersonation attacks.
dc.format.extent127
dc.identifier.urihttps://hdl.handle.net/20.500.14154/76619
dc.language.isoen
dc.publisherSaudi Digital Library
dc.subjectEmail security
dc.subjectAuthorship verification
dc.subjectStylometry
dc.subjectNatural language processing
dc.subjectTransformer models
dc.subjectBERT
dc.subjectDistilBERT
dc.subjectBiLSTM
dc.subjectSiamese networks
dc.subjectMulti-task learning
dc.subjectSynthetic datasets
dc.subjectPhishing detection
dc.subjectImpersonation attacks
dc.subjectCybersecurity
dc.titleAn NLP-Driven Framework for Business Email Compromise Detection and Authorship Verifcation
dc.typeThesis
sdl.degree.departmentSchool of Electronics and Computer Science
sdl.degree.disciplineComputer Science (Cyber Security)
sdl.degree.grantorUniversity of Southampton
sdl.degree.nameDoctor of Philosophy (PhD)

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
SACM-Dissertation.pdf
Size:
3.68 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed to upon submission
Description:

Copyright owned by the Saudi Digital Library (SDL) © 2025