An NLP-Driven Framework for Business Email Compromise Detection and Authorship Verifcation
No Thumbnail Available
Date
2025
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Saudi Digital Library
Abstract
Business Email Compromise (BEC) presents a critical cybersecurity threat, leveraging linguistic impersonation and social engineering rather than traditional malicious payloads. These attacks routinely evade conventional flters by mimicking legitimate communication styles and exploiting trusted identities. This thesis explores content-based detection strategies for BEC using a sequence of natural language processing (NLP) models. First, it proposes a transformer-based classifer to detect semantic indicators of deception in email body text. Second, it develops a Siamese authorship verifcation (AV) model that captures stylistic consistency, even under adversarial mimicry. These components are unifed within a multi-task learning (MTL) framework that simultaneously optimizes for BEC detection and AV by sharing underlying representations while preserving task-specifc objectives. To support empirical evaluation, a structured taxonomy of BEC fraud is introduced, and a synthetic email dataset is generated through prompt-guided language model fne-tuning and human validation. Experiments on combined real and synthetic corpora demonstrate that the MTL model achieves up to 97% F1-score in BEC detection and 93% in AV, outperforming transfer learning baseline while reducing false positives and computational overhead. This work contributes a principled, modular, and extensible framework for enhancing email security through joint semantic and stylistic analysis, addressing gaps in current defenses against sophisticated impersonation attacks.
Description
Keywords
Email security, Authorship verification, Stylometry, Natural language processing, Transformer models, BERT, DistilBERT, BiLSTM, Siamese networks, Multi-task learning, Synthetic datasets, Phishing detection, Impersonation attacks, Cybersecurity