Metadata-Centric Cybersecurity Classification: A Fair Benchmark of LLMs and Classical Models

Binothman, Elyas

Metadata-Centric Cybersecurity Classification: A Fair Benchmark of LLMs and Classical Models

dc.contributor.advisor	Chaudhry, Umair Bilal
dc.contributor.author	Binothman, Elyas
dc.date.accessioned	2025-11-18T15:39:11Z
dc.date.issued	2025
dc.description.abstract	Cybersecurity breach classification supports triage and risk response but is hindered by heterogeneous reporting, class imbalance, and limited semantic coverage in traditional pipelines. Prior work has relied on rule-based heuristics and classical models (SVM, Random Forest) with heavy feature engineering, while recent LLM studies rarely evaluate breach metadata under identical, fair splits; severity labels are often absent or not reproducibly constructed. We present a metadata-centric benchmark on the Privacy Rights Clearinghouse chronology spanning two tasks: breach-type classification and severity tiering in three and five labels, with severity derived reproducibly from native fields using a Breach Level Index style mapping. All models share one preprocessing recipe and a single stratified 80/20 train–test split. We compare parameter-efficient transformers (DistilBERT and T5 with LoRA) against tuned tabular baselines (Linear SVM, Random Forest, compact ANN). On breach type, DistilBERT achieves the strongest results (Accuracy 0.943; Macro– F1 0.840), surpassing tabular baselines. For severity, a classweighted ANN on TF–IDF and categorical features attains the highest Macro–F1 at both granularities, while T5 shows high accuracy but low Macro–F1, indicating majority-class bias. The study contributes a unified PRC schema with transparent severity construction, a fair head-to-head comparison under identical conditions, and an efficiency-oriented training recipe suitable for modest hardware.
dc.format.extent	12
dc.identifier.uri	https://hdl.handle.net/20.500.14154/77035
dc.language.iso	en
dc.publisher	Saudi Digital Library
dc.subject	cybersecurity
dc.subject	data breaches
dc.subject	metadata
dc.subject	breach type classification
dc.subject	severity classification
dc.subject	Privacy Rights Clearinghouse
dc.subject	parameter-efficient fine-tuning
dc.subject	LoRA
dc.subject	multi-model benchmarking
dc.subject	Artificial Intelligence
dc.subject	LLM
dc.subject	fine-tuning
dc.subject	Neural Networks
dc.title	Metadata-Centric Cybersecurity Classification: A Fair Benchmark of LLMs and Classical Models
dc.type	Thesis
sdl.degree.department	School of Electronic Engineering and Computer Science - Department of Computer Science
sdl.degree.discipline	Cybersecurity
sdl.degree.grantor	Queen Mary University of London
sdl.degree.name	Master of Science with Distinction in Artificial Intelligence

Files

Original bundle

Now showing 1 - 1 of 1

Name:: SACM-Dissertation.pdf
Size:: 1.96 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.61 KB
Format:: Item-specific license agreed to upon submission
Description:

Download

Collections

SACM - United Kingdom