Analysing Large-Scale Attacks in IoT Environments using ML/DL

No Thumbnail Available

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Saudi Digital Library

Abstract

The fine-grained classification of malicious network traffic presents a significant and persistent challenge in cybersecurity, primarily due to the extreme class imbalance inherent in real-world network data. Conventional machine learning approaches, which apply a single, unitary model to the problem, have demonstrated limited success, often failing to effectively identify rare but critical minority attack classes. This dissertation argues that the conventional model paradigm is fundamentally flawed for this problem space and proposes a hierarchical, multi-stage classification framework as a more robust alternative. This research presents a comprehensive, multi-faceted investigation into this problem, using the 34-class CICIoT2023 dataset as a benchmark. The study was conducted across four distinct experimental paths, comparing two ensemble methods (XGBoost and Random Forest) and two class-handling strategies (a "Grouped" approach that manually merges similar classes and an "Ungrouped" approach that tackles all 34 classes directly). Within this structure, we designed and implemented a 4-tier hierarchical framework that employs a "divide and conquer" strategy, using an initial classifier to handle majority traffic and a class-level routing mechanism to delegate ambiguous samples to specialised recovery tiers. An adaptive resampling strategy was deployed within these tiers, concentrating aggressive SMOTE only where required. The empirical results provide a holistic validation of the proposed architecture. The optimal configuration—an Ungrouped, XGBoost-led hierarchical framework—achieved a final accuracy of 0.9228 and Macro-F1 score of 0.7948, a substantial improvement over all other experimental paths and conventional baselines. More significantly, this approach demonstrated a more than 800% increase in the F1-score for some of the under-represented minority classes. The analysis also revealed a key architectural principle: classifier performance is role-dependent, with different ensemble methods excelling in different roles within the hierarchy, highlighting the importance of managing the bias-variance trade-off at a systemic level. Finally, this work provides a rigorous, data-centric analysis that distinguishes between model limitations and the inherent limitations of the dataset, identifying a "dataset-induced ceiling" on performance for 5 of the 34 classes. The primary contribution of this dissertation is, therefore, a methodologically robust and architecturally novel framework, validated through a comprehensive, multi-path experimental design. The principles of hierarchical decomposition and adaptive resource allocation are domain-agnostic and offer a promising direction for future research into extreme imbalance problems.

Description

Keywords

IoT Attack, Cybersecurity, Cyber Attack, Machine Learning, Deep Learning, Data Analysis

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2026