Enhancing Insider Threat Detection Using Machine Learning: Addressing Data Imbalance with Logistic Regression, Decision Tree, and XGBoost
| dc.contributor.advisor | JAOTOMBO, Franck | |
| dc.contributor.author | Althobaiti, Huda | |
| dc.date.accessioned | 2026-03-17T14:39:20Z | |
| dc.date.issued | 2024 | |
| dc.description.abstract | This thesis investigates the application of machine learning techniques to enhance the detection of insider threats using the BETH dataset, characterized by its inherent imbalance. Insider threats involve malicious activities or errors by authorized individuals, posing significant risks to organizations. The research evaluates three machine learning models: Logistic Regression (LR), Decision Tree (DT), and XGBoost, chosen for their strengths in handling imbalanced data, interpretability, and performance. The study addresses class imbalance challenges, mirroring real-world cybersecurity scenarios where malicious activities are outnumbered by normal activities. Two approaches were tested: upsampling the minority class using SMOTE and downsampling the majority class. However, training models on the imbalanced dataset provided a more realistic performance evaluation. The models were evaluated using metrics sensitive to class distribution, such as the Area Under the Curve (AUC) for the Receiver Operating Characteristic (ROC) and the Precision-Recall Curve (PRC). These metrics offer a comprehensive view of model performance with imbalanced data. Findings indicate that XGBoost outperforms LR and DT in ROC AUC and PRC AUC, demonstrating superior detection of minority class instances. Feature importance analysis identified key indicators of insider threats, including user ID, process name, timestamp, process ID, and parent process ID, enhancing model interpretability and providing insights into insider threat behaviors. In conclusion, while LR and DT offer valuable baseline and interpretative capabilities, XGBoost is the most effective for detecting insider threats in imbalanced datasets. The research underscores the importance of addressing data imbalance and selecting appropriate evaluation metrics to improve the robustness of insider threat detection systems in real-world applications. | |
| dc.format.extent | 67 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.14154/78459 | |
| dc.language.iso | en_US | |
| dc.publisher | Saudi Digital Library | |
| dc.subject | Insider Threat Detection | |
| dc.subject | Machine Learning in Cybersecurity | |
| dc.subject | Class Imbalance in Datasets | |
| dc.subject | XGBoost for Insider Threats | |
| dc.subject | Logistic Regression (LR) | |
| dc.subject | Decision Tree (DT) | |
| dc.subject | BETH Dataset | |
| dc.title | Enhancing Insider Threat Detection Using Machine Learning: Addressing Data Imbalance with Logistic Regression, Decision Tree, and XGBoost | |
| dc.type | Thesis | |
| sdl.degree.department | emlyon business school | |
| sdl.degree.discipline | Cybersecurity and defence management | |
| sdl.degree.grantor | emlyon business school | |
| sdl.degree.name | Master of science |
