Enhancing Insider Threat Detection Using Machine Learning: Addressing Data Imbalance with Logistic Regression, Decision Tree, and XGBoost

dc.contributor.advisorJAOTOMBO, Franck
dc.contributor.authorAlthobaiti, Huda
dc.date.accessioned2026-03-17T14:39:20Z
dc.date.issued2024
dc.description.abstractThis thesis investigates the application of machine learning techniques to enhance the detection of insider threats using the BETH dataset, characterized by its inherent imbalance. Insider threats involve malicious activities or errors by authorized individuals, posing significant risks to organizations. The research evaluates three machine learning models: Logistic Regression (LR), Decision Tree (DT), and XGBoost, chosen for their strengths in handling imbalanced data, interpretability, and performance. The study addresses class imbalance challenges, mirroring real-world cybersecurity scenarios where malicious activities are outnumbered by normal activities. Two approaches were tested: upsampling the minority class using SMOTE and downsampling the majority class. However, training models on the imbalanced dataset provided a more realistic performance evaluation. The models were evaluated using metrics sensitive to class distribution, such as the Area Under the Curve (AUC) for the Receiver Operating Characteristic (ROC) and the Precision-Recall Curve (PRC). These metrics offer a comprehensive view of model performance with imbalanced data. Findings indicate that XGBoost outperforms LR and DT in ROC AUC and PRC AUC, demonstrating superior detection of minority class instances. Feature importance analysis identified key indicators of insider threats, including user ID, process name, timestamp, process ID, and parent process ID, enhancing model interpretability and providing insights into insider threat behaviors. In conclusion, while LR and DT offer valuable baseline and interpretative capabilities, XGBoost is the most effective for detecting insider threats in imbalanced datasets. The research underscores the importance of addressing data imbalance and selecting appropriate evaluation metrics to improve the robustness of insider threat detection systems in real-world applications.
dc.format.extent67
dc.identifier.urihttps://hdl.handle.net/20.500.14154/78459
dc.language.isoen_US
dc.publisherSaudi Digital Library
dc.subjectInsider Threat Detection
dc.subjectMachine Learning in Cybersecurity
dc.subjectClass Imbalance in Datasets
dc.subjectXGBoost for Insider Threats
dc.subjectLogistic Regression (LR)
dc.subjectDecision Tree (DT)
dc.subjectBETH Dataset
dc.titleEnhancing Insider Threat Detection Using Machine Learning: Addressing Data Imbalance with Logistic Regression, Decision Tree, and XGBoost
dc.typeThesis
sdl.degree.departmentemlyon business school
sdl.degree.disciplineCybersecurity and defence management
sdl.degree.grantoremlyon business school
sdl.degree.nameMaster of science

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
SACM-Dissertation.pdf
Size:
1.18 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed to upon submission
Description:

Collections

Copyright owned by the Saudi Digital Library (SDL) © 2026