AI for Fraud Detection
| dc.contributor.advisor | Muneeb, Ahmad | |
| dc.contributor.author | Albaqami, Abdullah | |
| dc.date.accessioned | 2026-03-17T22:58:30Z | |
| dc.date.issued | 2026 | |
| dc.description.abstract | Financial fraud is rapidly growing in the digital payment systems, and it highlights the shortcomings of the fixed rule-based controls and machine learning models that work well in the testing environment but fail miserably in the real-life operational environment. This study develops and evaluates a complete fraud detection pipeline designed to address three persistent challenges: severe class imbalance, model instability under shifting data distributions, and the need for transparent decision outputs required by regulators and financial institutions. The pipeline integrates systematic data preprocessing, an optimized LightGBM model, and SHAP-based interpretability using the IEEE-CIS dataset of 590,540 transactions. The methodology includes memory optimization, structured missing-value treatment, outlier handling through winsorization, label encoding for high-cardinality categorical fields, temporal feature engineering, and correlation-based feature reduction. Optuna is a Bayesian optimisation that is used to optimise LightGBM hyper-parameters using ROC-AUC as the objective function. ROC-AUC, PR-AUC, precision, recall, F1-score, and a confusion matrix are used to measure model performance, thus, following the best practices in imbalanced classification. SHAP analysis is used to produce both global and local explanations of model behaviour. The final model achieves strong discriminative performance, with a ROC-AUC of 0.9606 and a PR-AUC of 0.8042. The accuracy (0.7335) and recall (0.7491) indicate balanced detection and the confusion matrix shows that there is good fraud detection with controllable false-positives. SHAP analysis shows that count based features, transaction amount, card identifiers, geographic features, and temporal patterns are the predictive features, which are consistent with the established fraud behaviours reported in the recent literature. The results demonstrate that the improvement in performance is not only due to the choice of the model but also to the mutual complementary effect of data engineering, hyper-parameter optimization, and interpretability. The researchers conclude that an end-to-end pipeline improves the accuracy of detection, increases transparency, and overcomes fundamental limitations that were found in previous studies of fraud. Limitations are anonymisation of the datasets, lack of drift analysis, and possible loss of fraud indicators in the course of preprocessing. | |
| dc.format.extent | 58 | |
| dc.identifier.citation | APA | |
| dc.identifier.uri | https://hdl.handle.net/20.500.14154/78488 | |
| dc.language.iso | en | |
| dc.publisher | Saudi Digital Library | |
| dc.subject | AI | |
| dc.subject | Fruad | |
| dc.subject | financial fraud detection | |
| dc.subject | Lightgbm | |
| dc.subject | Optimization | |
| dc.subject | Ieee | |
| dc.title | AI for Fraud Detection | |
| dc.type | Thesis | |
| sdl.degree.department | Department of Computer Science | |
| sdl.degree.discipline | Data scince | |
| sdl.degree.grantor | Swansea University | |
| sdl.degree.name | Master |
