AI for Fraud Detection
No Thumbnail Available
Date
2026
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Saudi Digital Library
Abstract
Financial fraud is rapidly growing in the digital payment systems, and it highlights the shortcomings of the
fixed rule-based controls and machine learning models that work well in the testing environment but fail
miserably in the real-life operational environment. This study develops and evaluates a complete fraud
detection pipeline designed to address three persistent challenges: severe class imbalance, model instability
under shifting data distributions, and the need for transparent decision outputs required by regulators and
financial institutions. The pipeline integrates systematic data preprocessing, an optimized LightGBM
model, and SHAP-based interpretability using the IEEE-CIS dataset of 590,540 transactions.
The methodology includes memory optimization, structured missing-value treatment, outlier handling
through winsorization, label encoding for high-cardinality categorical fields, temporal feature engineering,
and correlation-based feature reduction. Optuna is a Bayesian optimisation that is used to optimise
LightGBM hyper-parameters using ROC-AUC as the objective function. ROC-AUC, PR-AUC, precision,
recall, F1-score, and a confusion matrix are used to measure model performance, thus, following the best
practices in imbalanced classification. SHAP analysis is used to produce both global and local explanations
of model behaviour.
The final model achieves strong discriminative performance, with a ROC-AUC of 0.9606 and a PR-AUC
of 0.8042. The accuracy (0.7335) and recall (0.7491) indicate balanced detection and the confusion matrix
shows that there is good fraud detection with controllable false-positives. SHAP analysis shows that count based features, transaction amount, card identifiers, geographic features, and temporal patterns are the
predictive features, which are consistent with the established fraud behaviours reported in the recent
literature.
The results demonstrate that the improvement in performance is not only due to the choice of the model but
also to the mutual complementary effect of data engineering, hyper-parameter optimization, and
interpretability. The researchers conclude that an end-to-end pipeline improves the accuracy of detection,
increases transparency, and overcomes fundamental limitations that were found in previous studies of fraud.
Limitations are anonymisation of the datasets, lack of drift analysis, and possible loss of fraud indicators in
the course of preprocessing.
Description
Keywords
AI, Fruad, financial fraud detection, Lightgbm, Optimization, Ieee
Citation
APA
