Predicting Credit Card Default Risk Using Machine Learning: A Comparative Study of Tree-Based Models on Bank Customer Data

No Thumbnail Available

Date

2025

Journal Title

Journal ISSN

Volume Title

Publisher

Saudi Digital Library

Abstract

This study examined whether modern tree-based models can predict credit-card default better than the usual logistic model while keeping decisions clear, fair, and linked to profit. The background is rising delinquency and charge-offs, which increases the value of accurate and transparent tools. Prior studies rarely tested trees with time-split validation, profit-based cut-offs, and fairness checks in one design, so four hypotheses were set: boosted trees beat logistic out of time; imbalance methods raise minority recall with small loss in AUC; profit-tuned thresholds improve expected profit under risk limits; SHAP explanations make drivers and group outcomes easy to see. Two public datasets were used, the Taiwan file (~30k rows) and Home Credit (~307k with joined tables). The data were cleaned, leakage was avoided, logistic regression, decision trees, Random Forest, XGBoost, LightGBM, and CatBoost were compared, under-sampling, SMOTE, and focal loss were tested, forward-in-time splits were used for Home Credit, and thresholds were selected by expected profit with a guardrail on approved-default. Boosted trees led on Home Credit with AUC near 0.75, while on Taiwan the logistic baseline stayed competitive. A profit cut-off around 0.21 increased expected profit by over 4.9 billion while staying within the guardrail. Imbalance methods gave modest recall gains. SHAP showed external scores, affordability and timing as top drivers, with small approval gaps by gender. The findings imply trees are preferred for complex data, profit and risk should guide thresholds, and time-split testing matters. Recommendations include boosted trees with profit guardrails, simple fairness monitors, forward-in-time validation, and future work using richer data and stress tests too.

Description

Keywords

Finance, XGBoost, LightGBM, Data Analysis

Citation

Alqarzaai, A.A. (2025). PREDICTING CREDIT CARD DEFAULT RISK USING MACHINE LEARNING: A COMPARATIVE STUDY OF TREE-BASED MODELS ON BANK CUSTOMER DATA (Msc dissertation, Swansea University). Swansea University

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2026