Predicting Credit Card Default Risk Using Machine Learning: A Comparative Study of Tree-Based Models on Bank Customer Data

dc.contributor.advisorDo, Anh
dc.contributor.authorAlqarzaai, Abdulmohsen Abdullah
dc.date.accessioned2026-01-13T08:31:15Z
dc.date.issued2025
dc.description.abstractThis study examined whether modern tree-based models can predict credit-card default better than the usual logistic model while keeping decisions clear, fair, and linked to profit. The background is rising delinquency and charge-offs, which increases the value of accurate and transparent tools. Prior studies rarely tested trees with time-split validation, profit-based cut-offs, and fairness checks in one design, so four hypotheses were set: boosted trees beat logistic out of time; imbalance methods raise minority recall with small loss in AUC; profit-tuned thresholds improve expected profit under risk limits; SHAP explanations make drivers and group outcomes easy to see. Two public datasets were used, the Taiwan file (~30k rows) and Home Credit (~307k with joined tables). The data were cleaned, leakage was avoided, logistic regression, decision trees, Random Forest, XGBoost, LightGBM, and CatBoost were compared, under-sampling, SMOTE, and focal loss were tested, forward-in-time splits were used for Home Credit, and thresholds were selected by expected profit with a guardrail on approved-default. Boosted trees led on Home Credit with AUC near 0.75, while on Taiwan the logistic baseline stayed competitive. A profit cut-off around 0.21 increased expected profit by over 4.9 billion while staying within the guardrail. Imbalance methods gave modest recall gains. SHAP showed external scores, affordability and timing as top drivers, with small approval gaps by gender. The findings imply trees are preferred for complex data, profit and risk should guide thresholds, and time-split testing matters. Recommendations include boosted trees with profit guardrails, simple fairness monitors, forward-in-time validation, and future work using richer data and stress tests too.
dc.format.extent66
dc.identifier.citationAlqarzaai, A.A. (2025). PREDICTING CREDIT CARD DEFAULT RISK USING MACHINE LEARNING: A COMPARATIVE STUDY OF TREE-BASED MODELS ON BANK CUSTOMER DATA (Msc dissertation, Swansea University). Swansea University
dc.identifier.urihttps://hdl.handle.net/20.500.14154/77859
dc.language.isoen
dc.publisherSaudi Digital Library
dc.subjectFinance
dc.subjectXGBoost
dc.subjectLightGBM
dc.subjectData Analysis
dc.titlePredicting Credit Card Default Risk Using Machine Learning: A Comparative Study of Tree-Based Models on Bank Customer Data
dc.typeThesis
sdl.degree.departmentSchool of Management
sdl.degree.disciplineData Analysis
sdl.degree.grantorSwansea University
sdl.degree.nameFinance and Big Data Analytics

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
SACM-Dissertation.pdf
Size:
1.16 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.61 KB
Format:
Item-specific license agreed to upon submission
Description:

Copyright owned by the Saudi Digital Library (SDL) © 2026