Applying machine learning approaches to determine loan default events: A comparison with basel committee recommendations

Phạm Đoàn Xuân Duyên

Please use this identifier to cite or link to this item: https://digital.lib.ueh.edu.vn/handle/UEH/76236

Title:	Applying machine learning approaches to determine loan default events: A comparison with basel committee recommendations
Author(s):	Phạm Đoàn Xuân Duyên
Advisor(s):	Nguyễn Mạnh Tuấn
Abstract:	This research investigates the application of machine learning techniques to identify loan default events, comparing these methods to the default definitions provided by the Basel Committee. The primary goal is to establish a precise and adaptable definition of default that is customized for specific loan products, packages, or customer groups, thereby improving the effectiveness of credit risk assessment models. The study uses a dataset of credit card loans from a commercial bank, sourced fromKaggle, which includes customer information and their repayment history. Statistical analysis methods, such as cohort analysis and the debt migration matrix, are employed to pinpoint the factors that influence default events. These methods help in creating a statistically-derived definition of default, known as DPD90_MOB12, which considers loans that have been active for at least 12 months and are overdue by 90 days or more. This is then compared to the Basel Committee's definition, DPD90, which classifies loans as defaults when they are overdue by 90 days or more, irrespective of the loan's age. The research methodology includes data preprocessing using the Weight of Evidence (WOE) technique for data binning, which transforms input variables and evaluates their predictive power. Three machine learning models are used: logistic regression, random forest, and XG Boost. The models' performance is evaluated using metrics such as the Gini index and AUC-ROC curves, and stability is assessed with the Population Stability Index (PSI). The results indicate that models using the statistically-derived default definition (DPD90_MOB12) generally perform better than those using the Basel Committee's definition (DPD90). Specifically, the XG Boost model demonstrates the best performance across both training and testing datasets, achieving the highest Gini index. The findings suggest that a flexible definition of default, tailored to the specific loan product or customer group, can lead to more accurate and effective credit scoring models. By providing a comparative analysis of different default definitions and their impact on credit scoring models, this research contributes to existing knowledge and offers practical advice for financial institutions. The research emphasizes the need for financial institutions to move away from rigid adherence to the Basel Committee's definition and consider more nuanced, statistically-driven approaches tailored to their unique contexts
Issue Date:	2025
Publisher:	University of Economics Ho Chi Minh City
Series/Report no.:	Giải thưởng Nhà nghiên cứu trẻ UEH 2025
URI:	https://digital.lib.ueh.edu.vn/handle/UEH/76236
Appears in Collections:	Nhà nghiên cứu trẻ UEH

Files in This Item:

File

DetaiNCKHSV43768.pdf

Description

Size

2.5 MB

Format

Adobe PDF

Show full item record

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Google Scholar^TM