Applying machine learning approaches to determine loan default events: A comparison with basel committee recommendations

Phạm Đoàn Xuân Duyên

Please use this identifier to cite or link to this item: https://digital.lib.ueh.edu.vn/handle/UEH/76236

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Nguyễn Mạnh Tuấn	en_US
dc.contributor.author	Phạm Đoàn Xuân Duyên	en_US
dc.contributor.other	Hà Thị Mỹ Quyên	en_US
dc.date.accessioned	2025-09-04T06:51:00Z	-
dc.date.available	2025-09-04T06:51:00Z	-
dc.date.issued	2025	-
dc.identifier.uri	https://digital.lib.ueh.edu.vn/handle/UEH/76236	-
dc.description.abstract	This research investigates the application of machine learning techniques to identify loan default events, comparing these methods to the default definitions provided by the Basel Committee. The primary goal is to establish a precise and adaptable definition of default that is customized for specific loan products, packages, or customer groups, thereby improving the effectiveness of credit risk assessment models. The study uses a dataset of credit card loans from a commercial bank, sourced fromKaggle, which includes customer information and their repayment history. Statistical analysis methods, such as cohort analysis and the debt migration matrix, are employed to pinpoint the factors that influence default events. These methods help in creating a statistically-derived definition of default, known as DPD90_MOB12, which considers loans that have been active for at least 12 months and are overdue by 90 days or more. This is then compared to the Basel Committee's definition, DPD90, which classifies loans as defaults when they are overdue by 90 days or more, irrespective of the loan's age. The research methodology includes data preprocessing using the Weight of Evidence (WOE) technique for data binning, which transforms input variables and evaluates their predictive power. Three machine learning models are used: logistic regression, random forest, and XG Boost. The models' performance is evaluated using metrics such as the Gini index and AUC-ROC curves, and stability is assessed with the Population Stability Index (PSI). The results indicate that models using the statistically-derived default definition (DPD90_MOB12) generally perform better than those using the Basel Committee's definition (DPD90). Specifically, the XG Boost model demonstrates the best performance across both training and testing datasets, achieving the highest Gini index. The findings suggest that a flexible definition of default, tailored to the specific loan product or customer group, can lead to more accurate and effective credit scoring models. By providing a comparative analysis of different default definitions and their impact on credit scoring models, this research contributes to existing knowledge and offers practical advice for financial institutions. The research emphasizes the need for financial institutions to move away from rigid adherence to the Basel Committee's definition and consider more nuanced, statistically-driven approaches tailored to their unique contexts	en_US
dc.format.medium	61 p.	en_US
dc.language.iso	en	en_US
dc.publisher	University of Economics Ho Chi Minh City	en_US
dc.relation.ispartofseries	Giải thưởng Nhà nghiên cứu trẻ UEH 2025	en_US
dc.title	Applying machine learning approaches to determine loan default events: A comparison with basel committee recommendations	en_US
dc.type	Research Paper	en_US
ueh.speciality	Tài chính	en_US
ueh.award	Giải A	en_US
item.grantfulltext	reserved	-
item.languageiso639-1	en	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.cerifentitytype	Publications	-
item.fulltext	Full texts	-
item.openairetype	Research Paper	-
Appears in Collections:	Nhà nghiên cứu trẻ UEH

Files in This Item:

File

DetaiNCKHSV43768.pdf

Description

Size

2.5 MB

Format

Adobe PDF

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Google Scholar^TM