Advanced
Please use this identifier to cite or link to this item: https://digital.lib.ueh.edu.vn/handle/UEH/76236
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorNguyễn Mạnh Tuấnen_US
dc.contributor.authorPhạm Đoàn Xuân Duyênen_US
dc.contributor.otherHà Thị Mỹ Quyênen_US
dc.date.accessioned2025-09-04T06:51:00Z-
dc.date.available2025-09-04T06:51:00Z-
dc.date.issued2025-
dc.identifier.urihttps://digital.lib.ueh.edu.vn/handle/UEH/76236-
dc.description.abstractThis research investigates the application of machine learning techniques to identify loan default events, comparing these methods to the default definitions provided by the Basel Committee. The primary goal is to establish a precise and adaptable definition of default that is customized for specific loan products, packages, or customer groups, thereby improving the effectiveness of credit risk assessment models. The study uses a dataset of credit card loans from a commercial bank, sourced fromKaggle, which includes customer information and their repayment history. Statistical analysis methods, such as cohort analysis and the debt migration matrix, are employed to pinpoint the factors that influence default events. These methods help in creating a statistically-derived definition of default, known as DPD90_MOB12, which considers loans that have been active for at least 12 months and are overdue by 90 days or more. This is then compared to the Basel Committee's definition, DPD90, which classifies loans as defaults when they are overdue by 90 days or more, irrespective of the loan's age. The research methodology includes data preprocessing using the Weight of Evidence (WOE) technique for data binning, which transforms input variables and evaluates their predictive power. Three machine learning models are used: logistic regression, random forest, and XG Boost. The models' performance is evaluated using metrics such as the Gini index and AUC-ROC curves, and stability is assessed with the Population Stability Index (PSI). The results indicate that models using the statistically-derived default definition (DPD90_MOB12) generally perform better than those using the Basel Committee's definition (DPD90). Specifically, the XG Boost model demonstrates the best performance across both training and testing datasets, achieving the highest Gini index. The findings suggest that a flexible definition of default, tailored to the specific loan product or customer group, can lead to more accurate and effective credit scoring models. By providing a comparative analysis of different default definitions and their impact on credit scoring models, this research contributes to existing knowledge and offers practical advice for financial institutions. The research emphasizes the need for financial institutions to move away from rigid adherence to the Basel Committee's definition and consider more nuanced, statistically-driven approaches tailored to their unique contextsen_US
dc.format.medium61 p.en_US
dc.language.isoenen_US
dc.publisherUniversity of Economics Ho Chi Minh Cityen_US
dc.relation.ispartofseriesGiải thưởng Nhà nghiên cứu trẻ UEH 2025en_US
dc.titleApplying machine learning approaches to determine loan default events: A comparison with basel committee recommendationsen_US
dc.typeResearch Paperen_US
ueh.specialityTài chínhen_US
ueh.awardGiải Aen_US
item.cerifentitytypePublications-
item.languageiso639-1en-
item.grantfulltextreserved-
item.openairetypeResearch Paper-
item.fulltextFull texts-
item.openairecristypehttp://purl.org/coar/resource_type/c_18cf-
Appears in Collections:Nhà nghiên cứu trẻ UEH
Files in This Item:

File

Description

Size

Format

Show simple item record

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.