Title: | Applying machine learning approaches to determine loan default events: A comparison with basel committee recommendations |
Author(s): | Phạm Đoàn Xuân Duyên |
Advisor(s): | Nguyễn Mạnh Tuấn |
Abstract: | This research investigates the application of machine learning techniques to identify loan default events, comparing these methods to the default definitions provided by the Basel Committee. The primary goal is to establish a precise and adaptable definition of default that is customized for specific loan products, packages, or customer groups, thereby improving the effectiveness of credit risk assessment models. The study uses a dataset of credit card loans from a commercial bank, sourced fromKaggle, which includes customer information and their repayment history. Statistical analysis methods, such as cohort analysis and the debt migration matrix, are employed to pinpoint the factors that influence default events. These methods help in creating a statistically-derived definition of default, known as DPD90_MOB12, which considers loans that have been active for at least 12 months and are overdue by 90 days or more. This is then compared to the Basel Committee's definition, DPD90, which classifies loans as defaults when they are overdue by 90 days or more, irrespective of the loan's age. The research methodology includes data preprocessing using the Weight of Evidence (WOE) technique for data binning, which transforms input variables and evaluates their predictive power. Three machine learning models are used: logistic regression, random forest, and XG Boost. The models' performance is evaluated using metrics such as the Gini index and AUC-ROC curves, and stability is assessed with the Population Stability Index (PSI). The results indicate that models using the statistically-derived default definition (DPD90_MOB12) generally perform better than those using the Basel Committee's definition (DPD90). Specifically, the XG Boost model demonstrates the best performance across both training and testing datasets, achieving the highest Gini index. The findings suggest that a flexible definition of default, tailored to the specific loan product or customer group, can lead to more accurate and effective credit scoring models. By providing a comparative analysis of different default definitions and their impact on credit scoring models, this research contributes to existing knowledge and offers practical advice for financial institutions. The research emphasizes the need for financial institutions to move away from rigid adherence to the Basel Committee's definition and consider more nuanced, statistically-driven approaches tailored to their unique contexts |
Issue Date: | 2025 |
Publisher: | University of Economics Ho Chi Minh City |
Series/Report no.: | Giải thưởng Nhà nghiên cứu trẻ UEH 2025 |
URI: | https://digital.lib.ueh.edu.vn/handle/UEH/76236 |
Appears in Collections: | Nhà nghiên cứu trẻ UEH
|