Title: | Imbalanced data in classification: a case study of credit scoring |
Author(s): | Bui Thi Thien My |
Advisor(s): | Assoc. Prof. Dr. Le Xuan Truong Dr. Ta Quoc Bao |
Keywords: | Imbalanced data |
Abstract: | In classification, imbalanced data occurs when there is a great difference in the quantities of classes of the training data set. This problem frequently arises in various fields, for example, credit scoring and medical diagnosis. With imbalanced data, predictive modeling for real-world applications has posed a challenge because most machine learning algorithms are designed for balanced data sets. Therefore, addressing imbalanced data has attracted much attention from researchers and practitioners. In this dissertation, we propose solutions for imbalanced classification. Furthermore, these solutions are applied to a credit scoring case study. The solutions are derived from three papers published in the scientific journals. The first paper presents an interpretable decision tree ensemble model for imbalanced credit scoring data sets. The second paper introduces a novel technique for addressing imbalanced data, particularly in the cases of overlapping and noisy samples. The final paper proposes a modification of Logistic regression focusing on the optimization F-measure, a popular metric in imbalanced classification. These classifiers have been trained on a range of public and private data sets with highly imbalanced status and overlapping classes. The primary results demonstrate that the proposed works outperform both traditional and some recent models. |
Issue Date: | 2024 |
Publisher: | University of Economics Ho Chi Minh City |
URI: | https://opac.ueh.edu.vn/record=b1036923~S1 https://digital.lib.ueh.edu.vn/handle/UEH/71169 |
Appears in Collections: | DISSERTATIONS
|