Imbalanced data in classification: a case study of credit scoring

Bui Thi Thien My

Please use this identifier to cite or link to this item: https://digital.lib.ueh.edu.vn/handle/UEH/71169

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Assoc. Prof. Dr. Le Xuan Truong	en_US
dc.contributor.advisor	Dr. Ta Quoc Bao	en_US
dc.contributor.author	Bui Thi Thien My	en_US
dc.date.accessioned	2024-06-20T07:50:11Z	-
dc.date.available	2024-06-20T07:50:11Z	-
dc.date.issued	2024	-
dc.identifier.other	Barcode: 1000016996	-
dc.identifier.uri	https://opac.ueh.edu.vn/record=b1036923~S1	-
dc.identifier.uri	https://digital.lib.ueh.edu.vn/handle/UEH/71169	-
dc.description.abstract	In classification, imbalanced data occurs when there is a great difference in the quantities of classes of the training data set. This problem frequently arises in various fields, for example, credit scoring and medical diagnosis. With imbalanced data, predictive modeling for real-world applications has posed a challenge because most machine learning algorithms are designed for balanced data sets. Therefore, addressing imbalanced data has attracted much attention from researchers and practitioners. In this dissertation, we propose solutions for imbalanced classification. Furthermore, these solutions are applied to a credit scoring case study. The solutions are derived from three papers published in the scientific journals. The first paper presents an interpretable decision tree ensemble model for imbalanced credit scoring data sets. The second paper introduces a novel technique for addressing imbalanced data, particularly in the cases of overlapping and noisy samples. The final paper proposes a modification of Logistic regression focusing on the optimization F-measure, a popular metric in imbalanced classification. These classifiers have been trained on a range of public and private data sets with highly imbalanced status and overlapping classes. The primary results demonstrate that the proposed works outperform both traditional and some recent models.	en_US
dc.format.medium	123 p.	en_US
dc.language.iso	English	en_US
dc.publisher	University of Economics Ho Chi Minh City	en_US
dc.subject	Imbalanced data	en_US
dc.title	Imbalanced data in classification: a case study of credit scoring	en_US
dc.type	Dissertations	en_US
ueh.speciality	Statistics = Thống kê	en_US
item.openairetype	Dissertations	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.cerifentitytype	Publications	-
item.fulltext	Full texts	-
item.grantfulltext	reserved	-
item.languageiso639-1	English	-
Appears in Collections:	DISSERTATIONS

Files in This Item:

File

Bui Thi Thien My.pdf

Description

Size

2.5 MB

Format

Adobe PDF

Show simple item record

Google Scholar^TM

Check

Files in This Item:

Google ScholarTM

Google Scholar^TM