Bank failure prediction; Boosting algorithms; Target variables; U.S. banks; Variable selection techniques; XGBoost
From a modeling point of view, our work provides a novel approach to better use XGBoost for bank failure prediction, determining the essential technical aspects that can improve the predictive accuracy. Of these technical aspects, the two crucial factors are assigning correct values to target variables and careful predictor selection (through ANOVA, correlation, information value tests, and weight of evidence). We also highlight that bank failure could be predicted four to five quarters earlier when all predictive signals simultaneously appear. Hence, we strongly suggest using quarterly data instead of yearly data. In addition to practical implications, our present work also contributed to the existing literature. We confirm the results of existing studies that emphasized that XGBoost has strong predictive power (Carmona, Climent, and Momparler (2018)). Moreover, we provide evidence that XGBoost outperforms other models in the same boosting family, including gradient boosting and AdaBoost, through an intensive comparison of predictive power. These contributions might facilitate future work on bank failure prediction.