Machine learning methods are increasingly being used to predict company bankruptcy. Comparative studies carried out on selected methods to determine their suitability for predicting company bankruptcy have demonstrated high levels of prediction accuracy for the extreme gradient boosting method in this area. This method is resistant to outliers and relieves the researcher from the burden of having to provide missing data. The aim of this study is to assess how the elimination of outliers from data sets affects the accuracy of the extreme gradient boosting method in predicting company bankruptcy. The added value of this study is demonstrated by the application of the extreme gradient boosting method in bankruptcy prediction based on data free from the outliers reported for companies which continue to operate as a going concern. The research was conducted using 64 financial ratios for the companies operating in the industrial processing sector in Poland. The research results indicate that it is possible to increase the detection rate for bankrupt companies by eliminating the outliers reported for companies which continue to operate as a going concern from data sets.
XGBoost, company bankruptcy, machine learning, outlier
BAESENS, B., VAN GESTEL, T., VIAENE, S., STEPANOVA, M., SUYKENS, J., VANTHIE, J., (2003). Benchmarking state-of-the-art classification algorithms for credit scoring. Journal of the Operational Research Society, 54, pp. 627–635. DOI: http://dx.doi.org/10.1057/palgrave.jors.2601545.
BROWN, I., MUES, Ch., (2012). An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Systems with Applications, 39 (3), pp. 3446–3453. DOI: https://doi.org/10.1016/j.eswa.2011.09.033.
CHEN, T., GUESTRIN, C., (2016). XGBoost: A Scalable Tree Boosting System. DOI: http://dx.doi.org/10.1145/2939672.2939785.
GARCIA, V., MARQUES, A. I., SANCHEZ, J. S., (2019). Exploring the synergetic effects of sample types on the performance of ensembles for credit risk and corporate bankruptcy prediction. Information Fusion, 47, pp. 88–101. DOI: https://doi.org/10.1016/j.inffus.2018.07.004.
FRIEDMAN, J. H., (2001). Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29 (5), pp. 1189–1232.
FRIEDMAN, J. H., HASTIE, T., TIBSHIRANI, R., (2000). Additive logistic regression: a statistical view of boosting. The Annals of Statistics, 28 (2), pp. 337–407.
LESSMANN, S., BAESENS, B., SEOW, H. V., THOMAS, L. C., (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: an update of research. European Journal of Operational Research, 247 (1), pp. 124–136. DOI: https://doi.org/10.1016/j.ejor.2015.05.030.
NANNI, L., LUMINI, A., (2009). An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring. Expert Systems with Applications, 36, pp. 3028–3033. DOI: http://dx.doi.org/10.1016/j.eswa.2008.01.018.
PAWEŁEK, B., GAŁUSZKA, K., KOSTRZEWSKA, J., KOSTRZEWSKI, M., (2017). Classification Methods in the Research on the Financial Standing of Construction Enterprises After Bankruptcy in Poland. In: Palumbo, F. et al. (Eds.), Data Science, Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Switzerland, pp. 29–42. DOI: http://dx.doi.org/10.1007/978-3-319-55723-6\_3.
PAWEŁEK, B., (2017). Prediction of Company Bankruptcy in the Context of Changes in the Economic Situation. In: Papież, M., Śmiech, S. (Eds.), The 10th Professor Aleksander Zeliaś International Conference on Modelling and Forecasting of Socio-Economic Phenomena. Conference Proceedings. Cracow: Foundation of the Cracow University of Economics, pp. 290–299.
WU, Y., GAUNT, C., GRAY, S., (2010). A comparison of alternative bankruptcy prediction models. Journal of Contemporary Accounting & Economics, 6, pp. 34–45. DOI: http://dx.doi.org/10.1016/j.jcae.2010.04.002.
XIA, Y., LIU, Ch., LI, Y., LIU, N., (2017). A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. Expert Systems With Applications, 78, pp. 225–241.
ZIĘBA, M., TOMCZAK, S. K., TOMCZAK, J. M., (2016). Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert Systems With Applications, 58, pp. 93–101. DOI: http://dx.doi.org/10.1016/j.eswa.2016.04.001.