Heteroscedastic discriminant analysis  combined with feature selection for credit scoring

Katarzyna  Stąpor; Tomasz  Smolarczyk; Piotr  Fabian

doi:https://doi.org/10.59170/stattrans-2016-014

Heteroscedastic discriminant analysis combined with feature selection for credit scoring

Katarzyna Stąpor Institute of Computer Science, Silesian University of Technology, Gliwice, Poland. , Tomasz Smolarczyk Institute of Computer Science, Silesian University of Technology, Gliwice, Poland. , Piotr Fabian Institute of Computer Science, Silesian University of Technology, Gliwice, Poland. Statistics in Transition new series, vol. 17, 2016, 2, pages: 265-280 Published online: 1 June 2016 https://doi.org/10.59170/stattrans-2016-014

485 Views 21 Downloads

ARTICLE

(English) PDF

ABSTRACT

Credit granting is a fundamental question and one of the most complex tasks that every credit institution is faced with. Typically, credit scoring databases are often large and characterized by redundant and irrelevant features. An effective classification model will objectively help managers instead of intuitive experience. This study proposes an approach for building a credit scoring model based on the combination of heteroscedastic extension (Loog, Duin, 2002) of classical Fisher Linear Discriminant Analysis (Fisher, 1936, Krzyśko, 1990) and a feature selection algorithm that retains sufficient information for classification purpose. We have tested five feature subset selection algorithms: two filters and three wrappers. To evaluate the accuracy of the proposed credit scoring model and to compare it with the existing approaches we have used the German credit data set from the study (Chen, Li, 2010). The results of our study suggest that the proposed hybrid approach is an effective and promising method for building credit scoring models.

KEYWORDS

heteroscedastic discriminant analysis, feature subset selection, variable importance, credit scoring model.

REFERENCES

CHEN, F., LI, F., (2010). Combination of feature selection approaches with SVM in credit scoring, Expert Systems with Applications, Vol. 37, pp. 4902–4909.

COVER, T., THOMAS, J., (1991). Elements of information theory. John Wiley & Sons, New York, NY.

CROOK, J. N., EDELMAN, D. B., THOMAS, L. C., (2007). Recent developments in consumer credit risk assessment. European Journal of Operational Research 183 (3), pp. 1447–1465.

DASH, M., LIU, H., (1997). Feature selection for classification. Intelligent Data Analysis, 1, pp. 131–156.

DUDA, R., HART, P., STORK, D., (2001). Pattern Classification. John Wiley & Sons, New York, 2 ed.

FEO, T. A., RESENDE, M. G. C., (1995). Greedy randomized adaptive search procedures. J. Global Optim. 2, pp. 1–27

FISHER, R. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, pp. 179–188.

FUKUNAGA, K., (1990). Introduction to statistical pattern recognition. New York: Academic Press.

GOLDBERG, D., (1989). Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley Professional.

HALL, M., SMITH, L., (1997). Feature subset selection: a correlation based filter approach, in International Conference on Neural Information Processing and Intelligent Information Systems, Berlin.

KRZYŚKO, M., (1990). Discriminant analysis, WNT, Warszawa (in Polish).

KRZYŚKO, M., WOŁYŃSKI, W., (1996). Discriminant rules based on distances, Tatra Mountains Math. Publ. 7(1996), pp. 289–296.

LOOG, M., DUIN, R., (2002). Non-iterative heteroscedastic linear dimension reduction for two-class data: from Fisher to Chernoff. Proc. 4th Int. Workshop S+SSPR, pp. 508–517.

MATUSZCZYK, A., (2012). Credit scoring. Warszawa: CeDeWu Sp. z o.o.

MOSCATO, P., (2002). Memetic algorithms. In Pardalos, P.M., Resende, M. (eds.): Handbook of Applied Optimization. Oxford: Oxford University Press, pp. 157–167.

PACHECO, J., et al., (2006). Analysis of new variable selection methods in discriminant analysis, Computational Statistics & Data Analysis, Vol. 51, 3, pp. 1463–1478.

PUDIL, P., et al., (1994). Floating search methods in feature selection, Pattern Recognition Letters, Vol. 15, 11, pp. 1119–1125.

SOMOL, P., et al., (2005). Filter- versus Wrapper-based Feature Selection For Credit Scoring, International Journal of Intelligent Systems, Vol. 20 (10), pp. 985–999.

SPENCE, C., SAJDA, P., (1998). Role of feature selection in building pattern recognizers for computer-aided diagnosis, in Medical Imaging 1998: Image Processing, San Diego.

STĄPOR, K., (2011). Classification methods in computer vision. PWN, Warszawa (in Polish).

STĄPOR, K., (2015) Better alternatives for stepwise discriminant analysis. Acta Universitatis Lodziensis, Folia Oeconomica, Multivariate Statistical Analysis in Theory and Practice, nr 1(311), Lodziensis University Press, pp. 9–15.

THOMAS, L. C., (2000). A survey of credit and behavioural scoring: Forecasting financial risk of lending to consumers. International Journal of Forecasting 16 (2), pp. 149–172.

THOMAS, L. C., OLIVER, R. W., HAND, D. J., (2005). A survey of the issues in consumer credit modelling research. Journal of the Operational Research Society 56 (9), pp. 1006–1015.

ZHANG, D., X., ZHOU, S., LEUNG, C. H., ZHENG, J., (2010). Vertical bagging decision trees model for credit scoring. Expert Systems with Applications 37 (12), pp. 7838–7843