Stacked regression with a generalization of the Moore-Penrose Pseudoinverse

Tomasz Górecki; Maciej  Łuczak

doi:https://doi.org/10.59170/stattrans-2017-023

Stacked regression with a generalization of the Moore-Penrose Pseudoinverse

Tomasz Górecki Faculty of Mathematics and Computer Science, Adam Mickiewicz University, Umultowska 87, 61-614 Poznan, Poland. , Maciej Łuczak Faculty of Civil Engineering, Environmental and Geodetic Sciences, Koszalin University of Technology, Sniadeckich 2, 75-453 Koszalin, Poland. Statistics in Transition new series, vol. 18, 2017, 3, pages: 443-458 Published online: 1 September 2017 https://doi.org/10.59170/stattrans-2017-023

286 Views 15 Downloads

ARTICLE

(English) PDF

ABSTRACT

In practice, it often happens that there are a number of classification methods. We are not able to clearly determine which method is optimal. We propose a combined method that allows us to consolidate information from multiple sources in a better classifier. Stacked regression (SR) is a method for forming linear combinations of different classifiers to give improved classification accuracy. The Moore-Penrose (MP) pseudoinverse is a general way to find the solution to a system of linear equations. This paper presents the use of a generalization of the MP pseudoinverse of a matrix in SR. However, for data sets with a greater number of features our exact method is computationally too slow to achieve good results: we propose a genetic approach to solve the problem. Experimental results on various real data sets demonstrate that the improvements are efficient and that this approach outperforms the classical SR method, providing a significant reduction in the mean classification error rate.

KEYWORDS

stacked regression, genetic algorithm, Moore-Penrose pseudoinverse.

REFERENCES

BAUER, E., KOHAVI, R., (1999). An experimental comparison of voting classi fication algorithms: bagging, boosting, and variants. Machine Learning, 36,105–139.

BEN-ISRAEL, A., GREVILLE, T.N.E. (2003). Generalized inverses. Theory and applications. Springer.

BERGMANN, G., HOMMEL, G., (1988). Improvements of general multiple test procedures for redundant systems of hypotheses. In Multiple Hypotheses Test ing. P. Bauer, G. Hommel and E. Sonnemann (eds.) Springer, 110–115.

BREIMAN, L., (1996a). Stacked regression. Machine Learning, 24, 49–64.

BREIMAN, L., (1996b). Bagging predictors. Machine Learning, 24, 123–140.

BREIMAN, L., (2001). Random forests. Machine Learning, 45, 5–32.

BREIMAN, L., FRIEDMAN, J.H., OLSHEN, R.A., STONE, C.J., (1984). Classi fication and regression trees, Wadsworth, California.

CHO, S.B., KIM, J.H., (1995). Multiple network fusion using fuzzy logic. IEEE Transactions on Systems, Man, and Cybernetics, 6, 497–501.

DOUMPOS, M., ZOPOUNIDIS, C., (2007). Model combination for credit risk as sessment: A stacked generalization approach. Annals of Operations Research,151, 289–306.

DUIN, R., TAX, D., (2000). Experiments with classifier combining rules. Lecture Notes in Computer Science, 1857, 16–29.

DŽEROSKI, S., ŽENKO, B., (2004). Is combining classifiers with stacking better than selecting the best one? Machine Learning, 54, 255–273.

FRANK, A., ASUNCION, A., (2010). UCI Machine Learning Repository. http://archive.ics.uci.edu/ml Irvine, CA: University of California, School of Information and Computer Science.

GARCIA, S., HERRERA, F., (2008). An extension on ”statistical comparisons of classifiers over multiple data sets” for all pairwise comparisons. Journal of Machine Learning Research, 9, 2677–2694.

GLEIM, G., (1984). The profiling of professional football players. Clinical Sport Medicine, 3(1), 185–97.

GÓRECKI, T., (2005). Effect of choice of dissimilarity measure on classification efficiency with nearest neighbor method. Discussiones Mathematicae Proba bility and Statistics, 25(2), 217–239.

GÓRECKI, T., ŁUCZAK, M., (2013). Linear discriminant analysis with a gen eralization of Moore-Penrose pseudoinverse. International Journal of Applied Mathematics and Computer Science, 26(2), 463–471.

HUANG, Y.S., SUEN, C.Y., (1995). A method of combining multiple experts for the recognition of unconstrained handwritten numerals. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 90–93.

IMAN, R., DAVENPORT, J., (1980). Approximations of the critical region of the Friedman statistic. Communications in Statistics - Theory and Methods, 9(6),571–595.

KUNCHEVA, L.I., (2004). Combining Pattern Classifiers: Methods and Algo rithms. Wiley.

KUNCHEVA, L., WHITAKER, C., (2003). Measures of diversity in classifier ensembles. Machine Learning, 51, 181–207.

KYRCHEI, I., (2015). Cramer’s rule for generalized inverse solutions. In Advances in Linear Algebra Research I. Kyrchei (ed.) Nova Science Publishers, 79–132.

LEBLANC, M., TIBSHIRANI, R., (1996). Combining estimates in regression and classification. Journal of the American Statistical Association, 91, 1641–1650.

LEDEZMA, A., ALER, R., SANCHIS, A., BORRAJO, D., (2010). GA-stacking:evolutionary stacked generalization. Intelligent Data Analysis, 14, 89–119.

MARQUÉS, A., GARCÍA, V., SÁNCHEZ, J., (2012). Exploring the behavior of base classifiers in credit scoring ensembles. Expert Systems with Applications,39(11), 10244–10250.

MICHALEWICZ, Z., FOGEL, D.B., (2004). How To Solve It: Modern Heuristics.Springer.

MOJIRSHEIBANI, M., (2002). A comparison study of some combined classifiers.Communications in Statistics - Simulation and Computation, 31(2), 245–260.

MORRISON, D.F., (1976). Multivariate Statistical Methods. McGraw-Hill.

NI, W., BROWN, S., MAN, R., (2009). Stacked partial least squares regression analysis for spectral calibration and prediction. Journal of Chemometrics, 23,505–517.

OZAY, M., VURAL, F.T.Y., (2008). On the performance of stacked generalization classifiers. Lecture Notes in Computer Science, 5112, 445–454.

RAO, C.R., MITRA, S.K., (1971). Generalized Inverse of Matrices and its Appli cations. Wiley.

ROKACH, L., (2010). Ensemble-based classifiers. Artificial Intelligence Review,33, 1–39.

ROONEY, N., PATTERSON, D., ANAND, S., TSYMBAL, A., (2004). Dynamic integration of regression models. Lecture Notes in Computer Science, 3077,64–173.

ROONEY, N., PATTERSON, D., NUGENT, C., (2004). Reduced ensemble size stacking. Tools with Artificial Intelligence. ICTAI 6th IEEE International Conference, 266–271.

SCHAPIRE, R.E., (1990). The strength of weak learnability. Machine Learning, 5,197–227.

SEBER, G.A.F., (1984). Multivariate Observations. New York: Wiley.

SEHGAL, M.S.B., GONDAL, I., DOOLEY, L., (2005). Stacked regression en semble for cancer class prediction. Industrial Informatics INDIN 3rd IEEE International Conference, 831–835.

SESMERO, M., LEDEZMA, A., SANCHIS, A., (2015). Generating ensembles of heterogeneous classifiers using Stacked Generalization. Wiley Interdisci plinary Reviews: Data Mining and Knowledge Discovery, 5(1), 21–34.

SHUNMUGAPRIYA, P., KANMANI, S., (2013). Optimization of stacking ensem ble configurations through artificial bee colony algorithm. Swarm and Evolu tionary Computation, 12, 24–32.

SIGLETOS, G., PALIOURAS, G., SPYROPOULOS, C.D., HATZOPOULOS, M.,(2005). Combining information extraction systems using voting and stacked generalization. Journal of Machine Learning Research, 6, 1751–1782.

WERNECKE, K., (1992). A coupling procedure for discrimination of mixed data.Biometrics, 48, 497–506.

WOLPERT, D., (1992). Stacked generalization. Neural Networks, 5, 241–259.

VAN DER HEIJDEN, F., DUIN, R.P.W., DE RIDDER, D., TAX. D.M.J., (2004).Classification, Parameter Estimation and State Estimation: An Engineering Approach Using Matlab, New York: Wiley.

WEBB, A., (2002). Statistical Pattern Recognition. New York: Wiley.

XU L., JIANG J.H., ZHOU Y.P., WU H.L., SHEN G.L., YU R.Q., (2007). MCCV stacked regression for model combination and fast spectral interval selection in multivariate calibration. Chemometrics and Intelligent Laboratory Systems, 87(2), 226–230.

ZHANG, H., (2004). The Optimality of Naive Bayes. 17. FLAIRS Conference 2004: Miami Beach, Florida, USA 562–567