Tomasz Górecki https://orcid.org/0000-0002-9969-5257 , Mirosław Krzyśko https://orcid.org/0000-0001-0075-4432 , Waldemar Wołyński https://orcid.org/0000-0002-0777-9163
ARTICLE

(English) PDF

ABSTRACT

A new variable selection method is considered in the setting of classification with multivariate functional data (Ramsay and Silverman (2005)). The variable selection is a dimensionality reduction method which leads to replace the whole vector process, with a low-dimensional vector still giving a comparable classification error. Various classifiers appropriate for functional data are used. The proposed variable selection method is based on functional distance covariance (dCov) given by Székely and Rizzo (2009, 2012) and the Hilbert-Schmidt Independent Criterion (HSIC) given by Gretton et al. (2005). This method is a modification of the procedure given by Kong et al. (2015). The proposed methodology is illustrated with a real data example.

KEYWORDS

multivariate functional data, variable selection, dCov, HSIC, classification

REFERENCES

ANDO, T., (2009). Penalized optimal scoring for the classification of multi-dimensional functional data, Statistical Methodology, 6, pp. 565–576.

BERRENDERO, J. R., CUEVAS, A., TORRECILLA, J. L., (2016). Variable selection in functional data classification: a maxima-hunting proposal, Statistica Sinica, 26 (2), pp. 619–638.

DELAIGLE, A., HAAL, P., (2012). Methodology and theory for partial least squares applied to functional data. Annals of Statistics, 40, pp. 322–352.

FERRATY, F., VIEU, P., (2003). Curve discrimination. A nonparametric functional approach. Computational Statistics & Data Analysis, 44, pp. 161–173.

FERRATY, F., VIEU, P., (2009). Additive prediction and boosting for functional data. Computational Statistics & Data Analysis, 53 (4), pp. 1400–1413.

GÓRECKI, T., KRZYŚKO, M., WASZAK, Ł., WOŁYŃSKI, W., (2014). Methods of reducing dimension for functional data, Statistics in Transition new series, 15, pp. 231–242.

GÓRECKI, T., KRZYŚKO, M., WOŁYŃSKI, W., (2016). Multivariate functional regression analysis with application to classification problems, In: Analysis of Large and Complex Data, Studies in Classification, Data Analysis, and Knowledge Organization, Eds.: Wilhelm Adalbert F. X., Kestler Hans A., Springer International Publishing, pp. 173–183.

GRETTON, A., BOUSQUET, O., SMOLA, A., SCHÖLKOPF, B., (2005). Measuring statistical dependence with Hilbert-Schmidt norms. In: Algorithmic Learning Theory (S., Jain, H. U., Simon and E., Tomita, eds.), Lecture Notes in Computer Science, 3734, pp. 63–77, Springer, Berlin.

HASTIE, T. J., TIBSHIRANI, R. J., BUJA, A., (1995). Penalized discriminant analysis, Annals of Statistics, 23, pp. 73–102.

HORVÁTH, L., KOKOSZKA, P., (2012). Inference for Functional Data with Applications, Springer, New York.

JACQUES, J., PREDA, C., (2014). Model-based clustering for multivariate functional data, Computational Statistics & Data Analysis, 71, pp. 92–106.

KONG, J., WANG, S., WAHBA G., (2015). Using distance covariance for improved variable selection with application to learning genetic risk models, Statistics in Medicine, 34, pp. 1708–1720.

KUHN, M., Contributions from Jed Wing, Steve Weston, Andre Williams, Chris Keefer, Allan Engelhardt, Tony Cooper, Zachary Mayer, Brenton Kenkel, the R Core Team, Michael Benesty, Reynald Lescarbeau, Andrew Ziem, Luca Scrucca, Yuan Tang, Can Candan and Tyler Hunt, (2018), caret: Classification and Regression Training. R package version 6.0-80, https://CRAN.Rproject. org/package=caret.

R Core Team (2018). R: A language and environment for statistical computing, R Foundation for Statistical Computing, Vienna, Austria, https://www.Rproject. org/.

RAMSAY, J. O., SILVERMAN, B.W., (2005). Functional Data Analysis, Springer, New York.

RAMSAY, J. O., WICKHAM, H. GRAVES, S., HOOKER, G., (2018). fda: Functional Data Analysis, R package version 2.4.8, https://CRAN.R-project.org/package=fda.

RIZZO, M. L., SZÉKELY, G. J., (2018). energy: E-Statistics: Multivariate Inference via the Energy of Data, R package version 1.7-5, https://CRAN.Rproject. org/package=energy.

ROSSI, F., DELANNAYC, N., CONAN-GUEZA, B., VERLEYSENC, M., (2005). Representation of functional data in neural networks, Neurocomputing, 64, pp. 183–210.

ROSSI, F., VILLA, N., (2006). Support vector machines for functional data classification, Neural Computing, 69, pp. 730–742.

ROSSI, N., WANG, X., RAMSAY, J.O., (2002). Nonparametric item response function estimates with EM algorithm, Journal of Educational and Behavioral Statistics, 27, pp. 291–317.

SCHÖLKOPF, B., SMOLA, A. J., MÜLLER, K. R., (1998). Nonlinear component analysis as a kernel eigenvalue problem, Neural Computation, 10, pp. 1299– 1319.

SZÉKELY, G. J., RIZZO, M. L., BAKIROV, N. K., (2007). Measuring and testing dependence by correlation of distances, The Annals of Statistics, 35 (6), pp. 2769–2794.

SZÉKELY, G. J., RIZZO, M. L., (2009). Brownian distance covariance, Annals of Applied Statistics, 3 (4), pp. 1236–1265.

SZÉKELY, G. J., RIZZO, M. L., (2012). On the uniqueness of distance covariance, Statistical Probability Letters, 82 (12), pp. 2278–2282.

SZÉKELY, G. J., RIZZO, M. L., (2013). The distance correlation t-test of independence in high dimension. Journal of Multivariate Analysis, 117, pp. 193–213.

Back to top
Copyright © 2019 Statistics Poland