Daniel Kosiorowski , Dominik Mielczarek , Jerzy Rydlewski , Małgorzata Snarska
ARTICLE

(English) PDF

ABSTRACT

In this paper we present a novel perspective dedicated for sparse high-dimensional data sets, i.e. data which contain many zeros among coordinates of observations. Using jointly, selected sparse methods recently proposed in multivariate statistics, and kernel density framework for discrete data, we outline a general perspective for bringing out useful information from big economic databases. As a framework for our considerations we take the so-called functional data analysis, which originates from Ramsay and Silverman works. In particular we use functional principal components analysis within 2D density estimation procedure proposed by Simonoff.

KEYWORDS

sparse data, sparse methods, robust methods, categorical data, big data.

REFERENCES

CROUX, C., FILZMOSER, P., FRITZ, H., (2012). Robust Sparse Principal Component Analysis, Technometrics

DONG, J., SIMONOFF, J. S., (1994). The Construction and Properties of Boundary Kernels for Smoothing Sparse Multinomials. Journal of Computational and Graphical Statistics. Vol. 3, No. 1, 57–66.

HASTIE, T., TIBSHIRIANI, R., FRIEDMAN, J., (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition, Springer.

JACOB, P., OLIVEIRA, P. E., (2011). Local smoothing with given marginals, Journal of Statistical Computation and Simulation, DOI: 10.1080/00949655.2011.561436

JOLIFFE, I. T., TRENDAFILOV, N. T., UDDIN, M., (2003). A modified principal component technique based on the lasso, Journal of Computational and Graphical Statistics 12: 531–547.

KRZYŚKO, M., GÓRECKI, T., DERĘGOWSKI, K., (2012). Jądrowa i Funkcjonalna Analiza Składowych Głównych – spotkanie PTS o. w Poznaniu (referat dostępny na stronach PTS o. w Poznaniu http://www.stat.gov.pl/pts/ )

MIZERA, I., (2002). On Depth and Depth Points: a Calculus. The Annals of Statistics (30), 1681–1736.

RAMSAY, J. O., HOOKER, G., GRAVES, S., (2010). Functional Data Analysis with R and Matlab, Springer, New York.

SHANE, K. V., SIMONOFF, J. S., (2001). A robust approach to categorical data analysis, Journal of Computational and Graphical Statistics, Vol. 10, No. 1, 135–157.

SILVERSTEIN, J., BAI, Z., (1995). On the empirical distribution of eigenvalues of a class of large dimensional random matrices. Journal of Multivariate Analysis 54, (2), 175–192.

SIMONOFF, J. S., (1985). An improved goodness-of-fit statistic for sparse multinomials, Journal of the American Statistical Association, Vol. 80, No. 391, 671–677.

SIMONOFF, J. S., (1988). Detecting outlying cells in two-way contingency tables

via backward-stepping, Technometrics, Vol. 30, No. 3, 339–345.

SIMONOFF, J. S., (1995). A simple, automatic and adaptive bivariate density estimator based on conditional densities, Statistics and Computing, Vol. 5, 245–252.

SIMONOFF, J. S., (1983). A penalty function approach to smoothing large sparse contingency tables. The Annals of Statistics. Vol. 11, No. 1, 208–218.

SIMONOFF, J. S., (1998). Three sides of smoothing: categorical data smoothing, nonparametric regression, and density estimation. International Statistical Review, Vol. 66, No. 2, 137–156.

SNARSKA, M., (2012). A random matrix approach to dynamic factors in macroeconomic data, Acta Phys. Pol A, 121 (2B), 110–120.

VOICULESCU, D. V., (1991). Limit laws for random matrices and free products, Invent. Math. 104, 201.

Back to top
© 2019–2024 Copyright by Statistics Poland, some rights reserved. Creative Commons Attribution-ShareAlike 4.0 International Public License (CC BY-SA 4.0) Creative Commons — Attribution-ShareAlike 4.0 International — CC BY-SA 4.0