Advances on Permutation Multivariate Analysis of Variance for big data

Stefano Bonnini; Getnet Melak Assegie

doi:10.2478/stattrans-2022-0022

Advances on Permutation Multivariate Analysis of Variance for big data

Stefano Bonnini Department of Economics and Management, University of Ferrara, Italy ORCID:https://orcid.org/0000-0002-7972-3046 , Getnet Melak Assegie University of Parma, Italy ORCID:https://orcid.org/0000-0001-7288-9636 Statistics in Transition new series, vol. 23, 2022, 2, pages: 163–183 Published online: 14 June 2022 DOI 10.2478/stattrans-2022-0022

1081 Views 38 Downloads

ARTICLE

(English) PDF

ABSTRACT

In many applications of the multivariate analyses of variance, the classic parametric solutions for testing hypotheses of equality in population means or multisample and multivariate location problems might not be suitable for various reasons. Multivariate multisample location problems lack a comparative study of the power behaviour of the most important combined permutation tests as the number of variables diverges. In particular, it is useful to know under which conditions each of the different tests is preferable in terms of power, how the power of each test increases when the number of variables under the alternative hypothesis diverges, and the power behaviour of each test as the function of the proportion of true alternative hypotheses. The purpose of this paper is to fill the gap in the literature about combined permutation tests, in particular for big data with a large number of variables. A Monte Carlo simulation study was carried out to investigate the power behaviour of the tests, and the application to a real case study was performed to show the utility of the method.

KEYWORDS

big data, MANOVA, permutation test, multivariate analysis

REFERENCES

Anderson, M. J., (2001). A new method for non-parametric multivariate analysis of variance. Austral ecology, 26(1), pp. 32–46.

Arboretti, R., Bonnini, S., (2008). Moment-based multivariate permutation tests for ordinal categorical data. Journal of Nonparametric Statistics, 20(5), pp. 383–393.

Arboretti, R., Bonnini, S., (2009). Some new results on univariate and multivariate permutation tests for ordinal categorical variables under restricted alternatives. Statistical Methods and Applications: Journal of the Italian Statistical Society, 18(2), pp. 221–236.

Arboretti, R., Ceccato, R., Corain, L., Ronchi, F. and Salmaso, L., (2018). Multivariate small sample tests for two-way designs with applications to industrial statistics. Statistical Papers, 59(4), pp. 1483–1503.

Baro, E., Degoul, S., Beuscart, R. and Chazard, E., (2015). Toward a literature-driven definition of big data in healthcare. BioMed research international (https://doi.org/10.1155/2015/639021).

Bonnini, S., And Melak Assegie, G., (2019). Permutation multivariate tests for treatment effect: theory and recent developments. In SUSAN SSACAB 2019, pp. 30–30. The Biostatistics Research Unit of the South African Medical Research Council.

Bonnini, S., (2014). Testing for heterogeneity with categorical data: permutation solution versus bootstrap method. Communications in Statistics: Theory and Methods, 43(4), pp. 906–917.

Bonnini, S., (2016). Multivariate approach for comparative evaluations of customer satisfaction with application to transport services. Communications in Statistics: Simulation and Computation, 45(5), pp. 1554–1568.

Bonnini, S., Corain, L., Marozzi, M. and Salmaso, L., (2014). Nonparametric hypothesis testing: rank and permutation methods with applications in R. John Wiley & Sons.

Bonnini, S., Prodi, N., Salmaso, L., Visentin, C., (2014). Permutation approaches for stochastic ordering. Communications in Statistics: Theory and Methods, 43(10-12), pp. 2227–2235.

Clarke, K.R., (1993). Non-parametric multivariate analyses of changes in community structure. Australian journal of ecology, 18(1), pp.117–143.

Farcomeni, A. and Greco, L., (2016). Robust methods for data reduction. CRC press.

Finch, W.H., (2016). Comparison of multivariate means across groups with ordinal dependent variables: a Monte Carlo simulation study. Frontiers in Applied Mathematics and Statistics, 2, p. 2.

Hotelling, H., (1992). The generalization of Student’s ratio. In Breakthroughs in statistics, (pp. 54-65). Springer, New York, NY.

Johnson, R., (1997). Wichern. D., (2007). Applied multivariate statistical analysis. Prentice-Hall: London.

Legendre, P. and Anderson, M. J., (1999). Distance-based redundancy analysis: testing multispecies responses in multifactorial ecological experiments. Ecological monographs, 69(1), pp.1–24.

Mantel, N., Valand, R. S., (1970). A technique of nonparametric multivariate analysis. Biometrics, pp. 547-558.

McArdle, B. H., Anderson, M. J., 2001. Fitting multivariate models to community data: a comment on distance-based redundancy analysis. Ecology, 82(1), pp. 290– 297.

Mielke Jr, P. W., Berry, K. J., (1999). Multivariate tests for correlated data in completely randomized designs. Journal of Educational and Behavioral Statistics, 24(2), pp. 109–131.

Mielke Jr, P. W., Berry, K. J., Johnson, E. S., (1976). Multi-response permutation procedures for a priori classifications. Communications in Statistics: Theory and Methods, 5(14), pp. 1409–1424.

Özköse, H., Ari, E. S. and Gencer, C., (2015). Yesterday, today and tomorrow of big data. Procedia-Social and Behavioral Sciences, 195, pp. 1042–1050.

Pesarin, F., (2001). Multivariate permutation tests: with applications in biostatistics, Vol. 240. Wiley: Chichester.

Pesarin, F., Salmaso, L., (2010a). Permutation tests for complex data: theory, applications and software. John Wiley & Sons: Chichester.

Pesarin, F., Salmaso, L., (2010b). Finite-sample consistency of combination-based permutation tests with application to repeated measures designs. Journal of Nonparaetric Statistics, 22(5), pp. 669–684.

Pillai, K. S., (1955). Some new test criteria in multivariate analysis. The Annals of Mathematical Statistics, pp. 117–121.

Pillar, V., (2013). How accurate and powerful are randomization tests in multivariate analysis of variance?. Community Ecology, 14(2), pp. 153–163.

Pillar, V.D.P., Orlóci, L., (1996). On randomization testing in vegetation science: multifactor comparisons of relevé groups. Journal of Vegetation Science, 7(4), pp. 585–592.

Polko-Zajac, D., (2019). On permutation location-scale tests. Statistics in Transition, 20(4), pp. 153-166.

Polko-Zajac, D., (2020). A comparative study on the power of parametric and permutation tests for a multidimensional and two-sample location problem. Argumenta Oeconomica Cracoviensia, 2(23), pp. 69–79

Wilks, S. S., (1932). Certain generalizations in the analysis of variance. Biometrika, pp. 471–494.