Normality tests for transformed large measured data: a comprehensive analysis

Abu Feyo Bantu; Andrzej Kozyra; Józef Wiora

doi:https://doi.org/10.59139/stattrans-2025-034

Normality tests for transformed large measured data: a comprehensive analysis

Abu Feyo Bantu Department of Measurements and Control Systems, Silesian University of Technology, Gliwice, Poland ORCID:https://orcid.org/0000-0001-6463-9864 , Andrzej Kozyra 2Department of Measurements and Control Systems, Silesian University of Technology, Gliwice, Poland ORCID:https://orcid.org/0000-0003-2645-3537 , Józef Wiora Department of Measurements and Control Systems, Silesian University of Technology, Gliwice, Poland ORCID:https://orcid.org/0000-0002-8450-8623 Statistics in Transition new series, vol. 26, 2025, 3, pages: 195-208 Published online: 5 September 2025 https://doi.org/10.59139/stattrans-2025-034 Citation: Bantu A.F., Kozyra A., Wiora J., 2025. Normality tests for transformed large measured data: a comprehensive analysis. Statistics in Transition new series, 26(3), pp. 195-208 https://doi.org/10.59139/stattrans-2025-034

1030 Views 59 Downloads

ARTICLE

(English) PDF

ABSTRACT

In statistical analysis, evaluating the normality of large datasets is crucial for validating parametric tests, particularly in areas such as Global Navigation Satellite System (GNSS) measurements, where data often exhibit non-normal characteristics resulting from their variability and errors. This research aims to transform the measured GNSS data and to assess the effectiveness of transformation methods in achieving normality. Techniques like logarithmic, quantile and rank-based Inverse Normal Transformation (INT) were evaluated using visual methods (histograms, Q-Q plots), descriptive statistics (skewness, kurtosis) and statistical tests, including Kolmogorov-Smirnov (KS), Anderson-Darling (AD), Lilliefors (LF), D’Agostino K-squared (DA), Shapiro-Wilk (SW), Jarque-Bera (JB), Cramérvon Mises (CM), and Pearson Chi-square (Chi2) tests. The sensitivity of these tests to deviations from normality was assessed through the Receiver Operating Characteristic (ROC) analysis and the Area Under the Curve (AUC) values at a significance l evel o f 0 .1, using Monte Carlo (MC) simulations across the varying sample sizes. The results showed that untransformed latitude data consistently failed normality tests, while transformed data displayed normal characteristics. The rank-based INT showed superior effectiveness, influenced by the original distribution and characteristics of the dataset. The findings underscore the importance of tailored transformations in large-scale data applications, enhancing the accuracy and applicability of parametric statistical methods in geospatial and other industrial domains.

KEYWORDS

GNSS, ROC, normality test, statistical analysis.

REFERENCES

Aslam, M., Sherwani, R. A. K. and Saleem, M., (2021). Vague data analysis using neutrosophic Jarque-Bera test. Plos one, 16(12), e0260689.

Barba, P., Rosado, B., Ramírez-Zelaya, J. and Berrocoso, M., (2021). Comparative analysis of statistical and analytical techniques for the study of GNSS geodetic time series.Engineering Proceedings, 5(1), p. 21.

Cai, J. and Xu, X., (2024). Bayesian analysis of mixture models with yeo-johnson transformation. Communications in Statistics-Theory and Methods, 53(18), pp. 6600–6613.

D’Agostino, R. B., (2017). Tests for the normal distribution. in Goodness-of-fit-techniques, Routledge, pp. 367–420.

Demir, S., (2022). Comparison of normality tests in terms of sample sizes under different skewness and kurtosis coefficients. International Journal of Assessment Tools in Education, 9(2), pp. 397–409.

Ghasemi, A., and Zahediasl, S., (2012). Normality tests for statistical analysis: A guide for non-statisticians. International Journal of Endocrinology and Metabolism, 10, pp. 486– 489.

Huang, Z., Zhao, T., Lai, R., Tian, Y. and Yang, F., (2023). A comprehensive implementation of the log, Box-Cox and log-sinh transformations for skewed and censored precipitation data. Journal of Hydrology, 620, pp. 129347.

Khatun, N., (2021). Applications of normality test in statistical analysis. Open Journal of Statistics, 11(1), pp. 113–122.

Kumbhar, D. D., Kumar, S., Dubey, M., Kumar, A., Dongale, T. D., Pawar, S. D. and Mukherjee, S., (2024). Exploring statistical approaches for accessing the reliability of y2o3-based memristive devices. Microelectronic Engineering, 288, p. 112166.

Kwak, S. G. and Park, S. H., (2019). Normality test in clinical research. Journal of Rheumatic Diseases, 26(1), pp. 5–11.

Li, D.-C.,Wen, I.-H. and Chen,W.-C., (2016). A novel data transformation model for small data-set learning. International Journal of Production Research, 54(24), pp. 7453–7463.

McCaw, Z. R., Lane, J. M., Saxena, R., Redline, S. and Lin, X., (2020). Operating characteristics of the rank-based inverse normal transformation for quantitative trait analysis in genome-wide association studies. Biometrics, 76(4), pp. 1262–1272.

Nahm, F. S., (2022). Receiver operating characteristic curve: overview and practical use for clinicians. Korean journal of anesthesiology, 75(1), pp. 25–36.

Obilor, E. I. and Amadi, E. C., (2018). Test for significance of Pearson’s correlation coefficient. International Journal of Innovative Mathematics, Statistics & Energy Policies, 6(1), pp. 11–23.

Ogaja, C. A., (2022), GNSS data processing. in Introduction to GNSS Geodesy: Foundations of Precise Positioning Using Global Navigation Satellite Systems, Springer, pp. 119–134.

Osborne, J., (2010). Improving your data transformations: Applying the Box-Cox transformation. Practical Assessment, Research, and Evaluation, 15(1).

Patrício, M., Ferreira, F., Oliveiros, B. and Caramelo, F., (2017). Comparing the performance of normality tests with roc analysis and confidence intervals. Communications in Statistics: Simulation and Computation, 46, pp. 7535–7551.

Peterson, R. A., (2021). Finding optimal normalizing transformations via best normalize. R Journal, 13(1).

Pham, L., (2021). Frequency connectedness and cross-quantile dependence between green bond and green equity markets. Energy Economics, 98, p. 105257.

Raymaekers, J. and Rousseeuw, P. J., (2024). Transforming variables to central normality. Machine Learning, 113(8), pp. 4953–4975.

Razali, N. M. and Wah, Y. B., (2011). Power comparisons of Shapiro-Wilk, Kolmogorov- Smirnov, Lilliefors and Anderson-Darling tests. Journal of Statistical Modeling and Analytics, 2, pp. 13–14.

Rolke, W. and Gongora, C. G., (2021), A chi-square goodness-of-fit test for continuous distributions against a known alternative. Computational Statistics, 36(3), pp. 1885– 1900.

Stine, R. A., (2017). Explaining normal quantile-quantile plots through animation: the water-filling analogy. The American Statistician, 71(2), pp. 145–147.

Sun, J. and Xia, Y. (2024). Pretreating and normalizing metabolomics data for statistical analysis. Genes & Diseases, 11(3), p. 100979.

Tabachnick, B. G., Fidell, L. S. and Ullman, J. B., (2019). Using Multivariate Statistics, 7th ed., Pearson.

Uyanto, S. S., (2022). An extensive comparisons of 50 univariate goodness-of-fit tests for normality. Austrian Journal of Statistics, 51(3), pp. 45–97.

Von Mises, R., (2014). Mathematical theory of probability and statistics, Academic press.

Wilcox, R. R., (2010). Fundamentals of modern statistical methods: Substantially improving power and accuracy, Vol. 249, Springer.

Yan, P., (2024). Jackknife test for faulty GNSS measurements detection under non-gaussian noises. in ‘Proceedings of the 37th International Technical Meeting of the Satellite Division of The Institute of Navigation (ION GNSS+ 2024)’, pp. 1619–1641.

Yap, B.W. and Sim, C. H., (2011). Comparisons of various types of normality tests. Journal of Statistical Computation and Simulation, 81, pp. 2141–2155.

Zygmonta, C. S., (2023). Managing the assumption of normality within the general linear model with small samples: Guidelines for researchers regarding if, when and how. The Quantitative Methods for Psychology.