Halyna Holubova https://orcid.org/0000-0003-4847-5235

© Halyna Holubova. Article available under the CC BY-SA 4.0 licence

ARTICLE

(English) PDF

ABSTRACT

The dynamic development of the digitized society generates large-scale information data flows. Therefore, data need to be compressed in a way allowing its content to remain complete and informative. In order for the above to be achieved, it is advisable to use the principal component method whose main task is to reduce the dimension of multidimensional space with a minimal loss of information. The article describes the basic conceptual approaches to the definition of principle components. Moreover, the methodological principles of selecting the main components are presented. Among the many ways to select principle components, the easiest way is selecting the first k-number of components with the largest eigenvalues or to determine the percentage of the total variance explained by each component. Many statistical data packages often use the Kaiser method for this purpose. However, this method fails to take into account the fact that when dealing with random data (noise), it is possible to identify components with eigenvalues greater than one, or in other words, to select redundant components. We conclude that when selecting the main components, the classical mechanisms should be used with caution. The Parallel analysis method uses multiple data simulations to overcome the problem of random errors. This method assumes that the components of real data must have greater eigenvalues than the parallel components derived from simulated data which have the same sample size and design, variance and number of variables. A comparative analysis of the eigenvalues was performed by means of two methods: the Kaiser criterion and the parallel Horn analysis on the example of several data sets. The study shows that the method of parallel analysis produces more valid results with actual data sets. We believe that the main advantage of Parallel analysis is its ability to model the process of selecting the required number of main components by determining the point at which they cannot be distinguished from those generated by simulated noise.

KEYWORDS

principal components, principal component analysis, factor analysis, Kaiser criterion, ?arallel analysis, simulation

REFERENCES

Cattell, R. B., (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, pp. 245–276.

Chinkulyak, N. M., Pogrebnyak, L. O., (2015). Statystychnyi analiz yak instrument derzhavnoho upravlinnia [Statistical analysis as a tool of public administration] Derzhavne upravlinnia – Governance, 1, pp. 82–88, [in Ukrainian].

Çokluk, Ö., Koçak, D., (2016). Using Horn’s parallel analysis method in exploratory factor analysis for determining the number of factors. Educational Sciences: Theory & Practice, 16, pp. 537–551.

Decathlon Dataset, (2004). Retrieved from: https://malouche.github.io/data_in_class/decathlon_data.html.

Facebook Dataset, (2016). Machine learning repository. Retrieved from: https://archive.ics.uci.edu/ml/datasets/Facebook+metrics.

Gene Dataset. Machine learning repository. Retrieved from: https://archive.ics.uci.edu/ml/datasets.php?format=&task=&att=&area=&numAtt =&numIns=&type=&sort=nameUp&view=table.

Glorfeld, L. W., (1995). An improvement on Horn’s parallel analysis methodology for selecting the correct number of factors to retain. Educational and Psychological Measurement, 55, pp. 377–393.

Gorsuch, R. L., (1983). Factor analysis (2nd ed.). Hillsdale, NJ: LawrenceErlbaum. Harshman, R. A., Reddon, J. R., (1983). Determining the number of factors by comparing real with random data: A serious flaw and some possible corrections. Proceedings of the Classification Society of North America at Philadelphia, pp. 14–15.

Hayton, J., Alllen, D., (2004). Factor Retention. Decisions in Exploratory Factor Analysis: A Tutorial on Parallel Analysis. Retrieved from: https://www.researchgate.net/publication/235726204_Factor_Retention_Decision s_in_Exploratory_Factor_Analysis_A_Tutorial_on_Parallel_Analysis/link/5582a8 5008ae6cf036c1a886/download.

Holubova, H. V., (2013). Statystychnyi analiz osnovnykh faktoriv vplyvu na tranzyt vantazhiv v Ukraini [Statistical analysis of basis factor influence on in-transit freight in Ukraine by regression model] Visnyk Kyivskoho natsionalnoho universytetu im. T. Shevchenka. Ekonomika – Bulletin of Taras Shevchenko National University of Kyiv. Economics, 134, pp. 12–16, [in Ukrainian].

Holubova, H. V., (2020). Pryntsypy vyboru holovnykh komponent: osoblyvosti prykladnoho modeliuvannia [Principles of choice of main components: features of applied modeling] Novi dzherela ta metody poshyrennia danykh u statystytsi: materialy XVIII Mizhnarodnoi naukovo-praktychnoi konferentsii z nahody Dnia pratsivnykiv statystyky]. New sources and methods of data dissemination in statistics: materials of the XVIII International scientific-practical conference on the occasion of the Day of Statistics. Kyiv, Informatsiino-analitychneahenstvo. Information and Analytical Agency, pp. 155–160, [in Ukrainian].

Horn, J. L., (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 32, pp. 179-185.

Ierina, A. M., (2014). Komponentnyi analiz [Component analysis] Statystychne modeliuvannia ta prohnozuvannia [Statistical modeling and forecasting], 348, pp 287–313.

Kaiser, H. F., (1970). A second-generation Little Jiffy. Psychometrika, 35, pp. 401-415.

Korepanov, O. S., (2018). Metodolohiia indeksnoho analizu rivnia rozvytku informatsiinoho suspilstva [Methodology of index analysis of the level of development of the information society]. Statystyka Ukrainy – Statistics of Ukraine, 1, pp. 6–15, [in Ukrainian].

Lepeyko, T., Shcherbak, A., (2018). Determining factors to ensure the effective formation of the information process in the industrial enterprise management. Development Management, 16(4), pp. 88–97.

Osaulenko, O., Holubova, H. and Horobets, O., (2021). Implementing Bid Data in the Public Administration. Stratehiia rozvytku Ukrainy: finansovo-ekonomichnyi ta humanitarnyi aspekty: materialy VIII Mizhnarodnoi naukovo-praktychnoi konferentsii – Strategy of development of Ukraine: financial and economic and humanitarian aspects: materials of the VIII International scientific-practical conference. Kyiv, Informatsiino-analitychne ahenstvo, pp. 219–222, [in English].

Rosen, V. P., Reutsky, M. O. and Demchik, Ya. M., (2018). Zastosuvannia metoda holovnykh component dlia identyfikatsii holovnykh faktoriv vplyvu na velychynu elektrospozhyvannia [Application of the principal components method to identify the main factors influencing the amount of electricity consumption]. Enerhetyka: ekonomika, tekhnolohii, ekolohiia: naukovyi zhurnal – Energy: economics, technology, ecology: a scientific journal, No. 3 (53), pp. 81–87, [in Ukrainian].

Silverstein, A. B., (1987). Note on the parallel analysis criterion for determining the number of common factor or principal components. Psychological Reports, 61, pp. 351–354.

Zwick, W. R., Velicer, W. F., (1986). Comparison of five rules for determining the numberof components to retain. Psychological Bulletin, 99(3), pp. 432–442.

Back to top
© 2019–2024 Copyright by Statistics Poland, some rights reserved. Creative Commons Attribution-ShareAlike 4.0 International Public License (CC BY-SA 4.0) Creative Commons — Attribution-ShareAlike 4.0 International — CC BY-SA 4.0