The main objective of the article is to analyze topics from the field of economics and management discussed in the Polish publications from 2000 to 2024. The research process allowed the identification of the main topics and the evaluation of their importance in subsequent years covered by the analysis. The BERTopic model was chosen as the main research method. The paper presents both the theoretical basis of the employed research method and the results of its application to the analysis of the Polish publication achievements registered in the Scopus database. The paper presents a description of topics identified, a specification of the relationship between them and changes in the importance of each topic between 2000 and 2024. All calculations were performed using computer programs prepared in Python language.
publication achievements, topic modelling, BERTopic method.
Blei, D., Ng, A. and Jordan, M., (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, (3), pp. 993–1022.
Campello Ricardo J. G. B. and Moulavi, D. and S. J., (2013). Density-Based Clustering Based on Hierarchical Density Estimates, in V.S. and C.L. and M.H. and X.G. Pei Jian and Tseng (ed.) Advances in Knowledge Discovery and Data Mining. Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 160–172.
Deerwester, S. et al., (1990). Indexing by latent semantic analysis. Journal of the Society for Information Science, 41(6), pp. 391–407.
Devlin, J. et al., (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Available at: https://arxiv.org/abs/1810.04805.
Firth, J. R., (1962). A synopsis of linguistic theory, 1930–1955, in Studies in Linguistic Analysis. Oxford: Blackwell.
Grootendorst, M., (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794 [Preprint].
Hofmann, T., (1999). Probabilistic Latent Semantic Indexing. New York: ACM.
Lee, D. D., Seung, H. S., (1999). Learning the parts of objects by nonnegative matrix factorization. Nature, 401, pp. 788–791.
McInnes, L., Healy, J. and Melville, J., (2020). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. Available at: https://arxiv.org/abs/1802.03426.
Prim, R. C., (1957). Shortest connection networks and some generalizations. The Bell System Technical Journal, 36(6), pp. 1389–1401. Available at: https://doi.org/10.1002/j.1538-7305.1957.tb01515.x.
Reimers, N., Gurevych, I., (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks, in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. Available at: https://arxiv.org/abs/1908.10084.
Rijcken, E., (2023). CV Topic Coherence Explained. Understanding the metric that correlates the highest with humans.
Sparck Jones, K., (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of documentation, 28(1), pp. 11–21.
Vaswani, A. et al., (2017). Attention is all you need, in Advances in Neural Information Processing Systems.