Optimal sample allocation in multivariate stratified sampling: a comparison of deterministic and stochastic optimization algorithms

Dalius Pumputis

doi:https://doi.org/10.59139/stattrans-2026-001

Optimal sample allocation in multivariate stratified sampling: a comparison of deterministic and stochastic optimization algorithms

Dalius Pumputis 1Vilnius Gediminas Technical University (VILNIUS TECH), Lithuania ORCID:https://orcid.org/0000-0003-0954-0663 Statistics in Transition new series, vol. 27, 2026, 1, pages: 1-20 Published online: 11 March 2026 https://doi.org/10.59139/stattrans-2026-001 Citation: Pumputis D., 2026. Optimal sample allocation in multivariate stratified sampling: a comparison of deterministic and stochastic optimization algorithms. Statistics in Transition new series, 27(1), pp. 1-20 https://doi.org/10.59139/stattrans-2026-001

771 Views 156 Downloads

ARTICLE

(English) PDF

ABSTRACT

This study addresses the problem of optimal sample allocation in multivariate stratified sampling, where survey accuracy and cost-efficiency are the key concerns. Two optimization formulations are examined: one aims to minimize the total survey cost subject to constraints on the precision of the estimators of the population totals, while the other seeks to minimize a weighted sum of the relative variances of these estimators, given a fixed total survey budget. Classical and modern optimization approaches are reviewed and evaluated, including Integer Programming Algorithms (IPA), Bethel’s Algorithm (BA), Constrained Optimization by Linear Approximations (COBYLA), and three stochastics, namely Generalized Simulated Annealing Algorithm (GSAA), Particle Swarm Optimization (PSOA) and Biased Random- Key Genetic Algorithm (BRKGA). Using synthetic and real-world populations, numerical experiments demonstrate that IPA consistently achieves the global minimum and serves as the benchmark. While BA underperforms, BRKGA emerges as a competitive alternative, closely matching IPA in most scenarios. Results also highlight the impact of variable skewness on allocation efficiency, with real-world datasets being more complex and thus having higher sampling demands. The findings underscore the importance of adaptive, integerfeasible optimization methods for accurate and cost-effective survey design.

KEYWORDS

constrained optimization by linear approximations, integer programming, multivariate stratified sampling, optimal sample allocation, stochastic optimization

REFERENCES

Ahsan, M. J., Khan, S. U., (1982). Optimum allocation in multivariate stratified random

sampling with overhead cost. Metrika, 29, pp. 71–78. Available from: https://doi.org/10.1007/BF01893366.

AL-Kassab, M. M., Ali, A. A., (2015). Using particle swarm optimization to determine the optimal strata boundaries. J. Adv. Math., 11(1). Available from: https://rajpub.com/index.php/jam/article/view/1290.

Barcaroli, G., (2014). SamplingStrata: An R package for the optimization of stratified sampling. J. Stat. Softw., 61(4), pp. 1–24. Available from: https://doi.org/10.18637/jss.v061.i04.

Bean, J. C., (1994). Genetic algorithms and random keys for sequencing and optimization. ORSA J. Comput., 6(2), pp. 154–160. Available from: https://doi.org/10.1287/ijoc.6.2.154.

Bendtsen, C., (2022). pso: Particle Swarm Optimization. Available from: https://cran.rproject.org/web/packages/pso/index.html. R package version 1.0.4.

Bethel, J., (1985). An optimum allocation algorithm for multivariate surveys. Proc. Surv. Res. Methods Sect., pp. 209–212. Available from: http://www.asasrms.org/Proceedings/papers/1985035.pd f .

Bethel, J., (1989). Sample allocation in multivariate surveys. Surv. Methodol., 15(1), pp. 47–57. Available from: https://www.istat.it/en/files/2016/10/Sample-Allocationin-Multivariate-Surveys.pdf.

Brito, J. A., do Nascimento Silva, P. L., Semaan, G. S. and Maculan, N., (2015a). Integer programming formulations applied to optimal allocation in stratified sampling. Surv. Methodol., 41(2), pp. 427–442. Available from: https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2015002/article/14249-eng.pdf?st=P7ZqwcD1.

Brito, J. A., do Nascimento Silva, P. L., Maculan, N. and Semaan, G. S., (2015b). MultAlloc: Optimal Allocation in Stratified Sampling. R package version 1.2.

Brito, J. A., Fadel, A. and Semaan, G. S., (2022). A genetic algorithm applied to optimal allocation in stratified sampling. Commun. Stat. Simul. Comput., 51(7), pp. 3714–3732. Available from: https://doi.org/10.1080/03610918.2020.1722832.

Brito, J. A., Semaan, G. S. and Fadel, A., (2023). BRKGA: Biased Random Key Genetic Algorithm for Optimization Problems. R package version 0.1.0.

Chatterjee, S., (1967). A note on optimum allocation. Scand. Actuar. J., 50, pp. 40–44.Available from: https://doi.org/10.1080/03461238.1967.10406206.

Clerc, M., (2012). Standard particle swarm optimisation, Preprint on HAL Open Archive. Available from: https://hal.science/hal-00764996v1.

Clerc, M., Kennedy, J., (2002). The particle swarm – explosion, stability, and convergence in a multidimensional complex space. IEEE Trans. Evol. Comput., 6(1), pp. 58–73. Available from: https://doi.org/10.1109/4235.985692.

Cochran, W. G., (1977). Sampling techniques, 3rd ed. New York: Wiley. Available from: https://books.google.lt/books?id=xbNn41DUrNwC.

Dayal, S., (1985). Allocation of sample using values of auxiliary characteristic. J. Stat. Plan. Inference, 11(3), pp. 321–328.

del Valle, Y., Venayagamoorthy, G. K., Mohagheghi, S., Hernandez, J. C. and Harley, R. G., (2008). Particle swarm optimization: Basic concepts, variants and applications in power systems. IEEE Trans. Evol. Comput., 12(2). Available from:https://doi.org/10.1109/TEVC.2007.896686.

Garciá, J. A. D., Cortez, L. U., (2006). Optimum allocation in multivariate stratified sampling: multi-objective programming. Comunic. Del Cimat, no I-06-07/28-03-2006. Available from: https://cimat.repositorioinstitucional.mx/jspui/bitstream/1008/656/1/I-06-07.pdf.

Ghasemi, M., Akbari, E., Rahimnejad, A., Razavi, S. E., Ghavidel, S. and Li, L., (2018). Phasor particle swarm optimization: a simple and efficient variant of pso. Soft Comput., 23, pp. 9701–9718. Available from: https://doi.org/10.1007/s00500-018-3536-8.

Gonçalves, J. F., Resende, M. G. C., (2011). Biased random-key genetic algorithms for combinatorial optimization. J. Heuristics, 17(5), pp. 487–525. Available from: https://doi.org/10.1007/s10732-010-9143-1.

Gupta, S., Haq, A. and Varshney, R., (2024). Problem of compromise allocation in multivariate stratified sampling using intuitionistic fuzzy programming. Ann. Data Sci., 11, pp. 425–444. Available from: https://doi.org/10.1007/s40745-022-00410-y.

Haq, A., Ali, I. and Varshney, R., (2020). Compromise allocation problem in multivariate stratified sampling with flexible fuzzy goals. J. Stat. Comput. Simul., 90(9), pp. 1557–1569. Available from: https://doi.org/10.1080/00949655.2020.1734808.

Horvitz, D. G., Thompson, D. J., (1952). A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc., 47, pp. 663–685. Available from: https://doi.org/10.1080/01621459.1952.10483446.

Imran, M., Hashim, R. and Abd Khalid, N. E., (2013). An overview of particle swarm optimization variants. Procedia Eng. , 53, pp. 491–496. Available from: https://doi.org/10.1016/j.proeng.2013.02.063.

Jalil, S. A., Haq, A., Owad, A. A., Hashmi, N. and Adichwal, N. K., (2023). A hierarchical multi-level model for compromise allocation in multivariate stratified sample surveys with non-response problem, Knowl.-Based Syst., 278. Available from: https://doi.org/10.1016/j.knosys.2023.110839.

Kadane, J. B., (2005). Optimal dynamic sample allocation among strata. J. Off. Stat., 21(4), pp. 531–541. Available from: https://doi.org/10.1184/R1/6586808.v1.

Kennedy, J., Eberhart, R., (1995). Particle swarm optimization. Proc. ICNN’95 – Int. Conf. Neural Netw., pp. 1942–1948. Available from: https://doi.org/10.1109/ICNN.1995.488968.

Khan, M. F., Ali, I. and Ahmad, Q. S., (2011). Chebyshev approximate solution to allocation problem in multiple objective surveys with random costs. Am. J. Comput. Math., 01(04), pp. 247–251. Available from: https://doi.org/10.4236/ajcm.2011.14029.

Khan, M. G. M. and Ahsan, M. J., (2003). A note on optimum allocation in multivariate stratified sampling. South Pac. J. Nat. Appl. Sci., 21(1), pp. 91–95. Available from: https://doi.org/10.1071/SP03017.

Khan, M. G. M., Ahsan, M. J. and Jahan, N., (1998). Compromise allocation in multivariate stratified sampling: An integer solution. Nav. Res. Logist., 44(1). Available from: https://doi.org/10.1002/(SICI)1520-6750(199702)44:1<69::AID-NAV4>3.0.CO;2-K.

Khan, M. G. M., Maiti, T. and Ahsan, M. J., (2010). An optimal multivariate stratified sampling design using auxiliary information: an integer solution using goal programming approach. J. Off. Stat., 26, pp. 695–708. Available from: https://www.semanticscholar.org/paper/An-optimal-multivariate-stratified-sampling-design-Khan-Maiti/a8cfea23255468fd838e09ef09b2f9976f984985.

Kish, L., (1976). Optima and proxima in linear sample designs. J. R. Stat. Soc. Ser. A, 139(1), pp. 80–95. Available from: https://doi.org/10.2307/2344384.

Kokan, A. R., Khan, S., (1967). Optimum allocation in multivariate surveys : An analytical solution. J. R. Stat. Soc. Ser. B (Methodol.), 29(1), pp. 115–125. Available from: https://doi.org/10.1111/j.2517-6161.1967.tb00679.x.

Kuhn, H. W., Tucker, A. W., (1951). Nonlinear programming. Proc. 2nd Berkeley Symp. Math. Stat. Prob., pp. 481 – 492.

Li, J., Sun, Y. and Hou, S., (2021). Particle swarm optimization algorithm with multiple phases for solving continuous optimization problems. Discret. Dyn. Nat. Soc.. Available from: https://doi.org/10.1155/2021/8378579.

Mahfouz, M. I., Rashwan, M. M. and Khadr, Z. A., (2023). Optimal Stochastic Allocation in Multivariate Stratified Sampling. Math. Stat., 11(4), pp. 676–684. Available from: https://doi.org/10.13189/ms.2023.110409.

Neyman, J., (1934). On the two different aspects of the representative method: The method of stratified sampling and the method of purposive selection (with discussion). J. R. Stat. Soc., 97, pp. 558–625. Available from: https://doi.org/10.2307/2342192.

Powell, M. J. D., (1994). A direct search optimization method that models the objective and constraint functions by linear interpolation. In Gomez, S. and Hennart J. P. (Eds.), Advances in Optimization and Numerical Analysis, pp. 51–67, Kluwer Academic, Dordrecht. Available from: https://doi.org/10.1007/978-94-015-8330-5.

R Core Team, (2023). R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria. Available from: https://www.Rproject.org/.

Raghav, Y. S., Haq, A. and Ali I., (2023). Multiobjective intuitionistic fuzzy programming under pessimistic and optimistic applications in multivariate stratified sample allocation problems. PLoS ONE, 18(4). Available from: https://doi.org/10.1371/journal.pone.0284784.

Reddy, K. G., Khan, M. G. M. and Khan, S., (2018). Optimum strata boundaries and sample sizes in health surveys using auxiliary variables. PLoS ONE, 13(4).

Rini, D. P., Shamsuddin, S. M. and Yuhaniz, S. S., (2011). Particle swarm optimization: Technique, system and challenges. Int. J. Comput. Appl., 14(1). Available from: https://doi.org/10.5120/1810-2331.

Shannon, C. E., (1948). A Mathematical Theory of Communication. Bell Syst. Tech. J., 27(3), pp. 379–423. Available from: https://people.math.harvard.edu/ ctm/home/text/others/shannon/entropy/entropy.pdf.

Swain, A. K., (2013). A note on optimum allocation in stratified random sampling. Invest. Oper., 34(2).

Tsallis, C., Stariolo, D. A., (1996). Generalized simulated annealing. Physica A, 233(1), pp 395–406. Available from: https://doi.org/10.1016/S0378-4371(96)00271-3.

Varshney, R., Khan, M. G. M., Fatima, U. and Ahsan, M. J., (2014). Integer compromise allocation in multivariate stratified surveys. Ann. Oper. Res., 226(1), pp. 659–668. Available from: https://doi.org/10.1007/s10479-014-1734-z.

Wesołowski, J., Wieczorkowski, R. and Wójciak, W., (2024). Recursive Neyman algorithm for optimum sample allocation under box constraints on sample sizes in strata. Surv. Methodol., 50(2). Available from: https://www150.statcan.gc.ca/n1/pub/12-001-x/2024002/article/00003-eng.pdf.

Wolsey, L. A., (1998). Integer Programming, John Wiley & Sons, New York. Available from: https://books.google.lt/books/about/IntegerProgramming.html?id = x7RvQgAACAAJrediresc = y.

Wright, T., (2017). Exact optimal sample allocation: More efficient than Neyman. Stat. Probab. Lett., 129, pp. 50–57. Available from: https://doi.org/10.1016/j.spl.2017.04.026.

Wright, T., (2020). A general exact optimal sample allocation algorithm: With bounded cost and bounded sample sizes. Stat. Probab. Lett., 165. Available from: https://doi.org/10.1016/j.spl.2020.108829.

Xiang, Y., Gubian, S., Suomela, B. and Hoeng, J., (2013). Generalized simulated annealing for global optimization: The GenSA package. R J., 5(1), pp. 13–28.Available from: https://doi.org/10.32614/RJ-2013-002.

Yates, F., (1960). Sampling Methods for Censuses and Surveys, Charles Griffin and Co., London. Available from: https://archive.org/details/samplingmethodsf0000fran/page/n5/mode/2up.