Terrance D. Savitsky https://orcid.org/0000-0003-1843-3106 , Matthew R. Williams https://orcid.org/0000-0001-8894-1240 , Julie Gershunskaya https://orcid.org/0000-0002-0096-186X , Vladislav Beresovsky https://orcid.org/0009-0002-8375-5195

© T. D. Savitsky, J. Gershunskaya, M. R. Williams, V. Beresovsky. Article available under the CC BY-SA 4.0 licence


(English) PDF


Nonprobability (convenience) samples are increasingly sought to reduce the estimation variance for one or more population variables of interest that are estimated using a randomized survey (reference) sample by increasing the effective sample size. Estimation of a population quantity derived from a convenience sample will typically result in bias since the distribution of variables of interest in the convenience sample is different from the population distribution. A recent set of approaches estimates inclusion probabilities for convenience sample units by specifying reference sample-weighted pseudo likelihoods. This paper introduces a novel approach that derives the propensity score for the observed sample as a function of inclusion probabilities for the reference and convenience samples as our main result. Our approach allows specification of a likelihood directly for the observed sample as opposed to the approximate or pseudo likelihood. We construct a Bayesian hierarchical formulation that simultaneously estimates sample propensity scores and the convenience sample inclusion probabilities. We use a Monte Carlo simulation study to compare our likelihood based results with the pseudo likelihood based approaches considered in the literature.


Survey sampling, Nonprobability sampling, Data combining, Inclusion probabilities, Exact sample likelihood, Bayesian hierarchical modeling.


Beaumont, J.-F., (2020). Are probability surveys bound to disappear for the production of official statistics? Survey Methodology, 46, 1–28.

Beresovsky, V., (2019). On application of a response propensity model to estimation from web samples. https://www.researchgate.net/publication/333915871_On_application_of_a_response_propensity_model_to_estimation_from_web_ samples.

Bhattacharya, A., D. Pati, and Y. Yang, (2019). Bayesian fractional posteriors. The Annals of Statistics, 47(1), 39 – 66.

Binder, D. A., (1996). Taylor linearization for single phase and two phase samples: A cookbook approach. Survey Methodology, 17–26.

Carvalho, C. M., N. G., Polson, and J. G. Scott (2009, 16–18 Apr). Handling sparsity via the horseshoe. In D. van Dyk and M. Welling (Eds.), Proceedings of the Twelth International Conference on Artificial Intelligence and Statistics, Volume 5 of Proceedings of Machine Learning Research, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, pp. 73–80. PMLR.

Chen, Y., P. Li, and C. Wu, (2020). Doubly robust inference with nonprobability survey samples. Journal of the American Statistical Association, 115(532), 2011–2021.

DiSogra, C., C. Cobb, E. Chan, and J. M. Dennis (2011). Calibrating nonprobability internet samples with probability samples using early adopter characteristics. JSM Proceedings, Survey Research Methods Section, Alexandria, VA: American Statistical Association., pp. 4501–4515.

Elliott, M. R., (2009). Combining data from probability and non-probability samples using pseudo-weights. Survey Practice 2, 813–845.

Elliott, M. R. and R. Valliant, (2017). Inference for Nonprobability Samples. Statistical Science, 32(2), 249 – 264.

Gelman, A., D. Lee, and J. Guo, (2015). Stan: A probabilistic programming language for bayesian inference and optimization. In press. Journal of Educational and Behavior Science.

Johnson, N. G., M. R. Williams, and E. C. Riordan, (2021). Generalized nonlinear models can solve the prediction problem for data from species-stratified use-availability designs. Diversity and Distributions, 27(11), 2077–2092.

Lancaster, T. and G. Imbens, (1996). Case-control studies with contaminated controls. Journal of Econometrics, 71(1-2), 145–160.

Leon-Novelo, L. G. and T. D. Savitsky, (2019). Fully Bayesian estimation under informative sampling. Electronic Journal of Statistics, 13(1), 1608 – 1645.

Reiter, J. P. and T. E. Raghunathan, (2007). The multiple adaptations of multiple imputation. Journal of the American Statistical Association, 102(480), 1462–1471.

Tillé, Y. and A. Matei, (2021). sampling: Survey Sampling. R package version 2.9. Valliant, R., (2020). Comparing alternatives for estimation from nonprobability samples. Journal of Survey Statistics and Methodology, 8(2), 231–263.

Valliant, R. and J. A. Dever, (2011). Estimating propensity adjustments for volunteer web surveys. Sociological Methods and Research, 40, 105–137.

Wang, L., R. Valliant, and Y. Li, (2021). Adjusted logistic propensity weighting methods for population inference using nonprobability volunteer-based epidemiologic cohorts. Stat Med., 40(4), 5237–5250.

Williams, M. R. and T. D. Savitsky, (2021). Uncertainty Estimation for Pseudo-Bayesian Inference Under Complex Sampling. International Statistical Review, 89(1), 72–107.

Wu, C., (2022). Statistical inference with non-probability survey samples. Survey Methodology, 48(2), 283–311.

Back to top
© 2019–2024 Copyright by Statistics Poland, some rights reserved. Creative Commons Attribution-ShareAlike 4.0 International Public License (CC BY-SA 4.0) Creative Commons — Attribution-ShareAlike 4.0 International — CC BY-SA 4.0