Forough Karlberg
ARTICLE

(English) PDF

ABSTRACT

Skewed distributions with representative outliers pose a problem in many surveys. Various small area prediction approaches for skewed data based on transformation models have been proposed. However, in certain applications of those predictors, the fact that the survey data also contain a non-negligible number of zero-valued observations is sometimes dealt with rather crudely, for instance by arbitrarily adding a constant to each value (to allow zeroes to be considered as “positive observations, only smaller”, instead of acknowledging their qualitatively different nature). On the other hand, while a lognormal-logistic model has been proposed (to incorporate skewed distributions as well as zeroes), that model does not include any hierarchical aspects, and is therefore not explicitly adapted to small area prediction. In this paper, we consolidate the two approaches by extending one of the already established log-transformation mixed small area prediction models to incorporate a logistic component. This allows for the simultaneous, systematic treatment of domain effects, outliers and zero-valued observations in a single framework. We benchmark the resulting model-based predictors (against relevant alternatives) in applications to simulated data as well as empirical data from the Australian Agricultural and Grazing Industries Survey.

KEYWORDS

small area estimation, representative outliers, zero-valued observations, lognormal-logistic mixture model.

REFERENCES

BARNETT, V., LEWIS, T., (1994). Outliers in Statistical Data, 3rd ed. John Wiley & Sons.

BATTESE, G.E., HARTER, R.M., FULLER, W.A., (1988). An error component model for prediction of county crop areas using survey and satellite data,Journal of the American Statistical Association, Vol. 83, pp. 28–36.

BERG, E., CHANDRA, H., (2012). Small area prediction for a unit level lognormal model, Federal Committee on Statistical Methodology Research Conference.

CARROLL, R., RUPPERT, D., (1988). Transformation and Weighting in Regression, Chapman and Hall.

CHAMBERS, R. L., (1986). Outlier robust finite population estimation, Journal of the American Statistical Association, Vol. 81, pp. 1063–1069.

CHAMBERS, R. L., CHANDRA, H., SALVATI, N., TZAVIDIS. N., (2014). Outlier robust small area estimation, Journal of the Royal Statistical Society Series B: Statistical Methodology, Vol. 76, pp. 47–69.

CHAMBERS, R. L., DORFMAN, A. H., (2003). Transformed variables in survey sampling, Joint Statistical Meetings, Section on Survey Research Methods.

CHAMBERS, R. L., TZAVIDIS, N., (2006). M-quantile models for small area estimation., Biometrika, Vol. 93, pp. 255–268.

CHANDRA, H., CHAMBERS, R. L., (2005). Comparing EBLUP and C-EBLUP for Small Area Estimation, Statistics in Transition, Vol. 7, pp. 637–648.

CHANDRA, H., CHAMBERS, R. L., (2011). Small area estimation under transformation to linearity, Survey Methodology, Vol. 37, pp. 39–51.

CHEN, G., CHEN, J., (1996). A Transformation Method for Finite Population Sampling Calibrated with Empirical Likelihood, Survey Methodology, Vol. 22, pp. 139–146.

CHEN, J., CHEN, S.-Y., RAO, J. N. K., (2003). Empirical Likelihood Confidence Intervals for the Mean of a Population Containing Many Zero Values, The Canadian Journal of Statistics, Vol. 31, pp. 53–68.

FELLNER, W. H., (1986). Robust estimation of variance components, Technometrics, Vol. 28, pp. 51–60.

FULLER, W. A., (1991), Simple estimators for the mean of Skewed populations, Statistica Sinica, Vol. 1, pp. 137–158.

HIDIROGLOU, M. A., SMITH, P. A., (2005). Developing Small Area Estimates for Business Surveys at the ONS, Statistics in Transition, Vol. 7, pp. 527-539.

HIDIROGLOU, M. A., SRINATH, K. P., (1981). Some estimators of a population total from simple random samples containing large units, Journal of the American Statistical Association Vol. 76, pp. 690-695.

HUBER, P. J., (1981). Robust Statistics, John Wiley.

HUBERT, M., VAN DER VEEKEN, S., (2007). Outlier detection for skewed data, Journal of Chemometrics Vol. 22, pp. 235–246.

HUGGETT, M., (1996). Wealth distribution in life-cycle economies, Journal of Monetary Economics, Vol. 38, pp. 469–494.

JIANG, J., LAHIRI, P., (2006). Estimation of Finite Population Domain Means: A Model-Assisted Empirical Best Prediction Approach, Journal of the American Statistical Association, Vol. 101, pp. 301–311.

KARLBERG, F., (2000a). Population Total Prediction Under a Lognormal Superpopulation Model, Metron, Vol. LVIII, pp. 53–80.

KARLBERG, F., (2000b). Survey Estimation for Highly Skewed Populations in the Presence of Zeroes, Journal of Official Statistics, Vol. 16, pp. 229–241.

KOKIC, P. N., (1998). On Winsorisation in Business Surveys, SSC Annual Meeting, Proceedings of the Survey Methods Section.

LAMBERTA, D., (1992). Zero-Inflated Poisson Regression, With an Application to Defects in Manufacturing, Technometrics, 34, pp. 1–14.

LEE, H. L., (1995). Outliers in Business Surveys, In Business Surveys Methods, edited by Cox, Binder, Chinnappa, Christianson, Colledege and Kott, Chapter 26. John Wiley.

LEHTONEN, R., SÄRNDAL C. E., VEIJANEN, A., (2003). The effect of model choice in estimation for domains, including small domains, Survey Methodology, Vol. 29, pp. 33–44.

MINCER, J., (1970). The Distribution of Labor Incomes: A Survey With Special Reference to the Human Capital Approach, Journal of Economic Literature 8, pp. 1–26.

MOLINA, I., (2009). Uncertainty under a multivariate nested-error regression model with logarithmic transformation, Journal of Multivariate Analysis, Vol. 100, pp. 963–980.

MOLINA, I., Marhuenda, Y., (2013). Package ‘sae’,http://cran.r project.org/web/packages/sae/sae.pdf

PFEFFERMANN, D., (2013). New Important Developments in Small Area Estimation, Statistical Science 28, pp. 40–68.

PFEFFERMANN, D., Terryn, B. Moura, F. A. S., (2008). Small area estimation under a two-part random effects model with application to estimation of literacy in developing countries, Survey Methodology, Vol. 34, pp. 235–249.

RAO, J. N. K., (2003), Small Area Estimation, Wiley.

ROYALL, R. M., (1982), Finite populations (Sampling from), Entry in the Encyclopedia of Statistical Sciences.

ROYALL, R. M., CUMBERLAND, W. G., (1978). Variance estimation in finite population Sampling, Journal of the American Statistical Association Vol. 71, pp. 351–358.

RÖNNEGARD, L., SHEN, X. ALAM, M., (2010). hglm: A Package for Fitting Hierarchical Generalized Linear Models, The R Journal Vol. 2, pp. 20-28,http://journal.r-project.org/archive/2010-2/RJournal_2010-2_Roennegaard~et~al.

SALVATI, N., Chandra, H., Chambers, R. L., (2012). Model Based Direct Estimation of Small Area Distributions, Australian & New Zealand Journal of Statistics 54, pp. 103–123.

SEARLS, D. T., (1966). An estimator which reduces large true observations,Journal of American Statistical Association, Vol. 61, pp. 1200–1204.

SINHA, S. K., RAO, J. N. K., (2009). Robust small area estimation, Canadian Journal of Statistics, Vol. 37, pp. 381–399.

SHLOMO, N., PRIAM, R., (2013). Improving Estimation in Business Surveys.Chapter 4.2, 52–70 in BLUE-ETS Deliverable D6.2: Best practice recommendations on variance estimation and small area estimation in business surveys, edited by R. Bernardini Papalia, C. Bruch, T. Enderle, S. Falorsi, A. Fasulo, E. Fernandez-Vazquez, M. Ferrante, , J.P. Kolb, R. Münnich, S. Pacei, R. Priam, P. Righi, T. Schmid, N. Shlomo, F. Volk and T. Zimmermann.

SLUD, E., MAITI, T. (2006). Mean-squared error estimation in transformed Fay-Herriot models, Journal of the Royal Statistical Society: Series B (Statistical Methodology), Vol. 68, pp. 239–257.

THORBURN, D., (1993). The treatment of outliers in economic statistics,Proceedings of the International Conference on Establishment Surveys, Buffalo, New York.

WANG, J., FULLER W. A., (2003). The Mean Squared Error of Small Area Predictors Constructed with Estimated Area Variances.” Journal of the American Statistical Association, Vol. 98, pp. 716–723.

YOUNG, K. H., YOUNG, L. Y., (1975). Estimation of Regressions Involving Logarithmic Transformation of Zero Values in the Dependent Variable, The American Statistician , Vol. 29, pp. 118–120.

ZIMMERMANN, T., Münnich, R., (2013). Coherent small area estimates for skewed business data, Proceedings of the 2013 European Establishment Statistics Workshop

Back to top
© 2019–2024 Copyright by Statistics Poland, some rights reserved. Creative Commons Attribution-ShareAlike 4.0 International Public License (CC BY-SA 4.0) Creative Commons — Attribution-ShareAlike 4.0 International — CC BY-SA 4.0