Andrzej Młodak https: //orcid.org/0000-0002-6853-9163 , Tomasz Józefowski https://orcid.org/0000-0001-9485-1946
ARTICLE

(English) PDF

ABSTRACT

In this paper, we describe an attempt made to develop an efficient disclosure control algorithm for microdata in a statistical portal used for releasing detailed statistical information at various levels of spatial aggregation. The proposed algorithm is based on perturbative methods, such as microaggregation with Gower’s distance for categorical variables and the addition of correlated noise for continuous variables, but it also offers several alternative options in this regard. Moreover, the algorithm can be used to assess the loss of information by measuring distribution disturbances (based on a complex distance that accounts for all measurement scales) and the impact of the Statistical Disclosure Control (SDC) on the strength of correlations between variables (for continuous variables). Through the application of the tools offered by the sdcMicro R package, the algorithm was tested using microdata about agricultural farms and farm animals collected in the 2020 Polish Agricultural Census. We present the results of the tests and discuss the main problems and challenges connected with the use of such tools.

KEYWORDS

Statistical Disclosure Control, perturbative methods, disclosure risk, information loss, agricultural census.

REFERENCES

Abowd, J. M., (2018). The US Census Bureau adopts differential privacy. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, p. 2867.

Brand, R., (2002). Microdata protection through noise addition. Inference Control in Statistical Databases: From Theory to Practice, pp. 97–116.

Calian, V., (2020). Methods of statistical disclosure control for aggregate data with a case study on the new Icelandic geospatial system of statistical output areas. Working Papers of Statistics Iceland, 105(6).

Jackson, J., Mitra, R., Francis, B., and Dove, I., (2024). Obtaining (?,?) – Differential Privacy Guarantees When Using a Poisson Mechanism to Synthetize Contingency Tables. Privacy in Statistical Databases: International Conference, PSD 2024, Antibes Juanles- Pins, France, September 25–27, 2024. Proceedings, pp. 102–112.

Jansson, I., (2012). Issues and plans for the disclosure control of the Swedish Census 2011. En Workshop on Statistical Disclosure Control of Census Data, Luxembourg.

Kim, J. J., (1986). A method for limiting disclosure in microdata based on random noise and transformation. Proceedings of the Section on Survey Research Methods, pp. 303–308.

Kowarik, A., (2015). New computational tools and methods for official statistics [Doctoral dissertation, Technische Universitat Wien].

Kraus, J., (2021). Statistical Disclosure Control methods for Harmonised Protection of Census Data: A Grid Case. Demografie, 63(4), pp. 199–215.

Młodak, A., Pietrzak, M., and Jozefowski, T., (2022). The trade-off between the risk of disclosure and data utility in SDC: A case of data from a survey of accidents at work. Statistical Journal of the IAOS, 38(4), pp. 1503–1511.

Młodak, A., (2020). Information loss resulting from Statistical Disclosure Control of output data [(in Polish)]. Wiadomo´sci Statystyczne. The Polish Statistician, 65(09), pp. 7–27.

Muralidhar, K., Domingo-Ferrer, J., (2023). A Rejoinder to Garfinkel (2023) – Legacy Statistical Disclosure Limitation Techniques for Protecting 2020 Decennial US Census:Still a Viable Option. Journal of Official Statistics, 39(3), pp. 411–420.

Shlomo, N., (2022). How to Measure Disclosure Risk in Microdata? The Survey Statistician, 86, pp. 13–21.

Shlomo, N., Skinner, C., (2022). Measuring risk of re-identification in microdata: State-ofthe art and new directions. Journal of the Royal Statistical Society. Series A: Statistics in Society, 185(4), pp. 1644–1662.

Shlomo, N., Tudor, C., and Groom, P., (2010). Data swapping for protecting census tables. Privacy in Statistical Databases: UNESCO Chair in Data Privacy, International Conference, PSD 2010, Corfu, Greece, September 22–24, Proceedings, pp. 41–51.

SNI et al., (2022). Census 2021 Statistical Disclosure Control Methodology. Northern Ireland Statistics & Research Agency.

Templ, M., (2017). Statistical Disclosure Control for Microdata. Methods and Applications in R. Springer International Publishing AG, Cham, Switzerland.

Templ, M., Kowarik, A., and Meindl, B., (2015). Statistical Disclosure Control for Micro- Data Using the R Package sdcMicro. Journal of Statistical Software, 67(4), pp. 1–36.

Tran, T., Reimherr, M., and Slavkovic, A., (2024). Differentially private quantile regression. Privacy in Statistical Databases: International Conference, PSD 2024, Antibes Juanles- Pins, France, September 25–27, Proceedings, pp. 18–34.

Yoon, J., Drumright, L. N., and Van Der Schaar, M., (2020). Anonymization Through Data Synthesis Using Generative Adversarial Networks (ADS-GAN). IEEE Journal of Biomedical and Health Informatics, 24(8), pp. 2378–2388.

Zayatz, L., (2002). SDC in the 2000 US Decennial Census. In Inference Control in Statistical Databases: From Theory to Practice, pp. 193–202. Springer.

Back to top
© 2019–2025 Copyright by Statistics Poland, some rights reserved. Creative Commons Attribution-ShareAlike 4.0 International Public License (CC BY-SA 4.0) Creative Commons — Attribution-ShareAlike 4.0 International — CC BY-SA 4.0