Clustering based on poverty indicator data using K-Means cluster with Density-Based Spatial Clustering of Application with Noise

Sapriadi Rasyid; Siswanto Siswanto; Sitti Sahriman

doi:https://doi.org/10.59139/stattrans-2025-018

Clustering based on poverty indicator data using K-Means cluster with Density-Based Spatial Clustering of Application with Noise

Sapriadi Rasyid Department of Statistics, Hasanuddin University, Indonesia ORCID:https://orcid.org/0000-0006-2972-7125 , Siswanto Siswanto Corresponding author. Department of Statistics, Hasanuddin University, Indonesia ORCID:https://orcid.org/0000-0003-1934-5343 , Sitti Sahriman Department of Statistics, Hasanuddin University, Indonesia ORCID:https://orcid.org/0000-0002-9614-7132 Statistics in Transition new series, vol. 26, 2025, 2, pages: 113-128 Published online: 13 June 2025 https://doi.org/10.59139/stattrans-2025-018 Citation: Rasyid S., Siswanto S., Sahriman S., 2025. Clustering based on poverty indicator data using K-Means cluster with Density-Based Spatial Clustering of Application with Noise. Statistics in Transition new series, 26(2), pp. 113-128; https://doi.org/10.59139/stattrans-2025-018

800 Views 191 Downloads

ARTICLE

(English) PDF

ABSTRACT

The Indonesian government has implemented poverty alleviation programs, including assistance programs for the poor. Despite these efforts, the number of impoverished individuals in South Sulawesi continues to rise. To address this issue, a statistical method is necessary to cluster the poor based on error indicators for each region, serving as a reference for providing assistance. The appropriate statistical method is cluster analysis by minimizing object differences within one cluster and maximizing object differences between clusters. This study employs two methods, namely K-Means and Density-Based Spatial Clustering of Application with Noise (DBSCAN), to compare their effectiveness based on the Silhouette Coefficient. The data used for the analysis included eight poverty indicators for the South Sulawesi province in 2022. The K-Means method yielded two optimal clusters, with cluster 1 comprised of 23 regencies and cities, and cluster 2 only of Makassar City. The results of further analysis on cluster 1 consisted of eight new clusters and produced a Silhouette Coefficient of 0.507. In contrast, the DBSCAN method yielded one cluster, that encompassed 23 regencies and cities, with Makassar City identified as noise. The results of the further analysis on the clusters consisted of one cluster with three noises and produced a Silhouette Coefficient of 0.318. The study concludes that K-Means provides a higher Silhouette Coefficient and a more accurate representation of poverty clusters in South Sulawesi, which renders it a more effective tool for targeted poverty alleviation efforts.

KEYWORDS

Cluster, DBSCAN, poverty, K-Means, Silhouette Coefficient

REFERENCES

Abdi, H., Williams, L. J., (2010). Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2(4), pp. 433–459.

Akbar, T., Tinungki, G. M. and Siswanto, (2023). Performance of K-Medoids and Density Based Spatial Clustering of Application with Noise Using Silhouette Coefficient Test. Barekeng: J. Math. & App, 17(3), pp. 1605–1616.

Astutik, S., Solimun and Darmanto, (2018). Analisis Multivariat: Teori dan Aplikasinya dengan SAS. UB Press.

Bari, M. A., Kindzierski, W. B., (2018). Ambient volatile organic compounds (VOCs) in Calgary, Alberta: Sources and screening health risk assessment. Science of the Total Environment, 631, pp. 627–640.

Batool, F., Hennig, C., (2021). Clustering with the Average Silhouette Width. Computational Statistics and Data Analysis, 158(107190), pp. 1–18.

BPS, (2023, March). Profil Kemiskinan di Sulawesi Selatan. https://sulsel.bps.go.id

Chowdhury, S., Helian, N. and Cordeiro de Amorim, R., (2023). Feature weighting in DBSCAN using reverse nearest neighbours. Pattern Recognition, 137(109314), pp. 1–15.

Cordeiro de Amorim, R., Makarenkov, V., (2023). On k-means iterations and Gaussian clusters. Neurocomputing, 553(126547), pp. 1–10.

Dewi, D. A. I. C., Pramita, D. A. K., (2019). Analisis Perbandingan Metode Elbow dan Sillhouette pada Algoritma Clustering K-Medoids dalam Pengelompokan Produksi Kerajinan Bali. Jurnal Matrix, 9(3), pp. 102–109.

Festa, D., Novellino, A., Hussain, E., Bateson, L., Casagli, N., Confuorto, P., Soldato, M. D. and Raspini, F., (2023). Unsupervised detection of InSAR time series patterns based on PCA and K-means clustering. International Journal of Applied Earth Observation and Geoinformation, 118, pp. 1–13

González, C. A. D, Calderón, Y. M. M, Cruz, N. A. M and Sandoval, L. E. P., (2022). Typologies of Colombian off-grid localities using PCA and clustering analysis for a better understanding of their situation to meet SDG-7. Cleaner Energy Systems, 3(100023), pp. 1–16.

Granato, D., Santos, J. S., Escher, G. B., Ferreira, B. L. and Maggio, R. M., (2018). Use of principal component analysis (PCA) and hierarchical cluster analysis (HCA) for multivariate association between bioactive compounds and functional properties in foods: A critical perspective. Trends in Food Science and Technology, 72, pp. 83– 90.

Hahsler, M., Piekenbrock, M. and Doran, D., (2019). dbscan: Fast Density-Based Clustering with R. Journal of Statistical Software, 91, pp. 1–30.

Hair, J. F. J. R., Black, W. C., Babin, B. J. and Anderson, R. E., (2010). Multivariate Data Analysis (7th ed.). Pearson Education Inc.

Huang, Q., Chen, S. and Li, Y., (2023). Selection of seismic noise recording by K-means. Case Studies in Construction Materials, 19 (e02363), pp 1–16.

Jing, W., Zhao, C. and Jiang, C., (2019). An Improvement Method of DBSCAN Algorithm on Cloud Computing. Procedia Computer Science, 147, pp. 596–604.

Johnson, R. A., Wichern, D. W., (2007). Applied Multivariate Statistical Analysis. Pearson Prentice Hall.

Kherif, F., Latypova, A., (2019). Principal Component Analysis. In Machine Learning: Methods and Applications to Brain Disorders, pp. 209–225.

Kurita, T., (2019). Principal Component Analysis (PCA). In Computer Vision: a Reference Guide, pp. 1–4.

Liu, G., Ji, F., Sun, W. and Sun, L., (2023). Optimization design of short-circuit test platform for the distribution network of integrated power system based on improved K-means clustering. Energy Reports, 9, pp. 716–726.

Nurhaliza, N., Mustakim, (2021). Pengelompokan Data Kasus Covid-19 di Dunia Menggunakan Algoritma DBSCAN. IJIRSE, 1(1), 1–8.

Pramana, S., Yuniarto, B., Mariyah, S., Santoso, I. and Nooraeni, R., (2018). Data Mining Dengan R (Konsep Serta Implementasi). In Media.

Pu, G., Wang, L., Shen, J. and Dong, F., (2021). A hybrid unsupervised clustering-based anomaly detection method. Tsinghua Sci Technol, 26(2), pp. 146–153.

Rais, M., Goejantoro, R. and Prangga, S., (2021). Optimalisasi K-Means Cluster dengan Principal Component Analysis pada Pengelompokan Kabupaten/Kota di Pulau Kalimantan Berdasarkan Indikator Tingkat Pengangguran Terbuka. Jurnal Eksponensial, 12(2), pp. 129–135.

Řezanková, H. A. N. A., (2018). Different approaches to the silhouette coefficient calculation in cluster evaluation. In 21st International Scientific Conference AMSE Applications of Mathematics and Statistics in Economics, pp. 1–10.

Salmerón, R., García, C. B. and García, J., (2020). Variance Inflation Factor and Its Influence on Regression Models. Journal of Statistical Computation and Simulation, 90(12), pp. 1–15.

Starczewski, A., Cader, A., (2019). Determining the eps parameter of the DBSCAN algorithm. In Artificial Intelligence and Soft Computing: 18th International Conference, pp. 420–430.

Stewart, G., Al-Khassaweneh, M., (2022). An Implementation of the HDBSCAN Clustering Algorithm. Applied Sciences, 12(2405), pp. 1–21.

Zhang, R., Qiu, J., Guo, M., Cui, H. and Chen, X., (2022). An Adjusting Strategy after DBSCAN. IFAC-PapersOnLine, 55(3), pp. 219–222.