Empirical evaluation of OCLUS and GenRandomClust algorithms of generating cluster structures

Jerzy Korzeniewski

doi:https://doi.org/10.59170/stattrans-2013-030

Empirical evaluation of OCLUS and GenRandomClust algorithms of generating cluster structures

Jerzy Korzeniewski University of Lodz, Poland Statistics in Transition new series, vol. 14, 2013, 3, pages: 487-494 Published online: 2 September 2013 https://doi.org/10.59170/stattrans-2013-030

482 Views 17 Downloads

ARTICLE

(English) PDF

ABSTRACT

The OCLUS algorithm and genRandomClust algorithm are newest proposals of generating multivariate cluster structures. Both methods have the capacity of controlling cluster overlap, but both do it quite differently. It seems that OCLUS method has much easier, intuitive interpretation. In order to verify this opinion a comparative assessment of both algorithms was carried out. For both methods multiple cluster structures were generated and each of them was grouped into the proper number of clusters using k-means. The groupings were assessed by means of divisions similarity index (modified Rand index) referring to the classification resulting from the generation. The comparison criterion is the behaviour of the overlap parameters of structures. The monotonicity of the overlap parameters with respect to the similarity index is assessed as well as the variability of the similarity index for the fixed value of overlap parameters. Moreover, particular attention is given to checking the existence of an overlap parameter limit for the classical grouping procedures as well as uniform nature of overlap control with respect to all clusters.

KEYWORDS

cluster analysis, cluster structure generation, OCLUS algorithm, genRandomClust algorithm.

REFERENCES

ATLAS, R., OVERALL J., (1994). Comparative Evaluation of Two Superior Stopping Rules for Hierarchical Cluster Analysis, Psychometrika, 59, 581–591.

BLASHFIELD, R. K., (1976). „Mixture Model Tests of Cluster Analysis: Accuracy of Four Agglomerative Hierarchical Methods”, Psychological Bulletin, 83, 377–388.

GOLD, E., HOFFMAN P., (1976). Flange Detection Cluster Analysis, Multivariate Behavioral Research, 11, 217–235.

HUBERT, L., ARABIE, P., (1985). Comparing Partitions, Journal of Classification 2.KUIPER, F. K., FISHER, L., (1975). A Monte Carlo Comparison for Six Clustering Procedures, Biometrics, 31, 777–784.

MCINTYRE, R., BLASHFIELD, R., (1980). A Nearest-Centroid Technique for Evaluating the Minimum Variance Clustering Procedure, Multivariate Behavioral Research, 15, 225–238.

MILLIGAN, G., (1985). An Algorithm for Generating Artificial Test Clusters, Psychometrika, 50, 123–127.

PRICE, L., (1993). Identifying Cluster Overlap with NORMIX Population Membership Probabilities, Multivariate Behavioral Research, 28, 235–262.

QIU, W., JOE H., (2006). Generation of random clusters with specified degree of separation, Journal of Classification 23, 315–334.

STEINLEY, D., BRUSCO, M., (2007). Initializing k-means batch clustering: A critical evaluation of several techniques, Journal of Classification 24,99–121.

STEINLEY, D., HENSON, R., (2005). OCLUS: An Analytic Method for Generating Clusters with Known Overlap, Journal of Classification 22, 221–250.

WALLER, N., UNDERHILL, J, KAISER, H., (1999). A Method for Generating Simulated Plasmodes and Artificial Test Clusters with User-Defined Shape, Size and Orientation. Multivariate Behavioral Research, 34, 123–142