Abstract
In this work, we introduce a nonparametric clustering stopping rule algorithm based on the spatial median. Our proposed method aims to achieve a balance between the homogeneity within the clusters and the heterogeneity between clusters. The proposed algorithm maximises the ratio of the variation between clusters and the variation within clusters while adjusting for the number of clusters and number of observations. The proposed algorithm is robust against distributional assumptions and the presence of outliers. Simulations were used to validate the algorithm. We further evaluated the stability and the efficacy of the proposed algorithm using three real-world datasets. Moreover, we compared the performance of our model with 13 other traditional algorithms for determining the number of clusters. We found that the proposed algorithm outperformed 11 of the algorithms considered for comparison in terms of clustering number determination. The finding demonstrates that the proposed method provides a reliable alternative to determine the number of clusters for multivariate data.
| Original language | English |
|---|---|
| Number of pages | 19 |
| Journal | Journal of Applied Statistics |
| Early online date | 20 Sept 2025 |
| DOIs | |
| Publication status | E-pub ahead of print - 20 Sept 2025 |