Determining the Number of Clusters Using Multivariate Ranks

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Determining number of clusters in a multivariate data has become one of the most important issues in very diversified areas of scientific disciplines. The forward search algorithm is a graphical approach that helps us in this task. The traditional forward search approach based on Mahalanobis distances has been introduced by Hadi (1992), Atkinson (1994), while Atkinson et al. (2004) used it as a clustering method. But like many other Mahalanobis distance-based methods, it cannot be correctly applied to asymmetric distributions and more generally, to distributions which depart from the elliptical symmetry assumption. We propose a new forward search methodology based on spatial ranks, where clusters are grown with one data point at a time sequentially, using spatial ranks with respect to the points already in the subsample. The algorithm starts from a randomly chosen initial subsample. We illustrate with simulated data that the proposed algorithm is robust to the choice of initial subsample and it performs well in different mixture multivariate distributions. We also propose a modified algorithm based on the volume of central rank regions. Our numerical examples show that it produces the best results under elliptic symmetry.
Original languageEnglish
Title of host publicationRecent Advances in Robust Statistics
Subtitle of host publicationTheory and Applications
EditorsClaudio Agostinelli, Ayanendranath Basu, Peter Filzmoser, Diganta Mukherjee
PublisherSpringer
Chapter2
Pages17-33
ISBN (Electronic)9788132236436
ISBN (Print)9788132236412
DOIs
Publication statusPublished - 11 Nov 2016
EventInternational Conference on Robust Statistics 2015 - Kolkata, India
Duration: 12 Jan 201516 Jan 2015

Conference

ConferenceInternational Conference on Robust Statistics 2015
Abbreviated titleICORS 2015
Country/TerritoryIndia
CityKolkata
Period12/01/1516/01/15

Fingerprint

Dive into the research topics of 'Determining the Number of Clusters Using Multivariate Ranks'. Together they form a unique fingerprint.

Cite this