Random search with k-prototypes algorithm for clustering mixed datasets

Duc Pham, MM Suarez-Alvarez, YI Prostov

Research output: Contribution to journalArticlepeer-review

Abstract

A new algorithm to cluster datasets with mixed numerical and categorical values is presented. The algorithm, called RANKPRO (random search with k-prototypes algorithm), combines the advantages of a recently introduced population-based optimization algorithm called the bees algorithm (BA) and k-prototypes algorithm. The BA works with elite and good solutions, and continues to look for other possible extrema solutions keeping the number of testing points constant. However, the improvement of promising solutions by the BA may be time-consuming because it is based on random neighbourhood search. On the other hand, an application of the k-prototypes algorithm to a promising solution may be very effective because it improves the solution at each iteration. The RANKPRO algorithm balances two objectives: it explores the search space effectively owing to random selection of new solutions, and improves promising solutions fast owing to employment of the k-prototypes algorithm. The efficiency of the new algorithm is demonstrated by clustering several datasets. It is shown that in the majority of the considered datasets when the average number of iterations that the k-prototypes algorithm needs to converge is over 10, the RANKPRO algorithm is more efficient than the k-prototypes algorithm.
Original languageEnglish
Pages (from-to)2387-2403
Number of pages17
JournalRoyal Society of London. Proceedings A. Mathematical, Physical and Engineering Sciences
Volume467
Issue number2132
Early online date9 Mar 2011
DOIs
Publication statusPublished - 8 Aug 2011

Keywords

  • mixed datasets
  • random search
  • k-prototypes algorithm
  • clustering

Fingerprint

Dive into the research topics of 'Random search with k-prototypes algorithm for clustering mixed datasets'. Together they form a unique fingerprint.

Cite this