A new algorithm to cluster datasets with mixed numerical and categorical values is presented. The algorithm, called RANKPRO (random search with k-prototypes algorithm), combines the advantages of a recently introduced population-based optimization algorithm called the bees algorithm (BA) and k-prototypes algorithm. The BA works with elite and good solutions, and continues to look for other possible extrema solutions keeping the number of testing points constant. However, the improvement of promising solutions by the BA may be time-consuming because it is based on random neighbourhood search. On the other hand, an application of the k-prototypes algorithm to a promising solution may be very effective because it improves the solution at each iteration. The RANKPRO algorithm balances two objectives: it explores the search space effectively owing to random selection of new solutions, and improves promising solutions fast owing to employment of the k-prototypes algorithm. The efficiency of the new algorithm is demonstrated by clustering several datasets. It is shown that in the majority of the considered datasets when the average number of iterations that the k-prototypes algorithm needs to converge is over 10, the RANKPRO algorithm is more efficient than the k-prototypes algorithm.
|Number of pages||17|
|Journal||Royal Society of London. Proceedings A. Mathematical, Physical and Engineering Sciences|
|Early online date||9 Mar 2011|
|Publication status||Published - 8 Aug 2011|
- mixed datasets
- random search
- k-prototypes algorithm