Abstract
The multiobjective realisation of the data clustering problem has shown great promise in recent years, yielding clear conceptual advantages over the more conventional, singleobjective approach. Evolutionary algorithms have largely contributed to the development of this increasingly active research area on multiobjective clustering. Nevertheless, the unprecedented volumes of data seen widely today pose significant challenges and highlight the need for more effective and scalable tools for exploratory data analysis. This paper proposes an improved version of the multiobjective clustering with automatic k-determination algorithm. Our new algorithm improves its predecessor in several respects, but the key changes are related to the use of an efficient, specialised initialisation routine and two alternative reduced-length representations. These design components exploit information from the minimum spanning tree and redefine the problem in terms of the most relevant subset of its edges. Our study reveals that both the new initialisation routine and the new solution representations not only contribute to decrease the
computational overhead, but also entail a significant reduction of the search space, enhancing therefore the convergence capabilities and overall effectiveness of the method. These results suggest that the new algorithm proposed here will offer significant advantages in the realm of ‘big data’ analytics and applications.
computational overhead, but also entail a significant reduction of the search space, enhancing therefore the convergence capabilities and overall effectiveness of the method. These results suggest that the new algorithm proposed here will offer significant advantages in the realm of ‘big data’ analytics and applications.
Original language | English |
---|---|
Number of pages | 20 |
Journal | IEEE Transactions on Evolutionary Computation |
Volume | PP |
Issue number | 99 |
DOIs | |
Publication status | Published - 9 Aug 2017 |
Keywords
- evolutionary computation
- data analysis
- clustering methods
- data mining
- pareto optimization