An improved and more scalable evolutionary approach to multiobjective clustering

Mario Garza-Fabre; Julia Handl; Joshua Knowles

doi:10.1109/TEVC.2017.2726341

An improved and more scalable evolutionary approach to multiobjective clustering

Mario Garza-Fabre, Julia Handl, Joshua Knowles

Computer Science

Research output: Contribution to journal › Article › peer-review

19 Citations (Scopus)

218 Downloads (Pure)

Abstract

The multiobjective realisation of the data clustering problem has shown great promise in recent years, yielding clear conceptual advantages over the more conventional, singleobjective approach. Evolutionary algorithms have largely contributed to the development of this increasingly active research area on multiobjective clustering. Nevertheless, the unprecedented volumes of data seen widely today pose significant challenges and highlight the need for more effective and scalable tools for exploratory data analysis. This paper proposes an improved version of the multiobjective clustering with automatic k-determination algorithm. Our new algorithm improves its predecessor in several respects, but the key changes are related to the use of an efficient, specialised initialisation routine and two alternative reduced-length representations. These design components exploit information from the minimum spanning tree and redefine the problem in terms of the most relevant subset of its edges. Our study reveals that both the new initialisation routine and the new solution representations not only contribute to decrease the
computational overhead, but also entail a significant reduction of the search space, enhancing therefore the convergence capabilities and overall effectiveness of the method. These results suggest that the new algorithm proposed here will offer significant advantages in the realm of ‘big data’ analytics and applications.

Original language	English
Number of pages	20
Journal	IEEE Transactions on Evolutionary Computation
Volume	PP
Issue number	99
DOIs	https://doi.org/10.1109/TEVC.2017.2726341
Publication status	Published - 9 Aug 2017

Keywords

evolutionary computation
data analysis
clustering methods
data mining
pareto optimization

Access to Document

10.1109/TEVC.2017.2726341Licence: Creative Commons: Attribution (CC BY)

Garza-Fabre_et_al_An_improved_and_more_scalable_IEEE_Transactions_on_Evolutionary_Computation_2017
Checked for eligibility: 06/07/2017. (c) 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Published in IEEE Transactions on Evolutionary Computation DOI: 10.1109/TEVC.2017.2726341
Accepted author manuscript, 5.9 MBLicence: Creative Commons: Attribution (CC BY)

http://ieeexplore.ieee.org/document/8004483/Licence: Creative Commons: Attribution (CC BY)

Cite this

@article{03a7dbba56ed4b1f9709c977d0487e43,

title = "An improved and more scalable evolutionary approach to multiobjective clustering",

abstract = "The multiobjective realisation of the data clustering problem has shown great promise in recent years, yielding clear conceptual advantages over the more conventional, singleobjective approach. Evolutionary algorithms have largely contributed to the development of this increasingly active research area on multiobjective clustering. Nevertheless, the unprecedented volumes of data seen widely today pose significant challenges and highlight the need for more effective and scalable tools for exploratory data analysis. This paper proposes an improved version of the multiobjective clustering with automatic k-determination algorithm. Our new algorithm improves its predecessor in several respects, but the key changes are related to the use of an efficient, specialised initialisation routine and two alternative reduced-length representations. These design components exploit information from the minimum spanning tree and redefine the problem in terms of the most relevant subset of its edges. Our study reveals that both the new initialisation routine and the new solution representations not only contribute to decrease thecomputational overhead, but also entail a significant reduction of the search space, enhancing therefore the convergence capabilities and overall effectiveness of the method. These results suggest that the new algorithm proposed here will offer significant advantages in the realm of {\textquoteleft}big data{\textquoteright} analytics and applications.",

keywords = "evolutionary computation , data analysis , clustering methods , data mining , pareto optimization",

author = "Mario Garza-Fabre and Julia Handl and Joshua Knowles",

year = "2017",

month = aug,

day = "9",

doi = "10.1109/TEVC.2017.2726341",

language = "English",

volume = "PP",

journal = "IEEE Transactions on Evolutionary Computation",

issn = "1089-778X",

publisher = "Institute of Electrical and Electronics Engineers (IEEE)",

number = "99",

}

TY - JOUR

T1 - An improved and more scalable evolutionary approach to multiobjective clustering

AU - Garza-Fabre, Mario

AU - Handl, Julia

AU - Knowles, Joshua

PY - 2017/8/9

Y1 - 2017/8/9

N2 - The multiobjective realisation of the data clustering problem has shown great promise in recent years, yielding clear conceptual advantages over the more conventional, singleobjective approach. Evolutionary algorithms have largely contributed to the development of this increasingly active research area on multiobjective clustering. Nevertheless, the unprecedented volumes of data seen widely today pose significant challenges and highlight the need for more effective and scalable tools for exploratory data analysis. This paper proposes an improved version of the multiobjective clustering with automatic k-determination algorithm. Our new algorithm improves its predecessor in several respects, but the key changes are related to the use of an efficient, specialised initialisation routine and two alternative reduced-length representations. These design components exploit information from the minimum spanning tree and redefine the problem in terms of the most relevant subset of its edges. Our study reveals that both the new initialisation routine and the new solution representations not only contribute to decrease thecomputational overhead, but also entail a significant reduction of the search space, enhancing therefore the convergence capabilities and overall effectiveness of the method. These results suggest that the new algorithm proposed here will offer significant advantages in the realm of ‘big data’ analytics and applications.

AB - The multiobjective realisation of the data clustering problem has shown great promise in recent years, yielding clear conceptual advantages over the more conventional, singleobjective approach. Evolutionary algorithms have largely contributed to the development of this increasingly active research area on multiobjective clustering. Nevertheless, the unprecedented volumes of data seen widely today pose significant challenges and highlight the need for more effective and scalable tools for exploratory data analysis. This paper proposes an improved version of the multiobjective clustering with automatic k-determination algorithm. Our new algorithm improves its predecessor in several respects, but the key changes are related to the use of an efficient, specialised initialisation routine and two alternative reduced-length representations. These design components exploit information from the minimum spanning tree and redefine the problem in terms of the most relevant subset of its edges. Our study reveals that both the new initialisation routine and the new solution representations not only contribute to decrease thecomputational overhead, but also entail a significant reduction of the search space, enhancing therefore the convergence capabilities and overall effectiveness of the method. These results suggest that the new algorithm proposed here will offer significant advantages in the realm of ‘big data’ analytics and applications.

KW - evolutionary computation

KW - data analysis

KW - clustering methods

KW - data mining

KW - pareto optimization

U2 - 10.1109/TEVC.2017.2726341

DO - 10.1109/TEVC.2017.2726341

M3 - Article

SN - 1089-778X

VL - PP

JO - IEEE Transactions on Evolutionary Computation

JF - IEEE Transactions on Evolutionary Computation

IS - 99

ER -

An improved and more scalable evolutionary approach to multiobjective clustering

Abstract

Keywords

Access to Document

Fingerprint

Cite this