An Evolutionary Clustering Algorithm for Gene Expression Microarray Data Analysis

Research output: Contribution to journalArticle

Standard

An Evolutionary Clustering Algorithm for Gene Expression Microarray Data Analysis. / Ma, PCH; Chan, KCC; Yao, Xin; Chiu, DKY.

In: IEEE Transactions on Evolutionary Computation, Vol. 10, No. 3, 01.01.2006, p. 296-314.

Research output: Contribution to journalArticle

Harvard

APA

Vancouver

Author

Bibtex

@article{78567b31a0dd4cd78b589adaa374e50c,
title = "An Evolutionary Clustering Algorithm for Gene Expression Microarray Data Analysis",
abstract = "Clustering is concerned with the discovery of interesting groupings of records in a database. Many algorithms have been developed to tackle clustering problems in a variety of application domains. In particular, some of them have been used in bioinformatics research to uncover inherent clusters in gene expression microarray data. In this paper, we show how some popular clustering algorithms have been used for this purpose. Based on experiments using simulated and real data, we also show that the performance of these algorithms can be further improved. For more effective clustering of gene expression microarray data, which is typically characterized by a lot of noise, we propose a novel evolutionary algorithm called evolutionary clustering (EvoCluster). EvoCluster encodes an entire cluster grouping in a chromosome so that each gene in the chromosome encodes one cluster. Based on such encoding scheme, it makes use of a set of reproduction operators to facilitate the exchange of grouping information between chromosomes. The fitness function that the EvoCluster adopts is able to differentiate between how relevant a feature value is in determining a particular cluster grouping. As such, instead of just local pairwise distances, it also takes into consideration how clusters are arranged globally. Unlike many popular clustering algorithms, EvoCluster does not require the number of clusters to be decided in advance. Also, patterns hidden in each cluster can be explicitly revealed and presented for easy interpretation even by casual users. For performance evaluation, we have tested EvoCluster using both simulated and real data. Experimental results show that it can be very effective and robust even in the presence of noise and missing values. Also, when correlating the gene expression microarray data with DNA sequences, we were able to uncover significant biological binding sites (both previously known and unknown) in each cluster discovered by EvoCluster.",
keywords = "gene expression microarray data analysis, DNA sequence analysis, evolutionary algorithms (EAs), bioinformatics, clustering",
author = "PCH Ma and KCC Chan and Xin Yao and DKY Chiu",
year = "2006",
month = jan,
day = "1",
doi = "10.1109/TEVC.2005.859371",
language = "English",
volume = "10",
pages = "296--314",
journal = "IEEE Transactions on Evolutionary Computation",
issn = "1089-778X",
publisher = "Institute of Electrical and Electronics Engineers (IEEE)",
number = "3",

}

RIS

TY - JOUR

T1 - An Evolutionary Clustering Algorithm for Gene Expression Microarray Data Analysis

AU - Ma, PCH

AU - Chan, KCC

AU - Yao, Xin

AU - Chiu, DKY

PY - 2006/1/1

Y1 - 2006/1/1

N2 - Clustering is concerned with the discovery of interesting groupings of records in a database. Many algorithms have been developed to tackle clustering problems in a variety of application domains. In particular, some of them have been used in bioinformatics research to uncover inherent clusters in gene expression microarray data. In this paper, we show how some popular clustering algorithms have been used for this purpose. Based on experiments using simulated and real data, we also show that the performance of these algorithms can be further improved. For more effective clustering of gene expression microarray data, which is typically characterized by a lot of noise, we propose a novel evolutionary algorithm called evolutionary clustering (EvoCluster). EvoCluster encodes an entire cluster grouping in a chromosome so that each gene in the chromosome encodes one cluster. Based on such encoding scheme, it makes use of a set of reproduction operators to facilitate the exchange of grouping information between chromosomes. The fitness function that the EvoCluster adopts is able to differentiate between how relevant a feature value is in determining a particular cluster grouping. As such, instead of just local pairwise distances, it also takes into consideration how clusters are arranged globally. Unlike many popular clustering algorithms, EvoCluster does not require the number of clusters to be decided in advance. Also, patterns hidden in each cluster can be explicitly revealed and presented for easy interpretation even by casual users. For performance evaluation, we have tested EvoCluster using both simulated and real data. Experimental results show that it can be very effective and robust even in the presence of noise and missing values. Also, when correlating the gene expression microarray data with DNA sequences, we were able to uncover significant biological binding sites (both previously known and unknown) in each cluster discovered by EvoCluster.

AB - Clustering is concerned with the discovery of interesting groupings of records in a database. Many algorithms have been developed to tackle clustering problems in a variety of application domains. In particular, some of them have been used in bioinformatics research to uncover inherent clusters in gene expression microarray data. In this paper, we show how some popular clustering algorithms have been used for this purpose. Based on experiments using simulated and real data, we also show that the performance of these algorithms can be further improved. For more effective clustering of gene expression microarray data, which is typically characterized by a lot of noise, we propose a novel evolutionary algorithm called evolutionary clustering (EvoCluster). EvoCluster encodes an entire cluster grouping in a chromosome so that each gene in the chromosome encodes one cluster. Based on such encoding scheme, it makes use of a set of reproduction operators to facilitate the exchange of grouping information between chromosomes. The fitness function that the EvoCluster adopts is able to differentiate between how relevant a feature value is in determining a particular cluster grouping. As such, instead of just local pairwise distances, it also takes into consideration how clusters are arranged globally. Unlike many popular clustering algorithms, EvoCluster does not require the number of clusters to be decided in advance. Also, patterns hidden in each cluster can be explicitly revealed and presented for easy interpretation even by casual users. For performance evaluation, we have tested EvoCluster using both simulated and real data. Experimental results show that it can be very effective and robust even in the presence of noise and missing values. Also, when correlating the gene expression microarray data with DNA sequences, we were able to uncover significant biological binding sites (both previously known and unknown) in each cluster discovered by EvoCluster.

KW - gene expression microarray data analysis

KW - DNA sequence analysis

KW - evolutionary algorithms (EAs)

KW - bioinformatics

KW - clustering

UR - http://www.scopus.com/inward/record.url?scp=33744745603&partnerID=8YFLogxK

U2 - 10.1109/TEVC.2005.859371

DO - 10.1109/TEVC.2005.859371

M3 - Article

VL - 10

SP - 296

EP - 314

JO - IEEE Transactions on Evolutionary Computation

JF - IEEE Transactions on Evolutionary Computation

SN - 1089-778X

IS - 3

ER -