TY - GEN
T1 - Bi-clustering gene expression data using co-similarity
AU - Hussain, Syed Fawad
PY - 2011
Y1 - 2011
N2 - We propose a new framework for bi-clustering gene expression data that is based on the notion of co-similarity between genes and samples. Our work is based on a co-similarity based framework that iteratively learns similarity between rows using similarity between columns and vice-versa in a matrix. The underlying concept, which is usually referred to as bi-clustering in the domain of bioinformatics, aims to find groupings of the feature set that exhibit similar behavior across sample subsets. The algorithm has previously been shown to work well for document clustering in a sparse matrix representation. We propose a variation of the method suited for analyzing data that is represented as a dense matrix and is non-homogenous as is the case in gene expression. Our experiments show that, with the proposed variations, the method is well suited for finding bi-clusters with high degree of homogeneity and we provide empirical results on real world cancer datasets.
AB - We propose a new framework for bi-clustering gene expression data that is based on the notion of co-similarity between genes and samples. Our work is based on a co-similarity based framework that iteratively learns similarity between rows using similarity between columns and vice-versa in a matrix. The underlying concept, which is usually referred to as bi-clustering in the domain of bioinformatics, aims to find groupings of the feature set that exhibit similar behavior across sample subsets. The algorithm has previously been shown to work well for document clustering in a sparse matrix representation. We propose a variation of the method suited for analyzing data that is represented as a dense matrix and is non-homogenous as is the case in gene expression. Our experiments show that, with the proposed variations, the method is well suited for finding bi-clusters with high degree of homogeneity and we provide empirical results on real world cancer datasets.
KW - Bi-clustering
KW - Co-similarity
KW - Gene Expression Analysis
UR - http://www.scopus.com/inward/record.url?scp=84255176339&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-25853-4_15
DO - 10.1007/978-3-642-25853-4_15
M3 - Conference contribution
AN - SCOPUS:84255176339
SN - 9783642258527
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 190
EP - 200
BT - Advanced Data Mining and Applications - 7th International Conference, ADMA 2011, Proceedings
T2 - 7th International Conference on Advanced Data Mining and Applications, ADMA 2011
Y2 - 17 December 2011 through 19 December 2011
ER -