An improved co-similarity measure for document clustering

Syed Fawad Hussain; Gilles Bisson; Clément Grimal

doi:10.1109/ICMLA.2010.35

An improved co-similarity measure for document clustering

Syed Fawad Hussain, Gilles Bisson, Clément Grimal

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

32 Citations (Scopus)

Abstract

Co-clustering has been defined as a way to organize simultaneously subsets of instances and subsets of features in order to improve the clustering of both of them. In previous work [1], we proposed an efficient co-similarity measure allowing to simultaneously compute two similarity matrices between objects and features, each built on the basis of the other. Here we propose a generalization of this approach by introducing a notion of pseudo-norm and a pruning algorithm. Our experiments show that this new algorithm significantly improves the accuracy of the results when using either supervised or unsupervised feature selection data and that it outperforms other algorithms on various corpora.

Original language	English
Title of host publication	Proceedings - 9th International Conference on Machine Learning and Applications, ICMLA 2010
Pages	190-197
Number of pages	8
DOIs	https://doi.org/10.1109/ICMLA.2010.35
Publication status	Published - 2010
Event	9th International Conference on Machine Learning and Applications, ICMLA 2010 - Washington, DC, United States Duration: 12 Dec 2010 → 14 Dec 2010

Publication series

Name	Proceedings - 9th International Conference on Machine Learning and Applications, ICMLA 2010

Conference

Conference	9th International Conference on Machine Learning and Applications, ICMLA 2010
Country/Territory	United States
City	Washington, DC
Period	12/12/10 → 14/12/10

Keywords

Co-clustering
Similarity measure
Text mining

ASJC Scopus subject areas

Computer Science Applications
Human-Computer Interaction

Access to Document

10.1109/ICMLA.2010.35

Cite this

@inproceedings{3bd5b6becbf9496094866ee2c7187cfd,

title = "An improved co-similarity measure for document clustering",

abstract = "Co-clustering has been defined as a way to organize simultaneously subsets of instances and subsets of features in order to improve the clustering of both of them. In previous work [1], we proposed an efficient co-similarity measure allowing to simultaneously compute two similarity matrices between objects and features, each built on the basis of the other. Here we propose a generalization of this approach by introducing a notion of pseudo-norm and a pruning algorithm. Our experiments show that this new algorithm significantly improves the accuracy of the results when using either supervised or unsupervised feature selection data and that it outperforms other algorithms on various corpora.",

keywords = "Co-clustering, Similarity measure, Text mining",

author = "Hussain, {Syed Fawad} and Gilles Bisson and Cl{\'e}ment Grimal",

year = "2010",

doi = "10.1109/ICMLA.2010.35",

language = "English",

isbn = "9780769543000",

series = "Proceedings - 9th International Conference on Machine Learning and Applications, ICMLA 2010",

pages = "190--197",

booktitle = "Proceedings - 9th International Conference on Machine Learning and Applications, ICMLA 2010",

note = "9th International Conference on Machine Learning and Applications, ICMLA 2010 ; Conference date: 12-12-2010 Through 14-12-2010",

}

Hussain, SF, Bisson, G & Grimal, C 2010, An improved co-similarity measure for document clustering. in Proceedings - 9th International Conference on Machine Learning and Applications, ICMLA 2010., 5708832, Proceedings - 9th International Conference on Machine Learning and Applications, ICMLA 2010, pp. 190-197, 9th International Conference on Machine Learning and Applications, ICMLA 2010, Washington, DC, United States, 12/12/10. https://doi.org/10.1109/ICMLA.2010.35

An improved co-similarity measure for document clustering. / Hussain, Syed Fawad; Bisson, Gilles; Grimal, Clément.
Proceedings - 9th International Conference on Machine Learning and Applications, ICMLA 2010. 2010. p. 190-197 5708832 (Proceedings - 9th International Conference on Machine Learning and Applications, ICMLA 2010).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - An improved co-similarity measure for document clustering

AU - Hussain, Syed Fawad

AU - Bisson, Gilles

AU - Grimal, Clément

PY - 2010

Y1 - 2010

N2 - Co-clustering has been defined as a way to organize simultaneously subsets of instances and subsets of features in order to improve the clustering of both of them. In previous work [1], we proposed an efficient co-similarity measure allowing to simultaneously compute two similarity matrices between objects and features, each built on the basis of the other. Here we propose a generalization of this approach by introducing a notion of pseudo-norm and a pruning algorithm. Our experiments show that this new algorithm significantly improves the accuracy of the results when using either supervised or unsupervised feature selection data and that it outperforms other algorithms on various corpora.

AB - Co-clustering has been defined as a way to organize simultaneously subsets of instances and subsets of features in order to improve the clustering of both of them. In previous work [1], we proposed an efficient co-similarity measure allowing to simultaneously compute two similarity matrices between objects and features, each built on the basis of the other. Here we propose a generalization of this approach by introducing a notion of pseudo-norm and a pruning algorithm. Our experiments show that this new algorithm significantly improves the accuracy of the results when using either supervised or unsupervised feature selection data and that it outperforms other algorithms on various corpora.

KW - Co-clustering

KW - Similarity measure

KW - Text mining

UR - http://www.scopus.com/inward/record.url?scp=79952369097&partnerID=8YFLogxK

U2 - 10.1109/ICMLA.2010.35

DO - 10.1109/ICMLA.2010.35

M3 - Conference contribution

AN - SCOPUS:79952369097

SN - 9780769543000

T3 - Proceedings - 9th International Conference on Machine Learning and Applications, ICMLA 2010

SP - 190

EP - 197

BT - Proceedings - 9th International Conference on Machine Learning and Applications, ICMLA 2010

T2 - 9th International Conference on Machine Learning and Applications, ICMLA 2010

Y2 - 12 December 2010 through 14 December 2010

ER -

An improved co-similarity measure for document clustering

Abstract

Publication series

Conference

Keywords

ASJC Scopus subject areas

Access to Document

Fingerprint

Cite this