An improved co-similarity measure for document clustering

Syed Fawad Hussain, Gilles Bisson, Clément Grimal

Research output: Chapter in Book/Report/Conference proceedingConference contribution

32 Citations (Scopus)

Abstract

Co-clustering has been defined as a way to organize simultaneously subsets of instances and subsets of features in order to improve the clustering of both of them. In previous work [1], we proposed an efficient co-similarity measure allowing to simultaneously compute two similarity matrices between objects and features, each built on the basis of the other. Here we propose a generalization of this approach by introducing a notion of pseudo-norm and a pruning algorithm. Our experiments show that this new algorithm significantly improves the accuracy of the results when using either supervised or unsupervised feature selection data and that it outperforms other algorithms on various corpora.

Original languageEnglish
Title of host publicationProceedings - 9th International Conference on Machine Learning and Applications, ICMLA 2010
Pages190-197
Number of pages8
DOIs
Publication statusPublished - 2010
Event9th International Conference on Machine Learning and Applications, ICMLA 2010 - Washington, DC, United States
Duration: 12 Dec 201014 Dec 2010

Publication series

NameProceedings - 9th International Conference on Machine Learning and Applications, ICMLA 2010

Conference

Conference9th International Conference on Machine Learning and Applications, ICMLA 2010
Country/TerritoryUnited States
CityWashington, DC
Period12/12/1014/12/10

Keywords

  • Co-clustering
  • Similarity measure
  • Text mining

ASJC Scopus subject areas

  • Computer Science Applications
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'An improved co-similarity measure for document clustering'. Together they form a unique fingerprint.

Cite this