A modified fuzzy clustering for documents retrieval:: Application to document categorization

S. Nefti; M. Oussalah; Y. Rezgui

doi:10.1057/palgrave.jors.2602555

A modified fuzzy clustering for documents retrieval: Application to document categorization

S. Nefti, M. Oussalah, Y. Rezgui

Electronic, Electrical and Systems Engineering

Research output: Contribution to journal › Article › peer-review

9 Citations (Scopus)

Abstract

The paper advocates the use of a new fuzzy-based clustering algorithm for document categorization. Each document/datum will be represented as a fuzzy set. In this respect, the fuzzy clustering algorithm, will be constrained additionally in order to cluster fuzzy sets. Then, one needs to find a metric measure in order to detect the overlapping between documents and the cluster prototype (category). In this respect, we use one of the interclass probabilistic reparability measures known as Bhattacharyya distance, which will be incorporated in the general scheme of the fuzzy c-means algorithm for measuring the overlapping between fuzzy sets. This enables the introduction of fuzziness in the document clustering in the sense that it allows a single document to belong to more than one category. This is in line with semantic multiple interpretations conveyed by single words, which support multiple membership to several classes. Performances of the algorithms will be illustrated using a case study from the construction sector.

Original language	English
Pages (from-to)	384-394
Number of pages	11
Journal	Operational Research Society. Journal
Volume	60
Issue number	3
DOIs	https://doi.org/10.1057/palgrave.jors.2602555
Publication status	Published - 1 Mar 2009

Bibliographical note

Access to Document

10.1057/palgrave.jors.2602555

Cite this

@article{f81699602a0345439a3651276916976a,

title = "A modified fuzzy clustering for documents retrieval:: Application to document categorization",

abstract = "The paper advocates the use of a new fuzzy-based clustering algorithm for document categorization. Each document/datum will be represented as a fuzzy set. In this respect, the fuzzy clustering algorithm, will be constrained additionally in order to cluster fuzzy sets. Then, one needs to find a metric measure in order to detect the overlapping between documents and the cluster prototype (category). In this respect, we use one of the interclass probabilistic reparability measures known as Bhattacharyya distance, which will be incorporated in the general scheme of the fuzzy c-means algorithm for measuring the overlapping between fuzzy sets. This enables the introduction of fuzziness in the document clustering in the sense that it allows a single document to belong to more than one category. This is in line with semantic multiple interpretations conveyed by single words, which support multiple membership to several classes. Performances of the algorithms will be illustrated using a case study from the construction sector.",

author = "S. Nefti and M. Oussalah and Y. Rezgui",

year = "2009",

month = mar,

day = "1",

doi = "10.1057/palgrave.jors.2602555",

language = "English",

volume = "60",

pages = "384--394",

journal = "Operational Research Society. Journal ",

issn = "0160-5682",

publisher = "Palgrave Macmillan",

number = "3",

}

TY - JOUR

T1 - A modified fuzzy clustering for documents retrieval:

T2 - Application to document categorization

AU - Nefti, S.

AU - Oussalah, M.

AU - Rezgui, Y.

PY - 2009/3/1

Y1 - 2009/3/1

N2 - The paper advocates the use of a new fuzzy-based clustering algorithm for document categorization. Each document/datum will be represented as a fuzzy set. In this respect, the fuzzy clustering algorithm, will be constrained additionally in order to cluster fuzzy sets. Then, one needs to find a metric measure in order to detect the overlapping between documents and the cluster prototype (category). In this respect, we use one of the interclass probabilistic reparability measures known as Bhattacharyya distance, which will be incorporated in the general scheme of the fuzzy c-means algorithm for measuring the overlapping between fuzzy sets. This enables the introduction of fuzziness in the document clustering in the sense that it allows a single document to belong to more than one category. This is in line with semantic multiple interpretations conveyed by single words, which support multiple membership to several classes. Performances of the algorithms will be illustrated using a case study from the construction sector.

AB - The paper advocates the use of a new fuzzy-based clustering algorithm for document categorization. Each document/datum will be represented as a fuzzy set. In this respect, the fuzzy clustering algorithm, will be constrained additionally in order to cluster fuzzy sets. Then, one needs to find a metric measure in order to detect the overlapping between documents and the cluster prototype (category). In this respect, we use one of the interclass probabilistic reparability measures known as Bhattacharyya distance, which will be incorporated in the general scheme of the fuzzy c-means algorithm for measuring the overlapping between fuzzy sets. This enables the introduction of fuzziness in the document clustering in the sense that it allows a single document to belong to more than one category. This is in line with semantic multiple interpretations conveyed by single words, which support multiple membership to several classes. Performances of the algorithms will be illustrated using a case study from the construction sector.

UR - http://www.scopus.com/inward/record.url?partnerID=yv4JPVwI&eid=2-s2.0-60749083330&md5=a53270215678a323a8151471e8cc4254

U2 - 10.1057/palgrave.jors.2602555

DO - 10.1057/palgrave.jors.2602555

M3 - Article

AN - SCOPUS:60749083330

SN - 0160-5682

VL - 60

SP - 384

EP - 394

JO - Operational Research Society. Journal

JF - Operational Research Society. Journal

IS - 3

ER -

A modified fuzzy clustering for documents retrieval: Application to document categorization

Abstract

Bibliographical note

Access to Document

Fingerprint

Cite this