TY - JOUR
T1 - A modified fuzzy clustering for documents retrieval:
T2 - Application to document categorization
AU - Nefti, S.
AU - Oussalah, M.
AU - Rezgui, Y.
N1 - Copyright 2009 Elsevier B.V., All rights reserved.
PY - 2009/3/1
Y1 - 2009/3/1
N2 - The paper advocates the use of a new fuzzy-based clustering algorithm for document categorization. Each document/datum will be represented as a fuzzy set. In this respect, the fuzzy clustering algorithm, will be constrained additionally in order to cluster fuzzy sets. Then, one needs to find a metric measure in order to detect the overlapping between documents and the cluster prototype (category). In this respect, we use one of the interclass probabilistic reparability measures known as Bhattacharyya distance, which will be incorporated in the general scheme of the fuzzy c-means algorithm for measuring the overlapping between fuzzy sets. This enables the introduction of fuzziness in the document clustering in the sense that it allows a single document to belong to more than one category. This is in line with semantic multiple interpretations conveyed by single words, which support multiple membership to several classes. Performances of the algorithms will be illustrated using a case study from the construction sector.
AB - The paper advocates the use of a new fuzzy-based clustering algorithm for document categorization. Each document/datum will be represented as a fuzzy set. In this respect, the fuzzy clustering algorithm, will be constrained additionally in order to cluster fuzzy sets. Then, one needs to find a metric measure in order to detect the overlapping between documents and the cluster prototype (category). In this respect, we use one of the interclass probabilistic reparability measures known as Bhattacharyya distance, which will be incorporated in the general scheme of the fuzzy c-means algorithm for measuring the overlapping between fuzzy sets. This enables the introduction of fuzziness in the document clustering in the sense that it allows a single document to belong to more than one category. This is in line with semantic multiple interpretations conveyed by single words, which support multiple membership to several classes. Performances of the algorithms will be illustrated using a case study from the construction sector.
UR - http://www.scopus.com/inward/record.url?partnerID=yv4JPVwI&eid=2-s2.0-60749083330&md5=a53270215678a323a8151471e8cc4254
U2 - 10.1057/palgrave.jors.2602555
DO - 10.1057/palgrave.jors.2602555
M3 - Article
AN - SCOPUS:60749083330
SN - 0160-5682
VL - 60
SP - 384
EP - 394
JO - Operational Research Society. Journal
JF - Operational Research Society. Journal
IS - 3
ER -