A medoid-based weighting scheme for nearest-neighbor decision rule toward effective text categorization

Avideep Mukherjee; Tanmay Basu

doi:10.1007/s42452-020-2738-8

A medoid-based weighting scheme for nearest-neighbor decision rule toward effective text categorization

Avideep Mukherjee, Tanmay Basu

Cancer and Genomic Sciences

Research output: Contribution to journal › Article › peer-review

1 Citation (Scopus)

126 Downloads (Pure)

Abstract

The k-nearest-neighbor (kNN) decision rule is a simple and robust classifier for text categorization. The performance of kNN decision rule depends heavily upon the value of the neighborhood parameter k. The method categorize a test document even if the difference between the number of members of two competing categories is one. Hence, choice of k is crucial as different values of k can change the result of text categorization. Moreover, text categorization is a challenging task as the text data are generally sparse and high dimensional. Note that, assigning a document to a predefined category for an arbitrary value of k may not be accurate when there is no bound on the margin of majority voting. A method is thus proposed in spirit of the nearest-neighbor decision rule using a medoid-based weighting scheme to deal with these issues. The method puts more weightage on the training documents that are not only lie close to the test document but also lie close to the medoid of its corresponding category in decision making, unlike the standard nearest-neighbor algorithms that stress on the documents that are just close to the test document. The aim of the proposed classifier is to enrich the quality of decision making. The empirical results show that the proposed method performs better than different standard nearest-neighbor decision rules and support vector machine classifier using various well-known text collections in terms of macro- and micro-averaged f-measure.

Original language	English
Article number	1009
Journal	SN Applied Sciences
Volume	2
Issue number	6
Early online date	4 May 2020
DOIs	https://doi.org/10.1007/s42452-020-2738-8
Publication status	Published - 1 Jun 2020

Access to Document

10.1007/s42452-020-2738-8Licence: Creative Commons: Attribution (CC BY)

Mukherjee_Basu_2020_A_medoid-based_weighting_scheme_SN_Applied_SciencesFinal published version, 1.4 MBLicence: Creative Commons: Attribution (CC BY)

http://link.springer.com/10.1007/s42452-020-2738-8Licence: Creative Commons: Attribution (CC BY)

Cite this

@article{1227e80183aa47d6b03485ad32dea597,

title = "A medoid-based weighting scheme for nearest-neighbor decision rule toward effective text categorization",

abstract = "The k-nearest-neighbor (kNN) decision rule is a simple and robust classifier for text categorization. The performance of kNN decision rule depends heavily upon the value of the neighborhood parameter k. The method categorize a test document even if the difference between the number of members of two competing categories is one. Hence, choice of k is crucial as different values of k can change the result of text categorization. Moreover, text categorization is a challenging task as the text data are generally sparse and high dimensional. Note that, assigning a document to a predefined category for an arbitrary value of k may not be accurate when there is no bound on the margin of majority voting. A method is thus proposed in spirit of the nearest-neighbor decision rule using a medoid-based weighting scheme to deal with these issues. The method puts more weightage on the training documents that are not only lie close to the test document but also lie close to the medoid of its corresponding category in decision making, unlike the standard nearest-neighbor algorithms that stress on the documents that are just close to the test document. The aim of the proposed classifier is to enrich the quality of decision making. The empirical results show that the proposed method performs better than different standard nearest-neighbor decision rules and support vector machine classifier using various well-known text collections in terms of macro- and micro-averaged f-measure.",

author = "Avideep Mukherjee and Tanmay Basu",

year = "2020",

month = jun,

day = "1",

doi = "10.1007/s42452-020-2738-8",

language = "English",

volume = "2",

journal = "SN Applied Sciences",

issn = "2523-3963",

publisher = "Springer",

number = "6",

}

TY - JOUR

T1 - A medoid-based weighting scheme for nearest-neighbor decision rule toward effective text categorization

AU - Mukherjee, Avideep

AU - Basu, Tanmay

PY - 2020/6/1

Y1 - 2020/6/1

N2 - The k-nearest-neighbor (kNN) decision rule is a simple and robust classifier for text categorization. The performance of kNN decision rule depends heavily upon the value of the neighborhood parameter k. The method categorize a test document even if the difference between the number of members of two competing categories is one. Hence, choice of k is crucial as different values of k can change the result of text categorization. Moreover, text categorization is a challenging task as the text data are generally sparse and high dimensional. Note that, assigning a document to a predefined category for an arbitrary value of k may not be accurate when there is no bound on the margin of majority voting. A method is thus proposed in spirit of the nearest-neighbor decision rule using a medoid-based weighting scheme to deal with these issues. The method puts more weightage on the training documents that are not only lie close to the test document but also lie close to the medoid of its corresponding category in decision making, unlike the standard nearest-neighbor algorithms that stress on the documents that are just close to the test document. The aim of the proposed classifier is to enrich the quality of decision making. The empirical results show that the proposed method performs better than different standard nearest-neighbor decision rules and support vector machine classifier using various well-known text collections in terms of macro- and micro-averaged f-measure.

AB - The k-nearest-neighbor (kNN) decision rule is a simple and robust classifier for text categorization. The performance of kNN decision rule depends heavily upon the value of the neighborhood parameter k. The method categorize a test document even if the difference between the number of members of two competing categories is one. Hence, choice of k is crucial as different values of k can change the result of text categorization. Moreover, text categorization is a challenging task as the text data are generally sparse and high dimensional. Note that, assigning a document to a predefined category for an arbitrary value of k may not be accurate when there is no bound on the margin of majority voting. A method is thus proposed in spirit of the nearest-neighbor decision rule using a medoid-based weighting scheme to deal with these issues. The method puts more weightage on the training documents that are not only lie close to the test document but also lie close to the medoid of its corresponding category in decision making, unlike the standard nearest-neighbor algorithms that stress on the documents that are just close to the test document. The aim of the proposed classifier is to enrich the quality of decision making. The empirical results show that the proposed method performs better than different standard nearest-neighbor decision rules and support vector machine classifier using various well-known text collections in terms of macro- and micro-averaged f-measure.

U2 - 10.1007/s42452-020-2738-8

DO - 10.1007/s42452-020-2738-8

M3 - Article

SN - 2523-3963

VL - 2

JO - SN Applied Sciences

JF - SN Applied Sciences

IS - 6

M1 - 1009

ER -

A medoid-based weighting scheme for nearest-neighbor decision rule toward effective text categorization

Abstract

Access to Document

Fingerprint

Cite this