A medoid-based weighting scheme for nearest-neighbor decision rule toward effective text categorization

Research output: Contribution to journalArticlepeer-review

Authors

Colleges, School and Institutes

Abstract

The k-nearest-neighbor (kNN) decision rule is a simple and robust classifier for text categorization. The performance of kNN decision rule depends heavily upon the value of the neighborhood parameter k. The method categorize a test document even if the difference between the number of members of two competing categories is one. Hence, choice of k is crucial as different values of k can change the result of text categorization. Moreover, text categorization is a challenging task as the text data are generally sparse and high dimensional. Note that, assigning a document to a predefined category for an arbitrary value of k may not be accurate when there is no bound on the margin of majority voting. A method is thus proposed in spirit of the nearest-neighbor decision rule using a medoid-based weighting scheme to deal with these issues. The method puts more weightage on the training documents that are not only lie close to the test document but also lie close to the medoid of its corresponding category in decision making, unlike the standard nearest-neighbor algorithms that stress on the documents that are just close to the test document. The aim of the proposed classifier is to enrich the quality of decision making. The empirical results show that the proposed method performs better than different standard nearest-neighbor decision rules and support vector machine classifier using various well-known text collections in terms of macro- and micro-averaged f-measure.

Details

Original languageEnglish
Article number1009
JournalSN Applied Sciences
Volume2
Issue number6
Early online date4 May 2020
Publication statusPublished - 1 Jun 2020