# Algorithms and hardness results for nearest neighbor problems in bicolored point sets

Research output: Chapter in Book/Report/Conference proceedingConference contribution

## External organisations

• Indian Statistical Institute, Kolkata
• Ben-Gurion University of the Negev
• University of Warwick

## Abstract

In the context of computational supervised learning, the primary objective is the classification of data. Especially, the goal is to provide the system with “training” data and design a method which uses the training data to classify new objects with the correct label. A standard scenario is that the examples are points from a metric space, and “nearby” points should have “similar” labels. In practice, it is desirable to reduce the size of the training set without compromising too much on the ability to correctly label new objects. Such subsets of the training data are called as edited sets. Wilfong [SOCG ’91] defined two types of edited subsets: consistent subsets (those which correctly label all objects from the training data) and selective subsets (those which correctly label all new objects the same way as the original training data). This leads to the following two optimization problems: k-MCS-(X) Given k sets of points P 1 , P 2 , …, P k in a metric space X, the goal is to choose subsets of points Pi′⊆P i for i= 1, 2, …, k such that ∀p∈P i its nearest neighbor among (Formula Presented) lies in P i ′ for each i∈ [k] while minimizing (Note that we also enforce the condition (Formula Presented) the quantity (Formula Presented) - k-MSS-(X): Given k sets of points P 1 , P 2 , …, P k in a metric space X, the goal is to choose subsets of points Pi′⊆P i for i= 1, 2, …, k such that ∀p∈Pi its nearest neighbor among (Formula Presented) lies in Pi′ for each i∈ [k] while minimizing (Note that we again enforce the condition |P i ′|≥1∀i∈[k].) the quantity (Formula Presented). While there have been several heuristics proposed for these two problems in the computer vision and machine learning community, the only theoretical results for these problems (to the best of our knowledge) are due to Wilfong [SOCG ’91] who showed that both 3-MCS-(ℝ 2 ) and 2-MSS-(ℝ 2 ) are NP-complete. We initiate the study of these two problems from a theoretical perspective, and obtain several algorithmic and hardness results. On the algorithmic side, we first design an O(n 2 ) time exact algorithm and O(nlog n) time 2-approximation for the 2-MCS-(R) problem, i.e., the points are located on the real line. Moreover, we show that the exact algorithm also extends to the case when the points are located on the circumference of a circle. Next, we design an O(r 2 ) time online algorithm for the 2-MCS-(R) problem such that r< n, where n is the set of points and r is an integer. Finally, we give a PTAS for the k-MSS-(ℝ 2 ) problem. On the hardness side, we show that both the 2-MCS and 2-MSS problems are NP-complete on graphs. Additionally, the problems are W[2]-hard parameterized by the size k of the solution. For points on the Euclidean plane, we show that the 2-MSS problem is contained in W[1]. Finally, we show a lower bound of Ω(√n) bits for the storage of any (randomized) algorithm which solves both 2-MCS-(ℝ) and 2-MSS-(ℝ).

## Details

Original language English LATIN 2018 -Theoretical Informatics 13th Latin American Symposium, Buenos Aires, Argentina, April 16-19, 2018, Proceedings Miguel A. Mosteiro, Michael A. Bender, Martin Farach-Colton Published - 13 Mar 2018 13th International Symposium on Latin American Theoretical Informatics, LATIN 2018 - Buenos Aires, ArgentinaDuration: 16 Apr 2018 → 19 Apr 2018

### Publication series

Name Lecture Notes in Computer Science Springer 10807 0302-9743 1611-3349

### Conference

Conference 13th International Symposium on Latin American Theoretical Informatics, LATIN 2018 Argentina Buenos Aires 16/04/18 → 19/04/18