Algorithms and hardness results for nearest neighbor problems in bicolored point sets

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Authors

Colleges, School and Institutes

External organisations

  • Indian Statistical Institute, Kolkata
  • Ben-Gurion University of the Negev
  • University of Warwick

Abstract

In the context of computational supervised learning, the primary objective is the classification of data. Especially, the goal is to provide the system with “training” data and design a method which uses the training data to classify new objects with the correct label. A standard scenario is that the examples are points from a metric space, and “nearby” points should have “similar” labels. In practice, it is desirable to reduce the size of the training set without compromising too much on the ability to correctly label new objects. Such subsets of the training data are called as edited sets. Wilfong [SOCG ’91] defined two types of edited subsets: consistent subsets (those which correctly label all objects from the training data) and selective subsets (those which correctly label all new objects the same way as the original training data). This leads to the following two optimization problems: k-MCS-(X) Given k sets of points P 1 , P 2 , …, P k in a metric space X, the goal is to choose subsets of points Pi′⊆P i for i= 1, 2, …, k such that ∀p∈P i its nearest neighbor among (Formula Presented) lies in P i ′ for each i∈ [k] while minimizing (Note that we also enforce the condition (Formula Presented) the quantity (Formula Presented) - k-MSS-(X): Given k sets of points P 1 , P 2 , …, P k in a metric space X, the goal is to choose subsets of points Pi′⊆P i for i= 1, 2, …, k such that ∀p∈Pi its nearest neighbor among (Formula Presented) lies in Pi′ for each i∈ [k] while minimizing (Note that we again enforce the condition |P i ′|≥1∀i∈[k].) the quantity (Formula Presented). While there have been several heuristics proposed for these two problems in the computer vision and machine learning community, the only theoretical results for these problems (to the best of our knowledge) are due to Wilfong [SOCG ’91] who showed that both 3-MCS-(ℝ 2 ) and 2-MSS-(ℝ 2 ) are NP-complete. We initiate the study of these two problems from a theoretical perspective, and obtain several algorithmic and hardness results. On the algorithmic side, we first design an O(n 2 ) time exact algorithm and O(nlog n) time 2-approximation for the 2-MCS-(R) problem, i.e., the points are located on the real line. Moreover, we show that the exact algorithm also extends to the case when the points are located on the circumference of a circle. Next, we design an O(r 2 ) time online algorithm for the 2-MCS-(R) problem such that r< n, where n is the set of points and r is an integer. Finally, we give a PTAS for the k-MSS-(ℝ 2 ) problem. On the hardness side, we show that both the 2-MCS and 2-MSS problems are NP-complete on graphs. Additionally, the problems are W[2]-hard parameterized by the size k of the solution. For points on the Euclidean plane, we show that the 2-MSS problem is contained in W[1]. Finally, we show a lower bound of Ω(√n) bits for the storage of any (randomized) algorithm which solves both 2-MCS-(ℝ) and 2-MSS-(ℝ).

Details

Original languageEnglish
Title of host publicationLATIN 2018 -Theoretical Informatics
Subtitle of host publication13th Latin American Symposium, Buenos Aires, Argentina, April 16-19, 2018, Proceedings
EditorsMiguel A. Mosteiro, Michael A. Bender, Martin Farach-Colton
Publication statusPublished - 13 Mar 2018
Event13th International Symposium on Latin American Theoretical Informatics, LATIN 2018 - Buenos Aires, Argentina
Duration: 16 Apr 201819 Apr 2018

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume10807
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference13th International Symposium on Latin American Theoretical Informatics, LATIN 2018
Country/TerritoryArgentina
CityBuenos Aires
Period16/04/1819/04/18