A New Look at Nearest Neighbours: Identifying Benign Input Geometries via Random Projections

Ata Kaban

A New Look at Nearest Neighbours: Identifying Benign Input Geometries via Random Projections

Ata Kaban

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

35 Downloads (Pure)

Abstract

It is well known that in general, the nearest neighbour rule (NN) has sample complexity that is exponential in the input space dimension d when only smoothness is assumed on the label posterior function. Here we consider NN on randomly projected data, and we show that, if the input domain has a small ”metric size”, then the sample complexity becomes exponential in the metric entropy integral of the set of normalised chords of the input domain. This metric entropy integral measures the complexity of the input domain, and can be much smaller than d – for instance in cases when the data lies in a linear or a smooth
nonlinear subspace of the ambient space, or when it has a sparse representation. We then show that the guarantees we obtain for the compressive NN also hold for the dataspace NN in bounded domains; thus the random projection takes the role of an analytic tool to identify benign structures under which NN learning is possible from a small sample size. Numerical simulations on data designed to have intrinsically low complexity confirm our theoretical findings, and display a striking agreement in the empirical performances of compressive NN and dataspace NN. This suggests that high dimensional data sets that have a low
complexity underlying structure are well suited for computationally cheap compressive NN learning.

Original language	English
Title of host publication	ACML 2015 Proceedings
Publisher	JMLR
Pages	65-80
Number of pages	16
Volume	45
Publication status	Published - 25 Feb 2016
Event	7th Asian Conference on Machine Learning - Hong Kong, China Duration: 20 Nov 2015 → 22 Nov 2015

Publication series

Name	Proceedings of Machine Learning Research
Volume	45
ISSN (Print)	1938-7228

Conference

Conference	7th Asian Conference on Machine Learning
Country/Territory	China
City	Hong Kong
Period	20/11/15 → 22/11/15

Access to Document

ACML15-Nearest-Neighbour
Published as detailed above. Checked Feb 2016
Accepted author manuscript, 690 KBLicence: None: All rights reserved

http://proceedings.mlr.press/v45/Kaban15b.pdfLicence: None: All rights reserved

Cite this

@inproceedings{6c587de891a14815b94d7d6dacd2434d,

title = "A New Look at Nearest Neighbours: Identifying Benign Input Geometries via Random Projections",

abstract = "It is well known that in general, the nearest neighbour rule (NN) has sample complexity that is exponential in the input space dimension d when only smoothness is assumed on the label posterior function. Here we consider NN on randomly projected data, and we show that, if the input domain has a small ”metric size”, then the sample complexity becomes exponential in the metric entropy integral of the set of normalised chords of the input domain. This metric entropy integral measures the complexity of the input domain, and can be much smaller than d – for instance in cases when the data lies in a linear or a smoothnonlinear subspace of the ambient space, or when it has a sparse representation. We then show that the guarantees we obtain for the compressive NN also hold for the dataspace NN in bounded domains; thus the random projection takes the role of an analytic tool to identify benign structures under which NN learning is possible from a small sample size. Numerical simulations on data designed to have intrinsically low complexity confirm our theoretical findings, and display a striking agreement in the empirical performances of compressive NN and dataspace NN. This suggests that high dimensional data sets that have a lowcomplexity underlying structure are well suited for computationally cheap compressive NN learning.",

author = "Ata Kaban",

year = "2016",

month = feb,

day = "25",

language = "English",

volume = "45",

series = "Proceedings of Machine Learning Research",

publisher = "JMLR ",

pages = "65--80",

booktitle = "ACML 2015 Proceedings",

note = "7th Asian Conference on Machine Learning ; Conference date: 20-11-2015 Through 22-11-2015",

}

TY - GEN

T1 - A New Look at Nearest Neighbours: Identifying Benign Input Geometries via Random Projections

AU - Kaban, Ata

PY - 2016/2/25

Y1 - 2016/2/25

N2 - It is well known that in general, the nearest neighbour rule (NN) has sample complexity that is exponential in the input space dimension d when only smoothness is assumed on the label posterior function. Here we consider NN on randomly projected data, and we show that, if the input domain has a small ”metric size”, then the sample complexity becomes exponential in the metric entropy integral of the set of normalised chords of the input domain. This metric entropy integral measures the complexity of the input domain, and can be much smaller than d – for instance in cases when the data lies in a linear or a smoothnonlinear subspace of the ambient space, or when it has a sparse representation. We then show that the guarantees we obtain for the compressive NN also hold for the dataspace NN in bounded domains; thus the random projection takes the role of an analytic tool to identify benign structures under which NN learning is possible from a small sample size. Numerical simulations on data designed to have intrinsically low complexity confirm our theoretical findings, and display a striking agreement in the empirical performances of compressive NN and dataspace NN. This suggests that high dimensional data sets that have a lowcomplexity underlying structure are well suited for computationally cheap compressive NN learning.

AB - It is well known that in general, the nearest neighbour rule (NN) has sample complexity that is exponential in the input space dimension d when only smoothness is assumed on the label posterior function. Here we consider NN on randomly projected data, and we show that, if the input domain has a small ”metric size”, then the sample complexity becomes exponential in the metric entropy integral of the set of normalised chords of the input domain. This metric entropy integral measures the complexity of the input domain, and can be much smaller than d – for instance in cases when the data lies in a linear or a smoothnonlinear subspace of the ambient space, or when it has a sparse representation. We then show that the guarantees we obtain for the compressive NN also hold for the dataspace NN in bounded domains; thus the random projection takes the role of an analytic tool to identify benign structures under which NN learning is possible from a small sample size. Numerical simulations on data designed to have intrinsically low complexity confirm our theoretical findings, and display a striking agreement in the empirical performances of compressive NN and dataspace NN. This suggests that high dimensional data sets that have a lowcomplexity underlying structure are well suited for computationally cheap compressive NN learning.

M3 - Conference contribution

VL - 45

T3 - Proceedings of Machine Learning Research

SP - 65

EP - 80

BT - ACML 2015 Proceedings

PB - JMLR

T2 - 7th Asian Conference on Machine Learning

Y2 - 20 November 2015 through 22 November 2015

ER -

A New Look at Nearest Neighbours: Identifying Benign Input Geometries via Random Projections

Abstract

Publication series

Conference

Access to Document

Fingerprint

Cite this