Structure discovery in PAC-Learning by Random Projections

Ata Kaban; Henry Reeve

doi:10.1007/s10994-024-06531-0

Structure discovery in PAC-Learning by Random Projections

Ata Kaban^*, Henry Reeve

^*Corresponding author for this work

Computer Science

Research output: Contribution to journal › Article › peer-review

17 Downloads (Pure)

Abstract

High dimensional learning is data-hungry in general; however, many natural data sources and real-world learning problems posses some hidden low-complexity structure that permit effective learning from relatively small sample sizes. We are interested in the general question of how to discover and exploit such hidden benign traits when problem-specific prior knowledge is insufficient. In this work, we address this question through random projection’s ability to expose structure. We study both compressive learning and high dimensional learning from this angle by introducing the notions of compressive distortion and compressive complexity. We give user-friendly PAC bounds in the agnostic setting that are formulated in terms of these quantities, and we show that our bounds can be tight when these quantities are small. We then instantiate these quantities in several examples of particular learning problems, demonstrating their ability to discover interpretable structural characteristics that make high dimensional instances of these problems solvable to good approximation in a random linear subspace. In the examples considered, these turn out to resemble some familiar benign traits such as the margin, the margin distribution, the intrinsic dimension, the spectral decay of the data covariance, or the norms of parameters—while our general notions of compressive distortion and compressive complexity serve to unify these, and may be used to discover benign structural traits for other PAC-learnable problems.

Original language	English
Journal	Machine Learning
Early online date	26 Mar 2024
DOIs	https://doi.org/10.1007/s10994-024-06531-0
Publication status	E-pub ahead of print - 26 Mar 2024

Bibliographical note

Funding:
This work was funded by EPSRC Fellowship EP/P004245/1.

Keywords

Random Projection
Generalization
Structure Discovery

Access to Document

10.1007/s10994-024-06531-0Licence: Creative Commons: Attribution (CC BY)

KabánA2024StructureFinal published version, 3.38 MBLicence: Creative Commons: Attribution (CC BY)

FORGING: Fortuitous Geometries and Compressive Learning
Kaban, A.
Engineering & Physical Science Research Council
9/01/17 → 8/01/23
Project: Research Councils

Cite this

@article{67c594e4fdb84772be19f50de8a516b0,

title = "Structure discovery in PAC-Learning by Random Projections",

abstract = "High dimensional learning is data-hungry in general; however, many natural data sources and real-world learning problems posses some hidden low-complexity structure that permit effective learning from relatively small sample sizes. We are interested in the general question of how to discover and exploit such hidden benign traits when problem-specific prior knowledge is insufficient. In this work, we address this question through random projection{\textquoteright}s ability to expose structure. We study both compressive learning and high dimensional learning from this angle by introducing the notions of compressive distortion and compressive complexity. We give user-friendly PAC bounds in the agnostic setting that are formulated in terms of these quantities, and we show that our bounds can be tight when these quantities are small. We then instantiate these quantities in several examples of particular learning problems, demonstrating their ability to discover interpretable structural characteristics that make high dimensional instances of these problems solvable to good approximation in a random linear subspace. In the examples considered, these turn out to resemble some familiar benign traits such as the margin, the margin distribution, the intrinsic dimension, the spectral decay of the data covariance, or the norms of parameters—while our general notions of compressive distortion and compressive complexity serve to unify these, and may be used to discover benign structural traits for other PAC-learnable problems.",

keywords = "Random Projection, Generalization, Structure Discovery",

author = "Ata Kaban and Henry Reeve",

note = "Funding: This work was funded by EPSRC Fellowship EP/P004245/1.",

year = "2024",

month = mar,

day = "26",

doi = "10.1007/s10994-024-06531-0",

language = "English",

journal = "Machine Learning",

issn = "0885-6125",

publisher = "Springer",

}

TY - JOUR

T1 - Structure discovery in PAC-Learning by Random Projections

AU - Kaban, Ata

AU - Reeve, Henry

N1 - Funding: This work was funded by EPSRC Fellowship EP/P004245/1.

PY - 2024/3/26

Y1 - 2024/3/26

N2 - High dimensional learning is data-hungry in general; however, many natural data sources and real-world learning problems posses some hidden low-complexity structure that permit effective learning from relatively small sample sizes. We are interested in the general question of how to discover and exploit such hidden benign traits when problem-specific prior knowledge is insufficient. In this work, we address this question through random projection’s ability to expose structure. We study both compressive learning and high dimensional learning from this angle by introducing the notions of compressive distortion and compressive complexity. We give user-friendly PAC bounds in the agnostic setting that are formulated in terms of these quantities, and we show that our bounds can be tight when these quantities are small. We then instantiate these quantities in several examples of particular learning problems, demonstrating their ability to discover interpretable structural characteristics that make high dimensional instances of these problems solvable to good approximation in a random linear subspace. In the examples considered, these turn out to resemble some familiar benign traits such as the margin, the margin distribution, the intrinsic dimension, the spectral decay of the data covariance, or the norms of parameters—while our general notions of compressive distortion and compressive complexity serve to unify these, and may be used to discover benign structural traits for other PAC-learnable problems.

AB - High dimensional learning is data-hungry in general; however, many natural data sources and real-world learning problems posses some hidden low-complexity structure that permit effective learning from relatively small sample sizes. We are interested in the general question of how to discover and exploit such hidden benign traits when problem-specific prior knowledge is insufficient. In this work, we address this question through random projection’s ability to expose structure. We study both compressive learning and high dimensional learning from this angle by introducing the notions of compressive distortion and compressive complexity. We give user-friendly PAC bounds in the agnostic setting that are formulated in terms of these quantities, and we show that our bounds can be tight when these quantities are small. We then instantiate these quantities in several examples of particular learning problems, demonstrating their ability to discover interpretable structural characteristics that make high dimensional instances of these problems solvable to good approximation in a random linear subspace. In the examples considered, these turn out to resemble some familiar benign traits such as the margin, the margin distribution, the intrinsic dimension, the spectral decay of the data covariance, or the norms of parameters—while our general notions of compressive distortion and compressive complexity serve to unify these, and may be used to discover benign structural traits for other PAC-learnable problems.

KW - Random Projection

KW - Generalization

KW - Structure Discovery

U2 - 10.1007/s10994-024-06531-0

DO - 10.1007/s10994-024-06531-0

M3 - Article

SN - 0885-6125

JO - Machine Learning

JF - Machine Learning

ER -

Structure discovery in PAC-Learning by Random Projections

Abstract

Bibliographical note

Keywords

Access to Document

Projects

FORGING: Fortuitous Geometries and Compressive Learning

Cite this