Sharper generalization bounds for learning with gradient-dominated objective functions

Yunwen Lei; Yiming Ying

Sharper generalization bounds for learning with gradient-dominated objective functions

Yunwen Lei, Yiming Ying

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Stochastic optimization has become the workhorse behind many successful machine learning applications, which motivates a lot of theoretical analysis to understand its empirical behavior. As a comparison, there is far less work to study the generalization behavior especially in a non-convex learning setting. In this paper, we study the generalization behavior of stochastic optimization by leveraging the algorithmic stability for learning with β-gradient-dominated objective functions. We develop generalization bounds of the order O(1/nβ)) plus the convergence rate of the optimization algorithm, where n is the sample size. Our stability analysis significantly improves the existing non-convex analysis by removing the bounded gradient assumption and implying better generalization bounds. We achieve this improvement by exploiting the smoothness of loss functions instead of the Lipschitz condition in Charles & Papailiopoulos (2018). We apply our general results to various stochastic optimization algorithms, which show clearly how the variance-reduction techniques improve not only training but also generalization. Furthermore, our discussion explains how interpolation helps generalization for highly expressive models.

Original language	English
Title of host publication	International Conference on Learning Representations
Subtitle of host publication	ICLR 2021
Publisher	OpenReview.net
Pages	1-23
Number of pages	23
Publication status	Published - 16 Jan 2021
Event	The Ninth International Conference on Learning Representations - Virtual Duration: 3 May 2021 → 7 May 2021

Conference

Conference	The Ninth International Conference on Learning Representations
Abbreviated title	ICLR 2021
Period	3/05/21 → 7/05/21

Keywords

generalization bounds
non-convex learning

Access to Document

https://openreview.net/forum?id=r28GdiQF7vM

Cite this

@inproceedings{bc6a97d3d4d84d29a25ca49f442e3772,

title = "Sharper generalization bounds for learning with gradient-dominated objective functions",

abstract = "Stochastic optimization has become the workhorse behind many successful machine learning applications, which motivates a lot of theoretical analysis to understand its empirical behavior. As a comparison, there is far less work to study the generalization behavior especially in a non-convex learning setting. In this paper, we study the generalization behavior of stochastic optimization by leveraging the algorithmic stability for learning with β-gradient-dominated objective functions. We develop generalization bounds of the order O(1/nβ)) plus the convergence rate of the optimization algorithm, where n is the sample size. Our stability analysis significantly improves the existing non-convex analysis by removing the bounded gradient assumption and implying better generalization bounds. We achieve this improvement by exploiting the smoothness of loss functions instead of the Lipschitz condition in Charles & Papailiopoulos (2018). We apply our general results to various stochastic optimization algorithms, which show clearly how the variance-reduction techniques improve not only training but also generalization. Furthermore, our discussion explains how interpolation helps generalization for highly expressive models.",

keywords = "generalization bounds, non-convex learning",

author = "Yunwen Lei and Yiming Ying",

year = "2021",

month = jan,

day = "16",

language = "English",

pages = "1--23",

booktitle = "International Conference on Learning Representations",

publisher = "OpenReview.net",

note = "The Ninth International Conference on Learning Representations, ICLR 2021 ; Conference date: 03-05-2021 Through 07-05-2021",

}

TY - GEN

T1 - Sharper generalization bounds for learning with gradient-dominated objective functions

AU - Lei, Yunwen

AU - Ying, Yiming

PY - 2021/1/16

Y1 - 2021/1/16

N2 - Stochastic optimization has become the workhorse behind many successful machine learning applications, which motivates a lot of theoretical analysis to understand its empirical behavior. As a comparison, there is far less work to study the generalization behavior especially in a non-convex learning setting. In this paper, we study the generalization behavior of stochastic optimization by leveraging the algorithmic stability for learning with β-gradient-dominated objective functions. We develop generalization bounds of the order O(1/nβ)) plus the convergence rate of the optimization algorithm, where n is the sample size. Our stability analysis significantly improves the existing non-convex analysis by removing the bounded gradient assumption and implying better generalization bounds. We achieve this improvement by exploiting the smoothness of loss functions instead of the Lipschitz condition in Charles & Papailiopoulos (2018). We apply our general results to various stochastic optimization algorithms, which show clearly how the variance-reduction techniques improve not only training but also generalization. Furthermore, our discussion explains how interpolation helps generalization for highly expressive models.

AB - Stochastic optimization has become the workhorse behind many successful machine learning applications, which motivates a lot of theoretical analysis to understand its empirical behavior. As a comparison, there is far less work to study the generalization behavior especially in a non-convex learning setting. In this paper, we study the generalization behavior of stochastic optimization by leveraging the algorithmic stability for learning with β-gradient-dominated objective functions. We develop generalization bounds of the order O(1/nβ)) plus the convergence rate of the optimization algorithm, where n is the sample size. Our stability analysis significantly improves the existing non-convex analysis by removing the bounded gradient assumption and implying better generalization bounds. We achieve this improvement by exploiting the smoothness of loss functions instead of the Lipschitz condition in Charles & Papailiopoulos (2018). We apply our general results to various stochastic optimization algorithms, which show clearly how the variance-reduction techniques improve not only training but also generalization. Furthermore, our discussion explains how interpolation helps generalization for highly expressive models.

KW - generalization bounds

KW - non-convex learning

UR - https://openreview.net/revisions?id=r28GdiQF7vM

M3 - Conference contribution

SP - 1

EP - 23

BT - International Conference on Learning Representations

PB - OpenReview.net

T2 - The Ninth International Conference on Learning Representations

Y2 - 3 May 2021 through 7 May 2021

ER -

Sharper generalization bounds for learning with gradient-dominated objective functions

Abstract

Conference

Keywords

Access to Document

Fingerprint

Cite this