Generalization performance of multi-pass stochastic gradient descent with convex loss functions

Yunwen Lei; Ting Hu; Ke Tang

Generalization performance of multi-pass stochastic gradient descent with convex loss functions

Yunwen Lei, Ting Hu, Ke Tang

Computer Science

Research output: Contribution to journal › Article › peer-review

72 Downloads (Pure)

Abstract

Stochastic gradient descent (SGD) has become the method of choice to tackle large-scale datasets due to its low computational cost and good practical performance. Learning rate analysis, either capacity-independent or capacity-dependent, provides a unifying viewpoint to study the computational and statistical properties of SGD, as well as the implicit regularization by tuning the number of passes. Existing capacity-independent learning rates require a nontrivial bounded subgradient assumption and a smoothness assumption to be optimal. Furthermore, existing capacity-dependent learning rates are only established for the specific least squares loss with a special structure. In this paper, we provide both optimal capacity-independent and capacity-dependent learning rates for SGD with general convex loss functions. Our results require neither bounded subgradient assumptions nor smoothness assumptions, and are stated with high probability. We achieve this improvement by a refined estimate on the norm of SGD iterates based on a careful martingale analysis and concentration inequalities on empirical processes.

Original language	English
Article number	25
Journal	Journal of Machine Learning Research
Volume	22
Publication status	Published - 31 Jan 2021

Keywords

Generalization bound
Learning theory
Stochastic gradient descent

ASJC Scopus subject areas

Software
Control and Systems Engineering
Statistics and Probability
Artificial Intelligence

Access to Document

LeiY2021GeneralizationFinal published version, 479 KBLicence: Creative Commons: Attribution (CC BY)

https://jmlr.org/papers/v22/19-716.htmlLicence: Creative Commons: Attribution (CC BY)

Cite this

@article{7f4a8051a5434268bcdbfb0b4141fe3e,

title = "Generalization performance of multi-pass stochastic gradient descent with convex loss functions",

abstract = "Stochastic gradient descent (SGD) has become the method of choice to tackle large-scale datasets due to its low computational cost and good practical performance. Learning rate analysis, either capacity-independent or capacity-dependent, provides a unifying viewpoint to study the computational and statistical properties of SGD, as well as the implicit regularization by tuning the number of passes. Existing capacity-independent learning rates require a nontrivial bounded subgradient assumption and a smoothness assumption to be optimal. Furthermore, existing capacity-dependent learning rates are only established for the specific least squares loss with a special structure. In this paper, we provide both optimal capacity-independent and capacity-dependent learning rates for SGD with general convex loss functions. Our results require neither bounded subgradient assumptions nor smoothness assumptions, and are stated with high probability. We achieve this improvement by a refined estimate on the norm of SGD iterates based on a careful martingale analysis and concentration inequalities on empirical processes.",

keywords = "Generalization bound, Learning theory, Stochastic gradient descent",

author = "Yunwen Lei and Ting Hu and Ke Tang",

year = "2021",

month = jan,

day = "31",

language = "English",

volume = "22",

journal = "Journal of Machine Learning Research",

issn = "1532-4435",

publisher = "Journal of Machine Learning Research",

}

TY - JOUR

T1 - Generalization performance of multi-pass stochastic gradient descent with convex loss functions

AU - Lei, Yunwen

AU - Hu, Ting

AU - Tang, Ke

PY - 2021/1/31

Y1 - 2021/1/31

N2 - Stochastic gradient descent (SGD) has become the method of choice to tackle large-scale datasets due to its low computational cost and good practical performance. Learning rate analysis, either capacity-independent or capacity-dependent, provides a unifying viewpoint to study the computational and statistical properties of SGD, as well as the implicit regularization by tuning the number of passes. Existing capacity-independent learning rates require a nontrivial bounded subgradient assumption and a smoothness assumption to be optimal. Furthermore, existing capacity-dependent learning rates are only established for the specific least squares loss with a special structure. In this paper, we provide both optimal capacity-independent and capacity-dependent learning rates for SGD with general convex loss functions. Our results require neither bounded subgradient assumptions nor smoothness assumptions, and are stated with high probability. We achieve this improvement by a refined estimate on the norm of SGD iterates based on a careful martingale analysis and concentration inequalities on empirical processes.

AB - Stochastic gradient descent (SGD) has become the method of choice to tackle large-scale datasets due to its low computational cost and good practical performance. Learning rate analysis, either capacity-independent or capacity-dependent, provides a unifying viewpoint to study the computational and statistical properties of SGD, as well as the implicit regularization by tuning the number of passes. Existing capacity-independent learning rates require a nontrivial bounded subgradient assumption and a smoothness assumption to be optimal. Furthermore, existing capacity-dependent learning rates are only established for the specific least squares loss with a special structure. In this paper, we provide both optimal capacity-independent and capacity-dependent learning rates for SGD with general convex loss functions. Our results require neither bounded subgradient assumptions nor smoothness assumptions, and are stated with high probability. We achieve this improvement by a refined estimate on the norm of SGD iterates based on a careful martingale analysis and concentration inequalities on empirical processes.

KW - Generalization bound

KW - Learning theory

KW - Stochastic gradient descent

UR - http://www.scopus.com/inward/record.url?scp=85105877079&partnerID=8YFLogxK

M3 - Article

SN - 1532-4435

VL - 22

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

M1 - 25

ER -

Generalization performance of multi-pass stochastic gradient descent with convex loss functions

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Fingerprint

Cite this