Learning rates for stochastic gradient descent with nonconvex objectives

Yunwen Lei; Ke Tang

doi:10.1109/TPAMI.2021.3068154

Learning rates for stochastic gradient descent with nonconvex objectives

Yunwen Lei, Ke Tang

Computer Science

Research output: Contribution to journal › Article › peer-review

249 Downloads (Pure)

Abstract

Stochastic gradient descent (SGD) has become the method of choice for training highly complex and nonconvex models since it can not only recover good solutions to minimize training errors but also generalize well. Computational and statistical properties are separately studied to understand the behavior of SGD in the literature. However, there is a lacking study to jointly consider the computational and statistical properties in a nonconvex learning setting. In this paper, we develop novel learning rates of SGD for nonconvex learning by presenting high-probability bounds for both computational and statistical errors. We show that the complexity of SGD iterates grows in a controllable manner with respect to the iteration number, which sheds insights on how an implicit regularization can be achieved by tuning the number of passes to balance the computational and statistical errors. As a byproduct, we also slightly refine the existing studies on the uniform convergence of gradients by showing its connection to Rademacher chaos complexities.

Original language	English
Pages (from-to)	4505-4511
Journal	IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume	43
Issue number	12
Early online date	23 Mar 2021
DOIs	https://doi.org/10.1109/TPAMI.2021.3068154
Publication status	E-pub ahead of print - 23 Mar 2021

Keywords

Learning Rates
Stochastic Gradient Descent
arly Stopping
onconvex Optimization

Access to Document

10.1109/TPAMI.2021.3068154Licence: Other (please provide link to licence statement

LeiY2021Learning
Y. Lei and K. Tang, "Learning Rates for Stochastic Gradient Descent with Nonconvex Objectives," in IEEE Transactions on Pattern Analysis and Machine Intelligence, doi: 10.1109/TPAMI.2021.3068154. © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Accepted author manuscript, 503 KBLicence: Other (please specify with Rights Statement)

Cite this

@article{f8f97193f4b3404685174107a8800cae,

title = "Learning rates for stochastic gradient descent with nonconvex objectives",

abstract = "Stochastic gradient descent (SGD) has become the method of choice for training highly complex and nonconvex models since it can not only recover good solutions to minimize training errors but also generalize well. Computational and statistical properties are separately studied to understand the behavior of SGD in the literature. However, there is a lacking study to jointly consider the computational and statistical properties in a nonconvex learning setting. In this paper, we develop novel learning rates of SGD for nonconvex learning by presenting high-probability bounds for both computational and statistical errors. We show that the complexity of SGD iterates grows in a controllable manner with respect to the iteration number, which sheds insights on how an implicit regularization can be achieved by tuning the number of passes to balance the computational and statistical errors. As a byproduct, we also slightly refine the existing studies on the uniform convergence of gradients by showing its connection to Rademacher chaos complexities.",

keywords = "Learning Rates, Stochastic Gradient Descent, arly Stopping, onconvex Optimization",

author = "Yunwen Lei and Ke Tang",

year = "2021",

month = mar,

day = "23",

doi = "10.1109/TPAMI.2021.3068154",

language = "English",

volume = "43",

pages = "4505--4511",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence ",

issn = "0162-8828",

publisher = "Institute of Electrical and Electronics Engineers (IEEE)",

number = "12",

}

TY - JOUR

T1 - Learning rates for stochastic gradient descent with nonconvex objectives

AU - Lei, Yunwen

AU - Tang, Ke

PY - 2021/3/23

Y1 - 2021/3/23

N2 - Stochastic gradient descent (SGD) has become the method of choice for training highly complex and nonconvex models since it can not only recover good solutions to minimize training errors but also generalize well. Computational and statistical properties are separately studied to understand the behavior of SGD in the literature. However, there is a lacking study to jointly consider the computational and statistical properties in a nonconvex learning setting. In this paper, we develop novel learning rates of SGD for nonconvex learning by presenting high-probability bounds for both computational and statistical errors. We show that the complexity of SGD iterates grows in a controllable manner with respect to the iteration number, which sheds insights on how an implicit regularization can be achieved by tuning the number of passes to balance the computational and statistical errors. As a byproduct, we also slightly refine the existing studies on the uniform convergence of gradients by showing its connection to Rademacher chaos complexities.

AB - Stochastic gradient descent (SGD) has become the method of choice for training highly complex and nonconvex models since it can not only recover good solutions to minimize training errors but also generalize well. Computational and statistical properties are separately studied to understand the behavior of SGD in the literature. However, there is a lacking study to jointly consider the computational and statistical properties in a nonconvex learning setting. In this paper, we develop novel learning rates of SGD for nonconvex learning by presenting high-probability bounds for both computational and statistical errors. We show that the complexity of SGD iterates grows in a controllable manner with respect to the iteration number, which sheds insights on how an implicit regularization can be achieved by tuning the number of passes to balance the computational and statistical errors. As a byproduct, we also slightly refine the existing studies on the uniform convergence of gradients by showing its connection to Rademacher chaos complexities.

KW - Learning Rates

KW - Stochastic Gradient Descent

KW - arly Stopping

KW - onconvex Optimization

UR - http://www.scopus.com/inward/record.url?scp=85103253583&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2021.3068154

DO - 10.1109/TPAMI.2021.3068154

M3 - Article

SN - 0162-8828

VL - 43

SP - 4505

EP - 4511

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 12

ER -

Learning rates for stochastic gradient descent with nonconvex objectives

Abstract

Keywords

Access to Document

Fingerprint

Cite this