Generalization guarantee of SGD for pairwise learning

Yunwen Lei; Mingrui Liu; Yiming Ying

Generalization guarantee of SGD for pairwise learning

Yunwen Lei, Mingrui Liu, Yiming Ying

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

30 Downloads (Pure)

Abstract

Recently, there is a growing interest in studying pairwise learning since it includes many important machine learning tasks as specific examples, e.g., metric learning, AUC maximization and ranking. While stochastic gradient descent (SGD) is an efficient method, there is a lacking study on its generalization behavior for pairwise learning. In this paper, we present a systematic study on the generalization analysis of SGD for pairwise learning to understand the balance between generalization and optimization. We develop a novel high-probability generalization bound for uniformly-stable algorithms to incorporate the variance information for better generalization, based on which we establish the first nonsmooth learning algorithm to achieve almost optimal high-probability and dimension-independent generalization bounds in linear time. We consider both convex and nonconvex pairwise learning problems. Our stability analysis for convex problems shows how the interpolation can help generalization. We establish a uniform convergence of gradients, and apply it to derive the first generalization bounds on population gradients for nonconvex problems. Finally, we develop better generalization bounds for gradient-dominated problems.

Original language	English
Title of host publication	Advances in Neural Information Processing Systems 34 (NeurIPS 2021)
Editors	M. Ranzato, A. Beygelzimer, P.S. Liang, J.W. Vaughan, Y. Dauphin
Publisher	NeurIPS
ISBN (Print)	9781713845393
Publication status	Published - 1 Dec 2021
Event	Thirty-fifth Conference on Neural Information Processing Systems - Virtual Duration: 6 Dec 2021 → 14 Dec 2021

Publication series

Name	Advances in neural information processing systems
Volume	34
ISSN (Print)	1049-5258

Conference

Conference	Thirty-fifth Conference on Neural Information Processing Systems
Abbreviated title	NeurIPS 2021
Period	6/12/21 → 14/12/21

Access to Document

LeiY2021GeneralizationAccepted author manuscript, 393 KBLicence: None: All rights reserved

https://proceedings.neurips.cc/paper/2021/hash/b1301141feffabac455e1f90a7de2054-Abstract.html

Cite this

Lei, Y., Liu, M., & Ying, Y. (2021). Generalization guarantee of SGD for pairwise learning. In M. Ranzato, A. Beygelzimer, P. S. Liang, J. W. Vaughan, & Y. Dauphin (Eds.), Advances in Neural Information Processing Systems 34 (NeurIPS 2021) (Advances in neural information processing systems; Vol. 34). NeurIPS. https://proceedings.neurips.cc/paper/2021/hash/b1301141feffabac455e1f90a7de2054-Abstract.html

@inproceedings{e4aefe9ccac8495ab751684c51365dad,

title = "Generalization guarantee of SGD for pairwise learning",

abstract = "Recently, there is a growing interest in studying pairwise learning since it includes many important machine learning tasks as specific examples, e.g., metric learning, AUC maximization and ranking. While stochastic gradient descent (SGD) is an efficient method, there is a lacking study on its generalization behavior for pairwise learning. In this paper, we present a systematic study on the generalization analysis of SGD for pairwise learning to understand the balance between generalization and optimization. We develop a novel high-probability generalization bound for uniformly-stable algorithms to incorporate the variance information for better generalization, based on which we establish the first nonsmooth learning algorithm to achieve almost optimal high-probability and dimension-independent generalization bounds in linear time. We consider both convex and nonconvex pairwise learning problems. Our stability analysis for convex problems shows how the interpolation can help generalization. We establish a uniform convergence of gradients, and apply it to derive the first generalization bounds on population gradients for nonconvex problems. Finally, we develop better generalization bounds for gradient-dominated problems.",

author = "Yunwen Lei and Mingrui Liu and Yiming Ying",

year = "2021",

month = dec,

day = "1",

language = "English",

isbn = "9781713845393",

series = "Advances in neural information processing systems",

publisher = "NeurIPS",

editor = "M. Ranzato and A. Beygelzimer and P.S. Liang and Vaughan, {J.W. } and Dauphin, {Y. }",

booktitle = "Advances in Neural Information Processing Systems 34 (NeurIPS 2021)",

note = "Thirty-fifth Conference on Neural Information Processing Systems, NeurIPS 2021 ; Conference date: 06-12-2021 Through 14-12-2021",

}

Lei, Y, Liu, M & Ying, Y 2021, Generalization guarantee of SGD for pairwise learning. in M Ranzato, A Beygelzimer, PS Liang, JW Vaughan & Y Dauphin (eds), Advances in Neural Information Processing Systems 34 (NeurIPS 2021). Advances in neural information processing systems, vol. 34, NeurIPS, Thirty-fifth Conference on Neural Information Processing Systems, 6/12/21. <https://proceedings.neurips.cc/paper/2021/hash/b1301141feffabac455e1f90a7de2054-Abstract.html>

Generalization guarantee of SGD for pairwise learning. / Lei, Yunwen; Liu, Mingrui; Ying, Yiming.
Advances in Neural Information Processing Systems 34 (NeurIPS 2021). ed. / M. Ranzato; A. Beygelzimer; P.S. Liang; J.W. Vaughan; Y. Dauphin. NeurIPS, 2021. (Advances in neural information processing systems; Vol. 34).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Generalization guarantee of SGD for pairwise learning

AU - Lei, Yunwen

AU - Liu, Mingrui

AU - Ying, Yiming

PY - 2021/12/1

Y1 - 2021/12/1

N2 - Recently, there is a growing interest in studying pairwise learning since it includes many important machine learning tasks as specific examples, e.g., metric learning, AUC maximization and ranking. While stochastic gradient descent (SGD) is an efficient method, there is a lacking study on its generalization behavior for pairwise learning. In this paper, we present a systematic study on the generalization analysis of SGD for pairwise learning to understand the balance between generalization and optimization. We develop a novel high-probability generalization bound for uniformly-stable algorithms to incorporate the variance information for better generalization, based on which we establish the first nonsmooth learning algorithm to achieve almost optimal high-probability and dimension-independent generalization bounds in linear time. We consider both convex and nonconvex pairwise learning problems. Our stability analysis for convex problems shows how the interpolation can help generalization. We establish a uniform convergence of gradients, and apply it to derive the first generalization bounds on population gradients for nonconvex problems. Finally, we develop better generalization bounds for gradient-dominated problems.

AB - Recently, there is a growing interest in studying pairwise learning since it includes many important machine learning tasks as specific examples, e.g., metric learning, AUC maximization and ranking. While stochastic gradient descent (SGD) is an efficient method, there is a lacking study on its generalization behavior for pairwise learning. In this paper, we present a systematic study on the generalization analysis of SGD for pairwise learning to understand the balance between generalization and optimization. We develop a novel high-probability generalization bound for uniformly-stable algorithms to incorporate the variance information for better generalization, based on which we establish the first nonsmooth learning algorithm to achieve almost optimal high-probability and dimension-independent generalization bounds in linear time. We consider both convex and nonconvex pairwise learning problems. Our stability analysis for convex problems shows how the interpolation can help generalization. We establish a uniform convergence of gradients, and apply it to derive the first generalization bounds on population gradients for nonconvex problems. Finally, we develop better generalization bounds for gradient-dominated problems.

M3 - Conference contribution

SN - 9781713845393

T3 - Advances in neural information processing systems

BT - Advances in Neural Information Processing Systems 34 (NeurIPS 2021)

A2 - Ranzato, M.

A2 - Beygelzimer, A.

A2 - Liang, P.S.

A2 - Vaughan, J.W.

A2 - Dauphin, Y.

PB - NeurIPS

T2 - Thirty-fifth Conference on Neural Information Processing Systems

Y2 - 6 December 2021 through 14 December 2021

ER -

Generalization guarantee of SGD for pairwise learning

Abstract

Publication series

Conference

Access to Document

Fingerprint

Cite this