Optimisation and Learning with Randomly Compressed Gradient Updates

Zhanliang Huang; Yunwen Lei; Ata Kaban

doi:10.1162/neco_a_01588

Optimisation and Learning with Randomly Compressed Gradient Updates

Zhanliang Huang^*, Yunwen Lei, Ata Kaban

^*Corresponding author for this work

Computer Science

Research output: Contribution to journal › Article › peer-review

202 Downloads (Pure)

Abstract

Gradient descent methods are simple and efficient optimisation algorithms with widespread applications. To handle high-dimensional problems, we study compressed stochastic gradient descent (SGD) with low-dimensional gradient updates. We provide a detailed analysis in terms of both optimisation rates and generalisation rates. To this end, we develop uniform stability bounds for CompSGD for both smooth and non-smooth problems, based on which we develop almost optimal population risk bounds. Then, we extend our analysis to two variants of SGD – batch and mini-batch gradient descent. Furthermore, we show these variants achieve almost optimal rates compared to their high-dimensional gradient setting. Thus, our results provide a way to reduce the dimension of gradient updates without affecting the convergence rate in the generalisation analysis. Moreover, we show that the same result also holds in the differentially private setting, which allows us to reduce the dimension of added noise at “almost free” cost.

Original language	English
Number of pages	53
Journal	Neural Computation
Early online date	15 May 2023
DOIs	https://doi.org/10.1162/neco_a_01588
Publication status	E-pub ahead of print - 15 May 2023

Keywords

Gradient descent
Random projection
Generalisation bounds
Differential privacy

Access to Document

10.1162/neco_a_01588

HuangZ2023Optimisation
This document is the Author Accepted Manuscript version of a published work, Zhanliang Huang, Yunwen Lei, Ata Kabán; Optimization and Learning With Randomly Compressed Gradient Updates. Neural Comput 2023; doi: https://doi.org/10.1162/neco_a_01588, which appears in its final form in Neural Computation, copyright © 2023 Massachusetts Institute of Technology. The final Version of Record can be found at: https://doi.org/10.1162/neco_a_01588
Accepted author manuscript, 461 KBLicence: None: All rights reserved

Cite this

@article{a240701fc9f84943820a45e3f36de2f8,

title = "Optimisation and Learning with Randomly Compressed Gradient Updates",

abstract = "Gradient descent methods are simple and efficient optimisation algorithms with widespread applications. To handle high-dimensional problems, we study compressed stochastic gradient descent (SGD) with low-dimensional gradient updates. We provide a detailed analysis in terms of both optimisation rates and generalisation rates. To this end, we develop uniform stability bounds for CompSGD for both smooth and non-smooth problems, based on which we develop almost optimal population risk bounds. Then, we extend our analysis to two variants of SGD – batch and mini-batch gradient descent. Furthermore, we show these variants achieve almost optimal rates compared to their high-dimensional gradient setting. Thus, our results provide a way to reduce the dimension of gradient updates without affecting the convergence rate in the generalisation analysis. Moreover, we show that the same result also holds in the differentially private setting, which allows us to reduce the dimension of added noise at “almost free” cost.",

keywords = "Gradient descent, Random projection, Generalisation bounds, Differential privacy",

author = "Zhanliang Huang and Yunwen Lei and Ata Kaban",

year = "2023",

month = may,

day = "15",

doi = "10.1162/neco_a_01588",

language = "English",

journal = "Neural Computation",

issn = "0899-7667",

publisher = "Massachusetts Institute of Technology Press",

}

TY - JOUR

T1 - Optimisation and Learning with Randomly Compressed Gradient Updates

AU - Huang, Zhanliang

AU - Lei, Yunwen

AU - Kaban, Ata

PY - 2023/5/15

Y1 - 2023/5/15

N2 - Gradient descent methods are simple and efficient optimisation algorithms with widespread applications. To handle high-dimensional problems, we study compressed stochastic gradient descent (SGD) with low-dimensional gradient updates. We provide a detailed analysis in terms of both optimisation rates and generalisation rates. To this end, we develop uniform stability bounds for CompSGD for both smooth and non-smooth problems, based on which we develop almost optimal population risk bounds. Then, we extend our analysis to two variants of SGD – batch and mini-batch gradient descent. Furthermore, we show these variants achieve almost optimal rates compared to their high-dimensional gradient setting. Thus, our results provide a way to reduce the dimension of gradient updates without affecting the convergence rate in the generalisation analysis. Moreover, we show that the same result also holds in the differentially private setting, which allows us to reduce the dimension of added noise at “almost free” cost.

AB - Gradient descent methods are simple and efficient optimisation algorithms with widespread applications. To handle high-dimensional problems, we study compressed stochastic gradient descent (SGD) with low-dimensional gradient updates. We provide a detailed analysis in terms of both optimisation rates and generalisation rates. To this end, we develop uniform stability bounds for CompSGD for both smooth and non-smooth problems, based on which we develop almost optimal population risk bounds. Then, we extend our analysis to two variants of SGD – batch and mini-batch gradient descent. Furthermore, we show these variants achieve almost optimal rates compared to their high-dimensional gradient setting. Thus, our results provide a way to reduce the dimension of gradient updates without affecting the convergence rate in the generalisation analysis. Moreover, we show that the same result also holds in the differentially private setting, which allows us to reduce the dimension of added noise at “almost free” cost.

KW - Gradient descent

KW - Random projection

KW - Generalisation bounds

KW - Differential privacy

U2 - 10.1162/neco_a_01588

DO - 10.1162/neco_a_01588

M3 - Article

SN - 0899-7667

JO - Neural Computation

JF - Neural Computation

ER -

Optimisation and Learning with Randomly Compressed Gradient Updates

Abstract

Keywords

Access to Document

Fingerprint

Cite this