Research Collection School Of Computing and Information Systems

PAGE: A simple and optimal probabilistic gradient estimator for nonconvex optimization

Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

7-2021

Abstract

In this paper, we propose a novel stochastic gradient estimator---ProbAbilistic Gradient Estimator (PAGE)---for nonconvex optimization. PAGE is easy to implement as it is designed via a small adjustment to vanilla SGD: in each iteration, PAGE uses the vanilla minibatch SGD update with probability $p_t$ or reuses the previous gradient with a small adjustment, at a much lower computational cost, with probability $1-p_t$. We give a simple formula for the optimal choice of $p_t$. Moreover, we prove the first tight lower bound $\Omega(n+\frac{\sqrt{n}}{\epsilon^2})$ for nonconvex finite-sum problems, which also leads to a tight lower bound $\Omega(b+\frac{\sqrt{b}}{\epsilon^2})$ for nonconvex online problems, where $b:= \min\{\frac{\sigma^2}{\epsilon^2}, n\}$. Then, we show that PAGE obtains the optimal convergence results $O(n+\frac{\sqrt{n}}{\epsilon^2})$ (finite-sum) and $O(b+\frac{\sqrt{b}}{\epsilon^2})$ (online) matching our lower bounds for both nonconvex finite-sum and online problems. Besides, we also show that for nonconvex functions satisfying the Polyak-\L ojasiewicz (PL) condition, PAGE can automatically switch to a faster linear convergence rate $O(\cdot\log \frac{1}{\epsilon})$. Finally, we conduct several deep learning experiments (e.g., LeNet, VGG, ResNet) on real datasets in PyTorch showing that PAGE not only converges much faster than SGD in training but also achieves the higher test accuracy, validating the optimal theoretical results and confirming the practical superiority of PAGE.

Discipline

Databases and Information Systems

Research Areas

Data Science and Engineering; Intelligent Systems and Optimization

Publication

Proceedings of the 38th International Conference on Machine Learning (ICML 2021), Virtual Conference, July 18-24

First Page

Last Page

Publisher

Proceedings of Machine Learning Research

City or Country

Virtual Conference

Citation

LI, Zhize; BAO, Hongyan; ZHANG, Xiangliang; and RICHTARIK, Peter. PAGE: A simple and optimal probabilistic gradient estimator for nonconvex optimization. (2021). Proceedings of the 38th International Conference on Machine Learning (ICML 2021), Virtual Conference, July 18-24. 1-25.
Available at: https://ink.library.smu.edu.sg/sis_research/8683

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://proceedings.mlr.press/v139/li21a.html

Download

Included in

Databases and Information Systems Commons

COinS

Research Collection School Of Computing and Information Systems

PAGE: A simple and optimal probabilistic gradient estimator for nonconvex optimization

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

PAGE: A simple and optimal probabilistic gradient estimator for nonconvex optimization

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links