Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

12-2021

Abstract

To train networks, lookahead algorithm [1] updates its fast weights k times via an inner-loop optimizer before updating its slow weights once by using the latest fast weights. Any optimizer, e.g. SGD, can serve as the inner-loop optimizer, and the derived lookahead generally enjoys remarkable test performance improvement over the vanilla optimizer. But theoretical understandings on the test performance improvement of lookahead remain absent yet. To solve this issue, we theoretically justify the advantages of lookahead in terms of the excess risk error which measures the test performance. Specifically, we prove that lookahead using SGD as its inner-loop optimizer can better balance the optimization error and generalization error to achieve smaller excess risk error than vanilla SGD on (strongly) convex problems and nonconvex problems with Polyak-Łojasiewicz condition which has been observed/proved in neural networks. Moreover, we show the stagewise optimization strategy [2] which decays learning rate several times during training can also benefit lookahead in improving its optimization and generalization errors on strongly convex problems. Finally, we propose a stagewise locally-regularized lookahead (SLRLA) algorithm which sums up the vanilla objective and a local regularizer to minimize at each stage and provably enjoys optimization and generalization improvement over the conventional (stagewise) lookahead. Experimental results on CIFAR10/100 and ImageNet testify its advantages. Codes is available at https://github.com/sail-sg/SLRLA-optimizer.

Discipline

Theory and Algorithms

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Sydney, Australia, December 6-14

First Page

Last Page

Publisher

NeurIPS

City or Country

Virtual Conference

Citation

ZHOU, Pan; YAN, Hanshu; YUAN, Xiaotong; FENG, Jiashi; and YAN, Shuicheng. Towards understanding why Lookahead generalizes better than SGD and beyond. (2021). Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Sydney, Australia, December 6-14. 1-15.
Available at: https://ink.library.smu.edu.sg/sis_research/8987

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://proceedings.neurips.cc/paper/2021/hash/e53a0a2978c28872a4505bdb51db06dc-Abstract.html

Download

Included in

Theory and Algorithms Commons

COinS

Research Collection School Of Computing and Information Systems

Towards understanding why Lookahead generalizes better than SGD and beyond

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Towards understanding why Lookahead generalizes better than SGD and beyond

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links