Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

7-2018

Abstract

This work aims to provide understandings on the remarkable success of deep convolutional neural networks (CNNs) by theoretically analyzing their generalization performance and establishing optimization guarantees for gradient descent based training algorithms. Specifically, for a CNN model consisting of l convolutional layers and one fully connected layer, we prove that its generalization error is bounded by O( p θ%/n e ) where θ denotes freedom degree of the network parameters and %e = O(log(Ql i=1 bi(ki − si + 1)/p) + log(bl+1)) encapsulates architecture parameters including the kernel size ki , stride si , pooling size p and parameter magnitude bi . To our best knowledge, this is the first generalization bound that only depends on O(log(Ql+1 i=1 bi)), tighter than existing ones that all involve an exponential term like O( Ql+1 i=1 bi). Besides, we prove that for an arbitrary gradient descent algorithm, the computed approximate stationary point by minimizing empirical risk is also an approximate stationary point to the population risk. This well explains why gradient descent training algorithms usually perform sufficiently well in practice. Furthermore, we prove the one-to-one correspondence and convergence guarantees for the non-degenerate stationary points between the empirical and population risks. It implies that the computed local minimum for the empirical risk is also close to a local minimum for the population risk, thus ensuring the good generalization performance of CNNs.

Discipline

Theory and Algorithms

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the 35th International Conference on Machine Learning, Stockholm Sweden, 2018 July 10-15

First Page

Last Page

Publisher

Proceedings of Machine Learning Research

City or Country

Stockholm, Sweden

Citation

ZHOU, Pan and FENG, Jiashi. Understanding generalization and optimization performance of deep CNNs. (2018). Proceedings of the 35th International Conference on Machine Learning, Stockholm Sweden, 2018 July 10-15. 1-38.
Available at: https://ink.library.smu.edu.sg/sis_research/9010

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://publons.com/wos-op/publon/52135107/

Download

Included in

Theory and Algorithms Commons

COinS

Research Collection School Of Computing and Information Systems

Understanding generalization and optimization performance of deep CNNs

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Understanding generalization and optimization performance of deep CNNs

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links