Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

5-2018

Abstract

This work aims to provide comprehensive landscape analysis of empirical risk in deep neural networks (DNNs), including the convergence behavior of its gradient, its stationary points and the empirical risk itself to their corresponding population counterparts, which reveals how various network parameters determine the convergence performance. In particular, for an l-layer linear neural network consisting of di neurons in the i-th layer, we prove the gradient of its empirical risk uniformly converges to the one of its population risk, at the rate of O(r 2l p l √ maxi dis log(d/l)/n). Here d is the total weight dimension, s is the number of nonzero entries of all the weights and the magnitude of weights per layer is upper bounded by r. Moreover, we prove the one-to-one correspondence of the non-degenerate stationary points between the empirical and population risks and provide convergence guarantee for each pair. We also establish the uniform convergence of the empirical risk to its population counterpart and further derive the stability and generalization bounds for the empirical risk. In addition, we analyze these properties for deep nonlinear neural networks with sigmoid activation functions. We prove similar results for convergence behavior of their empirical risk gradients, non-degenerate stationary points as well as the empirical risk itself. To our best knowledge, this work is the first one theoretically characterizing the uniform convergence of the gradient and stationary points of the empirical risk of DNN models, which benefits the theoretical understanding on how the neural network depth l, the layer width di , the network size d, the sparsity in weight and the parameter magnitude r determine the neural network landscape.

Discipline

OS and Networks | Theory and Algorithms

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, Canada, April 30 - May 3

First Page

Last Page

Publisher

ICLR

City or Country

Vancouver, Canada

Citation

ZHOU, Pan and FENG, Jiashi. Empirical risk landscape analysis for understanding deep neural networks. (2018). Proceedings of the 6th International Conference on Learning Representations, ICLR 2018, Vancouver, Canada, April 30 - May 3. 1-60.
Available at: https://ink.library.smu.edu.sg/sis_research/9023

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://openreview.net/forum?id=B1QgVti6Z

Download

Included in

OS and Networks Commons, Theory and Algorithms Commons

COinS

Research Collection School Of Computing and Information Systems

Empirical risk landscape analysis for understanding deep neural networks

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Empirical risk landscape analysis for understanding deep neural networks

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links