Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

12-2025

Abstract

It has been recently observed in much of the literature that neural networks exhibit a bottleneck rank property: for larger depths, the activation and weights of neural networks trained with gradient-based methods tend to be of approximately low rank. In fact, the rank of the activations of each layer converges to a fixed value referred to as the “bottleneck rank”, which is the minimum rank required to represent the training data. This perspective is in line with the observation that regularizing linear networks (without activations) with weight decay is equivalent to minimizing the Schatten p quasi norm of the neural network. In this paper we investigate the implications of this phenomenon for generalization. More specifically, we prove generalization bounds for neural networks which exploit the approximate low rank structure of the weight matrices if present. The final results rely on the Schatten p quasi norms of the weight matrices: for small p, the bounds exhibit a sample complexity OrpW rL2 q where W and L are the width and depth of the neural network respectively and where r is the rank of the weight matrices. As p increases, the bound behaves more like a norm-based bound instead.

Discipline

Artificial Intelligence and Robotics

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the 39th Conference on Neural Information Processing Systems (NeurIPS 2025), San Diego, CA, December 2-7

First Page

1

Last Page

63

Publisher

PMLR

City or Country

United States of America

Additional URL

https://openreview.net/forum?id=n3M8h9mqDm

Share

COinS