Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

12-2023

Abstract

In diffusion models, UNet is the most popular network backbone, since its long skip connects (LSCs) to connect distant network blocks can aggregate long-distant information and alleviate vanishing gradient. Unfortunately, UNet often suffers from unstable training in diffusion models which can be alleviated by scaling its LSC coefficients smaller. However, theoretical understandings of the instability of UNet in diffusion models and also the performance improvement of LSC scaling remain absent yet. To solve this issue, we theoretically show that the coefficients of LSCs in UNet have big effects on the stableness of the forward and backward propagation and robustness of UNet. Specifically, the hidden feature and gradient of UNet at any layer can oscillate and their oscillation ranges are actually large which explains the instability of UNet training. Moreover, UNet is also provably sensitive to perturbed input, and predicts an output distant from the desired output, yielding oscillatory loss and thus oscillatory gradient. Besides, we also observe the theoretical benefits of the LSC coefficient scaling of UNet in the stableness of hidden features and gradient and also robustness. Finally, inspired by our theory, we propose an effective coefficient scaling framework ScaleLong that scales the coefficients of LSC in UNet and better improve the training stability of UNet. Experimental results on four famous datasets show that our methods are superior to stabilize training, and yield about 1.5× training acceleration on different diffusion models with UNet or UViT backbones.

Discipline

OS and Networks

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the 37th Conference on Neural Information Processing, New Orleans, United States, December 12-14

First Page

Last Page

Publisher

NeurIPS

City or Country

New Orleans

Citation

HUANG, Zhongzhan; ZHOU, Pan; YAN, Shuicheng; and LIN, Liang. ScaleLong: Towards more stable training of diffusion model via scaling network long skip connection. (2023). Proceedings of the 37th Conference on Neural Information Processing, New Orleans, United States, December 12-14. 1-26.
Available at: https://ink.library.smu.edu.sg/sis_research/9025

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://openreview.net/forum?id=0N73P8pH2l

Download

Included in

OS and Networks Commons

COinS

Research Collection School Of Computing and Information Systems

ScaleLong: Towards more stable training of diffusion model via scaling network long skip connection

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

ScaleLong: Towards more stable training of diffusion model via scaling network long skip connection

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links