Publication Type

Journal Article

Version

publishedVersion

Publication Date

2-2025

Abstract

To efficiently train large-scale models, low-bit gradient communication compresses full-precision gradients on local GPU nodes into low-precision ones for higher gradient synchronization efficiency among GPU nodes. However, it often degrades training quality due to compression information loss. To address this, we propose the Low-bit Communication Adaptor (LoCo), which compensates gradients on local GPU nodes before compression, ensuring efficient synchronization without compromising training quality. Specifically, LoCo designs a moving average of historical compensation errors to stably estimate concurrent compression error and then adopts it to compensate for the concurrent gradient compression, yielding a less lossless compression. This mechanism allows it to be compatible with general optimizers like Adam and sharding strategies like FSDP. Theoretical analysis shows that integrating LoCo into full-precision optimizers like Adam and SGD does not impair their convergence speed on nonconvex problems. Experimental results show that across large-scale model training frameworks like Megatron-LM and PyTorch’s FSDP, LoCo significantly improves communication efficiency, e.g., improving Adam’s training speed by 14% to 40% without performance degradation on large language models like LLAMAs and MoEs.

Keywords

Efficient Large-Scale Training, Large-Scale Optimization, Deep Learning Optimization

Discipline

Artificial Intelligence and Robotics

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

IEEE Transactions on Pattern Analysis and Machine Intelligence

Volume

Issue

First Page

4285

Last Page

4298

ISSN

0162-8828

Identifier

10.1109/TPAMI.2025.3544764

Publisher

Institute of Electrical and Electronics Engineers

Citation

XIE, Xingyu; LIN, Zhijie; TOH, Kim-chuan; and ZHOU, Pan. LoCo: Low-bit communication adaptor for large-scale model training. (2025). IEEE Transactions on Pattern Analysis and Machine Intelligence. 47, (16), 4285-4298.
Available at: https://ink.library.smu.edu.sg/sis_research/10457

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1109/TPAMI.2025.3544764

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Research Collection School Of Computing and Information Systems

LoCo: Low-bit communication adaptor for large-scale model training

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

LoCo: Low-bit communication adaptor for large-scale model training

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links