Zeroth-order fine-tuning of LLMs in random subspaces
Publication Type
Conference Proceeding Article
Publication Date
10-2025
Abstract
Fine-tuning Large Language Models (LLMs) has proven effective for a variety of downstream tasks. However, as LLMs grow in size, the memory demands for backpropagation become increasingly prohibitive. Zeroth-order (ZO) optimization methods offer a memory-efficient alternative by using forward passes to estimate gradients, but the variance of gradient estimates typically scales linearly with the model's parameter dimension–a significant issue for LLMs. In this paper, we propose the random Subspace Zeroth-order (SubZero) optimization to address the challenges posed by LLMs' high dimensionality. We introduce a low-rank perturbation tailored for LLMs that significantly reduces memory consumption while improving training performance. Additionally, we prove that our gradient estimation closely approximates the backpropagation gradient, exhibits lower variance than traditional ZO methods, and ensures convergence when combined with SGD. Experimental results show that SubZero enhances fine-tuning performance and achieves faster convergence compared to standard ZO approaches like MeZO across various language modeling tasks. Code is available at this https URL.
Discipline
Artificial Intelligence and Robotics
Research Areas
Intelligent Systems and Optimization
Publication
Proceedings of the 2025 International Conference on Computer Vision, ICCV, Honolulu, HawaiI, October 19-23
First Page
1
Last Page
27
Identifier
10.48550/arXiv.2410.08989
City or Country
Honolulu, HI, USA
Citation
YU, Ziming; ZHOU, Pan; WANG, Sike; LI, Jia; TIAN, Mi; and HUANG, Hua.
Zeroth-order fine-tuning of LLMs in random subspaces. (2025). Proceedings of the 2025 International Conference on Computer Vision, ICCV, Honolulu, HawaiI, October 19-23. 1-27.
Available at: https://ink.library.smu.edu.sg/sis_research/10519
Additional URL
https://doi.org/10.48550/arXiv.2410.08989