Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

8-2025

Abstract

Quantizing large language models (LLMs) is essential for reducing memory and computational costs in natural language processing. Existing methods combine quantization with parameter-efficient fine-tuning but often fail to meet practical performance requirements. This paper introduces MeMoTune, a novel fine-tuning framework for quantized LLMs. By employing a measure and moment approach within a low-rank approximation framework in probability measure space, MeMoTune optimizes the objective function for superior fine-tuning results. The update process is further refined through scaled gradient, enhancing convergence efficiency and noise robustness. Experiments on tasks like text generation, summarization, and understanding show MeMoTune significantly outperforms state-of-the-art methods, e.g. fine-tuning Llama2-13B on GSM8K improves accuracy by 5.5%, while fine-tuning DeBERTaV3-base on CoLA of GLUE increases Matthews correlation by 1.7%. The code is publicly available at: https://github.com/hddyyyb/MeMoTune.

Discipline

Artificial Intelligence and Robotics | Programming Languages and Compilers

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Vienna, Austria, July 27 - August 1

First Page

4036

Last Page

4050

Identifier

10.18653/v1/2025.findings-acl.208

Publisher

ACL

City or Country

Austria

Additional URL

https://doi.org/10.18653/v1/2025.findings-acl.208

Share

COinS