Publication Type
Journal Article
Version
publishedVersion
Publication Date
2-2026
Abstract
The widespread use of Large Language Models (LLMs) in software engineering has intensified the need for improved model and resource efficiency. In particular, for neural code generation, LLMs are used to translate function/method signature and DocString to executable code. DocStrings, which capture user requirements for the code and are typically used as the prompt for LLMs, often contain redundant information. Recent advancements in prompt compression have shown promising results in Natural Language Processing (NLP), but their applicability to code generation remains uncertain. Our empirical study shows that the state-ofthe-art prompt compression methods achieve only about 10% reduction, as further reductions would cause significant performance degradation. In our study, we propose a novel compression method, ShortenDoc, dedicated to DocString compression for code generation. Our experiments on six code generation datasets, five open source LLMs (1B to 10B parameters), and one closed-source LLM GPT-4o confirm that ShortenDoc achieves 25-40% compression while preserving the quality of generated code, outperforming other baseline methods at similar compression levels. The benefit of this method is to improve efficiency and reduce the token processing cost while maintaining the quality of the generated code, especially when calling third-party APIs.
Keywords
DocString Compression, Code Generation, Large Language Model
Discipline
Software Engineering
Research Areas
Software and Cyber-Physical Systems
Publication
ACM Transactions on Software Engineering and Methodology
Volume
35
Issue
2
First Page
1
Last Page
31
ISSN
1049-331X
Identifier
10.1145/3735636
Publisher
Association for Computing Machinery (ACM)
Citation
YANG, Guang; ZHOU, Yu; CHENG, Wei; ZHANG, Xiangyu; CHEN, Xiang; ZHUO, Terry Yue; ZHOU, Xin; LIU, Ke; David LO; and CHEN, Taolue.
Less is more: DocString compression in code generation. (2026). ACM Transactions on Software Engineering and Methodology. 35, (2), 1-31.
Available at: https://ink.library.smu.edu.sg/sis_research/11020
Copyright Owner and License
Authors-CC-BY
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/3735636