Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
7-2025
Abstract
Due to the auto-regressive nature of current video large language models (Video-LLMs), the inference latency increases as the input sequence length grows, posing challenges for the efficient processing of video sequences that are usually very long. We observe that during decoding, the attention scores of most tokens in Video-LLMs tend to be sparse and concentrated, with only certain tokens requiring comprehensive full attention. Based on this insight, we introduce Sparse-to-Dense (StD), a novel decoding strategy that integrates two distinct modules: one leveraging sparse top-K attention and the other employing dense full attention. These modules collaborate to accelerate Video-LLMs without loss. The fast (sparse) model speculatively decodes multiple tokens, while the slow (dense) model verifies them in parallel. StD is a tuning-free, plug-and-play solution that achieves up to a 1.94 walltime speedup in video processing. It maintains model performance while enabling a seamless transition from a standard Video-LLM to a sparse Video-LLM with minimal code modifications.
Discipline
Databases and Information Systems
Research Areas
Data Science and Engineering
Areas of Excellence
Digital transformation
Publication
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Vienna, Austria, July 27 - August 1
First Page
734
Last Page
742
Identifier
10.18653/v1/2025.acl-short.59
Publisher
ACL
City or Country
Vienna, Austria
Citation
ZHANG, Xuan; DU, Cunxiao; YU, Sicheng; WU, Jiawei; ZHANG, Fengzhuo; GAO, Wei; and LIU, Qian.
Sparse-to-dense: A free lunch for lossless acceleration of video understanding in LLMs. (2025). Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Vienna, Austria, July 27 - August 1. 734-742.
Available at: https://ink.library.smu.edu.sg/sis_research/10735
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.18653/v1/2025.acl-short.59