Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
10-2025
Abstract
3Dvisual grounding aims to identify and localize objects in a 3Dspacebasedontextualdescriptions. However, existing methods struggle with disentangling targets from anchors in complex multi-anchor queries and resolving inconsisten cies in spatial descriptions caused by perspective variations. To tackle these challenges, we propose ViewSRD, a frame work that formulates 3D visual grounding as a structured multi-view decomposition process. First, the Simple Rela tion Decoupling (SRD) module restructures complex multi anchor queries into a set of targeted single-anchor state ments, generating a structured set of perspective-aware de scriptions that clarify positional relationships. These de composed representations serve as the foundation for the Multi-view Textual-Scene Interaction (Multi-TSI) module, which integrates textual and scene features across multi ple viewpoints using shared, Cross-modal Consistent View Tokens (CCVTs) to preserve spatial correlations. Finally, a Textual-Scene Reasoning module synthesizes multi-view predictions into a unified and robust 3D visual grounding. Experiments on 3D visual grounding datasets show that ViewSRD significantly outperforms state-of-the-art meth ods, particularly in complex queries requiring precise spa tial differentiation. Code is available at https://github. com/visualjason/ViewSRD.
Discipline
Graphics and Human Computer Interfaces | Software Engineering
Research Areas
Software and Cyber-Physical Systems
Publication
2025 International Conference on Computer Vision ICCV: Honolulu, October 19-21: Proceedings
First Page
1
Last Page
11
Publisher
IEEE
City or Country
Pistacataway
Citation
HUANG, Ronggang; YANG, Haoxin; CAI, Yan; XU, Xuemiao; ZHANG, Huaidong; and HE, Shengfeng.
ViewSRD: 3D visual grounding via structured multi-view decomposition. (2025). 2025 International Conference on Computer Vision ICCV: Honolulu, October 19-21: Proceedings. 1-11.
Available at: https://ink.library.smu.edu.sg/sis_research/10513
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://openaccess.thecvf.com/content/ICCV2025/papers/Huang_ViewSRD_3D_Visual_Grounding_via_Structured_Multi-View_Decomposition_ICCV_2025_paper.pdf