Publication Type

Journal Article

Version

publishedVersion

Publication Date

1-2026

Abstract

Vision-and-Language Navigation in continuous environments (VLN-CE) requires an embodied robot to navigate the target destination following the natural language instruction. Most existing methods use panoramic RGB-D cameras for 360° observation of environments. However, these methods struggle in real-world applications because of the higher cost of panoramic RGB-D cameras. This paper studies a low-cost and practical VLN-CE setting, e.g., using monocular cameras of limited field of view, which means “Look Less” for visual observations and environment semantics. In this paper, we propose a ThinkMatter framework for monocular VLN-CE, where we motivate monocular robots to “Think More” by 1) generating novel views and 2) integrating instruction semantics. Specifically, we achieve the former by the proposed 3DGS-based panoramic generation to render novel views at each step, based on past observation collections. We achieve the latter by the proposed enhancement of the occupancy-instruction semantics, which integrates the spatial semantics of occupancy maps with the textual semantics of language instructions. These operations promote monocular robots with wider environment perceptions as well as transparent semantic connections with the instruction. Both extensive experiments in the simulators and real-world environments demonstrate the effectiveness of ThinkMatter, providing a promising practice for real-world navigation.

Keywords

vision-and-language navigation, panoramic view synthesis, semantic map learning

Discipline

Artificial Intelligence and Robotics | Databases and Information Systems

Research Areas

Data Science and Engineering; Intelligent Systems and Optimization

Publication

IEEE Transactions on Image Processing

Volume

74

First Page

875

Last Page

903

ISSN

1057-7149

Identifier

10.1109/TIP.2026.3652003

Publisher

Institute of Electrical and Electronics Engineers

Additional URL

https://doi.org/10.1109/TIP.2026.3652003

Share

COinS