Research Collection School Of Computing and Information Systems

ThinkMatter: Panoramic-aware instructional semantics for monocular vision-and-language navigation

Publication Type

Journal Article

Version

acceptedVersion

Publication Date

1-2026

Abstract

Vision-and-Language Navigation in continuous environments (VLN-CE) requires an embodied robot to navigate the target destination following the natural language instruction. Most existing methods use panoramic RGB-D cameras for 360° observation of environments. However, these methods struggle in real-world applications because of the higher cost of panoramic RGB-D cameras. This paper studies a low-cost and practical VLN-CE setting, e.g., using monocular cameras of limited field of view, which means “Look Less” for visual observations and environment semantics. In this paper, we propose a ThinkMatter framework for monocular VLN-CE, where we motivate monocular robots to “Think More” by 1) generating novel views and 2) integrating instruction semantics. Specifically, we achieve the former by the proposed 3DGS-based panoramic generation to render novel views at each step, based on past observation collections. We achieve the latter by the proposed enhancement of the occupancy-instruction semantics, which integrates the spatial semantics of occupancy maps with the textual semantics of language instructions. These operations promote monocular robots with wider environment perceptions as well as transparent semantic connections with the instruction. Both extensive experiments in the simulators and real-world environments demonstrate the effectiveness of ThinkMatter, providing a promising practice for real-world navigation.

Keywords

vision-and-language navigation, panoramic view synthesis, semantic map learning

Discipline

Artificial Intelligence and Robotics | Databases and Information Systems

Research Areas

Data Science and Engineering

Publication

IEEE Transactions on Image Processing

Volume

First Page

1937

Last Page

1950

ISSN

1057-7149

Identifier

10.1109/TIP.2026.3652003

Publisher

Institute of Electrical and Electronics Engineers

Citation

DAI, Guangzhao; WANG, Shuo; ZHAO, Hao; ZHU, Bin; SUN, Qianru; and SHU, Xiangbo. ThinkMatter: Panoramic-aware instructional semantics for monocular vision-and-language navigation. (2026). IEEE Transactions on Image Processing. 35, 1937-1950.
Available at: https://ink.library.smu.edu.sg/sis_research/10905

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1109/TIP.2026.3652003

Download

Find it in your library

Included in

Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons

COinS

Research Collection School Of Computing and Information Systems

ThinkMatter: Panoramic-aware instructional semantics for monocular vision-and-language navigation

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

ThinkMatter: Panoramic-aware instructional semantics for monocular vision-and-language navigation

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links