Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

11-2023

Abstract

This work demonstrates the VGGlass system, which simultaneously interprets human instructions for a target acquisition task and determines the precise 3D positions of both user and the target object. This is achieved by utilizing LiDARs mounted in the infrastructure and a smart glass device worn by the user. Key to our system is the union of LiDAR-based localization termed LiLOC and a multi-modal visual grounding approach termed RealG(2)In-Lite. To demonstrate the system, we use Intel RealSense L515 cameras and a Microsoft HoloLens 2, as the user devices. VGGlass is able to: a) track the user in real-time in a global coordinate system, and b) locate target objects referred by natural language and pointing gestures.

Keywords

Multi-modal interaction, 3D Localization, Visual Grounding

Discipline

Computer Engineering | Graphics and Human Computer Interfaces | OS and Networks

Research Areas

Data Science and Engineering

Areas of Excellence

Digital transformation

Publication

SenSys '23: Proceedings of the 21st ACM Conference on Embedded Networked Sensor Systems, Istanbul Turkiye, November 12-17

First Page

492

Last Page

493

ISBN

9798400704147

Identifier

https://doi.org/10.1145/3625687.362840

Publisher

ACM

City or Country

New York

Copyright Owner and License

Authors

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Additional URL

https://doi.org/10.1145/3625687.3628407

Share

COinS