Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
1-2023
Abstract
We present a multi-modal human instruction comprehension prototype for object acquisition tasks that involve verbal, visual and pointing gesture cues. Our prototype includes an AR smart-glass for issuing the instructions and a Jetson TX2 pervasive device for executing comprehension algorithms. With this setup, we enable on-device, computationally efficient object acquisition task comprehension with an average latency in the range of 150-330msec.
Keywords
Human-AI Collaboration, Multi-Modal Networks, Pervasive Systems, Referring Expression Comprehension, Visual Grounding
Discipline
Software Engineering
Research Areas
Software and Cyber-Physical Systems
Publication
2023 15th International Conference on COMmunication Systems and NETworkS COMSNETS: Bangalore, January 3-8: Proceedings
First Page
231
Last Page
233
ISBN
9781665477062
Identifier
10.1109/COMSNETS56262.2023.10041269
Publisher
IEEE
City or Country
Piscataway, NJ
Citation
WEERAKOON, Mudiyanselage Dulanga Kaveesha; SUBBARAJU, Vigneshwaran; TRAN, Tuan; and MISRA, Archan.
Demonstrating multi-modal human instruction comprehension with AR smart glass. (2023). 2023 15th International Conference on COMmunication Systems and NETworkS COMSNETS: Bangalore, January 3-8: Proceedings. 231-233.
Available at: https://ink.library.smu.edu.sg/sis_research/7797
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/COMSNETS56262.2023.10041269