Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

1-2023

Abstract

We present a multi-modal human instruction comprehension prototype for object acquisition tasks that involve verbal, visual and pointing gesture cues. Our prototype includes an AR smart-glass for issuing the instructions and a Jetson TX2 pervasive device for executing comprehension algorithms. With this setup, we enable on-device, computationally efficient object acquisition task comprehension with an average latency in the range of 150-330msec.

Keywords

Human-AI Collaboration, Multi-Modal Networks, Pervasive Systems, Referring Expression Comprehension, Visual Grounding

Discipline

Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

2023 15th International Conference on COMmunication Systems and NETworkS COMSNETS: Bangalore, January 3-8: Proceedings

First Page

231

Last Page

233

ISBN

9781665477062

Identifier

10.1109/COMSNETS56262.2023.10041269

Publisher

IEEE

City or Country

Piscataway, NJ

Additional URL

https://doi.org/10.1109/COMSNETS56262.2023.10041269

Share

COinS