Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
10-2020
Abstract
This work demonstrates the feasibility and benefits of using pointing gestures, a naturally-generated additional input modality, to improve the multi-modal comprehension accuracy of human instructions to robotic agents for collaborative tasks.We present M2Gestic, a system that combines neural-based text parsing with a novel knowledge-graph traversal mechanism, over a multi-modal input of vision, natural language text and pointing. Via multiple studies related to a benchmark table top manipulation task, we show that (a) M2Gestic can achieve close-to-human performance in reasoning over unambiguous verbal instructions, and (b) incorporating pointing input (even with its inherent location uncertainty) in M2Gestic results in a significant (30%) accuracy improvement when verbal instructions are ambiguous.
Discipline
Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces
Research Areas
Intelligent Systems and Optimization
Publication
ICMI '20: Proceedings of the 2020 International Conference on Multimodal Interaction
First Page
251
Last Page
259
Identifier
10.1145/3382507.3418863
Publisher
ACM
City or Country
New York
Citation
1
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/3382507.3418863
Included in
Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons