Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

5-2025

Abstract

Current voice agents wait for a user to complete their verbal instruction before responding; yet, this is misaligned with how humans engage in everyday conversational interaction, where interlocutors use multimodal signaling (e.g. nodding, grunting, or looking at referred to objects) to ensure conversational grounding. We designed an embodied VR agent that exhibits multimodal signaling behaviors in response to situated prompts, by turning its head, or by visually highlighting objects being discussed or referred to. We explore how people prompt this agent to design and manipulate the objects in a VR scene. Through a Wizard of Oz study, we found that participants interacting with an agent that indicated its understanding of spatial and action references were able to prevent errors 30% of the time, and were more satisfied and confident in the agent’s abilities. These findings underscore the importance of designing multimodal signaling communication techniques for future embodied agents.

Keywords

situated prompting, multimodal signaling, common ground, human-ai collaboration

Discipline

Software Engineering

Research Areas

Software and Cyber-Physical Systems

Areas of Excellence

Digital transformation

Publication

CHI '25: Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, Yokohama Japan, April 26 - May 1

First Page

1

Last Page

25

Identifier

10.1145/3706598.3713110

Publisher

ACM

City or Country

New York

Additional URL

https://doi.org/10.1145/3706598.3713110

Share

COinS