Research Collection School Of Computing and Information Systems

ImageInThat: Manipulating images to convey user instructions to robots

Karthik MAHADEVAN
Blaine LEWIS
Jiannan LI, Singapore Management UniversityFollow
Bilge MUTLU
Anthony TANG, Singapore Management UniversityFollow
Tovi GROSSMAN

Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

3-2025

Abstract

Foundation models are rapidly improving the capability of robots in performing everyday tasks autonomously such as meal preparation, yet robots will still need to be instructed by humans due to model performance, the difficulty of capturing user preferences, and the need for user agency. Robots can be instructed using various methods---natural language conveys immediate instructions but can be abstract or ambiguous, whereas end-user programming supports longer-horizon tasks but interfaces face difficulties in capturing user intent. In this work, we propose using direct manipulation of images as an alternative paradigm to instruct robots, and introduce a specific instantiation called ImageInThat which allows users to perform direct manipulation on images in a timeline-style interface to generate robot instructions. Through a user study, we demonstrate the efficacy of ImageInThat to instruct robots in kitchen manipulation tasks, comparing it to a text-based natural language instruction method. The results show that participants were faster with ImageInThat and preferred to use it over the text-based method. Supplementary material including code can be found at: https://image-in-that.github.io/.

Discipline

Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

HRI '25: Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction, Melbourne, Australia, March 4-6

First Page

757

Last Page

766

Identifier

10.5555/3721488.3721582

Publisher

ACM

City or Country

New York

Citation

MAHADEVAN, Karthik; LEWIS, Blaine; LI, Jiannan; MUTLU, Bilge; TANG, Anthony; and GROSSMAN, Tovi. ImageInThat: Manipulating images to convey user instructions to robots. (2025). HRI '25: Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction, Melbourne, Australia, March 4-6. 757-766.
Available at: https://ink.library.smu.edu.sg/sis_research/10133

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.5555/3721488.3721582

Download

Included in

Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

ImageInThat: Manipulating images to convey user instructions to robots

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

ImageInThat: Manipulating images to convey user instructions to robots

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links