Publication Type

Journal Article

Version

acceptedVersion

Publication Date

11-2024

Abstract

Intelligent virtual agents are used to accomplish complex multi-modal tasks such as human instruction comprehension in mixed-reality environments by increasingly adopting richer, energy-intensive sensors and processing pipelines. In such applications, the context for activating sensors and processing blocks required to accomplish a given task instance is usually manifested via multiple sensing modes. Based on this observation, we introduce a novel Commit-and-Switch ( CAS ) paradigm that simultaneously seeks to reduce both sensing and processing energy. In CAS , we first commit to a low-energy computational pipeline with a subset of available sensors. Then, the task context estimated by this pipeline is used to optionally switch to another energy-intensive DNN pipeline and activate additional sensors. We demonstrate how CAS's paradigm of interweaving DNN computation and sensor triggering can be instantiated principally by constructing multi-head DNN models and jointly optimizing the accuracy and sensing costs associated with different heads. We exemplify CAS via the development of the RealGIN-MH model for multi-modal target acquisition tasks, a core enabler of immersive human-agent interaction. RealGIN-MH achieves 12.9x reduction in energy overheads, while outperforming baseline dynamic model optimization approaches.

Keywords

Deep Learning for Visual Perception, Embedded Systems for Robotic and Automation, Human-Robot Collaboration, RGB-D Perception, Vision and Sensor-Based Control

Discipline

Artificial Intelligence and Robotics

Research Areas

Intelligent Systems and Optimization

Publication

IEEE Robotics and Automation Letters

Volume

9

Issue

11

First Page

10057

Last Page

10064

ISSN

2377-3766

Identifier

10.1109/LRA.2024.3469813

Publisher

Institute of Electrical and Electronics Engineers

Copyright Owner and License

Authors

Additional URL

https://doi.org/10.1109/LRA.2024.3469813

Share

COinS