Publication Type
Journal Article
Version
acceptedVersion
Publication Date
11-2024
Abstract
Intelligent virtual agents are used to accomplish complex multi-modal tasks such as human instruction comprehension in mixed-reality environments by increasingly adopting richer, energy-intensive sensors and processing pipelines. In such applications, the context for activating sensors and processing blocks required to accomplish a given task instance is usually manifested via multiple sensing modes. Based on this observation, we introduce a novel Commit-and-Switch ( CAS ) paradigm that simultaneously seeks to reduce both sensing and processing energy. In CAS , we first commit to a low-energy computational pipeline with a subset of available sensors. Then, the task context estimated by this pipeline is used to optionally switch to another energy-intensive DNN pipeline and activate additional sensors. We demonstrate how CAS's paradigm of interweaving DNN computation and sensor triggering can be instantiated principally by constructing multi-head DNN models and jointly optimizing the accuracy and sensing costs associated with different heads. We exemplify CAS via the development of the RealGIN-MH model for multi-modal target acquisition tasks, a core enabler of immersive human-agent interaction. RealGIN-MH achieves 12.9x reduction in energy overheads, while outperforming baseline dynamic model optimization approaches.
Keywords
Deep Learning for Visual Perception, Embedded Systems for Robotic and Automation, Human-Robot Collaboration, RGB-D Perception, Vision and Sensor-Based Control
Discipline
Artificial Intelligence and Robotics
Research Areas
Intelligent Systems and Optimization
Publication
IEEE Robotics and Automation Letters
Volume
9
Issue
11
First Page
10057
Last Page
10064
ISSN
2377-3766
Identifier
10.1109/LRA.2024.3469813
Publisher
Institute of Electrical and Electronics Engineers
Citation
WEERAKOON, Dulanga; SUBBARAJU, Vigneshwaran; LIM, Joo Hwee; and MISRA, Archan.
CAS: Fusing DNN optimization & adaptive sensing for energy-efficient multi-modal inference. (2024). IEEE Robotics and Automation Letters. 9, (11), 10057-10064.
Available at: https://ink.library.smu.edu.sg/sis_research/9360
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/LRA.2024.3469813