Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
10-2022
Abstract
Supporting real-time referring expression comprehension (REC) on pervasive devices is an important capability for human-AI collaborative tasks. Model pruning techniques, applied to DNN models, can enable real-time execution even on resource-constrained devices. However, existing pruning strategies are designed principally for uni-modal applications, and suffer a significant loss of accuracy when applied to REC tasks that require fusion of textual and visual inputs. We thus present a multi-modal pruning model, LGMDP, which uses language as a pivot to dynamically and judiciously select the relevant computational blocks that need to be executed. LGMDP also introduces a new SoftSkip mechanism, whereby 'skipped' visual scales are not completely eliminated but approximated with minimal additional computation. Experimental evaluation, using 3 benchmark REC datasets and an embedded device implementation, shows that LGMDP can achieve 33% latency savings, with an accuracy loss 0.5% - 2%.
Keywords
Human-Robot Interaction, Referring Expression Comprehension, Pruning, Computer Vision, Natural Language Processing
Discipline
Computer Engineering | Software Engineering
Research Areas
Software and Cyber-Physical Systems
Publication
MM '22: Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, October 10-14
First Page
3608
Last Page
3616
ISBN
9781450392037
Identifier
10.1145/3503161.3548432
Publisher
ACM
City or Country
New York
Citation
WEERAKOON, Dulanga; SUBBARAJU, Vigneshwaran; TRAN, Tuan; and MISRA, Archan.
SoftSkip: Empowering multi-modal dynamic pruning for single-stage referring comprehension. (2022). MM '22: Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, October 10-14. 3608-3616.
Available at: https://ink.library.smu.edu.sg/sis_research/7707
Copyright Owner and License
Publisher
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/3503161.3548432