Research Collection School Of Computing and Information Systems

Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices

Dulanga Kaveesha WEERAKOON MUDIYANSELAGE, Singapore Management UniversityFollow
Vigneshwaran SUBBARAJU, Institute for High Performance Computing
Joo Hwee LIM, Institute for Infocomm Research (I2R)Follow
Archan Misra, Singapore Management UniversityFollow

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

6-2024

Abstract

As the use of pervasive devices expands into complex collaborative tasks such as cognitive assistants and interactive AR/VR companions, they are equipped with a myriad of sensors facilitating natural interactions, such as voice commands. Spatio-Temporal Video Grounding (STVG), the task of identifying the target object in the field-of-view referred to in a language instruction, is a key capability needed for such systems. However, current STVG models tend to be resource-intensive, relying on multiple cross-attentional transformers applied to each video frame. This results in runtime complexity that increases linearly with video length. Furthermore, deploying these models on mobile devices while maintaining a low-latency poses additional challenges. Hence, this paper explores the latency and energy requirements for implementing STVG models on a pervasive device.

Keywords

Human-AI Collaboration, Spatio-Temporal Video Grounding

Discipline

Computer Engineering

Research Areas

Intelligent Systems and Optimization; Software and Cyber-Physical Systems

Areas of Excellence

Digital transformation

Publication

MOBISYS '24: Proceedings of the 22nd Annual International Conference on Mobile Systems, Minato-ku, Tokyo Japan, 2024 June 3-7

First Page

648

Last Page

649

ISBN

9798400705816

Identifier

https://doi.org/10.1145/3643832.3661402

Publisher

ACM

City or Country

New York

Citation

WEERAKOON MUDIYANSELAGE, Dulanga Kaveesha; SUBBARAJU, Vigneshwaran; LIM, Joo Hwee; and Misra, Archan. Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices. (2024). MOBISYS '24: Proceedings of the 22nd Annual International Conference on Mobile Systems, Minato-ku, Tokyo Japan, 2024 June 3-7. 648-649.
Available at: https://ink.library.smu.edu.sg/sis_research/9219

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Additional URL

https://doi.org/10.1145/3643832.3661402

Download

Included in

Computer Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Poster: Towards efficient spatio-temporal video grounding in pervasive mobile devices

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links