Publication Type

Journal Article

Version

acceptedVersion

Publication Date

1-2024

Abstract

The most important effect of the video hashing technique is to support fast retrieval, which is benefiting from the high efficiency of binary calculation. Current video hash approaches are thus mainly targeted at learning compact binary codes to represent video content accurately. However, they may overlook the generation efficiency for hash codes, i.e., designing lightweight neural networks. This paper proposes an method, which is not only for computing compact hash codes but also for designing a lightweight deep model. Specifically, we present an MLP-based model, where the video tensor is split into several groups and multiple axial contexts are explored to separately refine them in parallel. The axial contexts are referred to as the dynamics aggregated from different axial scales, including long/middle/short-range dependencies. The group operation significantly reduces the computational cost of the MLP backbone. Moreover, to achieve compact video hash codes, three structural losses are utilized. As demonstrated by the experiment, the three structures are highly complementary for approximating the real data structure. We conduct extensive experiments on three benchmark datasets for the unsupervised video hashing task and show the superior trade-off between performance and computational cost of our EUVH to the state of the arts.

Keywords

Codes, Computational modeling, Context modeling, Data Structure, Data structures, Deep Neural Network, Feature extraction, Hash functions, Large-scale retrieval, Transformers, Video hashing

Discipline

Graphics and Human Computer Interfaces | Numerical Analysis and Scientific Computing

Research Areas

Software and Cyber-Physical Systems

Publication

IEEE Transactions on Multimedia

First Page

Last Page

ISSN

1520-9210

Identifier

10.1109/TMM.2024.3368924

Publisher

Institute of Electrical and Electronics Engineers

Citation

DUAN, Jingru; HAO, Yanbin; ZHU, Bin; CHENG, Lechao; ZHOU, Pengyuan; and WANG, Xiang. Efficient unsupervised video hashing with contextual modeling and structural controlling. (2024). IEEE Transactions on Multimedia. 1-13.
Available at: https://ink.library.smu.edu.sg/sis_research/8723

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1109/TMM.2024.3368924

Download

Included in

Graphics and Human Computer Interfaces Commons, Numerical Analysis and Scientific Computing Commons

COinS

Research Collection School Of Computing and Information Systems

Efficient unsupervised video hashing with contextual modeling and structural controlling

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Efficient unsupervised video hashing with contextual modeling and structural controlling

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links