Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
11-2018
Abstract
Recent video captioning methods have made great progress by deep learning approaches with convolutional neural networks (CNN) and recurrent neural networks (RNN). While there are techniques that use memory networks for sentence decoding, few work has leveraged on the memory component to learn and generalize the temporal structure in video. In this paper, we propose a new method, namely Generalized Video Memory (GVM), utilizing a memory model for enhancing video description generation. Based on a class of self-organizing neural networks, GVM’s model is able to learn new video features incrementally. The learned generalized memory is further exploited to decode the associated sentences using RNN. We evaluate our method on the YouTube2Text data set using BLEU and METEOR scores as a standard benchmark. Our results are shown to be competitive against other state-of-the-art methods.
Keywords
Memory model, Video captioning, Deep learning, Adaptive Resonance Theory, LSTM, CNN
Discipline
Databases and Information Systems | Software Engineering
Research Areas
Data Science and Engineering
Publication
Multi-disciplinary International Conference on Artificial Intelligence: 12th International Conference: MIWAI 2018, Hanoi, Vietnam, November 18-20: Proceedings
Volume
11248 LNAI
First Page
187
Last Page
201
ISBN
9783030030131
Identifier
10.1007/978-3-030-03014-8_16
Publisher
Springer
City or Country
Cham
Citation
CHANG, Poo-Hee and TAN, Ah-hwee.
Learning generalized video memory for automatic video captioning. (2018). Multi-disciplinary International Conference on Artificial Intelligence: 12th International Conference: MIWAI 2018, Hanoi, Vietnam, November 18-20: Proceedings. 11248 LNAI, 187-201.
Available at: https://ink.library.smu.edu.sg/sis_research/6076
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1007/978-3-030-03014-8_16