Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

11-2018

Abstract

Recent video captioning methods have made great progress by deep learning approaches with convolutional neural networks (CNN) and recurrent neural networks (RNN). While there are techniques that use memory networks for sentence decoding, few work has leveraged on the memory component to learn and generalize the temporal structure in video. In this paper, we propose a new method, namely Generalized Video Memory (GVM), utilizing a memory model for enhancing video description generation. Based on a class of self-organizing neural networks, GVM’s model is able to learn new video features incrementally. The learned generalized memory is further exploited to decode the associated sentences using RNN. We evaluate our method on the YouTube2Text data set using BLEU and METEOR scores as a standard benchmark. Our results are shown to be competitive against other state-of-the-art methods.

Keywords

Memory model, Video captioning, Deep learning, Adaptive Resonance Theory, LSTM, CNN

Discipline

Databases and Information Systems | Software Engineering

Research Areas

Data Science and Engineering

Publication

Multi-disciplinary International Conference on Artificial Intelligence: 12th International Conference: MIWAI 2018, Hanoi, Vietnam, November 18-20: Proceedings

Volume

11248 LNAI

First Page

187

Last Page

201

ISBN

9783030030131

Identifier

10.1007/978-3-030-03014-8_16

Publisher

Springer

City or Country

Cham

Additional URL

https://doi.org/10.1007/978-3-030-03014-8_16

Share

COinS