Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
12-2011
Abstract
Automatically generating compact textual descriptions of complex video contents has wide applications. With the recent advancements in automatic audio-visual content recognition, in this paper we explore the technical feasibility of the challenging issue of precisely recounting video contents. Based on cutting-edge automatic recognition techniques, we start from classifying a variety of visual and audio concepts in video contents. According to the classification results, we apply simple rule-based methods to generate textual descriptions of video contents. Results are evaluated by conducting carefully designed user studies. We find that the state-of-the-art visual and audio concept classification, although far from perfect, is able to provide very useful clues indicating what is happening in the videos. Most users involved in the evaluation confirmed the informativeness of our machine-generated descriptions.
Keywords
Audio-visual concept classification, Textual descriptions of video content
Discipline
Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces
Research Areas
Intelligent Systems and Optimization
Publication
Proceedings of the 19th ACM International Conference on Multimedia ACM Multimedia 2011, MM'11, Scottsdale, Arizona, November 28 - December 1
First Page
655
Last Page
658
ISBN
9781450306164
Identifier
10.1145/2072298.2072411
Publisher
ACM
City or Country
Scottsdale, Arizona
Citation
TAN, Chun Chet; JIANG, Yu-Gang; and NGO, Chong-wah.
Towards textually describing complex video contents with audio-visual concept classifiers. (2011). Proceedings of the 19th ACM International Conference on Multimedia ACM Multimedia 2011, MM'11, Scottsdale, Arizona, November 28 - December 1. 655-658.
Available at: https://ink.library.smu.edu.sg/sis_research/6489
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Included in
Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons