Research Collection School Of Computing and Information Systems

Towards textually describing complex video contents with audio-visual concept classifiers

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

12-2011

Abstract

Automatically generating compact textual descriptions of complex video contents has wide applications. With the recent advancements in automatic audio-visual content recognition, in this paper we explore the technical feasibility of the challenging issue of precisely recounting video contents. Based on cutting-edge automatic recognition techniques, we start from classifying a variety of visual and audio concepts in video contents. According to the classification results, we apply simple rule-based methods to generate textual descriptions of video contents. Results are evaluated by conducting carefully designed user studies. We find that the state-of-the-art visual and audio concept classification, although far from perfect, is able to provide very useful clues indicating what is happening in the videos. Most users involved in the evaluation confirmed the informativeness of our machine-generated descriptions.

Keywords

Audio-visual concept classification, Textual descriptions of video content

Discipline

Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Publication

Proceedings of the 19th ACM International Conference on Multimedia ACM Multimedia 2011, MM'11, Scottsdale, Arizona, November 28 - December 1

First Page

655

Last Page

658

ISBN

9781450306164

Identifier

10.1145/2072298.2072411

Publisher

ACM

City or Country

Scottsdale, Arizona

Citation

TAN, Chun Chet; JIANG, Yu-Gang; and NGO, Chong-wah. Towards textually describing complex video contents with audio-visual concept classifiers. (2011). Proceedings of the 19th ACM International Conference on Multimedia ACM Multimedia 2011, MM'11, Scottsdale, Arizona, November 28 - December 1. 655-658.
Available at: https://ink.library.smu.edu.sg/sis_research/6489

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

Towards textually describing complex video contents with audio-visual concept classifiers

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Towards textually describing complex video contents with audio-visual concept classifiers

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Included in

Share

Search

Links

Browse

Links