Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
7-2003
Abstract
The concept of a documentary scene was inferred from the audio-visual characteristics of certain documentary videos. It was observed that the amount of information from the visual component alone was not enough to convey a semantic context to most portions of these videos, but a joint observation of the visual component and the audio component conveyed a better semantic context. From the observations that we made on the video data, we generated an audio score and a visual score. We later generated a weighted audio-visual score within an interval and adaptively expanded or shrunk this interval until we found a local maximum score value. The video ultimately will be divided into a set of intervals that correspond to the documentary scenes in the video. After we obtained a set of documentary scenes, we made a check for any redundant detections.
Keywords
Gaussian Mixture Model, Semantic Context, Visual Score, Scene Change, Shot Boundary
Discipline
Computer Sciences | Graphics and Human Computer Interfaces
Research Areas
Intelligent Systems and Optimization
Publication
Image and Video Retrieval: 2nd International Conference, CIVR 2003, Urbana-Champaign, IL, July 24-25: Proceedings
Volume
2728
First Page
227
Last Page
237
ISBN
9783540451136
Identifier
10.1007/3-540-45113-7_23
Publisher
Springer
City or Country
Cham
Citation
VELIVELLI, Atulya; NGO, Chong-Wah; and HUANG, Thomas S..
Detection of documentary scene changes by audio-visual fusion. (2003). Image and Video Retrieval: 2nd International Conference, CIVR 2003, Urbana-Champaign, IL, July 24-25: Proceedings. 2728, 227-237.
Available at: https://ink.library.smu.edu.sg/sis_research/6532
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1007/3-540-45113-7_23