Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
7-2024
Abstract
Computational video aesthetic prediction refers to using models that automatically evaluate the features of videos to produce their aesthetic scores. Current video aesthetic prediction models are designed based on bimodal frameworks. To address their limitations, we developed the Triple-Dimensional Multimodal Temporal Video Aesthetic neural network (TMTVA-net) model. The Long Short-Term Memory (LSTM) forms the conceptual foundation for the design framework. In the multimodal transformer layer, we employed two distinct transformers: the multimodal transformer and the feature transformer, enabling the acquisition of modality-specific patterns and representational features uniquely adapted to each modality. The fusion layer has also been redesigned to compute both pairwise interactions and overall interactions among the features. This study contributes to the video aesthetic prediction literature by considering the synergistic effects of textual, audio, and video features. This research presents a novel design framework that considers the combined effects of multimodal features.
Keywords
Computational Video Aesthetic, Multimodal Analysis, Neural Network, Design Science
Discipline
Databases and Information Systems | Graphics and Human Computer Interfaces
Research Areas
Data Science and Engineering
Publication
HCI International 2024: Late breaking papers: Washington, DC, June 29 - July 4
Volume
15380
First Page
68
Last Page
79
ISBN
9783031768217
Identifier
10.1007/978-3-031-76821-7_6
Publisher
Springer
City or Country
Cham
Citation
KANG, Zhangguang; NAH, Fiona Fui-hoon; and SIAU, Keng.
A computational aesthetic design science study on online video based on triple-dimensional multimodal analysis. (2024). HCI International 2024: Late breaking papers: Washington, DC, June 29 - July 4. 15380, 68-79.
Available at: https://ink.library.smu.edu.sg/sis_research/9962
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1007/978-3-031-76821-7_6
Included in
Databases and Information Systems Commons, Graphics and Human Computer Interfaces Commons