Research Collection School Of Computing and Information Systems

A computational aesthetic design science study on online video based on triple-dimensional multimodal analysis

Zhangguang KANG
Fiona Fui-hoon NAH, Singapore Management UniversityFollow
Keng SIAU, Singapore Management UniversityFollow

Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

7-2024

Abstract

Computational video aesthetic prediction refers to using models that automatically evaluate the features of videos to produce their aesthetic scores. Current video aesthetic prediction models are designed based on bimodal frameworks. To address their limitations, we developed the Triple-Dimensional Multimodal Temporal Video Aesthetic neural network (TMTVA-net) model. The Long Short-Term Memory (LSTM) forms the conceptual foundation for the design framework. In the multimodal transformer layer, we employed two distinct transformers: the multimodal transformer and the feature transformer, enabling the acquisition of modality-specific patterns and representational features uniquely adapted to each modality. The fusion layer has also been redesigned to compute both pairwise interactions and overall interactions among the features. This study contributes to the video aesthetic prediction literature by considering the synergistic effects of textual, audio, and video features. This research presents a novel design framework that considers the combined effects of multimodal features.

Keywords

Computational Video Aesthetic, Multimodal Analysis, Neural Network, Design Science

Discipline

Databases and Information Systems | Graphics and Human Computer Interfaces

Research Areas

Data Science and Engineering

Publication

HCI International 2024: Late breaking papers: Washington, DC, June 29 - July 4

Volume

15380

First Page

Last Page

ISBN

9783031768217

Identifier

10.1007/978-3-031-76821-7_6

Publisher

Springer

City or Country

Cham

Citation

KANG, Zhangguang; NAH, Fiona Fui-hoon; and SIAU, Keng. A computational aesthetic design science study on online video based on triple-dimensional multimodal analysis. (2024). HCI International 2024: Late breaking papers: Washington, DC, June 29 - July 4. 15380, 68-79.
Available at: https://ink.library.smu.edu.sg/sis_research/9962

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1007/978-3-031-76821-7_6

Download

Included in

Databases and Information Systems Commons, Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

A computational aesthetic design science study on online video based on triple-dimensional multimodal analysis

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

A computational aesthetic design science study on online video based on triple-dimensional multimodal analysis

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links