Publication Type

Journal Article

Version

acceptedVersion

Publication Date

3-2022

Abstract

We focus on the music generation conditional on human emotions, specifically the positive and negative emotions. There is no existing large-scale music datasets with the annotation of human emotion labels. It is thus not intuitive how to generate music conditioned on emotion labels. In this paper, we propose an annotation-free method to build a new dataset where each sample is a triplet of lyric, melody and emotion label (without requiring any labours). Specifically, we first train the automated emotion recognition model using the BERT (pre-trained on GoEmotions dataset) on Edmonds Dance dataset. We use it to automatically ‘`label’' the music with the emotion labels recognized from the lyrics. We then train the encoder-decoder based model to generate emotional music on that dataset, and call our overall method as Emotional Lyric and Melody Generator (ELMG). The framework of ELMG is consisted of three modules: 1) an encoder-decoder model trained end-to-end to generate lyric and melody; 2) a music emotion classifier trained on labeled data (our proposed dataset); and 3) a modified beam search algorithm that guides the music generation process by incorporating the music emotion classifier. We conduct objective and subjective evaluations on the generated music pieces, and our results show that ELMG is capable of generating tuneful lyric and melody with specified human emotions.

Keywords

Conditional Music Generation, Seq2Seq, Beam Search, Transformer

Discipline

Databases and Information Systems | Music

Research Areas

Data Science and Engineering

Publication

IEEE Transactions on Multimedia

First Page

1

Last Page

14

ISSN

1520-9210

Identifier

10.1109/TMM.2022.3163543

Publisher

Institute of Electrical and Electronics Engineers

Additional URL

http://doi.org/10.1109/TMM.2022.3163543

Share

COinS