Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

9-2025

Abstract

Large language models (LLMs) enable diverse forms of AI-assisted creation, yet they often struggle to bridge the preference-articulation gap: users may provide incomplete or vague intentions or lack the vocabulary to specify what they want, yielding outputs misaligned with true preferences. To address this gap and facilitate music creation in a vibe-centric environment, we introduce VibeMus, a proactive agentic system built on open-source components. The system engages in multi-turn dialogue to progressively determine the music’s emotion, genre, lyrics, and other aspects before generation. Simulated evaluations show that proactive clarification improves alignment with users’ intended nuances. Our approach is training-free, leveraging an open-source music model, an open-source agentic framework, and publicly available LLM APIs. We release our code, showcase several demos, and provide additional details at https://github.com/tuteng0915/VibeMus.

Keywords

Music Generation, Lyric Generation, Proactive Agentic System, Sound and Music Computing

Discipline

Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

MMAsia ’25: Proceedings of the 7th ACM International Conference on Multimedia in Asia, Kuala Lumpur, Malaysia, December 9-12

First Page

1

Last Page

3

Identifier

10.1145/3743093.3771663

Publisher

ACM

City or Country

New York

Additional URL

https://doi.org/10.1145/3743093.3771663

Share

COinS