Research Collection School Of Computing and Information Systems

Integrating symbolic and waveform music into Large Language Models

Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

1-2026

Abstract

Music, as a unique and integral element of human life, is characterized by its complex structures, intricate details, and the fusion of multimodal information. Recent study advance music understanding by leveraging knowledge and reasoning capabilities derived from Large Language Models (LLMs). However, they often lack compatibility and fail to fully utilize the complementary strengths of diverse representations (e.g., ABC, MIDI, Waveform). To address these limitations, we propose a unified music-language model framework, named UniMuLM, transitioning from single-representation approaches to the integration of multiple music representations for LLM. Unifying different music representation formats poses challenges such as patch integrity and boundary ambiguity that arise from temporal discrepancies across these representations. To address these issues, UniMuLM employs a unified encoder that hierarchically aligns representations across multiple granularities, using contrastive learning and cross-reconstruction training to support coherent integration. Fine-tuned in multiple stages on open-source datasets, UniMuLM demonstrates the potential to handle dual-representation inputs. Notably, it achieves performance competitive with specialized waveform-only models on music understanding tasks, while surpassing open-source baselines in downstream applications such as music knowledge answering and ABC melody completion.

Keywords

Multimodal Language Model, Music Language Model, Music Understanding, Sound and Music Computing

Discipline

Artificial Intelligence and Robotics | Databases and Information Systems | Music

Research Areas

Data Science and Engineering

Publication

Multimedia Modeling: 32nd International Conference on Multimedia Modeling, MMM 2026, Prague, Czech Republic, January 29-31, Proceedings

First Page

Last Page

103

ISBN

9789819569564

Identifier

10.1007/978-981-95-6957-1_7

Publisher

Springer

City or Country

Cham

Citation

TU, Teng; LIU, Xiaohao; MA, Yunshan; QI, Ji; and CHUA, Tat-Seng. Integrating symbolic and waveform music into Large Language Models. (2026). Multimedia Modeling: 32nd International Conference on Multimedia Modeling, MMM 2026, Prague, Czech Republic, January 29-31, Proceedings. 89-103.
Available at: https://ink.library.smu.edu.sg/sis_research/11026

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1007/978-981-95-6957-1_7

Download

Included in

Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons, Music Commons

COinS

Research Collection School Of Computing and Information Systems

Integrating symbolic and waveform music into Large Language Models

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Integrating symbolic and waveform music into Large Language Models

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links