Research Collection School Of Computing and Information Systems

Teaching diffusion models to ground alpha matte

Tianyi XIANG
Weiying ZHENG
Yutao JIANG
Tingrui SHEN
Hewei YU
Yangyang XU
Shengfeng HE, Singapore Management UniversityFollow

Publication Type

Journal Article

Version

publishedVersion

Publication Date

10-2025

Abstract

The power of visual language models is showcased in visual understanding tasks, where language-guided models achieve impressive flexibility and precision. In this paper, we ex tend this capability to the challenging domain of image matting by framing it as a soft grounding problem, enabling a single diffusion model to handle diverse objects, textures, and transparencies, all directed by descriptive text prompts. Our method teaches the diffusion model to ground alpha mattes by guiding it through a process of instance-level localization and transparency estimation. First, we introduce an intermediate objective that trains the model to accurately localize semantic components of the matte based on natural language cues, establishing a robust spatial foundation. Building on this, the model progressively refines its transparency estimation abilities, using the learned semantic structure as a prior to enhance the precision of alpha matte predictions. By treating spatial localization and transparency estimation as distinct learning objectives, our approach allows the model to fully leverage the semantic depth of diffusion models, removing the need for rigid visual pri ors. Extensive experiments highlight our model’s adaptability, precision, and computational efficiency, setting a new benchmark for flexible, text-driven image matting solutions. The code is available at https://github.com/xty435768/TeachDiffusionMatting.

Discipline

Graphics and Human Computer Interfaces | Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

Transactions on Machine Learning Research

First Page

Last Page

Publisher

JMLR

Citation

XIANG, Tianyi; ZHENG, Weiying; JIANG, Yutao; SHEN, Tingrui; YU, Hewei; XU, Yangyang; and HE, Shengfeng. Teaching diffusion models to ground alpha matte. (2025). Transactions on Machine Learning Research. 1-29.
Available at: https://ink.library.smu.edu.sg/sis_research/10514

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://openreview.net/pdf?id=2gNy9Yeg8J

Download

Included in

Graphics and Human Computer Interfaces Commons, Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

Teaching diffusion models to ground alpha matte

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

First Page

Last Page

Publisher

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Teaching diffusion models to ground alpha matte

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

First Page

Last Page

Publisher

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links