Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

6-2024

Abstract

In this paper, we delve into a novel aspect of learning novel diffusion conditions with datasets an order of magnitude smaller. The rationale behind our approach is the elimination of textual constraints during the few-shot learning process. To that end, we implement two optimization strategies. The first, prompt-free conditional learning, utilizes a prompt-free encoder derived from a pre-trained Stable Diffusion model. This strategy is designed to adapt new conditions to the diffusion process by minimizing the textual-visual cor-relation, thereby ensuring a more precise alignment between the generated content and the specified conditions. The second strategy entails condition-specific negative rectification, which addresses the inconsistencies typically brought about by Classifier-free guidance in few-shot training con-texts. Our extensive experiments across a variety of condition modalities demonstrate the effectiveness and efficiency of our framework, yielding results comparable to those obtained with datasets a thousand times larger.

Keywords

Prompt-free conditional learning, Conditional negative rectification, Training, Computer vision, Adaptation models, Codes, Text to image, Diffusion processes, Diffusion model, Image synthesis, Controllable image generation

Discipline

Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces

Research Areas

Data Science and Engineering; Intelligent Systems and Optimization

Publication

Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024) : Seattle, WA, USA, June 16-22

First Page

7109

Last Page

7118

Identifier

10.1109/CVPR52733.2024.00679

Publisher

IEEE

City or Country

Seattle, USA

Additional URL

https://doi.org/10.1109/CVPR52733.2024.00679

Share

COinS