Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
6-2024
Abstract
In this paper, we delve into a novel aspect of learning novel diffusion conditions with datasets an order of magnitude smaller. The rationale behind our approach is the elimination of textual constraints during the few-shot learning process. To that end, we implement two optimization strategies. The first, prompt-free conditional learning, utilizes a prompt-free encoder derived from a pre-trained Stable Diffusion model. This strategy is designed to adapt new conditions to the diffusion process by minimizing the textual-visual cor-relation, thereby ensuring a more precise alignment between the generated content and the specified conditions. The second strategy entails condition-specific negative rectification, which addresses the inconsistencies typically brought about by Classifier-free guidance in few-shot training con-texts. Our extensive experiments across a variety of condition modalities demonstrate the effectiveness and efficiency of our framework, yielding results comparable to those obtained with datasets a thousand times larger.
Keywords
Prompt-free conditional learning, Conditional negative rectification, Training, Computer vision, Adaptation models, Codes, Text to image, Diffusion processes, Diffusion model, Image synthesis, Controllable image generation
Discipline
Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces
Research Areas
Data Science and Engineering; Intelligent Systems and Optimization
Publication
Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024) : Seattle, WA, USA, June 16-22
First Page
7109
Last Page
7118
Identifier
10.1109/CVPR52733.2024.00679
Publisher
IEEE
City or Country
Seattle, USA
Citation
YU, Yuyang; LIU, Bangzhen; ZHENG, Chenxi; XU, Xuemiao; ZHANG, Huaidong; and HE, Shengfeng.
Beyond textual constraints : Learning novel diffusion conditions with fewer examples. (2024). Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2024) : Seattle, WA, USA, June 16-22. 7109-7118.
Available at: https://ink.library.smu.edu.sg/sis_research/9774
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/CVPR52733.2024.00679
Included in
Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons