Publication Type

Journal Article

Version

acceptedVersion

Publication Date

1-2024

Abstract

Text-to-3D generation has attracted much attention from the computer vision community. Existing methods mainly optimize a neural field from scratch for each text prompt, relying on heavy and repetitive training cost which impedes their practical deployment. In this paper, we propose a novel framework for fast text-to-3D generation, dubbed Instant3D. Once trained, Instant3D is able to create a 3D object for an unseen text prompt in less than one second with a single run of a feedforward network. We achieve this remarkable speed by devising a new network that directly constructs a 3D triplane from a text prompt. The core innovation of our Instant3D lies in our exploration of strategies to effectively inject text conditions into the network. In particular, we propose to combine three key mechanisms: cross-attention, style injection, and token-to-plane transformation, which collectively ensure precise alignment of the output with the input text. Furthermore, we propose a simple yet effective activation function, the scaled-sigmoid, to replace the original sigmoid function, which speeds up the training convergence by more than ten times. Finally, to address the Janus (multi-head) problem in 3D generation, we propose an adaptive Perp-Neg algorithm that can dynamically adjust its concept negation scales according to the severity of the Janus problem during training, effectively reducing the multi-head effect. Extensive experiments on a wide variety of benchmark datasets demonstrate that the proposed algorithm performs favorably against the state-of-the-art methods both qualitatively and quantitatively, while achieving significantly better efficiency. The code, data, and models are available at https://ming1993li.github.io/Instant3DProj/.

Keywords

Large-scale generative models, Neural radiance fields, Text-to-3D generation

Discipline

Graphics and Human Computer Interfaces | Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

International Journal of Computer Vision

First Page

Last Page

ISSN

0920-5691

Identifier

10.1007/s11263-024-02097-5

Publisher

Springer

Citation

LI, Ming; ZHOU, Pan; LIU, Jia-Wei; KEPPO, Jussi; LIN, Min; YAN, Shuicheng; and XU, Xiangyu. Instant3D: Instant Text-to-3D Generation. (2024). International Journal of Computer Vision. 1-23.
Available at: https://ink.library.smu.edu.sg/sis_research/8816

Copyright Owner and License

Authors CC-BY

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1007/s11263-024-02097-5

Download

Included in

Graphics and Human Computer Interfaces Commons, Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

Instant3D: Instant Text-to-3D Generation

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Instant3D: Instant Text-to-3D Generation

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links