Research Collection School Of Computing and Information Systems

MVGamba : Unify 3D content generation as state space sequence modeling

Xuanyu YI
Zike WU
Qiuhong SHEN
Qingshan XU
Pan ZHOU, Singapore Management UniversityFollow
Joo-Hwee LIM
Shuicheng YAN
Xinchao WANG
Hanwang ZHANG

Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

12-2024

Abstract

Recent 3D large reconstruction models (LRMs) can generate high-quality 3D content in sub-seconds by integrating multi-view diffusion models with scalable multi-view reconstructors. Current works further leverage 3D Gaussian Splatting as 3D representation for improved visual quality and rendering efficiency. However, we observe that existing Gaussian reconstruction models often suffer from multi-view inconsistency and blurred textures. We attribute this to the compromise of multi-view information propagation in favor of adopting powerful yet computationally intensive architectures (e.g., Transformers). To address this issue, we introduce MVGamba, a general and lightweight Gaussian reconstruction model featuring a multi-view Gaussian reconstructor based on the RNN-like State Space Model (SSM). Our Gaussian reconstructor propagates causal context containing multi-view information for cross-view self-refinement while generating a long sequence of Gaussians for fine-detail modeling with linear complexity. With off-the-shelf multi-view diffusion models integrated, MVGamba unifies 3D generation tasks from a single image, sparse images, or text prompts. Extensive experiments demonstrate that MVGamba outperforms state-of-the-art baselines in all 3D content generation scenarios with approximately only 0.1× of the model size.

Keywords

Large Reconstruction Models, LRMs, Gaussian reconstruction model, 3D content generation

Discipline

Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces

Research Areas

Data Science and Engineering; Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Conference on Neural Information Processing Systems, NeurIPS 2024 Datasets and Benchmarks

Identifier

doi.org/10.48550/arXiv.2406.06367

Publisher

Conference on Neural Information Processing Systems

City or Country

Vancouver

Citation

YI, Xuanyu; WU, Zike; SHEN, Qiuhong; XU, Qingshan; ZHOU, Pan; LIM, Joo-Hwee; YAN, Shuicheng; WANG, Xinchao; and ZHANG, Hanwang. MVGamba : Unify 3D content generation as state space sequence modeling. (2024). Conference on Neural Information Processing Systems, NeurIPS 2024 Datasets and Benchmarks.
Available at: https://ink.library.smu.edu.sg/sis_research/9491

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.48550/arXiv.2406.06367

Download

Included in

Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

MVGamba : Unify 3D content generation as state space sequence modeling

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

MVGamba : Unify 3D content generation as state space sequence modeling

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links