Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

10-2023

Abstract

We tackle the data scarcity challenge in few-shot point cloud recognition of 3D objects by using a joint prediction from a conventional 3D model and a well-pretrained 2D model. Surprisingly, such an ensemble, though seems trivial, has hardly been shown effective in recent 2D-3D models. We find out the crux is the less effective training for the “joint hard samples”, which have high confidence prediction on different wrong labels, implying that the 2D and 3D models do not collaborate well. To this end, our proposed invariant training strategy, called INVJOINT, does not only emphasize the training more on the hard samples, but also seeks the invariance between the conflicting 2D and 3D ambiguous predictions. INVJOINT can learn more collaborative 2D and 3D representations for better ensemble. Extensive experiments on 3D shape classification with widely-adopted ModelNet10/40, ScanObjectNN and Toys4K, and shape retrieval with ShapeNet-Core validate the superiority of our INVJOINT.

Discipline

Databases and Information Systems

Research Areas

Data Science and Engineering

Publication

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, October 2-6

First Page

14463

Last Page

14474

Publisher

IEEE

City or Country

Piscataway, NJ

Copyright Owner and License

Authors

Share

COinS