Research Collection School Of Computing and Information Systems

ObjectFusion: Multi-modal 3D object detection with object-centric fusion

Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

10-2023

Abstract

Recent progress on multi-modal 3D object detection has featured BEV (Bird-Eye-View) based fusion, which effectively unifies both LiDAR point clouds and camera images in a shared BEV space. Nevertheless, it is not trivial to perform camera-to-BEV transformation due to the inherently ambiguous depth estimation of each pixel, resulting in spatial misalignment between these two multi-modal features. Moreover, such transformation also inevitably leads to projection distortion of camera image features in BEV space. In this paper, we propose a novel Object-centric Fusion (ObjectFusion) paradigm, which completely gets rid of camera-to-BEV transformation during fusion to align object-centric features across different modalities for 3D object detection. ObjectFusion first learns three kinds of modality-specific feature maps (i.e., voxel, BEV, and image features) from LiDAR point clouds and its BEV projections, camera images. Then a set of 3D object proposals are produced from the BEV features via a heatmap-based proposal generator. Next, the 3D object proposals are reprojected back to voxel, BEV, and image spaces. We leverage voxel and RoI pooling to generate spatially aligned object-centric features for each modality. All the object-centric features of three modalities are further fused at object level, which is finally fed into the detection heads. Extensive experiments on nuScenes dataset demonstrate the superiority of our ObjectFusion, by achieving 69.8% mAP on nuScenes validation set and improving BEVFusion by 1.3%.

Keywords

3D object detection, Multi-modal, Fusion-based approach

Discipline

Artificial Intelligence and Robotics | Robotics

Research Areas

Intelligent Systems and Optimization

Publication

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2023, October 4-6

First Page

18067

Last Page

18076

City or Country

Paris

Citation

CAI, Q.; PAN, Y.; YAO, T.; NGO, Chong-wah; and MEI, T.. ObjectFusion: Multi-modal 3D object detection with object-centric fusion. (2023). Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2023, October 4-6. 18067-18076.
Available at: https://ink.library.smu.edu.sg/sis_research/8306

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

Artificial Intelligence and Robotics Commons, Robotics Commons

COinS

Research Collection School Of Computing and Information Systems

ObjectFusion: Multi-modal 3D object detection with object-centric fusion

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

City or Country

Citation

Creative Commons License

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

ObjectFusion: Multi-modal 3D object detection with object-centric fusion

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

City or Country

Citation

Creative Commons License

Included in

Share

Search

Links

Browse

Links