Research Collection School Of Computing and Information Systems

Instance-level video depth in groups beyond occlusions

Yuan LIANG
Yang ZHOU, Singapore Management University
Ziming SUN
Tianyi XIANG
Guiqing LI
Shengfeng HE, Singapore Management UniversityFollow

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

12-2025

Abstract

Depth estimation in dynamic, multi-object scenes remains a major challenge, especially under severe occlusions. Existing monocular models, including foundation models, struggle with instance-wise depth consistency due to their reliance on global regression. We tackle this problem from two key aspects: data and methodology. First, we introduce the Group Instance Depth (GID) dataset, the first large-scale video depth dataset with instance-level annotations, featuring 101,500 frames from real-world activity scenes. GID bridges the gap between synthetic and real-world depth data by providing high-fidelity depth supervision for multi-object interactions. Second, we propose InstanceDepth, the first occlusion-aware depth estimation framework for multi-object environments. Our twostage pipeline consists of (1) Holistic Depth Initialization, which assigns a coarse scene-level depth structure, and (2) Instance-Aware Depth Rectification, which refines instancewise depth using object masks, shape priors, and spatial relationships. By enforcing geometric consistency across occlusions, our method sets a new state-of-the-art on the GID dataset and multiple benchmarks. Our code and dataset can be found at https://github.com/ViktorLiang/ GID.

Discipline

Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the 2025 IEEE/CVF International Conference on Computer Vision (ICCV), Honolulu, Hawaii, October 19-23

First Page

7581

Last Page

7591

City or Country

USA

Citation

LIANG, Yuan; ZHOU, Yang; SUN, Ziming; XIANG, Tianyi; LI, Guiqing; and HE, Shengfeng. Instance-level video depth in groups beyond occlusions. (2025). Proceedings of the 2025 IEEE/CVF International Conference on Computer Vision (ICCV), Honolulu, Hawaii, October 19-23. 7581-7591.
Available at: https://ink.library.smu.edu.sg/sis_research/10798

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

Instance-level video depth in groups beyond occlusions

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

City or Country

Citation

Creative Commons License

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Instance-level video depth in groups beyond occlusions

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

City or Country

Citation

Creative Commons License

Included in

Share

Search

Links

Browse

Links