Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
12-2025
Abstract
Depth estimation in dynamic, multi-object scenes remains a major challenge, especially under severe occlusions. Existing monocular models, including foundation models, struggle with instance-wise depth consistency due to their reliance on global regression. We tackle this problem from two key aspects: data and methodology. First, we introduce the Group Instance Depth (GID) dataset, the first large-scale video depth dataset with instance-level annotations, featuring 101,500 frames from real-world activity scenes. GID bridges the gap between synthetic and real-world depth data by providing high-fidelity depth supervision for multi-object interactions. Second, we propose InstanceDepth, the first occlusion-aware depth estimation framework for multi-object environments. Our twostage pipeline consists of (1) Holistic Depth Initialization, which assigns a coarse scene-level depth structure, and (2) Instance-Aware Depth Rectification, which refines instancewise depth using object masks, shape priors, and spatial relationships. By enforcing geometric consistency across occlusions, our method sets a new state-of-the-art on the GID dataset and multiple benchmarks. Our code and dataset can be found at https://github.com/ViktorLiang/ GID.
Discipline
Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces
Research Areas
Intelligent Systems and Optimization
Areas of Excellence
Digital transformation
Publication
Proceedings of the 2025 IEEE/CVF International Conference on Computer Vision (ICCV), Honolulu, Hawaii, October 19-23
First Page
7581
Last Page
7591
City or Country
USA
Citation
LIANG, Yuan; ZHOU, Yangfang; SUN, Ziming; XIANG, Tianyi; LI, Guiqing; and HE, Shengfeng.
Instance-level video depth in groups beyond occlusions. (2025). Proceedings of the 2025 IEEE/CVF International Conference on Computer Vision (ICCV), Honolulu, Hawaii, October 19-23. 7581-7591.
Available at: https://ink.library.smu.edu.sg/sis_research/10798
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Included in
Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons