Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
11-2022
Abstract
We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. In total, we publiclyrelease 272K manual semantic masks of 257 object classes, 9.9M interpolated dense masks, 67K hand-object relations, covering 36 hours of 179 untrimmed videos. Along with the annotations, we introduce three challenges in video object segmentation, interaction understanding and long-term reasoning.For data, code and leaderboards: http://epic-kitchens.github.io/VISOR
Discipline
Graphics and Human Computer Interfaces
Research Areas
Intelligent Systems and Optimization
Areas of Excellence
Digital transformation
Publication
Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks, Virtual Conference, 2022 November 28
First Page
1
Last Page
14
City or Country
New Orleans, USA
Citation
DAR KHALIL, Ahmad AK; SHAN, Dandan; ZHU, Bin; MA, Jian; KAR, Amlan; HIGGINS, Richard; FOUHEY, David; FIDLER, Sanja; and DAMEN, Dima.
EPIC-KITCHENS VISOR benchmark: Video segmentations and object relations. (2022). Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022) Track on Datasets and Benchmarks, Virtual Conference, 2022 November 28. 1-14.
Available at: https://ink.library.smu.edu.sg/sis_research/9013
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.