Research Collection School Of Computing and Information Systems

HD-EPIC: A highly-detailed egocentric video dataset

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

6-2025

Abstract

We present a validation dataset of newly-collected kitchenbased egocentric videos, manually annotated with highly detailed and interconnected ground-truth labels covering: recipe steps, fine-grained actions, ingredients with nutritional values, moving objects, and audio annotations. Importantly, all annotations are grounded in 3D through digital twinning of the scene, fixtures, object locations, and primed with gaze. Footage is collected from unscripted recordings in diverse home environments, making HDEPIC the first dataset collected in-the-wild but with detailed annotations matching those in controlled lab environments. We show the potential of our highly-detailed annotations through a challenging VQA benchmark of 26K questions assessing the capability to recognise recipes, ingredients, nutrition, fine-grained actions, 3D perception, object motion, and gaze direction. The powerful long-context Gemini Pro only achieves 37.6% on this benchmark, showcasing its difficulty and highlighting shortcomings in current VLMs. We additionally assess action recognition, sound recognition, and long-term video-object segmentation on HD-EPIC. HD-EPIC is 41 hours of video in 9 kitchens with digital twins of 413 kitchen fixtures, capturing 69 recipes, 59K finegrained actions, 51K audio events, 20K object movements and 37K object masks lifted to 3D. On average, we have 263 annotations per minute of our unscripted videos.

Discipline

Databases and Information Systems | Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, June 10-17

First Page

23901

Last Page

23913

Identifier

10.1109/CVPR52734.2025.02226

Publisher

IEEE

City or Country

Piscataway, NJ

Citation

PERRETT, Toby; DARKHALIL, Ahmad; SINHA, Saptarshi; EMARA, Omar; POLLARD, Sam; PARIDA, Kranti Kumar; LIU, Kaiting; GATTI, Prajwal; BANSAL, Siddhant; FLANAGAN, Kevin; CHALK, Jacob; ZHU, Zhifan; GUERRIER, Rhodri; ABDELAZIM, Fahd; Bin ZHU; MOLTISANTI, Davide; WRAY, Michael; DOUGHTY, Hazel; and DAMEN, Dima. HD-EPIC: A highly-detailed egocentric video dataset. (2025). Proceedings of the 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, June 10-17. 23901-23913.
Available at: https://ink.library.smu.edu.sg/sis_research/10368

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1109/CVPR52734.2025.02226

Download

Included in

Databases and Information Systems Commons, Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

HD-EPIC: A highly-detailed egocentric video dataset

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

HD-EPIC: A highly-detailed egocentric video dataset

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links