First Joint Egocentric Vision (EgoVis) Workshop

Held in Conjunction with CVPR 2024

17 June 2024 - Seattle, USA

This joint workshop aims to be the focal point for the egocentric computer vision community to meet and discuss progress in this fast growing research area, addressing egocentric vision in a comprehensive manner including key research challenges in video understanding, multi-modal data, interaction learning, self-supervised learning, AR/VR with applications to cognitive science and robotics.


Wearable cameras, smart glasses, and AR/VR headsets are gaining importance for research and commercial use. They feature various sensors like cameras, depth sensors, microphones, IMUs, and GPS. Advances in machine perception enable precise user localization (SLAM), eye tracking, and hand tracking. This data allows understanding user behavior, unlocking new interaction possibilities with augmented reality. Egocentric devices may soon automatically recognize user actions, surroundings, gestures, and social relationships. These devices have broad applications in assistive technology, education, fitness, entertainment, gaming, eldercare, robotics, and augmented reality, positively impacting society.

Previously, research in this field faced challenges due to limited datasets in a data-intensive environment. However, the community's recent efforts have addressed this issue by releasing numerous large-scale datasets covering various aspects of egocentric perception, including HoloAssist, Aria Digital Twin, Aria Synthetic Environments, Ego4D, Ego-Exo4D, and EPIC-KITCHENS.

The goal of this workshop is to provide an exciting discussion forum for researchers working in this challenging and fast-growing area, and to provide a means to unlock the potential of data-driven research with our datasets to further the state-of-the-art.


We welcome submissions to the challenges from March to May (see important dates) through the leaderboards linked below. Participants to the challenges are are requested to submit a technical report on their method. This is a requirement for the competition. Reports should be 2-6 pages including references. Submissions should use the CVPR format and should be submitted through the CMT website.

HoloAssist Challenges

Challenge ID Challenge Name Challenge Lead Challenge Link
1 Action Recognition Mahdi Rad, Microsoft, Switzerland Link
2 Mistake Detection Ishani Chakraborty, Microsoft, US Link
3 Intervention Type Prediction Taein Kwon, ETH Zurich, Switzerland Link

Aria Digital Twin Challenges

Challenge ID Challenge Name Challenge Lead Challenge Link
1 Few-shots 3D Object detection & tracking Xiaqing Pan, Meta, US Link
2 3D Object detection & tracking Xiaqing Pan, Meta, US Link

Aria Synthetic Environments Challenges

Challenge ID Challenge Name Challenge Lead Challenge Link
1 Scene Reconstruction using structured language Vasileios Baltnas, Meta, UK Link

Ego4D Challenges

Ego4D is a massive-scale, egocentric dataset and benchmark suite collected across 74 worldwide locations and 9 countries, with over 3,670 hours of daily-life activity video. Please find details below on our challenges:

Challenge ID Challenge Name Challenge Lead Challenge Link
1 Visual Queries 2D Santhosh Kumar Ramakrishnan, University of Texas, Austin, US Link
2 Visual Queries 3D Vincent Cartillier, Georgia Tech, US Link
3 Natural Language Queries Satwik Kottur, Meta, US Link
4 Moment Queries Chen Zhao & Merey Ramazanova, KAUST, SA Link
5 EgoTracks Hao Tang & Weiyao Wang, Meta, US Link
6 Goal Step Yale Song, Meta, US Link
7 Ego Schema Karttikeya Mangalam, Raiymbek Akshulakov, UC Berkeley, US Link
8 PNR temporal localization Yifei Huang, University of Tokyo, JP Link
9 Localization and Tracking Hao Jiang, Meta, US Link
10 Speech Transcription Leda Sari Jachym Kolar & Vamsi Krishna Ithapu, Meta Reality Labs, US Link
11 Looking at me Eric Zhongcong Xu, National University of Singapore, Singapore Link
12 Short-term Anticipation Francesco Ragusa, University of Catania, IT Link
13 Long-term Anticipation Tushar Nagarajan, FAIR, US Link

Ego-Exo4D Challenges

Ego-Exo4D is a diverse, large-scale multi-modal multi view video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured ego- centric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair).

Challenge ID Challenge Name Challenge Lead Challenge Link
1 Ego-Pose Body Pablo Arbelaez & Maria Camila Escobar Palomeque, Universidad de los Andes Colombia Link
2 Ego-Pose Hands Jianbo Shi, Shan Shu, University of Pennsylvania, US Link

EPIC-Kitchens Challenges

Please check the EPIC-KITCHENS website for more information on the EPIC-KITCHENS challenges. Links to individual challenges are also reported below.

Challenge ID Challenge Name Challenge Lead Challenge Link
1 Action Recognition Jacob Chalk, University of Bristol, UK Link
2 Action Anticipation Antonino Furnari and Francesco Ragusa University of Catania, IT Link
3 Action Detection Francesco Ragusa and Antonino Furnari, University of Catania, IT Link
4 Domain Adaptation for Action Recognition Toby Perrett, University of Bristol, UK Link
5 Multi-Instance Retrieval Michael Wray, University of Bristol, UK Link
6 Semi-Supervised Video-Object Segmentation Ahmad Dar Khalil, University of Bristol, UK Link
7 Hand-Object Segmentation Dandan Shan, University of Michigan, US Link
8 EPIC-SOUNDS Audio-Based Interaction Recognition Jacob Chalk, University of Bristol, UK Link
9 TREK-150 Object Tracking Matteo Dunnhofer, University of Udine, IT Link

Call for Abstracts

You are invited to submit extended abstracts to the first edition of joint egocentric vision workshop which will be held alongside CVPR 2024 in Seattle.

These abstracts represent existing or ongoing work and will not be published as part of any proceedings. We welcome all works that focus within the Egocentric Domain, it is not necessary to use the Ego4D dataset within your work. We expect a submission may contain one or more of the following topics (this is a non-exhaustive list):


The length of the extended abstracts is 2-4 pages, including figures, tables, and references. We invite submissions of ongoing or already published work, as well as reports on demonstrations and prototypes. The 1st joint egocentric vision workshop gives opportunities for authors to present their work to the egocentric community to provoke discussion and feedback. Accepted work will be presented as either an oral presentation (either virtual or in-person) or as a poster presentation. The review will be single-blind, so there is no need to anonymize your work, but otherwise will follow the format of the CVPR submissions, information can be found here. Accepted abstracts will not be published as part of a proceedings, so can be uploaded to ArXiv etc. and the links will be provided on the workshop’s webpage. The submission will be managed with the CMT website.

Important Dates

Challenges Leaderboards Open Mar 2024
Challenges Leaderboards Close 30 May 2024
Challenges Technical Reports Deadline (on CMT) 5 June 2024 (23:59 PT)
Extended Abstract Deadline 10 May 2024 (23:59 PT)
Extended Abstract Notification to Authors 29 May 2024
Extended Abstracts ArXiv Deadline 12 June 2024
Workshop Date 17 June 2024


All dates are local to Seattle's time, PST.
Workshop Location: Room TBD

A tentative programme is shown below.

Time Event
08:45-09:00 Welcome and Introductions
09:00-09:30 Invited Keynote 1: Takeo Kanade, Carnegie Mellon University, US
09:30-10:20 HoloAssist Challenges
10:20-11:20 Coffee Break and Poster Session
11:20-11:50 Invited Keynote 2: Diane Larlus, Naver Labs Europe and MIAI Grenoble, FR
11:50-12:40 EPIC-KITCHENS Challenges
12:40-13:40 Lunch Break
13:40-14:10 EgoVis 2022/2023 Distinguished paper Awards
14:10-14:40 Invited Keynote 3: Michael C. Frank & Bria Long, Stanford University, US
14:40-15:30 Aria Digital Twin & Synthetic Environments Challenges
15:30-16:00 Coffee Break
16:00-16:30 Invited Keynote 4: Fernando de La Torre, Carnegie Mellon University, US
16:30-17:40 Ego4D Challenges
17:40-18:10 Invited Keynote 5: Jim Rehg, University of Illinois Urbana-Champaign, US
18:10-18:15 Conclusion

Invited Speakers

Takeo Kanade

Carnegie Mellon University, USA

Jim Rehg

University of Illinois Urbana-Champaign, USA

Diane Larlus

Naver Labs Europe and MIAI Grenoble

Fernando De la Torre

Carnegie Mellon University, USA

Michael C. Frank

Stanford University, USA

Bria Long

University of California, San Diego, USA

Workshop Organisers

Antonino Furnari

University of Catania

Angela Yao

National University of Singapore

Xin Wang

Microsoft Research

Tushar Nagarajan

FAIR, Meta

Huiyu Wang

FAIR, Meta

Jing Dong


Jakob Engel

FAIR, Meta

Siddhant Bansal

University of Bristol

Takuma Yagi

National Institute of Advanced Industrial Science and Technology

Co-organizing Advisors

Dima Damen

University of Bristol

Giovanni Maria Farinella

University of Catania

Kristen Grauman

UT Austin

Jitendra Malik

UC Berkeley

Richard Newcombe

Reality Labs Research

Marc Pollefeys

ETH Zurich

Yoichi Sato

University of Tokyo

David Crandall

Indiana University

Related Past Events

This workshop follows the footsteps of the following previous events:

EPIC-Kitchens and Ego4D Past Workshops:

Human Body, Hands, and Activities from Egocentric and Multi-view Cameras Past Workshops:

Project Aria Past Tutorials: