Third Joint Egocentric Vision Workshop

Challenges

We welcome submissions to the challenges from February to May (see important dates) through the leaderboards linked below. Participants to the challenges are requested to submit a technical report on their method. This is a requirement for the competition. Reports should be 2-6 pages including references. Submissions should use the CVPR format and should be submitted through the CMT website.

HoloAssist Challenges

HoloAssist is a large-scale egocentric human interaction dataset, where two people collaboratively complete physical manipulation tasks.

Action Recognition

Lead:Taein Kwon, ETH Zurich, Switzerland
Summary:Action Recognition on the holoassist dataset. Input could be RGB images or multiple modalities.

Challenge Link (coming soon)

Mistake Detection

Lead:Mahdi Rad, Microsoft, Switzerland
Summary:Mistake detection is defined following the convention Assembly101 but applied to fine-grained actions in our benchmark. We take the features from the fine-grained action clips from the beginning of the coarse-grained action until the end of the current action clip, and the model predicts a label from {correct, mistake}.
Challenge Link (coming soon)

Ego4D Challenges

Ego4D is a massive-scale, egocentric dataset and benchmark suite collected across 74 worldwide locations and 9 countries, with over 3,670 hours of daily-life activity video. Please find details below on our challenges:

Ego4D Episodic Memory

Track: Visual Queries

Lead: Suyog Jain, Meta, US
Summary:Given an egocentric video, the goal is to answer queries of the form "Where did I last see object X?", where the query object X is specified as a static image.

Challenge Link (coming soon)

Ego4D Episodic Memory

Track: Natural Language Queries

Lead: Suyog Jain, Meta, US
Summary:Given an egocentric video V and a natural language query Q, the goal is to identify a response track r, such that the answer to Q can be deduced from r.

Challenge Link (coming soon)

Ego4D Episodic Memory

Track: Moment Queries

Lead:Chen Zhao, KAUST, SA
Summary: Given an input video and a query action category, the goal is to retrieve all the instances of this action category in the video.
Current SOTA: Paper
Previous Winner: 34.99

Challenge Link (coming soon)

Ego4D Episodic Memory

Track: Goal Step

Lead: Yale Song, Meta, US
Summary: Given an untrimmed egocentric video, identify the temporal action segment corresponding to a natural language description of the step. Specifically, predict the (start_time, end_time) for a given keystep description.
Current SOTA: Paper
Previous Winner: 35.18 r@1, IoU=0.3

Challenge Link (coming soon)

Ego4D Episodic Memory

Track: EgoSchema

Lead: Karttikeya Mangalam, UC Berkeley, US
Summary: EgoSchema is a very long-form video question-answering dataset and benchmark to evaluate long video understanding capabilities of modern vision and language systems.
Current SOTA: 0.75 (report unavailable)
Previous Winner: N/A

Challenge Link (coming soon)

Ego4D Social Interaction

Track: Looking at me

Lead: Xizi Wang, Indiana University, US
Summary:The task focuses on identifying communicative acts that are directed towards the camera-wearer, as distinguished from those directed to other social partners

Challenge Link (coming soon)

Ego4D Social Interaction

Track: Talking to me

Lead:Xizi Wang, Indiana University, US
Summary:Given a video and audio segment with the same tracked faces and an additional label that identifies speaker status, classify whether each visible face is talking to the camera wearer.

Challenge Link (coming soon)

Ego4D Forecasting

Track: Short-term object interaction anticipation

Lead:Francesco Ragusa, University of Catania, IT
Summary:This task aims to predict the next human-object interaction happening after a given timestamp. Given an input video, the goal is to anticipate 1)the spatial positions of the active objects, 2) the category of each detected next active objects, 3) how each active object will be used (verb), 4) and when the interaction will begin.
Current SOTA:Paper 1; Paper 2
Previous Winner: Top-5 Overall mAP: 7.21

Challenge Link (coming soon)

Ego4D Forecasting

Track: Long-term action anticipation

Lead: Tushar Nagarajan, FAIR, US
Summary: This task aims to predict the next Z future actions after a given action. Given an input video up to a particular timestep (corresponding to the last visible action), the goal is to predict a list of action classes [(verb1, noun1), (verb2, noun2) ... (verbZ, nounZ)] that follow it.
Current SOTA: Paper
Previous Winner: N/A

Challenge Link (coming soon)

Ego-Exo4D Challenges

Ego-Exo4D is a diverse, large-scale multi-modal multi view video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured ego- centric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair).

EgoExo4D Pose Challenge

Track: Ego-Pose Body

Lead: Juanita Puentes Mozo, Los Andes
Summary: The EgoExo4D Body Pose Challenge aims to accurately estimate body pose using only first-person raw video and/or egocentric camera pose.
Current SOTA: EgoCast (MPJPE: 14.36)
Previous Winner: MPJPE: 15.32

Challenge Link (coming soon)

EgoExo4D Pose Challenge

Track: Ego-Pose Hands

Lead:Shan Shu, University of Pennsylvania, US
Summary:
Current SOTA:
Previous Winner:

Challenge Link (coming soon)

EgoExo4D Proficiency Estimation

Track: Demonstrator Proficiency

Lead: Arjun Somayazulu, UT Austin, US
Summary: Given synchronized egocentric and exocentric video of a demonstrator performing a task, classify the proficiency skill level of the demonstrator.
Current SOTA: EgoExo4D benchmark baseline
Previous Winner: N/A

Challenge Link (coming soon)

EgoExo4D Keysteps

Track: Fine-grained Keystep Recognition

Lead: Sherry Xue, UT Austin, US
Summary:
Current SOTA:
Previous Winner:

Challenge Link (coming soon)

EgoExo4D Relations

Track: Correspondence

Lead: Sanjay Haresh, Simon Fraser, Canada
Summary: The challenge is aimed at methods for object correspondences across ego-centric and exo-centric views. Given a pair of time-synchronized egocentric and exocentric videos, as well as a query object track in one of the views, the goal is to output the corresponding mask for the same object instance in the other view for all frames where the object is visible in both views.
Current SOTA: Paper
Previous Winner: N/A

Challenge Link (coming soon)

EgoExo4D Keysteps

Track: Procedure Understanding

Lead:Luigi Seminara, University of Catania, IT
Summary:The objective of this task is to infer a procedure's underlying structure from observing natural videos of subjects performing the procedure.
Current SOTA:
Previous Winner:

Challenge Link (coming soon)

EPIC-Kitchens Challenges

Please check the EPIC-KITCHENS website for more information on the EPIC-KITCHENS challenges. Links to individual challenges are also reported below.

Action Recognition

Lead: Prajwal Gatti, University of Bristol, UK
Summary: Classify the action's verb and noun depicted in a trimmed video clip.
Current SOTA: Paper
Previous Winner: 48.1% - top 1 / 77.4% - top 5

Challenge Link (coming soon)

Action Detection

Lead: Francesco Ragusa, University of Catania, IT
Summary: The challenge requires detecting and recognising all action instances within an untrimmed video. The challenge will be carried out on the EPIC-KITCHENS-100 dataset.
Current SOTA: Results
Previous Winner: Action Avg. mAP 31.97

Challenge Link (coming soon)

Domain Adaptation Challenge for Action Recognition

Lead: Saptarshi Sinha, University of Bristol, UK
Summary: Given labelled videos from the source domain and unlabelled videos from the target domain, the goal is to classify actions in the target domain. An action is defined as a verb and noun depicted in a trimmed video clip.
Current SOTA: Paper
Previous Winner: 43.17 for action accuracy

Challenge Link (coming soon)

Multi-Instance Retrieval

Lead: Prajwal Gatti, University of Bristol, UK
Summary: Perform cross-modal retrieval by searching between vision and text modalities.
Current SOTA: Paper
Previous Winner: Normalised Discounted Cumulative Gain (%) Avg. - 74.25

Challenge Link (coming soon)

Semi-Supervised Video-Object Segmentation

Lead: Rhodri Guerrier, University of Bristol, UK
Summary:Given a sub-sequence of frames with M object masks in the first frame, the goal of this challenge is to segment these through the remaining frames. Other objects not present in the first frame of the sub-sequence are excluded from this benchmark.
Current SOTA:Webpage

Challenge Link (coming soon)

EPIC-SOUNDS Audio-Based Interaction Recognition

Lead: Omar Emara, University of Bristol, UK
Summary: Recognising interactions from audio data from EPIC-Sounds (classify the audio).
Current SOTA: User: JMCarrot
Previous Winner: N/A

Challenge Link (coming soon)

EPIC-SOUNDS Audio-Based Interaction Detection

Lead: Omar Emara, University of Bristol, UK
Summary: Classify all audio-based interactions (recognition) from audio data of EPIC-Sounds and predict their start and end times for a given video.
Current SOTA: User: shuming
Previous Winner: N/A

Challenge Link (coming soon)

HD-EPIC Challenge

Please check the HD-EPIC website for more information on the HD-EPIC challenges. Links to individual challenges are also reported below.

HD-EPIC Challenges - VQA

Lead: Prajwal Gatti, University of Bristol, UK
Summary: Given a question belonging to any one of the seven types defined in the HD-EPIC VQA benchmark, the goal is to predict the correct answer among the five listed choices.
Current SOTA: Gemini Pro
Previous Winner: N/A

Challenge Link (coming soon)

Call for Papers

You are invited to submit papers to the third edition of joint egocentric vision workshop which will be held alongside CVPR 2026 in Denver.

These papers represent original work and will be published as part of proceedings alongside CVPR. We welcome all works that focus within the Egocentric Domain, it is not necessary to use the datasets in this workshop within your work. We expect a submission may contain one or more of the following topics (this is a non-exhaustive list):

Egocentric vision for human activity analysis and understanding, including action recognition, action detection, audio-visual action perception and object state change detection
Egocentric vision for anticipating human behaviour, actions and objects
Egocentric vision for 3D perception and interaction, including dynamic scene reconstruction, hand-object reconstruction, long-term object tracking, NLQ and visual queries, long-term video understanding
Head-mounted eye tracking and gaze estimation including attention modelling and next fixation prediction
Egocentric vision for object/event recognition and retrieval
Egocentric vision for summarization
Daily life and activity monitoring
Egocentric vision for human skill learning, assistance, and robotics
Egocentric vision for social interaction and human behaviour understanding
Privacy and ethical concerns with wearable sensors and egocentric vision
Egocentric vision for health and social good
Symbiotic human-machine vision systems, human-wearable devices interaction
Interactive AR/VR and Egocentric online/real-time perception

Presentation Guidelines

All accepted papers will be presented as posters. The guidelines for the posters are the same as at the main conference.

Submission Instruction

Submit your papers through CMT: https://cmt3.research.microsoft.com/EgoVis2026/
We are following the CVPR paper format: https://cvpr.thecvf.com/Conferences/2026/AuthorGuidelines
LaTeX/Word Templates can be found here: CVPR 2026 Paper Template
We accept full-length (max 8 pages) submissions, excluding references
All the submissions will be peer-reviewed by at least two reviewers
Blind review: we adopt double-blind review for this workshop. Submitted papers and supplementary materials should not reveal any information about the author
Dual submission: We do not accept paper submissions that have been published or are under review for other conferences or workshops. Accepted papers are expected to be published at CVPR proceedings
In submitting a manuscript to EgoVis Workshop, the authors agree to the review process and agree to contribute with the reviewing process

Call for Abstracts

You are invited to submit extended abstracts to the third edition of joint egocentric vision workshop which will be held alongside CVPR 2026 in Denver.

These abstracts represent existing or ongoing work and will not be published as part of any proceedings. We welcome all works that focus within the Egocentric Domain, it is not necessary to use the Ego4D dataset within your work. We expect a submission may contain one or more of the following topics (this is a non-exhaustive list):

Egocentric vision for human activity analysis and understanding, including action recognition, action detection, audio-visual action perception and object state change detection
Egocentric vision for anticipating human behaviour, actions and objects
Egocentric vision for 3D perception and interaction, including dynamic scene reconstruction, hand-object reconstruction, long-term object tracking, NLQ and visual queries, long-term video understanding
Head-mounted eye tracking and gaze estimation including attention modelling and next fixation prediction
Egocentric vision for object/event recognition and retrieval
Egocentric vision for summarization
Daily life and activity monitoring
Egocentric vision for human skill learning, assistance, and robotics
Egocentric vision for social interaction and human behaviour understanding
Privacy and ethical concerns with wearable sensors and egocentric vision
Egocentric vision for health and social good
Symbiotic human-machine vision systems, human-wearable devices interaction
Interactive AR/VR and Egocentric online/real-time perception

Format

The length of the extended abstracts is 2-4 pages, including figures, tables, and references. We invite submissions of ongoing or already published work, as well as reports on demonstrations and prototypes. The joint egocentric vision workshop gives opportunities for authors to present their work to the egocentric community to provoke discussion and feedback. Accepted work will be presented as either an oral presentation (either virtual or in-person) or as a poster presentation. The review will be single-blind, so there is no need to anonymize your work, but otherwise will follow the format of the CVPR submissions, information can be found here. Accepted abstracts will not be published as part of a proceedings, so can be uploaded to ArXiv etc. and the links will be provided on the workshop’s webpage. The submission will be managed with the CMT website.

Paper Deadline (on CMT)	27 Feb 2026
Paper Notifications to Authors	3 April 2026
Camera Ready Deadline (on CMT)	7 April 2026
Challenges Leaderboards Open	Feb 2026
Challenges Leaderboards Close	13 May 2026
Challenges Technical Reports Deadline (on CMT)	20 May 2026
Notification to Challenge Winners	27 May 2026
Challenge Reports ArXiv Deadline	1 June 2026
Extended Abstract Deadline (on CMT)	27 April 2026
Extended Abstract Notification to Authors	18 May 2026
Extended Abstracts ArXiv Deadline	25 May 2026
Workshop Date	TBD

Time	Event
08:45-09:00	Welcome and Introductions
09:00-09:30	Invited Keynote 1: Marc Pollefeys, ETH Zurich, Switzerland
09:30-10:00	Oral Presentations (Group 1)
10:00-10:45	Coffee Break and First Poster Session
10:45-11:15	Invited Keynote 2: Saurabh Gupta, University of Illinois, USA
11:15-12:15	Challenges and Winning Solutions
12:15-12:45	Invited Keynote 3: Jawahar C V, IIIT Hyderabad, India
12:45-13:30	Lunch Break
13:30-14:00	EgoVis Distinguished Papers Award
14:00-14:30	Invited Keynote 4: Lorenzo Torresani, Northeastern University, USA
14:30-15:00	Oral Presentations (Group 2)
15:00-15:30	Invited Keynote 5: Hazel Doughty, Leiden University, Netherlands
15:30-16:15	Coffee Break and Second Poster Session
16:15-16:45	Invited Keynote 6: Ziwei Liu, Nanyang Technological University, Singapore
16:45-17:15	Panel Discussion
17:15-17:30	Conclusion

Third Joint Egocentric Vision (EgoVis) Workshop

Held in Conjunction with CVPR 2026

3/4 June 2026 - Denver, CO, USA

Room: TBD

Overview

HoloAssist

Ego-Exo4D

Ego4D

EPIC-Kitchens

HD-EPIC

EgoCross

Challenges

HoloAssist Challenges

Ego4D Challenges

Ego-Exo4D Challenges

EPIC-Kitchens Challenges

HD-EPIC Challenge

Call for Papers

Presentation Guidelines

Submission Instruction

Call for Abstracts

Format

Important Dates

Program

Invited Speakers

Workshop Organisers

Co-organizing Advisors

Related Past Events

Acknowledgements