Wearable cameras, smart glasses, and AR/VR headsets are gaining importance for research and commercial use. They feature various sensors like cameras, depth sensors, microphones, IMUs, and GPS. Advances in machine perception enable precise user localization (SLAM), eye tracking, and hand tracking. This data allows understanding user behavior, unlocking new interaction possibilities with augmented reality. Egocentric devices may soon automatically recognize user actions, surroundings, gestures, and social relationships. These devices have broad applications in assistive technology, education, fitness, entertainment, gaming, eldercare, robotics, and augmented reality, positively impacting society.
Previously, research in this field faced challenges due to limited datasets in a data-intensive environment. However, the community's recent efforts have addressed this issue by releasing numerous large-scale datasets covering various aspects of egocentric perception, including HoloAssist, Ego4D, Ego-Exo4D, EPIC-KITCHENS, HD-EPIC, EgoCross, and CASTLE.
The goal of this workshop is to provide an exciting discussion forum for researchers working in this challenging and fast-growing area, and to provide a means to unlock the potential of data-driven research with our datasets to further the state-of-the-art.
We welcome submissions to the challenges from February to May (see important dates) through the leaderboards linked below. Participants to the challenges are requested to submit a technical report on their method. This is a requirement for the competition. Reports should be 2-6 pages including references. Submissions should use the CVPR format and should be submitted through the CMT website.
HoloAssist is a large-scale egocentric human interaction dataset, where two people collaboratively complete physical manipulation tasks.
Ego4D is a massive-scale, egocentric dataset and benchmark suite collected across 74 worldwide locations and 9 countries, with over 3,670 hours of daily-life activity video. Please find details below on our challenges:
Ego-Exo4D is a diverse, large-scale multi-modal multi view video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured ego- centric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair).
Please check the EPIC-KITCHENS website for more information on the EPIC-KITCHENS challenges. Links to individual challenges are also reported below.
Please check the HD-EPIC website for more information on the HD-EPIC challenges. Links to individual challenges are also reported below.
Please check the EgoCross website for more information on the EgoCross challenge.
Task Description: Given an egocentric video from a novel domain that differs significantly from commonly seen scenarios (e.g., industrial or surgical environments rather than daily-life settings), the goal is to select the correct answer from four options (A, B, C, D) for a given query question.
Please check the CASTLE website for more information on the CASTLE challenge.
You are invited to submit papers to the third edition of joint egocentric vision workshop which will be held alongside CVPR 2026 in Denver.
These papers represent original work and will be published as part of proceedings alongside CVPR. We welcome all works that focus within the Egocentric Domain, it is not necessary to use the datasets in this workshop within your work. We expect a submission may contain one or more of the following topics (this is a non-exhaustive list):
All accepted papers will be presented as posters.
You are invited to submit extended abstracts to the third edition of joint egocentric vision workshop which will be held alongside CVPR 2026 in Denver.
These abstracts represent existing or ongoing work and will not be published as part of any proceedings. We welcome all works that focus within the Egocentric Domain, it is not necessary to use the Ego4D dataset within your work. We expect a submission may contain one or more of the following topics (this is a non-exhaustive list):
The length of the extended abstracts is 2-4 pages, including figures, tables, and references. We invite submissions of ongoing or already published work, as well as reports on demonstrations and prototypes. The joint egocentric vision workshop gives opportunities for authors to present their work to the egocentric community to provoke discussion and feedback. Accepted work will be presented as either an oral presentation (either virtual or in-person) or as a poster presentation. The review will be single-blind, so there is no need to anonymize your work, but otherwise will follow the format of the CVPR submissions, information can be found here. Accepted abstracts will not be published as part of a proceedings, so can be uploaded to ArXiv etc. and the links will be provided on the workshop’s webpage. The submission will be managed with the CMT website.
NOTE: All dates are in Pacific Time (PT).
| Paper Deadline (on CMT) | 27 Feb 2026 |
| Paper Notifications to Authors | 3 April 2026 |
| Camera Ready Deadline (on CMT) | 7 April 2026 |
| Challenges Leaderboards Open | Feb 2026 |
| Challenges Leaderboards Close | 13 May 2026 |
| Challenges Technical Reports Deadline (on CMT) | 20 May 2026 |
| Notification to Challenge Winners | 27 May 2026 |
| Challenge Reports ArXiv Deadline | 1 June 2026 |
| Extended Abstract Deadline (on CMT) | 27 April 2026 |
| Extended Abstract Notification to Authors | 18 May 2026 |
| Extended Abstracts ArXiv Deadline | 25 May 2026 |
| Workshop Date | 3 June 2026 |
All dates are local to Denver's time, MST.
Workshop Location: Room 704/706
| Time | Event |
|---|---|
| 08:45-09:00 | Welcome and Introductions |
| 09:00-09:30 | Invited Keynote 1: Marc Pollefeys, ETH Zurich, Switzerland |
| 09:30-10:00 | Oral Presentations (Group 1) |
| 10:00-10:45 | Coffee Break and First Poster Session |
| 10:45-11:15 | Invited Keynote 2: Saurabh Gupta, University of Illinois, USA |
| 11:15-12:15 | Challenges and Winning Solutions |
| 12:15-12:45 | Invited Keynote 3: Jawahar C V, IIIT Hyderabad, India |
| 12:45-13:30 | Lunch Break |
| 13:30-14:00 | EgoVis Distinguished Papers Award |
| 14:00-14:30 | Invited Keynote 4: Lorenzo Torresani, Northeastern University, USA |
| 14:30-15:00 | Oral Presentations (Group 2) |
| 15:00-15:30 | Invited Keynote 5: Hazel Doughty, Leiden University, Netherlands |
| 15:30-16:15 | Coffee Break and Second Poster Session |
| 16:15-16:45 | Invited Keynote 6: Ziwei Liu, Nanyang Technological University, Singapore |
| 16:45-17:15 | Panel Discussion |
| 17:15-17:45 | Project Aria Updates |
| 17:45-17:50 | Conclusion |
Instructions on poster printing for the workshop: link.
All workshop posters are in ExHall A.
| Workshop Poster Session | Title | Authors | Paper Link | Project Page | CVPR Poster Session | CVPR Paper ID |
|---|---|---|---|---|---|---|
| First | Beyond Caption-Based Queries in Video Moment Retrieval | David Pujol-Perich; Albert Clapés; Dima Damen; Sergio Escalera; Michael Wray | link | link | TBD | TBD |
| First | E-3DPSM: A State Machine for Event-based Egocentric 3D Human Pose Estimation | Mayur Deshmukh; Hiroyasu Akada; Helge Rhodin; Christian Theobalt; Vladislav Golyanik | link | link | 5th June, 16:00–18:00 (Poster Session 2) | TBD |
| First | Ego-1K – A Large-Scale Multiview Video Dataset for Egocentric Vision | Jae Yong Lee; Daniel Scharstein; Akash Bapat; Hao Hu; Andrew Fu; Haoru Zhao; Paul Sammut; Xiang Li; Stephen Jeapes; Anik Gupta; Lior David; Saketh Madhuvarasu; JAY JOSHI; Jason Wither | link | link | TBD | TBD |
| First | Ego2Web: A Web Agent Benchmark Grounded on Egocentric Videos | Shoubin Yu; Lei Shu; Antoine Yang; Yao Fu; Srinivas Sunkara; Maria Wang; Jindong Chen; Mohit Bansal; Boqing Gong | link | link | TBD | TBD |
| First | EgoFlow: Gradient-Guided Flow Matching for Egocentric 6DoF Object Motion Generation | Abhishek Saroha; Huajian Zeng; Xingxing Zuo; Daniel Cremers; Xi Wang | link | link | TBD | TBD |
| First | UniDex: A Robot Foundation Suite for Universal Dexterous Hand Control from Egocentric Human Videos | Gu Zhang; Qicheng Xu; Haozhe Zhang; Jianhan Ma; Long He; Yiming Bao; Zeyu Ping; Zhecheng Yuan; Chenhao Lu; Chengbo Yuan; Tianhai Liang; Xiaoyu Tian; Maanping Shao; Feihong Zhang; Mingyu Ding; Yang Gao; Hao Zhao; Hang Zhao; Huazhe Xu | link | link | TBD | TBD |
| First | Unique Lives, Shared World: Learning from Single-Life Videos | Tengda Han; Sayna Ebrahimi; Dilara Gokay; Li Yang Ku; Maks Ovsjanikov; Iva Babukova; Daniel Zoran; Viorica Patraucean; Joao Carreira; Andrew Zisserman; Dima Damen | link | TBD | Poster Session 4, ID 241. (Sat June 6th) | TBD |
| First | ViterbiPlanNet: Injecting Procedural Knowledge via Differentiable Viterbi for Planning in Instructional Videos | Luigi Seminara; Davide Moltisanti; Antonino Furnari | link | link | 7th June, 11:45–13:45, during Poster Session 5 | TBD |
| Second | EgoXtreme: A Dataset for Robust Object Pose Estimation in Egocentric Views under Extreme Conditions | Taegyoon Yoon; Yegyu Han; Seojin Ji; Jaewoo Park; Sojeong Kim; Taein Kwon; Hyung-Sin Kim | link | link | June 7, 15:30 – 17:30 (Poster Session 6) ExHall A (Order #418) | 44704 |
| Second | Mistake Attribution: Fine-Grained Mistake Understanding in Egocentric Videos | Yayuan Li; Aadit Jain; Filippos Bellos; Jason Corso | link | link | TBD | TBD |
| Second | SAVA-X: Ego-to-Exo Imitation Error Detection via Scene-Adaptive View Alignment and Bidirectional Cross View Fusion | Xiang Li; Heqian Qiu; Lanxiao Wang; Benliu Qiu; Fanman Meng; Linfeng Xu; Hongliang Li | link | TBD | June 6 afternoon | TBD |
| Second | Seeing Conversations: Communication Context Identification in Egocentric Video | Tobias Dorszewski; Jens Hjortkjær | TBD | TBD | day 3 | 38615 |
| Second | Seeing without Pixels: Perception from Camera Trajectories | Zihui Xue; Kristen Grauman; Dima Damen; Andrew Zisserman; Tengda Han | link | link | Poster Session 6, Sun Jun 7, 3:30 – 5:30 PM (#248) | TBD |
| Second | SkillSight: Efficient First-Person Skill Assessment with Gaze | Chi Hsuan Wu; Kumar Ashutosh; Kristen Grauman | link | link | Poster Session 6. Session order 253. | TBD |
| Second | Test-time Ego-Exo-centric Adaptation for Action Anticipation via Multi-Label Prototype Growing and Dual-Clue Consistency | Zhaofeng Shi; Heqian Qiu; Lanxiao Wang; Qingbo Wu; Fanman Meng; Lili Pan; Hongliang Li | link | TBD | TBD | TBD |
| Second | V2-SAM: Marrying SAM2 with Multi-Prompt Experts for Cross-View Object Correspondence | Jiancheng Pan; Runze Wang; Tianwen Qian; Mohammad Mahdi; Yanwei Fu; Xiangyang Xue; Xiaomeng Huang; Luc Van Gool; Danda Paudel; Yuqian Fu | link | link | 3 (Session Order 248) | 33962 |
This workshop follows the footsteps of the following previous events:
EPIC-Kitchens and Ego4D Past Workshops:
Human Body, Hands, and Activities from Egocentric and Multi-view Cameras Past Workshops:
Project Aria Past Tutorials:
The Microsoft CMT service was used for managing the peer-reviewing process for this conference. This service was provided for free by Microsoft and they bore all expenses, including costs for Azure cloud services as well as for software development and support.