Wearable cameras, smart glasses, and AR/VR headsets are gaining importance for research and commercial use. They feature various sensors like cameras, depth sensors, microphones, IMUs, and GPS. Advances in machine perception enable precise user localization (SLAM), eye tracking, and hand tracking. This data allows understanding user behavior, unlocking new interaction possibilities with augmented reality. Egocentric devices may soon automatically recognize user actions, surroundings, gestures, and social relationships. These devices have broad applications in assistive technology, education, fitness, entertainment, gaming, eldercare, robotics, and augmented reality, positively impacting society.
Previously, research in this field faced challenges due to limited datasets in a data-intensive environment. However, the community's recent efforts have addressed this issue by releasing numerous large-scale datasets covering various aspects of egocentric perception, including HoloAssist, Aria Digital Twin, Aria Synthetic Environments, Ego4D, Ego-Exo4D, and EPIC-KITCHENS.
The goal of this workshop is to provide an exciting discussion forum for researchers working in this challenging and fast-growing area, and to provide a means to unlock the potential of data-driven research with our datasets to further the state-of-the-art.
We welcome submissions to the challenges from March to May (see important dates) through the leaderboards linked below. Participants to the challenges are are requested to submit a technical report on their method. This is a requirement for the competition. Reports should be 2-6 pages including references. Submissions should use the CVPR format and should be submitted through the CMT website.
Challenge ID | Challenge Name | Challenge Lead | Challenge Link |
---|---|---|---|
1 | Action Recognition | Mahdi Rad, Microsoft, Switzerland | Link |
2 | Mistake Detection | Ishani Chakraborty, Microsoft, US | Link |
3 | Intervention Type Prediction | Taein Kwon, ETH Zurich, Switzerland | Link |
Challenge ID | Challenge Name | Challenge Lead | Challenge Link |
---|---|---|---|
1 | Few-shots 3D Object detection & tracking | Xiaqing Pan, Meta, US | Link |
2 | 3D Object detection & tracking | Xiaqing Pan, Meta, US | Link |
Challenge ID | Challenge Name | Challenge Lead | Challenge Link |
---|---|---|---|
1 | Scene Reconstruction using structured language | Vasileios Baltnas, Meta, UK | Link |
Ego4D is a massive-scale, egocentric dataset and benchmark suite collected across 74 worldwide locations and 9 countries, with over 3,670 hours of daily-life activity video. Please find details below on our challenges:
Challenge ID | Challenge Name | Challenge Lead | Challenge Link |
---|---|---|---|
1 | Visual Queries 2D | Santhosh Kumar Ramakrishnan, University of Texas, Austin, US | Link |
2 | Visual Queries 3D | Vincent Cartillier, Georgia Tech, US | Link |
3 | Natural Language Queries | Satwik Kottur, Meta, US | Link |
4 | Moment Queries | Chen Zhao & Merey Ramazanova, KAUST, SA | Link |
5 | EgoTracks | Hao Tang & Weiyao Wang, Meta, US | Link |
6 | Goal Step | Yale Song, Meta, US | Link |
7 | Ego Schema | Karttikeya Mangalam, Raiymbek Akshulakov, UC Berkeley, US | Link |
8 | PNR temporal localization | Yifei Huang, University of Tokyo, JP | Link |
9 | Localization and Tracking | Hao Jiang, Meta, US | Link |
10 | Speech Transcription | Leda Sari Jachym Kolar & Vamsi Krishna Ithapu, Meta Reality Labs, US | Link |
11 | Looking at me | Eric Zhongcong Xu, National University of Singapore, Singapore | Link |
12 | Short-term Anticipation | Francesco Ragusa, University of Catania, IT | Link |
13 | Long-term Anticipation | Tushar Nagarajan, FAIR, US | Link |
Ego-Exo4D is a diverse, large-scale multi-modal multi view video dataset and benchmark challenge. Ego-Exo4D centers around simultaneously-captured ego- centric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair).
Challenge ID | Challenge Name | Challenge Lead | Challenge Link |
---|---|---|---|
1 | Ego-Pose Body | Pablo Arbelaez & Maria Camila Escobar Palomeque, Universidad de los Andes Colombia | Link |
2 | Ego-Pose Hands | Jianbo Shi, Shan Shu, University of Pennsylvania, US | Link |
Please check the EPIC-KITCHENS website for more information on the EPIC-KITCHENS challenges. Links to individual challenges are also reported below.
Challenge ID | Challenge Name | Challenge Lead | Challenge Link |
---|---|---|---|
1 | Action Recognition | Jacob Chalk, University of Bristol, UK | Link |
2 | Action Anticipation | Antonino Furnari and Francesco Ragusa University of Catania, IT | Link |
3 | Action Detection | Francesco Ragusa and Antonino Furnari, University of Catania, IT | Link |
4 | Domain Adaptation for Action Recognition | Toby Perrett, University of Bristol, UK | Link |
5 | Multi-Instance Retrieval | Michael Wray, University of Bristol, UK | Link |
6 | Semi-Supervised Video-Object Segmentation | Ahmad Dar Khalil, University of Bristol, UK | Link |
7 | Hand-Object Segmentation | Dandan Shan, University of Michigan, US | Link |
8 | EPIC-SOUNDS Audio-Based Interaction Recognition | Jacob Chalk, University of Bristol, UK | Link |
9 | TREK-150 Object Tracking | Matteo Dunnhofer, University of Udine, IT | Link |
10 | EPIC-SOUNDS Audio-Based Interaction Detection | Jacob Chalk, University of Bristol, UK | Link |
Benchmark | Challenge | Team Rank | Winner Names | Technical Report | Code |
---|---|---|---|---|---|
EPIC-KITCHENS | Action Recognition | 1 | Shuming Liu (KAUST)*; Lin Sui (Nanjing University); Chen-Lin Zhang (Moonshot AI); Fangzhou Mu (NVIDIA); Chen Zhao (KAUST); Bernard Ghanem (KAUST) | Coming Soon... | Coming Soon... |
EPIC-KITCHENS | Action Recognition | 2 | Yingxin Xia (DeepGlint); Ninghua Yang (DeepGlint)*; Kaicheng Yang (DeepGlint); Xiang An (DeepGlint); Xiangzi Dai (DeepGlint); Weimo Deng (DeepGlint); Ziyong Feng (DeepGlint) | Coming Soon... | Coming Soon... |
EPIC-KITCHENS | Action Recognition | 2 | Yingxin Xia (DeepGlint); Ninghua Yang (DeepGlint)*; Kaicheng Yang (DeepGlint); Xiang An (DeepGlint); Xiangzi Dai (DeepGlint); Weimo Deng (DeepGlint); Ziyong Feng (DeepGlint) | Coming Soon... | Coming Soon... |
EPIC-KITCHENS | Action Recognition | 3 | Jilan Xu (Fudan University)*; Baoqi Pei (Zhejiang University); Yifei Huang (The University of Tokyo); Guo Chen (Nanjing University); Yicheng Liu (Nanjing University); Yuping He (Nanjing University); Kanghua Pan (Nanjing University); Yali Wang (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences); Tong Lu (Nanjing University); Limin Wang (Nanjing University); Yu Qiao (Shanghai Artificial Intelligence Laboratory) | Coming Soon... | Coming Soon... |
EPIC-KITCHENS | Action Detection | 1 | Shuming Liu (KAUST)*; Lin Sui (Nanjing University); Chen-Lin Zhang (Moonshot AI); Fangzhou Mu (NVIDIA); Chen Zhao (KAUST); Bernard Ghanem (KAUST) | Coming Soon... | Coming Soon... |
EPIC-KITCHENS | Action Detection | 2 | Yingxin Xia (DeepGlint); Ninghua Yang (DeepGlint)*; Kaicheng Yang (DeepGlint); Xiang An (DeepGlint); Xiangzi Dai (DeepGlint); Weimo Deng (DeepGlint); Ziyong Feng (DeepGlint) | Coming Soon... | Coming Soon... |
EPIC-KITCHENS | Action Detection | 2 | Yingxin Xia (DeepGlint); Ninghua Yang (DeepGlint)*; Kaicheng Yang (DeepGlint); Xiang An (DeepGlint); Xiangzi Dai (DeepGlint); Weimo Deng (DeepGlint); Ziyong Feng (DeepGlint) | Coming Soon... | Coming Soon... |
EPIC-KITCHENS | Action Detection | 3 | Jacob Chalk, Jaesung Huh, Evangelos Kazakos, Andrew Zisserman, Dima Damen | Coming Soon... | Coming Soon... |
EPIC-KITCHENS | Unsupervised Domain Adaptation for Action Recognition | 1 | Jilan Xu (Fudan University)*; Baoqi Pei (Zhejiang University); Yifei Huang (The University of Tokyo); Guo Chen (Nanjing University); Yicheng Liu (Nanjing University); Yuping He (Nanjing University); Kanghua Pan (Nanjing University); Yali Wang (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences); Tong Lu (Nanjing University); Limin Wang (Nanjing University); Yu Qiao (Shanghai Artificial Intelligence Laboratory) | Coming Soon... | Coming Soon... |
EPIC-KITCHENS | Multi-Instance Retrieval | 1 | XIAOQI WANG (The Hong Kong Polytechnic University); Yi Wang (The Hong Kong Polytechnic University); Lap-Pui Chau (The Hong Kong Polytechnic University)* | Coming Soon... | Coming Soon... |
EPIC-KITCHENS | Multi-Instance Retrieval | 2 | Jilan Xu (Fudan University)*; Baoqi Pei (Zhejiang University); Yifei Huang (The University of Tokyo); Guo Chen (Nanjing University); Yicheng Liu (Nanjing University); Yuping He (Nanjing University); Kanghua Pan (Nanjing University); Yali Wang (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences); Tong Lu (Nanjing University); Limin Wang (Nanjing University); Yu Qiao (Shanghai Artificial Intelligence Laboratory) | Coming Soon... | Coming Soon... |
EPIC-KITCHENS | Multi-Instance Retrieval | 3 | Jiamin Cao (Xidian University)*; Lingqi Wang (Xidian University); Jiayao Hao (Xidian University ); Shuyuan Yang (Xidian University); Licheng Jiao (Xidian University) | Coming Soon... | Coming Soon... |
EPIC-KITCHENS | Video Object Segmentation | 1 | Qinliang Wang (xidian university)*; xuejian Gou (xidian university); Zhongjian Huang (Xidian University); Lingling Li (Xidian University); Fang Liu (Xidian University) | Coming Soon... | Coming Soon... |
EPIC-KITCHENS | Video Object Segmentation | 2 | Sen Jia (Xidian University)*; Xinyue Yu (Xidian University); Long Sun (Xidian University); Licheng Jiao (Xidian University); Shuyuan Yang (Xidian University) | Coming Soon... | Coming Soon... |
EPIC-KITCHENS | Video Object Segmentation | 3 | Libo Yan (Xidian University)*; Shizhan Zhao (Xidian University); Zhang Yanzhao (Xidian University); Xu Liu (Xidian University); Puhua Chen (Xidian University) | Coming Soon... | Coming Soon... |
EPIC-KITCHENS | Audio-Based Interaction Recognition | 1 | Lingqi Wang (Xidian University)*; Jiamin Cao (Xidian University); xuejian Gou (xidian university); Lingling Li (Xidian University); Fang Liu (Xidian University) | Coming Soon... | Coming Soon... |
EPIC-KITCHENS | Audio-Based Interaction Recognition | 2 | Jacob Chalk, Jaesung Huh, Evangelos Kazakos, Andrew Zisserman, Dima Damen | Coming Soon... | Coming Soon... |
EPIC-KITCHENS | Audio-Based Interaction Recognition | 3 | Shizhan Zhao (Xidian University)*; Libo Yan (Xidian University); Zhang Yanzhao (Xidian University); Licheng Jiao (Xidian University); Xu Liu (Xidian University); Yuwei Guo (Xidian University) | Coming Soon... | Coming Soon... |
EPIC-KITCHENS | Audio-Based Interaction Detection | 1 | Shuming Liu (KAUST)*; Lin Sui (Nanjing University); Chen-Lin Zhang (Moonshot AI); Fangzhou Mu (NVIDIA); Chen Zhao (KAUST); Bernard Ghanem (KAUST) | Coming Soon... | Coming Soon... |
EPIC-KITCHENS | Audio-Based Interaction Detection | 2 | Jacob Chalk, Jaesung Huh, Evangelos Kazakos, Andrew Zisserman, Dima Damen | Coming Soon... | Coming Soon... |
EPIC-KITCHENS | Audio-Based Interaction Detection | 3 | xuejian Gou (xidian university)*; Qinliang Wang (xidian university); Jiamin Cao (Xidian University); Lingling Li (Xidian University); Fang Liu (Xidian University) | Coming Soon... | Coming Soon... |
HoloAssist - Mistake Detection
|
Mistake Detection | 1 | Michele Mazzamuto (University of Catania), Antonino Furnari (University of Catania), and Giovanni Maria Farinella (University of Catania) | Link | Link |
HoloAssist - Action Recognition
|
Fine-grained action recognition task | 1 | Artem Merinov (Free University of Bozen-Bolzano), Oswald Lanz (Free University of Bozen-Bolzano) | Link | - |
Ego-Exo4D | Ego-Pose Hands | 1 | Feng Chen, Lenovo Research Ling Ding, Lenovo Research Kanokphan Lertniphonphan, Lenovo Research Jian Li, Lenovo Research Kaer Huang, Lenovo Research Zhepeng Wang, Lenovo Research |
Coming Soon... | Coming Soon... |
Ego-Exo4D | Ego-Pose Hands | 2 | Georgios Pavlakos, UT Austin Dandan Shan, University of Michigan Ilija Radosavovic, UC Berkeley Angjoo Kanazawa, UC Berkeley David Fouhey, New York University Jitendra Malik, UC Berkeley |
Coming Soon... | Coming Soon... |
Ego-Exo4D | Ego-Pose Hands | 3 | Baoqi Pei, Zhejiang University, Shanghai AI Laboratory Yifei Huang, University of Tokyo, Shanghai AI Laboratory Guo Chen, Nanjing University, Shanghai AI Laboratory Jilan Xu, Fudan University, Shanghai AI Laboratory Yicheng Liu, Nanjing University Yuping He, Nanjing University Kanghua Pan, Nanjing University Tong Lu, Nanjing University Yali Wang, Shenzhen Institute of Advanced Technology, Shanghai AI Laboratory Limin Wang, Nanjing University, Shanghai AI Laboratory Yu Qiao, Shanghai AI Laboratory |
Coming Soon... | Coming Soon... |
Ego-Exo4D | Ego-Pose Body | 1 | Jilan Xu, Fudan University, Shanghai AI Laboratory Yifei Huang, University of Tokyo, Shanghai AI Laboratory Guo Chen, Nanjing University, Shanghai AI Laboratory Baoqi Pei, Zhejiang University, Shanghai AI Laboratory Yicheng Liu, Nanjing University Yuping He, Nanjing University Kanghua Pan, Nanjing University Tong Lu, Nanjing University Yali Wang, Shenzhen Institute of Advanced Technology, Shanghai AI Laboratory Limin Wang, Nanjing University, Shanghai AI Laboratory Yu Qiao, Shanghai AI Laboratory |
Coming Soon... | Coming Soon... |
Ego-Exo4D | Ego-Pose Body | 3 | Congsheng Xu, Shanghai Jiaotong University | Coming Soon... | Coming Soon... |
Ego-Exo4D | Ego-Pose Body | 2 | Brent Yi, UC Berkeley Vickie Ye, UC Berkeley Georgios Pavlakos, UT Austin Lea Müller, UC Berkeley Maya Zheng, UC Berkeley Yi Ma, UC Berkeley Jitendra Malik, UC Berkeley Angjoo Kanazawa, UC Berkeley |
Coming Soon... | Coming Soon... |
Ego4D | Goal Step | 1 | Carlos Plou, Universidad de Zaragoza Lorenzo Mur-Labadia, University of Zaragoza Ruben Martinez-Cantin, University of Zaragoza Ana Murillo, Universidad de Zaragoza |
Coming Soon... | Coming Soon... |
Ego4D | Goal Step | 2 | Yuping He, Nanjing University Guo Chen, Nanjing University, Shanghai AI Laboratory Baoqi Pei, Zhejiang University, Shanghai AI Laboratory Yicheng Liu, Nanjing University Kanghua Pan, Nanjing University Jilan Xu, Fudan University, Shanghai AI Laboratory Yifei Huang, University of Tokyo, Shanghai AI Laboratory Yali Wang, Shenzhen Institute of Advanced Technology, Shanghai AI Laboratory Tong Lu, Nanjing University Limin Wang, Nanjing University, Shanghai AI Laboratory Yu Qiao, Shanghai AI Laboratory |
Coming Soon... | Coming Soon... |
Ego4D | Moments Queries | 2 | Kanghua Pan, Nanjing University Yuping He, Nanjing University Guo Chen, Nanjing University, Shanghai AI Laboratory Baoqi Pei, Zhejiang University, Shanghai AI Laboratory Yicheng Liu, Nanjing University Jilan Xu, Fudan University, Shanghai AI Laboratory Yifei Huang, University of Tokyo, Shanghai AI Laboratory Yali Wang, Shenzhen Institute of Advanced Technology, Shanghai AI Laboratory Tong Lu, Nanjing University Limin Wang, Nanjing University, Shanghai AI Laboratory (1,6) Yu Qiao, Shanghai AI Laboratory |
Coming Soon... | Coming Soon... |
Ego4D | Natural Language Queries | 1 | Yuping He, Nanjing University Guo Chen, Nanjing University, Shanghai AI Laboratory Baoqi Pei, Zhejiang University, Shanghai AI Laboratory Yicheng Liu, Nanjing University Kanghua Pan, Nanjing University Jilan Xu, Fudan University, Shanghai AI Laboratory Yifei Huang, University of Tokyo, Shanghai AI Laboratory Yali Wang, Shenzhen Institute of Advanced Technology, Shanghai AI Laboratory Tong Lu, Nanjing University Limin Wang, Nanjing University, Shanghai AI Laboratory Yu Qiao,Shanghai AI Laboratory |
Coming Soon... | Coming Soon... |
Ego4D | Short-term Object Interaction Anticipation | 1 | Guo Chen, Nanjing University, Shanghai AI Laboratory Yuping He, Nanjing University Baoqi Pei, Zhejiang University, Shanghai AI Laboratory Yicheng Liu, Nanjing University Kanghua Pan, Nanjing University Jilan Xu, Fudan University, Shanghai AI Laboratory Yifei Huang, University of Tokyo, Shanghai AI Laboratory Yali Wang, Shenzhen Institute of Advanced Technology Shanghai AI Laboratory Tong Lu, Nanjing University Limin Wang, Nanjing University, Shanghai AI Laboratory Yu Qiao, Shanghai AI Laboratory |
Coming Soon... | Coming Soon... |
Ego4D | Long-term Action Anticipation | 1 | Yicheng Liu, Nanjing University Guo Chen, Nanjing University, Shanghai AI Laboratory Yuping He, Nanjing University Baoqi Pei, Zhejiang University, Shanghai AI Laboratory Kanghua Pan, Nanjing University Jilan Xu, Fudan University, Shanghai AI Laboratory Yifei Huang, University of Tokyo, Shanghai AI Laboratory Yali Wang, Shenzhen Institute of Advanced Technology, Shanghai AI Laboratory Tong Lu, Nanjing University Limin Wang, Nanjing University, Shanghai AI Laboratory Yu Qiao, Shanghai AI Laboratory |
Coming Soon... | Coming Soon... |
Ego4D | Ego Schema | 2 | Noriyuki Kugo, Panasonic Connect Tatsuya Ishibashi, Panasonic Connect Kosuke Ono, Panasonic Connect Yuji Sato, Panasonic Connect |
Coming Soon... | Coming Soon... |
Ego4D | Ego Schema | 3 | Ying Wang, NYU Yanlai Yang, NYU Mengye Ren, NYU |
Coming Soon... | Coming Soon... |
Ego4D | Ego Schema | 1 | Haoyu Zhang, Harbin Institute of Technology, Yuquan Xie, Harbin Institute of Technology Yisen Feng, Harbin Institute of Technology Zaijing Li, Harbin Institute of Technology Meng Liu, Shandong Jianzhu University Liqiang Nie, Harbin Institute of Technology |
Coming Soon... | Coming Soon... |
Ego4D | Long-term Action Anticipation | 2 | Zeyun Zhong, Karlsruhe Institute of Technology Manuel Martin, Fraunhofer IOSB Dr. Frederik Diederichs, Fraunhofer IOSB Jürgen Beyerer, Fraunhofer IOSB |
Coming Soon... | Coming Soon... |
Ego4D | Looking at Me | 2 | Xin Li, University of Science and Technology Beijing Xu Han, University of Science and Technology Beijing Bochao Zou, University of Science and Technology Beijing Huimin Ma, University of Science and Technology Beijing |
Coming Soon... | Coming Soon... |
Ego4D | Looking at Me | 1 | Kanokphan Lertniphonphan, Lenovo Research Jun Xie, Lenovo Research Yaqing Meng, Chinese Academy of Sciences Shijing Wang, Beijing Jiaotong University Feng Chen, Lenovo Research Zhepeng Wang, Lenovo Research |
Coming Soon... | Coming Soon... |
Ego4D | Moments Queries | 1 | Shuming Liu, King Abdullah University of Science and Technology Chen-Lin Zhang, Moonshot AI Fangzhou Mu, NVIDIA Bernard Ghanem, King Abdullah University of Science and Technology |
Coming Soon... | Coming Soon... |
Ego4D | Natural Language Queries | 2 | Haoyu Zhang, Harbin Institute of Technology, Yuquan Xie, Harbin Institute of Technology Yisen Feng, Harbin Institute of Technology Zaijing Li, Harbin Institute of Technology Meng Liu, Shandong Jianzhu University Liqiang Nie, Harbin Institute of Technology |
Coming Soon... | Coming Soon... |
Goal Step | 3 | Haoyu Zhang, Harbin Institute of Technology, Yuquan Xie, Harbin Institute of Technology Yisen Feng, Harbin Institute of Technology Zaijing Li, Harbin Institute of Technology Meng Liu, Shandong Jianzhu University Liqiang Nie, Harbin Institute of Technology |
Coming Soon... | Coming Soon... | |
Ego4D | Short-term Object Interaction Anticipation | 3 | Hyunjin Cho, Department of ECE, Seoul National University Dong Un Kang, Department of ECE, Seoul National University Se Young Chun, Department of ECE, Seoul National University |
Coming Soon... | Coming Soon... |
Ego4D | Short-term Object Interaction Anticipation | 2 | Lorenzo Mur-Labadia, University of Zaragoza Jose Guerrero, Universidad de Zaragoza Ruben Martinez-Cantin, University of Zaragoza Giovanni Maria Farinella, University of Catania |
Coming Soon... | Coming Soon... |
Ego4D | Visual Queries 2D | 1 | Baoqi Pei, Zhejiang University, Shanghai AI Laboratory Yifei Huang, University of Tokyo, Shanghai AI Laboratory Guo Chen, Nanjing University, Shanghai AI Laboratory Jilan Xu, Fudan University, Shanghai AI Laboratory Yicheng Liu, Nanjing University Yuping He, Nanjing University Kanghua Pan, Nanjing University Tong Lu, Nanjing University Yali Wang, Shenzhen Institute of Advanced Technology, Shanghai AI Laboratory Limin Wang, Nanjing University, Shanghai AI Laboratory Yu Qiao, Shanghai AI Laboratory |
Coming Soon... | Coming Soon... |
Ego4D | Visual Queries 3D | 2 | Jilan Xu, Fudan University, Shanghai AI Laboratory Yifei Huang, University of Tokyo, Shanghai AI Laboratory Guo Chen, Nanjing University, Shanghai AI Laboratory Baoqi Pei, Zhejiang University, Shanghai AI Laboratory Yicheng Liu, Nanjing University Yuping He, Nanjing University Kanghua Pan, Nanjing University Tong Lu, Nanjing University Yali Wang, Shenzhen Institute of Advanced Technology, Shanghai AI Laboratory Limin Wang, Nanjing University Yu Qiao, Shanghai AI Laboratory |
Coming Soon... | Coming Soon... |
Ego4D | Visual Queries 3D | 1 | Jinjie Mai, KAUST Abdullah Hamdi, KAUST Chen Zhao, KAUST Silvio Giancola, KAUST Bernard Ghanem, KAUST |
Coming Soon... | Coming Soon... |
You are invited to submit extended abstracts to the first edition of joint egocentric vision workshop which will be held alongside CVPR 2024 in Seattle.
These abstracts represent existing or ongoing work and will not be published as part of any proceedings. We welcome all works that focus within the Egocentric Domain, it is not necessary to use the Ego4D dataset within your work. We expect a submission may contain one or more of the following topics (this is a non-exhaustive list):
The length of the extended abstracts is 2-4 pages, including figures, tables, and references. We invite submissions of ongoing or already published work, as well as reports on demonstrations and prototypes. The 1st joint egocentric vision workshop gives opportunities for authors to present their work to the egocentric community to provoke discussion and feedback. Accepted work will be presented as either an oral presentation (either virtual or in-person) or as a poster presentation. The review will be single-blind, so there is no need to anonymize your work, but otherwise will follow the format of the CVPR submissions, information can be found here. Accepted abstracts will not be published as part of a proceedings, so can be uploaded to ArXiv etc. and the links will be provided on the workshop’s webpage. The submission will be managed with the CMT website.
Challenges Leaderboards Open | Mar 2024 |
Challenges Leaderboards Close | 30 May 2024 |
Challenges Technical Reports Deadline (on CMT) | 5 June 2024 (23:59 PT) |
Extended Abstract Deadline | 10 May 2024 (23:59 PT) |
Extended Abstract Notification to Authors | 29 May 2024 |
Extended Abstracts ArXiv Deadline | 12 June 2024 |
Workshop Date | 17 June 2024 |
All dates are local to Seattle's time, PST.
Workshop Location: Room Summit 428
Time | Event |
---|---|
09:00-09:15 | Welcome and Introductions |
09:15-09:45 | Invited Keynote 1: Jim Rehg, University of Illinois Urbana-Champaign, US |
09:45-10:20 | HoloAssist Challenges |
10:20-11:20 | Coffee Break and Poster Session |
11:20-11:50 | Invited Keynote 2: Diane Larlus, Naver Labs Europe and MIAI Grenoble, FR |
11:50-12:40 | EPIC-KITCHENS Challenges |
12:40-13:40 | Lunch Break |
13:40-14:10 | EgoVis 2022/2023 Distinguished paper Awards |
14:10-14:40 | Invited Keynote 3: Michael C. Frank & Bria Long, Stanford University, US |
14:40-15:30 | Project Aria Datasets & Challenges |
15:30-16:00 | Coffee Break |
16:00-16:30 | Invited Keynote 4: Fernando de La Torre, Carnegie Mellon University, US |
16:30-17:40 | Ego4D & Ego-Exo4D Challenges |
17:40-18:00 | Conclusion |
Note to authors: Please hang your poster following the indicated poster numbers. Posters can be put up ONLY during the poster session time (10.20 - 11.20).
All workshop posters are in ARCH building 4E
EgoVis Poster Number | Title | Authors | arXiv Link |
---|---|---|---|
192 | On the Application of Egocentric Computer Vision to Industrial Scenarios | Vivek Prabhakar Chavan (Fraunhofer Institute); Oliver Heimann (Fraunhofer IPK); Jörg Krüger (TU-Berlin) | link |
193 | Instance Tracking in 3D Scenes from Egocentric Videos | Yunhan Zhao (University of California, Irvine); Haoyu Ma (University of California, Irvine); Shu Kong (Texas A&M University); Charless Fowlkes (UC Irvine) | link |
194 | The Audio-Visual Conversational Graph: From an Egocentric-Exocentric Perspective | Wenqi Jia (Georgia Institute of Technology) | link |
195 | ALGO: Object-Grounded Visual Commonsense Reasoning for Open-World Egocentric Action Recognition | Sanjoy Kundu (Auburn University); Sathyanarayanan N Aakur (Auburn University); Shubham Trehan (Auburn University) | link |
196 | Object Aware Egocentric Online Action Detection | Joungbin An (Yonsei University); YUNSU PARK (Yonsei University); Hyolim Kang (Yonsei University); Seon Joo Kim (Yonsei University) | link |
197 | From Observation to Abstractions: Efficient In-Context Learning from Human Feedback and Visual Demonstrations for VLM Agents | Gabriel Sarch (Carnegie Mellon University); Lawrence Jang (Carnegie Mellon University); Michael J Tarr (Carnegie Mellon University); William W Cohen (Google AI); Kenneth Marino (Google DeepMind); Katerina Fragkiadaki (Carnegie Mellon University) | link |
198 | Learning Mobile Manipulation Skills via Autonomous Exploration | Russell Mendonca (Carnegie Mellon University); Deepak Pathak (Carnegie Mellon University) | Coming soon... |
199 | RMem: Restricted Memory Banks Improve Video Object Segmentation | Junbao Zhou (UIUC); Ziqi Pang (UIUC); Yu-Xiong Wang (University of Illinois at Urbana-Champaign) | link |
200 | ENIGMA-51: Towards a Fine-Grained Understanding of Human Behavior in Industrial Scenarios | Francesco Ragusa (University of Catania); Rosario Leonardi (University of Catania); Michele Mazzamuto (University of Catania); Claudia Bonanno (Università degli Studi di Catania); Rosario Scavo (University of Catania); Antonino Furnari (University of Catania); Giovanni Maria Farinella (University of Catania) | link |
201 | Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection? | Rosario Leonardi (University of Catania); Antonino Furnari (University of Catania); Francesco Ragusa (University of Catania); Giovanni Maria Farinella (University of Catania) | link |
202 | Contrastive Language Video Time Pre-training | Hengyue Liu (UC Riverside); Kyle Min (Intel Labs); Hector A Valdez (Intel Corporation); Subarna Tripathi (Intel Labs) | link |
203 | Identification of Conversation Partners from Egocentric Video | Tobias Dorszewski (Technical University of Denmark); Søren Fuglsang (University Hospital of Copenhagen); Jens Hjortkjær (DTU) | link |
204 | Put Myself in Your Shoes: Lifting the Egocentric Perspective from Exocentric Videos | Mi Luo (University of Texas at Austin); Zihui Xue (The University of Texas at Austin); Alex Dimakis (UT Austin); Kristen Grauman (Facebook AI Research & UT Austin) | Coming soon... |
205 | HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model | Khoa HV Vo (University of Arkansas); Thinh Phan (University of Arkansas); Kashu Yamazaki (University of Arkansas); Minh Q Tran (University of Arkansas); Ngan Le (University of Arkansas) | link |
206 | Video Question Answering for People with Visual Impairments Using an Egocentric 360-Degree Camera | Inpyo Song (SungKyunKwan University); MinJun Joo (iislab); Joonhyung Kwon (Korea Aerospace University); Jangwon Lee (SungKyunKwan University) | link |
207 | X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization | Anna Kukleva (MPII); Fadime Sener (University of Bonn); Edoardo Remelli (Meta); Bugra Tekin (Meta); Eric Sauser (Meta); Bernt Schiele (MPI Informatics); Shugao Ma (Meta Reality Labs) | link |
208 | HandFormer: Utilizing 3D Hand Pose for Egocentric Action Recognition | Md Salman Shamil (National University of Singapore); Dibyadip Chatterjee (National University of Singapore); Fadime Sener (University of Bonn); Shugao Ma (Meta Reality Labs); Angela Yao (National University of Singapore) | link |
210 | Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos | Sagnik Majumder (University of Texas at Austin); Ziad Al-Halah (University of Utah); Kristen Grauman (University of Texas at Austin) | link |
EgoVis Poster Number | Title | Authors | arXiv Link | CVPR 2024 Presentation Details |
---|---|---|---|---|
178 | PREGO: online mistake detection in PRocedural EGOcentric videos | Alessandro Flaborea, Guido Maria D'Amely di Melendugno, Leonardo Plini, Luca Scofano, Edoardo De Matteis, Antonino Furnari, Giovanni Maria Farinella, Fabio Galasso | link | Thursday, 20 June, 17:15 to 18:45 |
179 | Summarize the Past to Predict the Future: Natural Language Descriptions of Context Boost Multimodal Object Interaction Anticipation | Razvan-George Pasca, Alexey Gavryushin, Muhammad Hamza, Yen-Ling Kuo, Kaichun Mo, Luc Van Gool, Otmar Hilliges, Xi Wang | link | Thursday, 20 June, 17:15 to 18:45 |
180 | EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams | Christen Millerdurai, Hiroyasu Akada, Jian Wang, Diogo Luvizon, Christian Theobalt, Vladislav Golyanik | link | Wednesday, 19 June, 10:30 to 12:00 |
181 | SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos | Changan Chen, Kumar Ashutosh, Rohit Girdhar, David Harwath, Kristen Grauman | link | Friday, 21 June, 17:00 to 18:30 |
182 | Action Scene Graphs for Long-Form Understanding of Egocentric Videos | Rodin Ivan, Antonino Furnari, Kyle Min, Subarna Tripathi, Giovanni Maria Farinella | link | Thursday, 20 June, 17:15 to 18:45 |
183 | EgoGen: An Egocentric Synthetic Data Generator (CVPR HIGHLIGHT) | Gen Li, Kaifeng Zhao, Siwei Zhang, Xiaozhong Lyu, Mihai Dusmanu, Yan Zhang, Marc Pollefeys, Siyu Tang | link | Thursday, 20 June, 17:15 to 18:45 |
184 | Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting (CVPR HIGHLIGHT) | Taeho Kang, Youngki Lee | link | Wednesday 19 June, 10:30 to 12:00 |
185 | Retrieval-Augmented Egocentric Video Captioning | Jilan Xu, Yifei Huang, Junlin Hou, Guo Chen, Yuejie Zhang, Rui Feng, Weidi Xie | link | Thursday, 20 June, 10:30 to 12:00 |
190 | 3D Human Pose Perception from Egocentric Stereo Videos (CVPR HIGHLIGHT) | Hiroyasu Akada, Jian Wang, Vladislav Golyanik, Christian Theobalt | link | Wednesday 19 June, 10:30 to 12:00 |
187 | A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives | Simone Alberto Peirone, Francesca Pistilli, Antonio Alliegro, Giuseppe Averta | link | Thursday, 20 June, 17:15 to 18:45 |
188 | Egocentric Full Body Motion Capture with FisheyeViT and Diffusion-Based Motion Refinement | Jian Wang, Zhe Cao, Diogo Luvizon, Lingjie Liu, Kripasindhu Sarkar, Danhang Tang, Thabo Beeler, Christian Theobalt | link | Wednesday 19 June, 10:30 to 12:00 |
189 | Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation | Ruicong Liu, Takehiko Ohkawa, Mingfang Zhang, Yoichi Sato | link | Wednesday 19 June, 10:30 to 12:00 |
186 | EgoExoLearn: A Dataset for Bridging Asynchronous Ego- and Exo-centric View of Procedural Activities in Real World | Yifei Huang, Guo Chen, Jilan Xu, Mingfang Zhang, Lijin Yang, Baoqi Pei, Hongjie Zhang, Lu Dong, Yali Wang, Limin Wang, Yu Qiao | link | Friday, 21 June, 10:30 to 12:00 |
191 | EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models (CVPR HIGHLIGHT) | Sijie Cheng, Zhicheng Guo, Jingwen Wu, Kechen Fang, Peng Li, Huaping Liu, Yang Liu | link | Thursday, 20 June, 10:30 to 12:00 |
193 | Instance Tracking in 3D Scenes from Egocentric Videos | Yunhan Zhao (University of California, Irvine); Haoyu Ma (University of California, Irvine); Shu Kong (Texas A&M University); Charless Fowlkes (UC Irvine) | link | Thursday, 21 June, 10:30 to 12:00 |
207 | X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization | Anna Kukleva (MPII); Fadime Sener (University of Bonn); Edoardo Remelli (Meta); Bugra Tekin (Meta); Eric Sauser (Meta); Bernt Schiele (MPI Informatics); Shugao Ma (Meta Reality Labs) | link | Thursday, 21 June, 17:15 to 18:45 |
210 | Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos | Sagnik Majumder (University of Texas at Austin); Ziad Al-Halah (University of Utah); Kristen Grauman (University of Texas at Austin) | link | Thursday, 21 June, 17:15 to 18:45 |
This workshop follows the footsteps of the following previous events:
EPIC-Kitchens and Ego4D Past Workshops:
Human Body, Hands, and Activities from Egocentric and Multi-view Cameras Past Workshops:
Project Aria Past Tutorials: