MSR Thesis Talk: Himangi Mittal - Robotics Institute Carnegie Mellon University
Loading Events

MSR Thesis Defense

April

19
Wed
Himangi Mittal PhD Student Robotics Institute,
Carnegie Mellon University
Wednesday, April 19
11:00 am to 12:00 pm
GHC 6115
MSR Thesis Talk: Himangi Mittal
Title: Audio-Visual State-Aware Representation Learning from Interaction-Rich Data
Abstract
In robotics and augmented reality, the input to the agent is a long stream of video from the first-person or egocentric point of view. Recently, there have been significant efforts to capture humans from their first-person/egocentric view interacting with their own environment as they go about their daily activities. As a result, several large-scale egocentric, interaction-rich, multi-modal datasets have emerged. However, learning representations from such videos can be quite challenging.

First, given the uncurated nature of long, untrimmed, continuous videos, learning effective representations require focusing on moments in time when interactions take place.  Second, visual representations of daily activities should be sensitive to changes in the state of the object and the environment. However, current successful multi-modal learning frameworks encourage representations that are invariant to time and object states. We propose a self-supervised algorithm to learn representations from egocentric video data using multiple modalities of video and audio. To address the above challenges, we leverage audio signals to identify moments of likely interactions which are conducive to better learning. Motivated by the observation of a sharp audio signal associated with an interaction, we propose a novel self-supervised objective that learns from audible state changes caused by interactions. We validate these contributions extensively on two large-scale egocentric datasets, EPIC-Kitchens-100 and Ego4D, and show improvements on several downstream tasks, including action recognition, long-term action anticipation, object state change classification, and point-of-no-return temporal localization.

Committee:
Prof. Abhinav Gupta (advisor)
Prof. David Held
Prof. Shubham Tulsiani
Yufei Ye