3:00 pm to 4:00 pm
Newell-Simon Hall 3305
Abstract: Today’s machine perception systems rely heavily on supervision provided by humans, such as labels and natural language. I will talk about our efforts to make systems that, instead, learn from two ubiquitous sources of unlabeled data: visual motion and cross-modal sensory associations. I will begin by discussing our work on creating unified models for visual tracking. A range of video modeling tasks, from optical flow to object tracking, share the same fundamental challenge: establishing space-time correspondence. Yet, approaches that dominate each space significantly differ. We propose a method, called the contrastive random walk, that learns dense, long-range trajectories through self-supervision, and which can be used to solve a variety of motion estimation problems. Second, I will present my group’s work on creating multimodal models that learn from audio and touch. I will show that the contrastive random walk can, perhaps surprisingly, be applied to binaural sound localization, by posing the problem as an “audio tracking” task. Finally, I will describe our work on learning from tactile sensing data that has been collected “in the wild” by humans, and our work on capturing camera properties by learning the cross-modal correspondence between images and camera metadata.
Bio: Andrew Owens is an assistant professor at University of Michigan in the department of Electrical Engineering and Computer Science. Prior to that, he was a postdoctoral scholar at UC Berkeley. He received a Ph.D. in Electrical Engineering and Computer Science from MIT in 2016. He is a recipient of a Computer Vision and Pattern Recognition (CVPR) Best Paper Honorable Mention Award, and a Microsoft Research Ph.D. Fellowship.
Homepage: https://www.andrewowens.com/
Sponsored in part by: Meta Reality Labs Pittsburgh