Carnegie Mellon University
5:00 pm to 6:00 pm
NSH 3305
Abstract:
Humans have the remarkable ability to create visual worlds far beyond what could be seen by human eye, including inferring the state of unobserved, imagining the unknown, and thinking about diverse possibilities about what lies in the future. Machines lack this inquisitive ability despite the current revolution in machine learning and computer vision. We believe that such an ability requires scaling beyond the current paradigm of supervised, curated datasets, and instead requires interaction and intervention with dynamic environments.
In this work, we present algorithms that can automatically create visual content (images, videos, space-time visualization of dynamic events in 3D) in an unsupervised manner, and yet be user-controllable and interactive. These capabilities enable diverse applications such as image and video synthesis, 4D spatiotemporal reconstruction, and visual data manipulation. We also present approaches to factorize scene information, and infer 3D information from a single 2D image. With these tools, we propose to vary weather conditions such as rain and clouds, and synthesize their influence on a 4D visual content.
While these applications are interesting in themselves from a computer graphics/content-generation perspective, we also plan to explore their application in creating data for training and evaluating visual recognition algorithms e.g., synthesizing videos of urban scenes with inclement weather to better train autonomous perception systems.
Thesis Committee Members:
Deva Ramanan, Co-chair
Yaser Sheikh, Co-chair
Martial Hebert
Alexei A. Efros, University of California, Berkeley
David A. Forsyth, University of Illinois, Urbana Champaign