Carnegie Mellon University
3:30 pm to 4:30 pm
Abstract: Generalization, i.e., the ability to adapt to novel scenarios, is the hallmark of human intelligence. While we have systems that excel at cleaning floors, playing complex games, and occasionally beating humans, they are incredibly specific in that they only perform the tasks they are trained for and are miserable at generalization. One of the fundamental reasons is that, unlike humans, most of these artificial agents start tabula-rasa without any prior knowledge and learn only towards a fixed goal. In this talk, I will present our initial efforts towards building a framework for learning general-purpose visual embodied intelligence. The framework brings together ideas from machine learning, computer vision, control theory, and developmental psychology to achieve end-to-end sensorimotor learning in embodied agents. I will present results from case studies of robots that achieve strong performance across several simulation benchmarks, manipulate deformable objects, navigate in office environments, display drastically diverse locomotion styles across unseen robot shapes, and perceive the real-world in 3D from just a single 2D image.
Brief Bio: Deepak Pathak is a faculty in the School of Computer Science at Carnegie Mellon University. He received his Ph.D. in Artificial Intelligence from UC Berkeley and his research spans computer vision, machine learning, and robotics. He is a recipient of the Google Faculty Award, Facebook Graduate Fellowship, the NVIDIA Fellowship, and the Snapchat Fellowship, and his research has been featured in popular press outlets, including The Wall Street Journal, The Economist, Quanta Magazine, Wired, and MIT Technology Review. Deepak received his Bachelor’s from IIT Kanpur with a Gold Medal in Computer Science. He founded VisageMap Inc. later acquired by FaceFirst Inc. For details: https://www.cs.cmu.edu/~dpathak/
Host: Chris Atkeson
Point of Contact: Stephanie Matvey (smatvey@andrew.cmu.edu)