Carnegie Mellon University
Abstract:
As robots become more common in our daily lives, they will need to interact with many different environments and countless types of objects. While we, as humans, can easily understand an object after seeing it only once, this task is not trivial for robots. Researchers have, for the most part, been left with two choices. The first is limiting their robots to interacting with a fixed set of known objects, fitting structured representations to fully capture the texture and geometry of those objects. In doing so, the robots are able to accurately detect, localize and perform complex interactions with those objects, but are unable to extend this understanding to new objects that they may encounter. The second option is to give up the notion of objects, learning dense representations that allow them to grasp arbitrary surfaces or densely track geometry or texture. But, in giving up the concept of objects, they also lose some ability to plan and predict complex interactions.
In this thesis we wish to explore different means of learning object and environmental representations. We characterize and model the inherent uncertainty of these representations, thus allowing robots to understand what they do and do know about an object. We focus specifically on representations that can generalize to new objects, not previously encountered by the robot. Using self-supervised learning, we allow our representations to adapt to new environments.
Building from our understanding of existing representations, we propose a self-supervised method for learning local structured object representations, in the form of oriented keypoints, as well as means of understanding how their interactions evolve over time with respect to a downstream task. We will explore how these learned relationships can then be used by a residual policy to replicate the demonstrated task. We seek to further understand both the uncertainty in these representations, as well as the effects this uncertainty has on task success. We hope that this will allow robots to more easily interact with novel objects and complete new objectives given only a limited number of demonstrations.
Thesis Committee Members:
David Held, Co-chair
Martial Hebert, Co-chair
Oliver Kroemer
Silvio Savarese, Stanford University