Babies learn about their world by pushing and poking objects, putting them in their mouths and throwing them. Carnegie Mellon University scientists are taking a similar approach to teach robots how to recognize and grasp objects around them.
Manipulation remains a major challenge for robots and has become a bottleneck for many applications. But researchers at CMU’s Robotics Institute have shown that by allowing robots to spend hundreds of hours poking, grabbing and otherwise physically interacting with a variety of objects, those robots can teach themselves how to pick up objects.
In their latest findings, presented last fall at the European Conference on Computer Vision, they showed that robots gained a deeper visual understanding of objects when they were able to manipulate them.
The researchers, led by Abhinav Gupta, assistant professor of robotics, are now scaling up this approach, with help from a three-year, $1.5 million “focused research award” from Google.
“We will use dozens of different robots, including one- and two-armed robots and even drones, to learn about the world and actions that can be performed in the world,” Gupta said. “The cost of robots has come down significantly in recent years, enabling us to unleash lots of robots to collect an unprecedented amount of data on physical interactions.”
Gupta said the shortcomings of previous approaches to robot manipulation were apparent during the Defense Advanced Research Projects Agency’s Robotics Challenge in 2015. Some of the world’s most advanced robots, designed to respond to natural or manmade emergencies, had difficulty with tasks such as opening doors or unplugging and re-plugging an electrical cable.
“Our robots still cannot understand what they see and their action and manipulation capabilities pale in comparison to those of a two-year-old,” Gupta said.
For decades, visual perception and robotic control have been studied separately. Visual perception developed with little consideration of physical interaction, and most manipulation and planning frameworks can’t cope with perception failures. Gupta predicts that by allowing the robot to explore perception and action simultaneously, like a baby, can help overcome these failures.
“Psychological studies have shown that if people can’t affect what they see, their visual understanding of that scene is limited,” said Lerrel Pinto, a Ph.D. student in robotics in Gupta’s research group. “Interaction with the real world exposes a lot of visual dynamics.”
Robots are slow learners, however, requiring hundreds of hours of interaction to learn how to pick up objects. And because robots have previously been expensive and often unreliable, researchers relying on this data-driven approach have long suffered from “data starvation.”
Scaling up the learning process will help address this data shortage. Pinto said much of the work by the CMU group has been done using a two-armed Baxter robot with a simple, two-fingered manipulator. Using more and different robots, including those with more sophisticated hands, will enrich manipulation databases.
Meanwhile, the success of this research approach has inspired other research groups in academia and by Google’s own array of robots to adopt this approach and help expand databases even further.
“If you can get the data faster, you can try a lot more things — different software frameworks, different algorithms,” Pinto said. And once one robot learns something, it can be shared with all robots.
In addition to Gupta and Pinto, the research team for the Google-funded project includes Martial Hebert, director of the Robotics Institute; Deva Ramanan, associate professor of robotics; and Ruslan Salakhutdinov, associate professor of machine learning and director of artificial intelligence research at Apple. The Office of Naval Research and the National Science Foundation also sponsor this research.