Carnegie Mellon’s Winning Strategy Speeds Up Robotic Searches
PITTSBURGH—A robot travelling from point A to point B is more efficient if it understands that point A is the living room couch and point B is a refrigerator, even if it’s in an unfamiliar place. That’s the common sense idea behind a “semantic” navigation system developed by Carnegie Mellon University and Facebook AI Research (FAIR).
That navigation system, called SemExp, last month won the Habitat ObjectNav Challenge during the virtual Computer Vision and Pattern Recognition conference, edging a team from Samsung Research China. It was the second consecutive first-place finish for the CMU team in the annual challenge.
SemExp, or Goal-Oriented Semantic Exploration, uses machine learning to train a robot to recognize objects — knowing the difference between a kitchen table and an end table, for instance — and to understand where in a home such objects are likely to be found. This enables the system to think strategically about how to search for something, said Devendra S. Chaplot, a Ph.D. student in CMU’s Machine Learning Department.
“Common sense says that if you’re looking for a refrigerator, you’d better go to the kitchen,” Chaplot said. Classical robotic navigation systems, by contrast, explore a space by building a map showing obstacles. The robot eventually gets to where it needs to go, but the route can be circuitous.
Previous attempts to use machine learning to train semantic navigation systems have been hampered because they tend to memorize objects and their locations in specific environments. Not only are these environments complex, but the system often has difficulty generalizing what it has learned to different environments.
Chaplot — working with FAIR’s Dhiraj Gandhi, along with Abhinav Gupta, associate professor in the Robotics Institute, and Ruslan Salakhutdinov, professor in the Machine Learning Department — sidestepped that problem by making SemExp a modular system.
The system uses its semantic insights to determine the best places to look for a specific object, Chaplot said. “Once you decide where to go, you can just use classical planning to get you there.”
This modular approach turns out to be efficient in several ways. The learning process can concentrate on relationships between objects and room layouts, rather than also learning route planning. The semantic reasoning determines the most efficient search strategy. Finally, classical navigation planning gets the robot where it needs to go as quickly as possible.
Semantic navigation ultimately will make it easier for people to interact with robots, enabling them to simply tell the robot to fetch an item in a particular place, or give it directions such as “go to the second door on the left.”