Efficient Multi-View Object Recognition and Full Pose Estimation
Abstract
We present an approach for efficiently recognizing all objects in a scene and estimating their full pose from multiple views. Our approach builds upon a state of the art single-view algorithm which recognizes and registers learned metric 3D models using local descriptors. We extend to multiple views using a novel multi-step optimization that processes each view individually and feeds consistent hypotheses back to the algorithm for global refinement. We demonstrate that our method produces results comparable to the theoretical optimum, a full multi-view generalized camera approach, while avoiding its combinatorial time complexity. We provide experimental results demonstrating pose accuracy, speed, and robustness to model error using a three-camera rig, as well as a physical implementation of the pose output being used by an autonomous robot executing grasps in highly cluttered scenes.
Please see the accompanying video at http://www.youtube.com/watch?v=ZNHRH00UMvk
BibTeX
@conference{Romea-2010-10439,author = {Alvaro Collet Romea and Siddhartha Srinivasa},
title = {Efficient Multi-View Object Recognition and Full Pose Estimation},
booktitle = {Proceedings of (ICRA) International Conference on Robotics and Automation},
year = {2010},
month = {May},
pages = {2050 - 2055},
}