SilhoNet: An RGB Method for 6D Object Pose Estimation - Robotics Institute Carnegie Mellon University

SilhoNet: An RGB Method for 6D Object Pose Estimation

Gideon Billings and M. Johnson-Roberson
Journal Article, IEEE Robotics and Automation Letters, Vol. 4, No. 4, pp. 3727 - 3734, October, 2019

Abstract

Autonomous robot manipulation involves estimating the translation and orientation of the object to be manipulated as a 6-degree-of-freedom (6D) pose. Methods using RGB-D data have shown great success in solving this problem. However, there are situations where cost constraints or the working environment may limit the use of RGB-D sensors. When limited to monocular camera data only, the problem of object pose estimation is very challenging. In this letter, we introduce a novel method called SilhoNet that predicts 6D object pose from monocular images. We use a convolutional neural network pipeline that takes in region of interest proposals to simultaneously predict an intermediate silhouette representation for objects with an associated occlusion mask and a 3D translation vector. The 3D orientation is then regressed from the predicted silhouettes. We show that our method achieves better overall performance on the YCB-Video dataset than two networks for 6D pose estimation from monocular image input.

BibTeX

@article{Billings-2019-130139,
author = {Gideon Billings and M. Johnson-Roberson},
title = {SilhoNet: An RGB Method for 6D Object Pose Estimation},
journal = {IEEE Robotics and Automation Letters},
year = {2019},
month = {October},
volume = {4},
number = {4},
pages = {3727 - 3734},
}