High-Quality GPU-based Deliberative Perception for Object Pose Estimation with RGB data - Robotics Institute Carnegie Mellon University

High-Quality GPU-based Deliberative Perception for Object Pose Estimation with RGB data

Shanshan (Jessy) Xie
Master's Thesis, Tech. Report, CMU-RI-TR-21-11, Robotics Institute, Carnegie Mellon University, May, 2021

Abstract

Known object pose estimation is essential for a robot to interact with the real world. It is the first and fundamental task if the robot wants to manipulate the object. This problem is particularly challenging when the environment is complicated with clutters or the object itself is occluded. Changes in lighting and difficult orientations of the objects also bring challenges to the pose estimation algorithm. Most of the modern approaches need to obtain a large number of training data with accurate ground truth annotations to find the correspondence and output predictions. An alternative is to use a search-based algorithm that finds a pose best explains the scene in all possible rendered poses, which does not require prior knowledge or training except the model of the targeting object. PERCH(PErception Via SeaRCH)[21] is an example that uses depth data to converge to a globally optimal solution by searching over a specific space.

In this work, we propose a PERCH color-only version, a pose estimation algorithm that needs an RGB-only image and the mesh model of the target object. It finds the best explanation for the observed scene by rendering images for all possible poses and evaluating them using a designed cost function that takes into account both image similarity measurement and the rarity for each feature in the scene. The experiment results both from a publicly available dataset and our synthetic dataset show that our algorithm achieves high accuracy, especially in high occlusion scenes without the need for any annotation and training.

BibTeX

@mastersthesis{Xie-2021-127382,
author = {Shanshan (Jessy) Xie},
title = {High-Quality GPU-based Deliberative Perception for Object Pose Estimation with RGB data},
year = {2021},
month = {May},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-21-11},
keywords = {pose estimation, deliberative perception, manipulation},
}