6D Object Pose Estimation for Manipulation via Weak Supervision
Abstract
6D object pose estimation is essential for robotic manipulation tasks. Existing learning-based pose estimators often rely on training from labeled absolute poses with fixed object canonical frames, which 1) requires datasets with annotations of object absolute pose that are resource-intensive to collect; 2) is hard to generalize to novel configurations and unseen objects. Instead, we propose to investigate the utilization of relative poses between: i) a single object in different orientations; ii) pairs of interacting objects in manipulation tasks. In this thesis, we show that by using relative poses as weak supervision, we can achieve better label-efficiency and generalizability to novel object configurations and unseen objects.
In the first part of this thesis, we investigate the problem of learning an image-based object pose estimator self-supervised by relative object poses. However, local rotation averaging problems can be difficult to optimize in training due to the closed nature of the rotational manifold of SO(3). To tackle this, we propose a new algorithm that utilizes Modified Rodrigues Parameters to stereographically project 3D rotations from the closed manifold of SO(3) to the open manifold of R^3 allowing optimization to be done on an open manifold improving the convergence speed. Empirically, we show that the proposed algorithm is able to converge to a consistent relative orientation frame much faster than algorithms that purely operate in the SO(3) space, and subsequently enabling training pose estimators self-supervised by relative poses.
In the second part, we study the problem of learning task-specific relative pose between interacting objects to solve manipulation tasks. For example, hanging a mug on a rack requires us to reason about relative pose between objects. We conjecture that the relative pose between objects is a generalizable notion of a manipulation task that can transfer to new objects in the same category. We define this as "cross-pose", and propose a vision-based method that learns to estimate it for a variety of manipulation tasks. Finally, we empirically show that our system is able to generalize to unseen objects in both simulation and the real world from very few demonstrations.
BibTeX
@mastersthesis{Pan-2022-133163,author = {Chuer Pan},
title = {6D Object Pose Estimation for Manipulation via Weak Supervision},
year = {2022},
month = {August},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-22-36},
keywords = {Robot Learning, Computer Vision, Pose Estimation, Manipulation, Self-Supervision, Rotation Averaging, Rotation Parameterization, Learning from Demonstration, 3D Visual Learning},
}