Articulated Pose Estimation with Tiny Synthetic Videos - Robotics Institute Carnegie Mellon University

Articulated Pose Estimation with Tiny Synthetic Videos

D. Park and D. Ramanan
Workshop Paper, CVPR '15 Workshops: CHA-LEARN Workshop on Looking at People, pp. 58 - 66, June, 2015

Abstract

We address the task of articulated pose estimation from video sequences. We consider an interactive setting where the initial pose is annotated in the first frame. Our system synthesizes a large number of hypothetical scenes with different poses and camera positions by applying geometric deformations to the first frame. We use these synthetic images to generate a custom labeled training set for the video in question. This training data is then used to learn a regressor (for future frames) that predicts joint locations from image data. Notably, our training set is so accurate that nearest-neighbor (NN) matching on low-resolution pixel features works well. As such, we name our underlying representation “tiny synthetic videos”. We present quantitative results the Friends benchmark dataset that suggests our simple approach matches or exceed state-of-the-art.

BibTeX

@workshop{Park-2015-121188,
author = {D. Park and D. Ramanan},
title = {Articulated Pose Estimation with Tiny Synthetic Videos},
booktitle = {Proceedings of CVPR '15 Workshops: CHA-LEARN Workshop on Looking at People},
year = {2015},
month = {June},
pages = {58 - 66},
}