TesseTrack: End-to-End Learnable Multi-Person Articulated 3D Pose Tracking

N. Dinesh Reddy, Laurent Guigues, Leonid Pischulini, Jayan Eledath, and Srinivasa Narasimhan

Conference Paper, Proceedings of (CVPR) Computer Vision and Pattern Recognition, June, 2021

View Publication

Abstract

We consider the task of 3D pose estimation and tracking of multiple people seen in an arbitrary number of camera feeds. We propose TesseTrack, a novel top-down approach that simultaneously reasons about multiple individuals’ 3D body joint reconstructions and associations in space and time in a single end-to-end learnable framework. At the core of our approach is a novel spatio-temporal formulation that operates in a common voxelized feature space aggregated from single- or multiple camera views.
After a person detection step, a 4D CNN produces short-term person-specific representations which are then linked across time by a differentiable matcher. The linked descriptions are then merged and deconvolved into 3D poses. This joint spatio-temporal formulation contrasts with previous piece-wise strategies that treat 2D pose estimation, 2D-to-3D lifting, and 3D pose tracking as independent sub-problems that are error-prone when solved in isolation. Furthermore, unlike previous methods, TesseTrack is robust to changes in the number of camera views and achieves very good results even if a single view is available at inference time. Quantitative evaluation of 3D pose reconstruction accuracy on standard benchmarks shows significant improvements over the state of the art. Evaluation of multi-person articulated 3D pose tracking in our novel evaluation framework demonstrates the superiority of TesseTrack over strong baselines.

BibTeX

@conference{Narapureddy-2021-126995,
author = {N. Dinesh Reddy and Laurent Guigues and Leonid Pischulini and Jayan Eledath and Srinivasa Narasimhan},
title = {TesseTrack: End-to-End Learnable Multi-Person Articulated 3D Pose Tracking},
booktitle = {Proceedings of (CVPR) Computer Vision and Pattern Recognition},
year = {2021},
month = {June},
publisher = {IEEE},
keywords = {person tracking, 3D Pose estimation, 4D reconstruction},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.