Face Detection, Pose Estimation, and Landmark Localization in the Wild

Conference Paper, Proceedings of (CVPR) Computer Vision and Pattern Recognition, pp. 2879 - 2886, June, 2012

Abstract

We present a unified model for face detection, pose estimation, and landmark estimation in real-world, cluttered images. Our model is based on a mixtures of trees with a shared pool of parts; we model every facial landmark as a part and use global mixtures to capture topological changes due to viewpoint. We show that tree-structured models are surprisingly effective at capturing global elastic deformation, while being easy to optimize unlike dense graph structures. We present extensive results on standard face benchmarks, as well as a new “in the wild” annotated dataset, that suggests our system advances the state-of-the-art, sometimes considerably, for all three tasks. Though our model is modestly trained with hundreds of faces, it compares favorably to commercial systems trained with billions of examples (such as Google Picasa and face.com).

BibTeX

@conference{Zhu-2012-121206,
author = {X. Zhu and D. Ramanan},
title = {Face Detection, Pose Estimation, and Landmark Localization in the Wild},
booktitle = {Proceedings of (CVPR) Computer Vision and Pattern Recognition},
year = {2012},
month = {June},
pages = {2879 - 2886},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.