Exploiting Space-Time Statistics of Videos for Face “Hallucination”
Abstract
Face "Hallucination" aims to recover high quality, high-resolution images of human faces from low-resolution, blurred, and degraded images or video. This thesis presents person-specific solutions to this problem through careful exploitation of space (image) and space-time (video) models. The results demonstrate accurate restoration of facial details, with resolution enhancements upto a scaling factor of 16. The algorithms proposed in this thesis follow the analysis-by-synthesis paradigm; they explain the observed (low-resolution) data by fitting a (high-resolution) model. In this context, the first contribution is the discovery of a scaling-induced bias that plagues most model-to-image (or image-to-image) fitting algorithms. It was found that models and observations should be treated asymmetrically, both to formulate an unbiased objective function and to derive an accurate optimization algorithm. This asymmetry is most relevant to Face Hallucination: when applied to the popular Active Appearance Model, it leads to a novel face tracking and reconstruction algorithm that is significantly more accurate than state-of-the-art methods. The analysis also reveals the inherent trade-off between computational efficiency and estimation accuracy in low-resolution regimes. The second contribution is a statistical generative model of face videos. By treating a video as a composition of space-time patches, this model efficiently encodes the temporal dynamics of complex visual phenomena such as eye-blinks and the occlusion or appearance of teeth. The same representation is also used to define a data-driven prior on a three-dimensional Markov Random Field in space and time. Experimental results demonstrate that temporal representation and reasoning about facial expressions improves robustness by regularizing the Face Hallucination problem. The final contribution is an approximate compensation scheme against illumination effects. It is observed that distinct illumination subspaces of a face (each coming from a different pose and expression) still exhibit similar variation with respect to illumination. This motivates augmenting the video model with a low-dimensional illumination subspace, whose parameters are estimated jointly with high-resolution face details. Successful Face Hallucinations beyond the lighting conditions of the training videos are reported.
Video demonstrations available at http://www.cs.cmu.edu/~dedeoglu/thesis/
BibTeX
@phdthesis{Dedeoglu-2007-9700,author = {Goksel Dedeoglu},
title = {Exploiting Space-Time Statistics of Videos for Face "Hallucination"},
year = {2007},
month = {April},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-07-05},
}