Abstract:
In this thesis, we aim to build photorealistic animatable avatars of humans wearing complex clothing in a data-driven manner. Such avatars will be a critical technology to enable future applications such as immersive telepresence in Virtual Reality (VR) and Augmented Reality (AR). Existing full-body avatars that jointly model geometry and view-dependent texture using Variational Autoencoders (VAEs) can be efficiently animated and rendered from arbitrary viewpoints, but they only work with limited tight garments such as T-shirts and pants. Loose-fitting clothing, however, does not closely follow the body motion and has a much larger deformation space. Most styles of clothing pose a significant challenge in various aspects of the existing frameworks for avatar modeling, including tracking, animation and rendering.
This thesis builds a systematic solution for modeling dynamic clothing for data-driven photorealistic avatars. As opposed to the single-layer representation of existing full-body avatars, where the clothing is treated as a residual deformation on top of the human body, we utilize a separate representation of clothing that allows modeling in a finer granularity. We address the challenge by unifying three components of avatar modeling: model-based statistical prior from pre-captured data, physics-based prior from simulation, and real-time measurement from sparse sensor input.
In the first work, we introduce a separate two-layer representation for body and clothing in animatable full-body avatars. This separation allows us to disentangle the dynamics between the pose-driven body part and temporally-dependent clothing part, which leads to much higher overall quality results. In addition, this formulation also enables photorealistic editing of clothing color and texture in a temporally coherent manner. In the second work, we further combine physics-based cloth simulation with the photorealistic avatars, which can generate rich and natural dynamics even for loose clothing such as a skirt or a dress. We develop a physics-inspired neural rendering model to bridge the generalization gap between the training data from registered captured clothing and the test data from simulated clothing. This approach further allows animating a captured garment on the body of a novel avatar without the need for the person to wear the clothing in reality, thus opening up the possibility of photorealistic virtual try-on. After that, we would like to go beyond pose-driven animation of clothing, and incorporate denser sensor input to achieve more faithful telepresence of clothing. We first investigate a simpler setting, where we build a linear clothing model to capture clothing deformation in a temporally coherent manner from a monocular RGB video input. Finally, we develop a two-stage method to faithfully drive photorealistic avatars with loose clothing using several RGB-D cameras. We first coarsely track the clothing surface online to produce texel-aligned unwrapped image and geometric features in the UV space. This sensor-based conditioning input is then fed to the avatars to reproduce the clothing appearance from an arbitrary viewpoint. We demonstrate that such avatars can not only be driven in the original capture environment, but also a novel environment with different illumination and background with several RGB-D cameras.
Thesis Committee Members:
Jessica K. Hodgins, Chair
Fernando De la Torre
Matthew P. O’Toole
Chenglei Wu, Google
Niloy J. Mitra, University College London, Adobe