VR facial animation via multiview image translation - Robotics Institute Carnegie Mellon University
Loading Events

VASC Seminar

October

21
Mon
Shih-En Wei Research Scientist Facebook Reality Labs
Monday, October 21
3:00 pm to 3:30 pm
GHC 6501
VR facial animation via multiview image translation

Abstract:  A key promise of Virtual Reality (VR) is the possibility of remote social interaction that is more immersive than any prior telecommunication media. However, existing social VR experiences are mediated by inauthentic digital representations of the user (i.e., stylized avatars). These stylized representations have limited the adoption of social VR applications in precisely those cases where immersion is most necessary (e.g., professional interactions and intimate conversations). In this work, we present a bidirectional system that can animate avatar heads of both users’ full likeness using consumer-friendly headset mounted cameras (HMC). There are two main challenges in doing this: unaccommodating camera views and the image-to-avatar domain gap. We address both challenges by leveraging constraints imposed by multiview geometry to establish precise image-to-avatar correspondence, which are then used to learn an end-to-end model for real-time tracking. We present designs for a training HMC, aimed at data-collection and model building, and a tracking HMC for use during interactions in VR. Correspondence between the avatar and the HMC-acquired images are automatically found through self-supervised multiview image translation, which does not require manual annotation or one-to-one correspondence between domains. We evaluate the system on a variety of users and demonstrate significant improvements over prior work.

 

Bio:  Shih-En Wei is currently a research scientist in Facebook Reality Labs (FRL), Pittsburgh. In FRL, he has been working on human-related computer vision problems including realtime face and eye tracking to enable authentic telepresence in VR. His interests also include inverse rendering to establish correspondence between graphical representations and realtime sensor data across modality gaps. Prior to FRL, he received the M.S. degree from the Robotics Institute, Carnegie Mellon University, Pittsburgh. In CMU, he worked on realtime body pose estimation from monocular images. More prior to that, he received his B.S. degree in Electrical Engineering and M.S. degree in Communication Engineering from National Taiwan University, Taipei, Taiwan.