Name: Audio-Visual Learning for Social Telepresence
Start: 2022-09-12T15:00:00-04:00
End: 2022-09-12T16:00:00-04:00
Location: Newell-Simon Hall 3305

Alexander Richard Research Scientist Reality Labs Research

Monday, September 12
3:00 pm to 4:00 pm
Newell-Simon Hall 3305

Audio-Visual Learning for Social Telepresence

Abstract

Relationships between people are strongly influenced by distance. Even with today’s technology, remote communication is limited to a two-dimensional audio-visual experience and lacks the availability of a shared, three-dimensional space in which people can interact with each other over the distance. Our mission at Reality Labs Research (RLR) in Pittsburgh is to develop such a telepresence system that is indistinguishable from reality, i.e., a system that provides photo- and phono-realistic social interactions in VR. Building such a system requires modeling complex interactions between visual and acoustic signals: the facial expression of an avatar is strongly influenced by the content and tone of their speech, and vice versa, the tone and content of speech is strongly correlated with the facial expression. We demonstrate that these audio-visual relationships can be modeled through a codify-and-resynthesize paradigm for both acoustic and visual outputs, unlocking state-of-the-art systems for face animation and speech enhancement. Further, as avatars can move freely through space, I will talk about novel neural rendering approaches for 3D audio that overcome limitations of traditional sound spatialization. In the future, these technologies will help build a realistic virtual environment with lifelike avatars that allow for authentic social interactions, connecting people all over the world, anywhere and at any time.

Bio

Alexander Richard is a Research Scientist at Reality Labs Research (RLR) in Pittsburgh leading the audio-visual research team. With his team, he concentrates on audio-visual learning to build photo- and phono-realistic immersive experiences in Virtual Reality that enable remote communication indistinguishable from reality. Combining computer vision, machine learning, and audio processing, he develops key technologies for audio-visual lifelike avatars and novel 3D rendering approaches for spatial and binaural audio. Before joining RLR, Alexander was a Speech Scientist at Amazon Alexa in Aachen, Germany. He received his PhD from the University of Bonn for his work on temporal segmentation of human actions in videos.

Homepage: https://alexanderrichard.github.io/

Sponsored in part by: Facebook Reality Labs Pittsburgh

VASC Seminar

September

Event Navigation

VASC Seminar

September

Share This Event!

Event Navigation