Three 15-minute talks are scheduled to showcase the AR/VR research conducted at CMU. Below, you’ll find the speaker names, talk titles, and abstracts.
Speaker: Yehonathan Litman
Title: StableMaterial: Lighting aware Texture & Material Diffusion Prior for Inverse Rendering
Abstract: Recent works in inverse rendering have shown promise in using multi-view images of an object to reconstruct and recover a 3D mesh along with its underlying material and albedo representation. However, in practice, the appearance of a reconstructed object is subpar under novel relighting. This is due to the inherent ambiguity of decoupling albedo and material from input images, as multiple possibilities can be used to reproduce the pixel appearance. We improve a typical 3D inverse pipeline flow by integrating a 2D diffusion prior to estimate the most probable texture and material under an SDS loss formulation. After training on a large scale curated dataset of high quality objects conditioned on novel relighting appearances, we show improved results in terms of appearance when a reconstructed object is relit under novel lighting condition.
Speaker: Alexander Wang
Title: MARingBA: Music-Adaptive Ringtones for Blended Audio Notification Delivery
Abstract: Audio notifications provide users with an efficient way to access information beyond their current focus of attention. Current notification delivery methods, like phone ringtones, are primarily optimized for high noticeability, enhancing situational awareness in some scenarios but causing disruption and annoyance in others. In this work, we build on the observation that music listening is now a commonplace practice and present MARIngBA, a novel approach that blends ringtones into background music to modulate their noticeability. We contribute a design space exploration and evaluation of music-adaptive manipulation parameters.
Speaker: Zhenyi Luo
Title: Real-Time Simulated Avatar from Head-Mounted Sensors
Abstract: We present SimXR, a method for controlling a simulated avatar from information (headset pose and cameras) obtained from AR / VR headsets. Due to the challenging viewpoint of head-mounted cameras, the human body is often clipped out of view, making traditional image-based egocentric pose estimation challenging. On the other hand, headset poses provide valuable information about overall body motion, but lack fine-grained details about the hands and feet. To synergize headset poses with cameras, we control a humanoid to track headset movement while analyzing input images to decide body movement. When body parts are seen, the movements of hands and feet will be guided by the images; when unseen, the laws of physics guide the controller to generate plausible motion. We design an end-to-end method that does not rely on any intermediate representations and learns to directly map from images and headset poses to humanoid control signals.
Sponsored in part by: Meta Reality Labs Pittsburgh