3:30 pm to 4:30 pm
3305 Newell-Simon Hall
Abstract:
In this talk, we will dive into our recent work on vision-centric generative AI, focusing on how it helps with understanding and creating visual content like images and videos. We’ll cover the latest advances, including multimodal large language models for visual understanding and diffusion transformers for visual generation. We’ll explore how these two areas are closely connected, along with the challenges and opportunities in building powerful and scalable visual intelligence. Plus, we’ll look at why these developments matter, both in practical applications and as key steps toward creating robust visual intelligence that can better understand and interact with the sensory-rich world around us.
Bio:
Saining Xie is an Assistant Professor of Computer Science at the Courant Institute of Mathematical Sciences at New York University and is affiliated with NYU Center for Data Science. He is also a visiting faculty researcher at Google DeepMind. Before joining NYU in 2023, he was a research scientist at FAIR, Meta. In 2018, he received his Ph.D. degree in computer science from the University of California San Diego. He works in computer vision and machine learning, with a particular interest in scalable visual representation learning for multimodal understanding and generation. His work has been recognized with the Marr Prize honorable mention, CVPR best paper finalists and an Amazon research award.
Homepage: Saining Xie
Sponsored in part by: Meta Reality Labs Pittsburgh