Building Scalable Visual Intelligence: From Represention to Understanding and Generation - Robotics Institute Carnegie Mellon University
Loading Events

VASC Seminar

October

21
Mon
Saining Xie Assistant Professor Courant Institute of Mathematical Sciences, New York University
Monday, October 21
3:30 pm to 4:30 pm
3305 Newell-Simon Hall
Building Scalable Visual Intelligence: From Represention to Understanding and Generation

Abstract:

In this talk, we will dive into our recent work on vision-centric generative AI, focusing on how it helps with understanding and creating visual content like images and videos. We’ll cover the latest advances, including multimodal large language models for visual understanding and diffusion transformers for visual generation. We’ll explore how these two areas are closely connected, along with the challenges and opportunities in building powerful and scalable visual intelligence. Plus, we’ll look at why these developments matter, both in practical applications and as key steps toward creating robust visual intelligence that can better understand and interact with the sensory-rich world around us.

Bio:

Saining Xie is an Assistant Professor of Computer Science at the Courant Institute of Mathematical Sciences at New York University and is affiliated with NYU Center for Data Science. He is also a visiting faculty researcher at Google DeepMind. Before joining NYU in 2023, he was a research scientist at FAIR, Meta. In 2018, he received his Ph.D. degree in computer science from the University of California San Diego. He works in computer vision and machine learning, with a particular interest in scalable visual representation learning for multimodal understanding and generation. His work has been recognized with the Marr Prize honorable mention, CVPR best paper finalists and an Amazon research award.

Homepage:  Saining Xie

Sponsored in part by:   Meta Reality Labs Pittsburgh