Bridging Generative and Discriminative Learning with Diffusion Models - Robotics Institute Carnegie Mellon University
Loading Events

PhD Thesis Proposal

November

26
Tue
Zhipeng Bao PhD Student Robotics Institute,
Carnegie Mellon University
Tuesday, November 26
12:00 pm to 1:30 pm
GHC 4405
Bridging Generative and Discriminative Learning with Diffusion Models

Abstract:
Generative models have advanced significantly, synthesizing photorealistic images, videos, and text. Building on this progress, our work explores the potential of diffusion models to bridge generative and discriminative learning, uncovering new pathways for leveraging their strengths in visual perception tasks.
In the first part, we propose Diff-2-in-1, a unified framework for multi-modal data generation and dense visual perception. By exploiting the diffusion-denoising process, it enhances visual perception through self-improving learning mechanisms and multi-modal generation, effectively mirroring the training data distribution.

Next, we extend diffusion models to video understanding, analyzing their feature representations in comparison to non-generative, self-supervised approaches. Video diffusion models consistently excel in capturing temporal dynamics and scene structure, outperforming image-based diffusion models and offering a novel direction for video understanding.

Finally, we present REM, a framework for referral video segmentation using text-to-video diffusion models. By preserving generative representations and fine-tuning on narrow-domain datasets, REM achieves state-of-the-art performance on in-domain datasets and significantly outperforms competitors on out-of-domain data.

Looking ahead, we propose two directions: leveraging video diffusion models as RGB visual encoders for robot learning and exploring lightweight fine-tuning for video editing using pretrained image-to-video diffusion models.

Thesis Committee Members:

Martial Hebert, Chair
Deva Ramanan
Jun-Yan Zhu
Alexei Efros, University of California, Berkeley
Yu-Xiong Wang, University of Illinois Urbana-Champaign
Pavel Tokmakov, Toyota Research Institute

More Information