Name: Alignment for Vision-Language Foundation Model
Start: 2023-12-08T14:00:00-05:00
End: 2023-12-08T15:30:00-05:00
Location: NSH 3305

Zhiqiu Lin PhD Student Robotics Institute,
Carnegie Mellon University

Friday, December 8
2:00 pm to 3:30 pm
NSH 3305

Alignment for Vision-Language Foundation Model

Abstract:
Recent advancements in vision-language foundation models, exemplified by GPT4-Vision and DALL-E 3, have significantly transformed both research and practical applications, ranging from professional assistance to content creation. However, aligning them precisely with specific user goals presents a notable challenge. This thesis introduces innovative strategies for improving this alignment. I will first introduce our novel cross-modal adaptation framework, utilizing textual/audio data to tailor foundational models such as CLIP more effectively to tasks such as visual recognition. Next, I will present an optimization approach based on ChatGPT for automatically aligning popular proprietary (black-box) models, like DALL-E 3, to better meet user needs. Lastly, I will share our latest efforts to assess and enhance model fidelity in target tasks requiring advanced visio-linguistic reasoning over compositions of objects, attributes, and their relations.

Committee:
Prof. Deva Ramanan (advisor)
Prof. Deepak Pathak
Prof. Graham Neubig (LTI)
Mihir Prabhudesai (RI PhD student)

MSR Thesis Defense

December

Event Navigation

MSR Thesis Defense

December

Share This Event!

Event Navigation