Towards Equitable Representation in Text-to-Image Generation - Robotics Institute Carnegie Mellon University
Loading Events

MSR Thesis Defense

April

23
Tue
Zhixuan Liu PhD Student Robotics Institute,
Carnegie Mellon University
Tuesday, April 23
12:00 pm to 1:30 pm
Gates Hillman Center 4405
Towards Equitable Representation in Text-to-Image Generation

Abstract:

Accurate representation in media is known to improve the well-being of the people who consume it. There is a growing concern about the increasing use of generative AI in media as the generative image models trained on large web-crawled datasets such as LAION are known to produce images with harmful stereotypes and misrepresentations of various groups, including underrepresented cultures. It is infeasible to collect a sufficiently large dataset of representative, highly curated data to retrain a model, such as Stable Diffusion, from scratch. We improve inclusive representation in generated images by (1) engaging with communities to collect a culturally representative dataset that we call the Cross-Cultural Understanding Benchmark (CCUB) dataset and (2) proposing a novel Self-Contrastive Fine-Tuning (SCoFT) method that leverages the model’s known biases to self-improve. SCoFT is designed to prevent overfitting on small datasets, encode only high-level information from the data, and shift the generated distribution away from misrepresentations encoded in a pre-trained model. We evaluate our method with participants who are personally familiar with the cultures in the CCUB dataset. Our findings indicate that fine-tuning on CCUB decreases offensiveness and increases the cultural representation of generated images, a trend further enhanced by our proposed Self-Contrastive Fine-Tuning method. Additionally, we show that our models produce a greater diversity of generated images.

Committee:

Prof. Jean Oh (advisor)

Dr. Ji Zhang

Prof. Jun-Yan Zhu

Peter Schaldenbrand