12/05/2024    Mallory Lindahl

Sheng-Yu Wang, fifth-year Ph.D. at the Carnegie Mellon University Robotics Institute.

Sheng-Yu Wang, fifth-year Ph.D. student at the Carnegie Mellon University Robotics Institute, has received a Google Fellowship for his work in data attribution for text-to-image models. Wang is advised by Jun-Yan Zhu, assistant professor at the Robotics Institute. 

The Google Ph.D. Fellowship Program, established 15 years ago, supports graduate students conducting impressive research in computer science and related fields. Wang’s fellowship, awarded in the “Machine Perception” category, will last until 2026. 

Wang’s research addresses key challenges in text-to-image synthesis. While generative models have made significant advancements, they currently lack effective mechanisms to safeguard the copyright and ownership of data. These models could also be exploited maliciously to spread misinformation. As a result, there are growing anxieties from artists, designers, and the general public. 

To resolve these issues, his research primarily focuses on three areas: attributing synthetic content to the data that influenced it, giving data contributors the ability to remove their data from model training, and developing forensics tools to detect synthetic content generated by various generative visual models. Currently, Wang prioritizes developing methods that can eventually be deployed to large text-to-image models for practical purposes. 

His research introduces methods such as “Attribution by Customization” and “Attribution by Unlearning” which measure how individual training images affect the results of text-to-image models. By selectively adding or removing sample images, these methods generate datasets that reveal which training images have the most significant impact on the model’s outputs. He also collaborated with Nupur Kumari on developing algorithms to remove copyrighted materials from text-to-image generative models.

With the support of the Google Ph.D fellowship, Wang plans to continue to expand data attribution, advance his efforts to empower creators to opt out of model training, and improve synthetic content detection to provide more trust in generative models.

For More Information: Aaron Aupperlee | 412-268-9068 | aaupperlee@cmu.edu