Name: Discovering and Erasing Undesired Concepts
Start: 2025-02-24T15:30:00-05:00
End: 2025-02-24T16:30:00-05:00
Location: 3305 Newell-Simon Hall

Niv Cohen Research Scientist New York University

Monday, February 24
3:30 pm to 4:30 pm
3305 Newell-Simon Hall

Discovering and Erasing Undesired Concepts

Abstract:

The rapid growth of generative models allows an ever-increasing variety of capabilities. Yet, these models may also produce undesired content such as unsafe or misleading images, private information, or copyrighted material.

In this talk, I will discuss practical methods to prevent undesired generations. First, I will show how the challenge of avoiding undesired generations manifested itself in a simple Capture-the-Flag LLM setting, where even our top defense strategy was breached. Next, I will demonstrate a similar vulnerability in state-of-the-art concept erasure methods for Text-to-Image models. Finally, I will distinguish between erasure through Guidance-Based Avoidance and Destruction-Based Removal methods. I will discuss the trade-offs of each approach and their behavior in various settings.

Bio:

Niv is a postdoctoral researcher at New York University hosted by Prof. Chinmay Hegde. He received a BSc in mathematics with physics as part of the Technion Excellence Program. He received his PhD in computer science from the Hebrew University of Jerusalem, advised by Prof. Yedid Hoshen. Niv was awarded the Israeli data science scholarship for outstanding postdoctoral fellows (VATAT). He is interested in anomaly detection, representation learning, and AI safety for Vision & Language models.

Homepage: https://nivc.github.io/

VASC Seminar

February

Event Navigation

VASC Seminar

February

Share This Event!

Event Navigation