Weak Multi-modal Supervision for Object Detection and Persuasive Media - Robotics Institute Carnegie Mellon University
Loading Events

VASC Seminar

November

7
Mon
Adriana Kovashka Associate Professor in Computer Science University of Pittsburgh
Monday, November 7
3:00 pm to 4:00 pm
Newell-Simon Hall 3305
Weak Multi-modal Supervision for Object Detection and Persuasive Media

Abstract:  The diversity of visual content available on the web presents new challenges and opportunities for computer vision models. In this talk, I present our work on learning object detection models from potentially noisy multi-modal data, retrieving complementary content across modalities, transferring reasoning models across dataset boundaries, and recognizing objects in non-photorealistic media.  While the work has applications in common benchmark datasets, the motivation for it stems from a single source: the ability to analyze content in complex persuasive media, such as visual advertisements and political articles.

 

Reasoning about persuasive media challenges state-of-the-art computer vision because extensive human common sense knowledge is needed, including non-factual knowledge, e.g. understanding of symbolism. Second, data is not abundant, and providing all required supervision is not feasible. However, weak supervision can be extracted from loosely aligned complementary modalities. Third, objects may be portrayed in creative, atypical fashion, with each portrayal style being too rare to form a proper domain, thus typical object recognition and domain adaptation methods fail.

 

To tackle these challenges, we first collect a large dataset of advertisements and public service announcements, covering topics ranging from automobiles and clothing, to health and domestic violence. We pose decoding the ads as answering the questions “What should do viewer do, according to the ad” (the suggested action), and “Why should the viewer do the suggested action, according to the ad” (the suggested reason). We show how to effectively use a general-purpose knowledge base to find the correct action-reason statement, and how to cope with shortcut solutions that bypass the need for reasoning, on both our ads dataset, and another visual common-sense dataset. We further investigate the challenges of transferring reasoning knowledge from other datasets.

 

Second, we present an approach for learning to recognize new concepts given supervision only in the form of captions. Captions can be considered “free” supervision, in that humans naturally caption content they upload, and narratives are available in video content. However, such multi-modal data contains little redundancy; the visual and textual channels complement each other, rather than being redundant as in common image-text datasets. We thus develop techniques to retrieve content across complementary modalities, specifically in political articles, and preserve semantics despite visual variability. We also take early steps to clean the supervision available in captions, for training object detection models.

 

Third, we examine the atypical portrayals of objects in persuasive media. We categorize atypicality and develop techniques to leverage spatial and semantic compatibility to detect it. We propose domain generalization techniques to transfer object recognition models to new, atypical styles. Specifically, we focus on shape as a bridge between domains, and show that it is especially effective for domains with sparse texture. Together, our work greatly advances the ability to recognize concepts from real-world multi-modal data, including for reasoning about persuasive media.

 

Bio: Adriana Kovashka is an Associate Professor in Computer Science at the University of Pittsburgh. Her research interests are in computer vision and machine learning. She has authored over twenty publications in top-tier computer vision and artificial intelligence conferences and journals (CVPR, ICCV, ECCV, NeurIPS, AAAI, ACL, TPAMI, IJCV) and over ten second-tier conference publications (BMVC, ACCV, WACV). Her research is funded by the National Science Foundation, Google, Amazon and Adobe. She received the NSF CAREER award in 2021. She has served as an Area Chair for CVPR in 2018-2021 and 2023, NeurIPS 2020, ICLR 2021 and 2023, AAAI 2021-2022, and will serve as co-Program Chair of ICCV 2025. She has been on program committees for over twenty conferences and journals, and has co-organized seven workshops.

 

Homepage:   Adriana Kovashka (pitt.edu)

 

 

Sponsored in part by:   Meta Reality Labs Pittsburgh