Weakly supervised learning from images and videos - Robotics Institute Carnegie Mellon University
Loading Events

VASC Seminar

May

27
Wed
Cordelia Schmid INRIA Research Director INRIA
Wednesday, May 27
11:00 am to 12:00 am
Weakly supervised learning from images and videos

Event Location: NSH 1305
Bio: Cordelia Schmid holds a M.S. degree in Computer Science from the University of Karlsruhe and a Doctorate, also in Computer Science, from the Institut National Polytechnique de Grenoble (INPG). Her doctoral thesis on “Local Greyvalue Invariants for Image Matching and Retrieval” received the best thesis award from INPG in 1996. She received the Habilitation degree in 2001 for her thesis entitled “From Image Matching to Learning Visual Models”. Dr. Schmid was a post-doctoral research assistant in the Robotics Research Group of Oxford University in 1996–1997. Since 1997 she has held a permanent research position at INRIA Rhone-Alpes, where she is a research director and directs the INRIA team called LEAR for LEArning and Recognition in Vision. Dr. Schmid is the author of over a hundred technical publications. She has been an Associate Editor for IEEE PAMI (2001–2005) and for IJCV (2004–2012), editor-in-chief for IJCV (2013—), a program chair of IEEE CVPR 2005 and ECCV 2012 as well as a general chair of IEEE CVPR 2015. In 2006 and 2014, she was awarded the Longuet-Higgins prize for fundamental contributions in computer vision that have withstood the test of time. She is a fellow of IEEE. In 2013, she was awarded an ERC advanced grant on “Active Large-scale LEarninG for visual RecOgnition”

Abstract: With the amount of on-line available digital content growing daily, large-scale, weakly supervised learning is becoming more and more important. In this talk we present some recent results for weakly supervised learning from images and videos.

Standard approaches to object category localization require bounding box annotations of object instances. This time-consuming annotation process is sidestepped in weakly supervised learning, where the annotation is restricted to binary labels that indicate the absence/ presence of object instances in the image. Our main contribution is a multi-fold multiple instance learning procedure, which prevents training from prematurely locking onto erroneous object locations. We also propose a window refinement method, which improves the localization accuracy by incorporating an objectness prior.

We, then, show how to move towards unsupervised discovery and localization of dominant objects from a noisy image collection of multiple object classes. The setting of this problem is fully unsupervised, without even image-level annotations or any assumption of a single dominant class. We tackle the discovery and localization problem using a part-based matching approach considering both appearance similarity and spatial consistency of candidate regions. Dominant objects are discovered and localized by comparing the scores of candidate regions and selecting those that stand out over other regions containing them.

Finally, we present work on learning object detectors from realworld web videos known only to contain objects of a target class. We propose a fully automatic pipeline that localizes objects in a set of videos of the class and learns a detector for it. The approach extracts candidate spatio-temporal tubes based on motion segmentation and then selects one tube per video jointly over all videos.