Loading Events

VASC Seminar

December

11
Tue
Xiaofeng Ren Research Assistant Professor Toyota Technological Institute Chicago
Tuesday, December 11
3:30 pm to 12:00 am
Segmentation, Tracking and Recognition: a Visual Trio

Event Location: NSH 1305
Bio: Xiaofeng Ren received his B.S. from Zhejiang University, his M.S. from
Stanford University, and his Ph.D. from the UC Berkeley in 2006. He is
currently a research assistant professor at the Toyota Technological
Institute at Chicago. His research interests lie broadly in the areas of
computer vision. His recent work focuses on mid-level vision, including
contour grouping, segmentation, figure-ground organization, and their
interactions with both low-level image cues and high-level object
knowledge.

Abstract: Segmentation, tracking and object recognition are all fundamental problems
in vision. Largely studied in isolation, they are nonetheless inseparable
aspects of a single visual perception process. In this talk I will discuss
several works that explore synergies between the “big three”.

Segmentation (from bottom-up) is very challenging by itself, and
segmenting objects from background is usually near-impossible — a lot of
semantic knowledge is involved. If we know what object we look for, such
as a horse, it becomes much easier because we can model semantic knowledge
of the object, e.g. shape, and use it to guide segmentation.

In a video setting, object segmentation is possible without a priori
object knowledge. I will discuss a joint paradigm where we track an object
by repeatedly segmenting it from background. Tracking makes segmentation
easy: by tracking an object over time, we may learn about the object,
including its appearance and shape, on-the-fly. This knowledge may be
combined with temporal/motion cues to guide segmentation. On the other
hand, segmentation makes tracking robust: by exploring low-level cues
(e.g. boundary contrast) and mid-level cues (e.g. convexity),
segmentation-based tracking can avoid drifting and tolerate large
variations in object shape, appearance, scale as well as background scene.
I will show results on long sequences of sports video.

A more challenging setting of the problem is to find and track people in
archive films, having to handle crowded scenes as well as poor image
quality. I will discuss a joint detection-tracking approach, where we
track people by linking face detections and switching to low-level
tracking where detection fails. Here, detection makes tracking robust to
variations in appearance, pose or movement. Moreover, tracking establishes
temporal correspondences and allows us to integrate information over time.
I will show that temporal integration greatly improves detection precision
(suppressing isolated false positives) and at the same time boosts recall
(finding people missed by a face detector).