3:00 pm to 4:00 pm
Event Location: NSH 1305
Bio: Talk 1: Carl Doersch is a second year PhD student in the Machine Learning Department, studying under Alyosha Efros. His research centers on computer vision and machine learning. He holds a B.S. in computer science and cognitive science from CMU.
Talk 2: Marek Vondrak was born in Prague, Czech Republic. He received his Sc.M. degree in computer science from Charles University, Prague and is currently pursuing a Ph.D. degree at Brown University, Providence, RI. Marek’s research interests include recovery of articulated human motion from video, physical simulation, motion control of humanoids and character animation. His current major focus has concentrated on introducing techniques from computer graphics, robotics and animation to computer vision in order to build effective models of human motion for tracking.
Abstract: Talk 1: Given a large repository of geotagged imagery, we seek to automatically find visual elements, e.g. windows, balconies, and street signs, that are most distinctive for a certain geo-spatial area, for example the city of Paris. This is a tremendously difficult task as the visual features distinguishing architectural elements of different places can be very subtle. In addition, we face a hard search problem: given all possible patches in all images, which of them are both frequently occurring and geographically informative? To address these issues, we propose to use a discriminative clustering approach able to take into account the weak geographic supervision. We show that geographically representative image elements can be discovered automatically from Google Street View imagery in a discriminative manner. We demonstrate that these elements are visually interpretable and perceptually geo-informative. The discovered visual elements can also support a variety of computational geography tasks, such as mapping architectural correspondences and influences within and across cities, finding representative elements at different geo-spatial scales, and geographically-informed image retrieval.
Talk 2: Marker-less motion capture is a challenging problem, particularly when only monocular video is available. We estimate human motion from monocular video by recovering three-dimensional controllers capable of implicitly simulating the observed human behavior and replaying this behavior in other environments and under physical perturbations. Our approach employs a state-space biped controller with a balance feedback mechanism that encodes control as a sequence of simple control tasks. Transitions among these tasks are triggered on time and on proprioceptive events (e.g., contact). Inference takes the form of optimal control where we optimize a high-dimensional vector of control parameters and the structure of the controller based on an objective function that compares the resulting simulated motion with input observations. We illustrate our approach by automatically estimating controllers for a variety of motions directly from monocular video. We show that the estimation of controller structure through incremental optimization and refinement leads to controllers that are more stable and that better approximate the reference motion. We demonstrate our approach by capturing sequences of walking, jumping, and gymnastics.