Human Action Analysis-Understanding the spatio-temporal structure. - Robotics Institute Carnegie Mellon University
Loading Events

VASC Seminar

March

19
Mon
Michalis Raptis Postdoctoral Researcher Disney Research Pittsburgh
Monday, March 19
3:00 pm to 12:00 am
Human Action Analysis-Understanding the spatio-temporal structure.

Event Location: NSH 1305
Bio: Michalis Raptis is a Postdoctoral Researcher in Disney Research, Pittsburgh. He received the M.Sc. and the Ph.D. degree from the Computer Science Department of University of California, Los Angeles (UCLA), in 2008 and 2011 respectively. In 2006, he obtained the diploma of Electrical and Computer Engineering from the National Technical University of Athens, Greece. From 2006 till 2011, he was a graduate research assistant in the UCLA Vision Lab. His research interests are in the broader fields of computer vision, pattern recognition and time series analysis, in particular the application of discriminative approaches to the solution of video analysis tasks.

Abstract: The analysis of activities from video sequences has been one of the most challenging and important problem in the area of computer vision. One of the key challenges in human activity recognition lies in explicitly modeling the spatio-temporal structure of the data. We address this problem by introducing a mid-level representation of video sequences for the purpose of video analysis. From an input video, we extract salient spatio-temporal structures by forming clusters of trajectories, which serve as candidate parts of an action. The assembly of these parts into an action class is governed by a graphical model that incorporates appearance and motion constraints for each part as well as the spatio-temporal dependencies among them. The activities present in a new video sequence are then discerned by a discriminative framework that matches the model of each action to the parts present in the video sequence using a discrete optimization framework. We show the performance of our framework in standard benchmark datasets, and illustrate its potential to support a fine-grained analysis that not only gives a label to a video, but also identifies and localizes its constituent parts.