Understanding Videos, Constructing Plots: Learning a Visually Grounded Storyline Model from Annotated Videos - Robotics Institute Carnegie Mellon University

Understanding Videos, Constructing Plots: Learning a Visually Grounded Storyline Model from Annotated Videos

Abhinav Gupta, Praveen Srinivasan, Jianbo Shi, and Larry S. Davis
Conference Paper, Proceedings of (CVPR) Computer Vision and Pattern Recognition, pp. 2012 - 2019, June, 2009

Abstract

Analyzing videos of human activities involves not only recognizing actions (typically based on their appearances), but also determining the story/plot of the video. The storyline of a video describes causal relationships between actions. Beyond recognition of individual actions, discovering causal relationships helps to better understand the semantic meaning of the activities. We present an approach to learn a visually grounded storyline model of videos directly from weakly labeled data. The storyline model is represented as an AND-OR graph, a structure that can compactly encode storyline variation across videos. The edges in the AND-OR graph correspond to causal relationships which are represented in terms of spatio-temporal constraints. We formulate an Integer Programming framework for action recognition and storyline extraction using the storyline model and visual groundings learned from training data.

BibTeX

@conference{Gupta-2009-113366,
author = {Abhinav Gupta and Praveen Srinivasan and Jianbo Shi and Larry S. Davis},
title = {Understanding Videos, Constructing Plots: Learning a Visually Grounded Storyline Model from Annotated Videos},
booktitle = {Proceedings of (CVPR) Computer Vision and Pattern Recognition},
year = {2009},
month = {June},
pages = {2012 - 2019},
}