10:00 am to 12:00 am
Event Location: NSH 1507
Abstract: Action recognition techniques rely heavily on well chosen features, such as trajectory-based motion descriptors, to make the most of relatively scarce video training data. Typically these features must be hand-selected because the very paucity of suitably annotated data that makes the selection of features critical also restricts the degree to which those features can be directly learned. However, low-quality coarsely annotated data is readily available in the form of tagged videos on websites such as YouTube.com, and public motion capture databases along with simple graphics techniques make possible the generation of vast amounts of plausible (in terms of motion) synthetic videos of human actions. The difficulty lies in taking advantage of these types of data: YouTube clips are coarsely and inconsistently annotated, while synthetic data differs from real data in many way, both overt and subtle. These data sources should not be considered as substitute data for any particular action recognition task, but instead as sources of related tasks in the broader domain of action recognition. This proposal seeks to take advantage of these plentiful related action recognition tasks to improve performance on a target task, by using them to select or generate good feature representations.
While many approaches have attempted to learn or select ‘generally good’ features that perform well when shared across many tasks, we propose instead to use collaborative filtering techniques to use these related tasks to recommend features tailored specifically for the target task. These recommendations are made by evaluating, or rating, a small subset of features on the target task, and then using that small set of ratings along with the ratings of features on a large set of synthetic or coarsely labeled real tasks to predict the ratings of the unevaluated features on the target task. Additionally, we propose a layered bag-of-words representation that enables us to exploit the detailed annotations (pixel-level body part labels) available in synthetic human action data.
Committee:Martial Hebert, Co-chair
Rahul Sukthankar, Co-chair
Yaser Sheikh
Ivan Laptev, INRIA