Long-Term Activity Forecasting using First-Person Vision - Robotics Institute Carnegie Mellon University

Long-Term Activity Forecasting using First-Person Vision

Conference Paper, Proceedings of 13th Asian Conference on Computer Vision (ACCV '16), pp. 346 - 360, November, 2016

Abstract

Long-term activity forecasting deals with the problem of predicting how an agent will complete a full activity, defined as a continuous trajectory and a discrete sequence of sub-actions. While previous datadriven methods only dealt with forecasting 2D trajectories, we present a method that leverages common sense prior knowledge and minimal data. In order to forecast the trajectories, we learn a policy function that maps from states to actions the agent should perform next. Through the use of deep reinforcement learning, our method is able to learn a highly non-linear mapping from agent states to actions. We develop the first forecasting framework that uses ego-centric video input, which is an optimal vantage point for understanding human activities over large spaces. Given an annotated first person video sequence for the activity, we construct a 3D point cloud of the environment and activity paths through 3D space. Based on a limited number of examples, we use reinforcement learning to derive a policy for the entire environment, even for areas that have never been visited during the demonstrated examples. We explore the use of deep reinforcement learning to recover a direct mapping from environmental features to best action. Our approach makes it possible to combine a high dimensional continuous state (namely the local point could density surrounding the agent) with a discrete state portion (action stage of an activity) into a single state for behavior forecasting. The result is a policy that generalizes very well from only a few activity samples. We validate our approach on our First-Person Office Behavior Dataset and show that our method of encoding more prior knowledge leads to an increase in forecasting accuracy. We also demonstrate that the deep reinforcement learning approach is able to achieve higher forecasting accuracy than the traditional alternatives.

BibTeX

@conference{Bokhari-2016-109798,
author = {Syed Zahir Bokhari and Kris M. Kitani},
title = {Long-Term Activity Forecasting using First-Person Vision},
booktitle = {Proceedings of 13th Asian Conference on Computer Vision (ACCV '16)},
year = {2016},
month = {November},
pages = {346 - 360},
}