Inside-out: First Person Vision for Personalized Intelligence - Robotics Institute Carnegie Mellon University
Loading Events

VASC Seminar

April

18
Mon
Jianbo Shi Professor University of Pennsylvania
Monday, April 18
11:00 am to 12:00 pm
Inside-out: First Person Vision for Personalized Intelligence

Event Location: Gates 2109
Bio: Jianbo Shi studied Computer Science and Mathematics as an undergraduate at Cornell University where he received his B.A. in 1994. He received his Ph.D. degree in Computer Science from University of California at Berkeley in 1998. He joined The Robotics Institute at Carnegie Mellon University in 1999 as a research faculty, where he lead the Human Identification at Distance(HumanID) project, developing vision techniques for human identification and activity inference. In 2003 he joined University of Pennsylvania where he is currently a Professor of Computer and Information Science. In 2007, he was awarded the Longuet-Higgins Prize for his work on Normalized Cuts. His current research focuses on first person human behavior analysis and image recognition-segmentation. His other research interests include image/video retrieval, 3D vision, and vision based desktop computing. His long-term interests center around a broader area of machine intelligence, he wishes to develop a “visual thinking” module that allows computers not only to understand the environment around us, but also to achieve cognitive abilities such as machine memory and learning.

Abstract: A first person camera placed at the person’s head captures candid moments in our life, providing detailed visual data of how we interact with people, objects, and scenes. It reveals our future intention and momentary visual sensorimotor behaviors. With the first person vision, can we build a computational model for personalized intelligence that predicts what we see and act by “putting yourself in her/his shoes”?

We provide three examples. (1) At physical level, we predict the wearer’s intent in a form of force and torque that control the movements. Our model integrates visual scene semantics, 3D reconstruction, and inverse optimal control to compute the active force (peddaling and braking while biking) and experienced passive force (gravity, air drag, and friction) in a first person sport video. (2) At spatial scene level, we predict plausible future trajectories of ego-motion. The predicted paths avoid obstacles, move between objects, even turn around a corner into invisible space behind objects. (3) At object level, we study the holistic correlation of visual attention with motor action by introducing “action-objects” associated with seeing and touching actions. Such action-objects exhibit characteristic 3D spatial distance and orientation with respect to the person, which allow us to build a predictive model using EgoNet. We demonstrate that we can predict momentary visual attention and motor actions without gaze tracking and tactile sensing for first person videos.

This is a join work with Hyun Soo Park.