Understanding Social and Physical Interactions from First Person Cameras - Robotics Institute Carnegie Mellon University
Loading Events

VASC Seminar

August

17
Wed
Hyun Soo Park Assistant Professor, the University of Minnesota University of Minnesota
Wednesday, August 17
3:00 pm to 4:00 pm
Understanding Social and Physical Interactions from First Person Cameras

Event Location: Newell Simon Hall 1507
Bio: Hyun Soo Park is an Assistant Professor at the Department of Computer Science and Engineering, the University of Minnesota. He is interested in understanding human visual sensorimotor behaviors from first person cameras. Prior to the UMN, he was a Postdoctoral Fellow working with Jianbo Shi at University of Pennsylvania. He earned his Ph.D. under the supervision of Yaser Sheikh from Carnegie Mellon University.

Abstract: A first person video records not only what is out in the environment but also what is in our head (intention and attention) at the time via social and physical interactions. It is invisible but it can be revealed by fixation, camera motion, and visual semantics. In this talk, I will present a computational model to decode our intention and attention from first person cameras when interacting with (1) scene and (2) people.

A person exerts his/her intention through applying physical force and torque to scenes and objects, which effects in visual sensation. We leverage the first person visual sensation to precisely compute force and torque that the first person experienced by integrating visual semantics, 3D reconstruction, and inverse optimal control. Such visual sensation also allows us to associate with our past experiences that eventually provide a strong cue to predict future activities. When interacting with other people, social attention is a medium that controls group behaviors, e.g., how they form a group and move. We learn the geometric and visual relationship between group behaviors and social attention measured from first person cameras. Based on the learned relationship, we derive a predictive model to localize social attention from third person view.