3:00 pm to 4:00 pm
Event Location: Newell Simon Hall 1507
Bio: Yin Li is currently a doctoral candidate in the School of Interactive Computing at the Georgia Institute of Technology. His research interests lie at the intersection of computer vision and mobile health. Specifically, he creates methods and systems to automatically analyze first person videos, known as First Person Vision (FPV). He has particular interests in recognizing the person’s activities and developing FPV for health care applications. He is the co-recipient of the best student paper awards at MobiHealth 2014 and IEEE Face & Gesture 2015. His work had been covered by MIT Tech Review, WIRED UK and New Scientist.
Abstract: Advances in sensor miniaturization, low-power computing, and battery life have enabled the first generation of mainstream wearable cameras. Millions of hours of videos have been captured by these devices, creating a record of our daily visual experiences at an unprecedented scale. This has created a major opportunity to develop new capabilities and products based on First Person Vision (FPV)–the automatic analysis of videos captured from wearable cameras. Meanwhile, vision technology is at a tipping point. Major progress has been made over the last few years in both visual recognition and 3D reconstruction. The stage is set for a grand challenge of activity recognition in FPV. My research focuses on understanding naturalistic daily activities of the camera wearer in FPV to advance both computer vision and mobile health.
In the first part of this talk, I will demonstrate that first person video has the unique property of encoding the intentions and goals of the camera wearer. I will introduce a set of first person visual cues that captures the users’ intent and can be used to predict their point of gaze and the actions they are performing during activities of daily living. Our methods are demonstrated using a benchmark dataset that I helped to create. In the second part, I will describe a novel approach to measure children’s social behaviors during naturalistic face-to-face interactions with an adult partner, who is wearing a camera. I will show that first person video can support fine-grained coding of gaze (differentiating looks to eyes vs. face), which is valuable for autism research. Going further, I will present a method for automatically detecting moments of eye contact. Finally, I will briefly cover my work on cross-modal learning using deep models. This is joint work with Zhefan Ye, Sarah Edmunds, Dr. Alireza Fathi, Dr. Agata Rozga and Dr. Wendy Stone.