Loading Events

VASC Seminar

May

9
Mon
Yezhou Yang Postdoctoral Research Associate University of Maryland, Institute for Advanced Computer Studies
Monday, May 9
3:00 pm to 4:00 pm
Human Manipulation Action Understanding for Cognitive Robots

Event Location: Newell Simon Hall 1507
Bio: Dr. Yezhou Yang is a Postdoctoral Research Associate at the Computer Vision Lab and the Automation, Robotics and Cognition (ARC) Lab, with the University of Maryland Institute for Advanced Computer Studies, working with his PhD advisors: Prof. Yiannis Aloimonos and Dr. Cornelia Fermuller. His main interests lie in Cognitive Robotics, Computer Vision and Robot Vision, especially exploring visual primitives in human action understanding from visual input, grounding them by natural language as well as high-level reasoning over the primitives for intelligent robots. He was a recipient of the Qualcomm Innovation Fellowship 2011, the UMD CS Department Dean’s Fellowship award and the Microsoft Research Asia Young Researcher Scholarship 2009. He received a B.A. in Computer Science from Zhejiang University in 2010, and a Ph.D. in Computer Science from the University of Maryland, College Park in 2015.

Abstract: Modern intelligent agents will need to learn the manipulation actions that humans perform. They will need to recognize these actions when they see them and they will need to perform these actions themselves. The lesson from the findings on mirror neurons is that the two processes of interpreting visually observed action and generating actions, should share the same underlying cognitive process. The talk will present a cognitive system that interprets human manipulation actions from perceptual information (image and depth data) and consists of perceptual modules and reasoning modules that are in interaction with each other. The talk focuses on two core problems at the heart of manipulation action understanding: a) the grounding of relevant information about actions in perception (the perception – action integration problem), and b) the organization of perceptual and high-level symbolic information for interpreting the actions (the sequencing problem). At the high level, actions are represented with the Manipulation Action Context-free Grammar (MACFG), a syntactic grammar and associated parsing algorithms, which organizes actions as a sequence of sub-events. Each sub-event is described by the hand (as well as grasp type), movements (actions) and the objects and tools involved, and the relevant information about these quantities is obtained from biologically inspired perception modules. These modules recognize the hand grasp, manipulation action consequences and object-wise spatial primitives. Furthermore, a probabilistic semantic parsing framework based on CCG (Combinatory Categorial Grammar) theory is adopted to model the semantic meaning of human manipulation actions. Analogically, understanding manipulation actions is like understanding language, while executing them is like generating language. Experiments on two tasks, 1) a robot observing people performing manipulation actions, and 2) a robot then executing manipulation actions accordingly, are conducted to validate the formalism.