Multimodal Analysis, Recognition and Synthesis of Expressive Human Behaviors - Robotics Institute Carnegie Mellon University
Loading Events

VASC Seminar

June

30
Mon
Carlos Busso Assistant Professor The University of Texas
Monday, June 30
3:00 pm to 4:00 pm
Multimodal Analysis, Recognition and Synthesis of Expressive Human Behaviors

Event Location: NSH 1507
Bio: Carlos Busso is an Assistant Professor at the Electrical Engineering Department of The University of Texas at Dallas (UTD). He received his B.S (2000) and M.S (2003) degrees with high honors in electrical engineering from University of Chile, Santiago, Chile, and his Ph.D (2008) in electrical engineering from University of Southern California (USC), Los Angeles, USA. He was selected by the School of Engineering of Chile as the best Electrical Engineer graduated in 2003 across Chilean universities. At USC, he received a Provost Doctoral Fellowship from 2003 to 2005 and a Fellowship in Digital Scholarship from 2007 to 2008. At UTD, he leads the Multimodal Signal Processing (MSP) laboratory [http://msp.utdallas.edu]. He received the Hewlett Packard Best Paper Award at the IEEE ICME 2011 (with J. Jain). He is the co-author of the winner paper of the Classifier Sub-Challenge event at the Interspeech 2009 emotion challenge. His research interests are in digital signal processing, speech and video processing, and multimodal interfaces. His current research includes the broad areas of affective computing, multimodal human-machine interfaces, modeling and synthesis of verbal and nonverbal behaviors, sensing human interaction, in-vehicle active safety system, and machine learning methods for multimodal processing.

Abstract: The verbal and non-verbal channels of human communication are internally and intricately connected. As a result, gestures and speech present high levels of correlation and coordination. This relationship is greatly affected by the linguistic and emotional content of the message being communicated. The interplay is observed across the different communication channels including various aspects of speech, facial expressions, and movements of the hands, head and body. Understanding and modeling this complex interplay has direct implications in the recognition and synthesis of expressive human behaviors. For recognition, building robust emotional models requires careful considerations to compensate for the effect of the variabilities introduced by the lexical and speaker information. This presentation will discuss strategies to normalize acoustic and facial features, improving the performance of the system. We quantify the dependency between facial or acoustic features and communication traits (i.e., speaker, lexical and emotional factors). This metric estimates the uncertainty reduction in the trajectory models when a given trait is considered. The analysis provides important insights on the dependency between the features and the aforementioned factors. Likewise, we present neutral reference models built with functional data analysis to contrast expressive behaviors on the fundamental frequency. For synthesis, we generate expressive behaviors by modeling interrelation between speech and head and eyebrow motion. We propose to synthesize natural head motion and eyebrow sequences from acoustic prosodic features by sampling from trained Dynamic Bayesian Networks (DBNs). Our comparison experiments show that the synthesized head motions produce conversational agents with human-like behaviors that are tightly coupled with speech.