Automatic detection of human affective behavior in dyadic conversations
Abstract
Within the past decade, major strides have been made in automatic emotion detection. Most research has focused on frame-level detection of emotion or facial action descriptors (i.e. action units in Facial Action Coding System). More recently, attention has focused on prediction of session-level descriptors, such as depression severity, from automated analysis of emotion. This thesis addresses two challenges. One is the detection of emotion descriptors when unknown latency exists between the onset of an event and its time stamp. Latency of this type occurs when continuous manual annotation is performed without stopping and reviewing video to determine onsets and offsets with temporal precision. This problem has been addressed to a limited extent in the continuous annotation of valence and arousal but never before for coding multiple categorical descriptors (e.g., happy, angry, sad). The second challenge is the detection of session-level characteristics (e.g., gender) from video. Session- level descriptors provide a unique challenge for machine learning because the total amount of data per person is limited and at the same time each individual data (video) is relatively long (average of 20 mins in our data). This is challenging as temporal models such as Long short-term memory (LSTM) are poorly suited to long videos. To address these challenges we pursue both hand-crafted and deep approaches.
BibTeX
@mastersthesis{Jasani-2019-117146,author = {Bhavan Jasani},
title = {Automatic detection of human affective behavior in dyadic conversations},
year = {2019},
month = {August},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-19-53},
keywords = {Human affective behaviour, dyadic conversation, behaviour science, computer vision, machine learning},
}