1:30 pm to 12:00 am
Event Location: NSH 1507
Abstract: Facial actions speak louder than words. Facial actions can reveal a person’s emotion, intention, and physical state; and make possible a range of applications that include market research, human-robot interaction, drowsiness detection, and clinical and developmental psychology research. In this proposal, we investigate both supervised and unsupervised approaches to facial action discovery.
Supervised approaches seek to train and validate classifiers for facial action detection. This task is challenging, in part, for two major reasons. First, classifiers must generalize to previously unknown subjects that may differ markedly in behavior, facial morphology, and the recording environment. To address this problem, we propose Selective Transfer Machine (STM), a transductive learning method that personalizes generic classifiers for facial expression analysis. By personalizing the classifier, STM is able to generalize better than state-of-the-art approaches to unseen subjects. In addition, the STM framework can incorporate partly labeled data from a test subject.
Second, supervised learning typically uses hand-crafted, a priori features, such as Gabor, HOG and SIFT together with independent approaches to classifier training (eg, SVM). Recent research suggests that an alternative approach to feature selection integrated with an alternative learning paradigm (Deep Learning) may provide superior performance and greatly reduce or eliminate the problem of domain transfer. Given a renewed number of more than 0.5 million frames annotated in our GFT and BP4D+ datasets, this is the golden era for this exploration.
This thesis will test the hypothesis that Deep Learning enables greater accuracy relative to baseline SVMs and domain transfer approaches, normalizing by the number of independent parameters.
A major limitation of supervised approaches, including Deep Learning, is the collection of annotations, which can be time-consuming, error-prone, and limited to detection phenomena described in other contexts by observers. We explore for the first time, the use of unsupervised approaches for facial action discovery. In particular, we introduce the Common Event Discovery (CED) problem, which, in an unsupervised manner, discovers correlated facial actions from a set of videos. An exhaustive approach to find such facial actions has a quartic complexity in the length of videos, and thus impractical. This thesis proposes an efficient branch-and-bound (B&B) method that guarantees a global optimal solution. We will evaluate CED in three human interaction tasks: video recorded three-person social interactions and parent-infant interaction, and motion captured body movement. We hypothesize that CED will have moderate convergence with supervised approaches, and identify novel patterns in intra- and interpersonal actions occult to supervised approaches.
Committee:Fernando De la Torre, Co-chair
Jeffrey F. Cohn, Co-chair
Simon Lucey
Deva Ramanan
Vladimir Pavlovic, Rutgers University