How much training data for facial action unit detection? - Robotics Institute Carnegie Mellon University

How much training data for facial action unit detection?

Conference Paper, Proceedings of 11th IEEE International Conference and Workshops on Automatic Face & Gesture Recognition (FG '15), May, 2015

Abstract

By systematically varying the number of subjects and the number of frames per subject, we explored the influence of training set size on appearance and shape-based approaches to facial action unit (AU) detection. Digital video and expert coding of spontaneous facial activity from 80 subjects (over 350,000 frames) were used to train and test support vector machine classifiers. Appearance features were shape-normalized SIFT descriptors and shape features were 66 facial landmarks. Ten-fold cross-validation was used in all evaluations. Number of subjects and number of frames per subject differentially affected appearance and shape-based classifiers. For appearance features, which are high-dimensional, increasing the number of training subjects from 8 to 64 incrementally improved performance, regardless of the number of frames taken from each subject (ranging from 450 through 3600). In contrast, for shape features, increases in the number of training subjects and frames were associated with mixed results. In summary, maximal performance was attained using appearance features from large numbers of subjects with as few as 450 frames per subject. These findings suggest that variation in the number of subjects rather than number of frames per subject yields most efficient performance.

BibTeX

@conference{Girard-2015-119670,
author = {Jeffrey M. Girard and Jeffrey F. Cohn and Laszlo A. Jeni and Simon Lucey and Fernando De la Torre},
title = {How much training data for facial action unit detection?},
booktitle = {Proceedings of 11th IEEE International Conference and Workshops on Automatic Face & Gesture Recognition (FG '15)},
year = {2015},
month = {May},
}