A Probabilistic Framework to Learn from Multiple Annotators with Time-Varying Accuracy - Robotics Institute Carnegie Mellon University

A Probabilistic Framework to Learn from Multiple Annotators with Time-Varying Accuracy

P. Donmez, J. Carbonell, and J. Schneider
Conference Paper, Proceedings of SIAM International Conference on Data Mining (SDM '10), pp. 826 - 837, April, 2010

Abstract

This paper addresses the challenging problem of learning from multiple annotators whose labeling accuracy (reliability) differs and varies over time. We propose a framework based on Sequential Bayesian Estimation to learn the expected accuracy at each time step while simultaneously deciding which annotators to query for a label in an incremental learning framework. We develop a variant of the particle filtering method that estimates the expected accuracy at every time step by sets of weighted samples and performs sequential Bayes updates. The estimated expected accuracies are then used to decide which annotators to be queried at the next time step. The empirical analysis shows that the proposed method is very effective at predicting the true label using only moderate labeling efforts, resulting in cleaner labels to train classifiers. The proposed method significantly outperforms a repeated labeling baseline which queries all labelers per example and takes the majority vote to predict the true label. Moreover, our method is able to track the true accuracy of an annotator quite well in the absence of gold standard labels. These results demonstrate the strength of the proposed method in terms of estimating the time-varying reliability of multiple annotators and producing cleaner, better quality labels without extensive label queries.

BibTeX

@conference{Donmez-2010-119817,
author = {P. Donmez and J. Carbonell and J. Schneider},
title = {A Probabilistic Framework to Learn from Multiple Annotators with Time-Varying Accuracy},
booktitle = {Proceedings of SIAM International Conference on Data Mining (SDM '10)},
year = {2010},
month = {April},
pages = {826 - 837},
}