Bayesian Optimal Active Search on Graphs - Robotics Institute Carnegie Mellon University

Bayesian Optimal Active Search on Graphs

R. Garnett, Y. Krishnamurthy, D. Wang, J. Schneider, and R. Mann
Workshop Paper, KDD '11 9th Workshop on Mining and Learning with Graphs (MLG '11), August, 2011

Abstract

In many classification problems, including numerous examples on modern large-scale graph datasets, a large quantity of unlabeled data are available. The cost of obtaining a label for such data can be very expensive, for example when human intervention is required. Both the semi-supervised and active learning communities have approached such problems. The former focuses on how to aid the classification task by exploiting the distribution of unlabeled data, and the latter addresses the problem of choosing the most useful labels to acquire, that would subsequently minimize the cost incurred in the pursuit of the learning goal. Various algorithms have been proposed by these communities that learn from a large dataset with few labeled data, and graph datasets have in particular received a lot of attention for large-scale applications such as collaborative filtering for recommendation systems and link prediction in social networks. Here, we focus on a specific active binary-classification problem, where the goal is to find the members of a particular class as quickly as possible. In some situations, such as fraud detection or the investigative analysis of potentially criminal social networks, only the members of the malicious class are sought, whereas obtaining labels for points in the positive class is only useful for the purpose of facilitating that task. We derive the Bayesian optimal policy for this decision problem and test our proposed algorithm on two large-scale graph datasets. The optimal policy can be implemented in parallel, and it is our hope that the described algorithm can scale to even larger graphs.

BibTeX

@workshop{Garnett-2011-119840,
author = {R. Garnett and Y. Krishnamurthy and D. Wang and J. Schneider and R. Mann},
title = {Bayesian Optimal Active Search on Graphs},
booktitle = {Proceedings of KDD '11 9th Workshop on Mining and Learning with Graphs (MLG '11)},
year = {2011},
month = {August},
}