Leveraging Common Structure to Improve Prediction across Related Datasets

Matt Barnes, Nick Gisolfi, Madalina Fiterau, and Artur Dubrawski

Conference Paper, Proceedings of 29th AAAI Conference on Artificial Intelligence (AAAI '15), pp. 4144 - 4145, January, 2015

Abstract

In many applications, training data is provided in the form of related datasets obtained from several sources, which typically affects the sample distribution. The learned classification models, which are expected to perform well on similar data coming from new sources, often suffer due to bias introduced by what we call `spurious' samples -- those due to source characteristics and not representative of any other part of the data. As standard outlier detection and robust classification usually fall short of determining groups of spurious samples, we propose a procedure which identifies the common structure across datasets by minimizing a multi-dataset divergence metric, increasing accuracy for new datasets.

BibTeX

@conference{Barnes-2015-121852,
author = {Matt Barnes and Nick Gisolfi and Madalina Fiterau and Artur Dubrawski},
title = {Leveraging Common Structure to Improve Prediction across Related Datasets},
booktitle = {Proceedings of 29th AAAI Conference on Artificial Intelligence (AAAI '15)},
year = {2015},
month = {January},
pages = {4144 - 4145},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.