Multi-Label Output Codes using Canonical Correlation Analysis

Y. Zhang and J. Schneider

Conference Paper, Proceedings of 14th International Conference on Artificial Intelligence and Statistics (AISTATS '11), Vol. 15, pp. 873 - 882, April, 2011

Abstract

Traditional error-correcting output codes (ECOCs) decompose a multi-class classification problem into many binary problems. Although it seems natural to use ECOCs for multi-label problems as well, doing so naively creates issues related to: the validity of the encoding, the efficiency of the decoding, the predictability of the generated codeword, and the exploitation of the label dependency. Using canonical correlation analysis, we propose an error-correcting code for multi-label classification. Label dependency is characterized as the most predictable directions in the label space, which are extracted as canonical output variates and encoded into the codeword. Predictions for the codeword define a graphical model of labels with both Bernoulli potentials (from classifiers on the labels) and Gaussian potentials (from regression on the canonical output variates). Decoding is performed by efficient mean-field approximation. We establish connections between the proposed code and research areas such as compressed sensing and ensemble learning. Some of these connections contribute to better understanding of the new code, and others lead to practical improvements in code design. In our empirical study, the proposed code leads to substantial improvements compared to various competitors in music emotion classification and outdoor scene recognition.

BibTeX

@conference{Zhang-2011-119808,
author = {Y. Zhang and J. Schneider},
title = {Multi-Label Output Codes using Canonical Correlation Analysis},
booktitle = {Proceedings of 14th International Conference on Artificial Intelligence and Statistics (AISTATS '11)},
year = {2011},
month = {April},
volume = {15},
pages = {873 - 882},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.