Improving Text Classification by Shrinkage in a Hierarchy of Classes

A. McCallum, R. Rosenfeld, Tom Mitchell, and A. Ng

Conference Paper, Proceedings of (ICML) International Conference on Machine Learning, pp. 359 - 367, July, 1998

View Publication

Abstract

When documents are organized in a large number of topic categories, the categories are often arranged in a hierarchy. The U.S. patent database and Yahoo are two examples. This paper shows that the accuracy of a naive Bayes text classifier can be significantly improved by taking advantage of a hierarchy of classes. We adopt an established statistical technique called shrinkage that smooths parameter estimates of a data-sparse child with its parent in order to obtain more robust parameter estimates. The approach is also employed in deleted interpolation, a technique for smoothing n-grams in language modeling for speech recognition. Our method scales well to large data sets, with numerous categories in large hierarchies. Experimental results on three real-world data sets from UseNet, Yahoo, and corporate web pages show improved performance, with a reduction in error up to 29% over the traditional at classifier.

BibTeX

@conference{McCallum-1998-14715,
author = {A. McCallum and R. Rosenfeld and Tom Mitchell and A. Ng},
title = {Improving Text Classification by Shrinkage in a Hierarchy of Classes},
booktitle = {Proceedings of (ICML) International Conference on Machine Learning},
year = {1998},
month = {July},
pages = {359 - 367},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.