A Machine Learning Approach to Building Domain-Specific Search Engines

Andrew McCallum, Kamal Nigam, Jason Rennie, and Kristie Seymore

Conference Paper, Proceedings of 16th International Joint Conference on Artificial Intelligence (IJCAI '99), Vol. 2, pp. 662 - 667, July, 1999

View Publication

Abstract

Domain-specific search engines are becoming increasingly popular because they offer increased accuracy and extra features not possible with general, Web-wide search engines. Unfortunately, they are also difficult and time-consuming to maintain. This paper proposes the use of machine learning techniques to greatly automate the creation and maintenance of domain-specific search engines. We describe new research in reinforcement learning, text classification and information extraction that enables efficient spidering, populates topic hierarchies, and identifies informative text segments. Using these techniques, we have built a demonstration system: a search engine for computer science research papers available at www.cora.justrcsettrch.com.

BibTeX

@conference{McCallum-1999-16661,
author = {Andrew McCallum and Kamal Nigam and Jason Rennie and Kristie Seymore},
title = {A Machine Learning Approach to Building Domain-Specific Search Engines},
booktitle = {Proceedings of 16th International Joint Conference on Artificial Intelligence (IJCAI '99)},
year = {1999},
month = {July},
volume = {2},
pages = {662 - 667},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.