Text Classification for Intelligent Portfolio Management - Robotics Institute Carnegie Mellon University

Text Classification for Intelligent Portfolio Management

Young-Woo Seo, Joseph Andrew Giampapa, and Katia Sycara
Tech. Report, CMU-RI-TR-02-14, Robotics Institute, Carnegie Mellon University, May, 2002

Abstract

In the application domain of stock portfolio management, software agents that evaluate the risks associated with the individual companies of a portfolio should be able to read electronic news articles that are written to give investors an indication of the financial outlook of a company. There is a positive correlation between news reports on a company's financial outlook and the company's attractiveness as an investment. However, because of the volume of such reports, it is impossible for financial analysts or investors to track and read each one. Therefore, it would be very helpful to have a system that automatically classifies news reports that reflect positively or negatively on a company's financial outlook. To accomplish this task, we treat the understanding of news articles as a text classification problem. In this paper, we propose a text classification method that we call, ``Domain Experts" and ``Self-Confident" sampling, and compare it with naive Bayes with expectation maximization (EM). We evaluate these learning techniques in terms of how well they improve with unlabeled data after being initially trained on a small number of human-labeled articles and how well they classify the latest financial news articles. The significance of this work lies in the new classification method that we propose and in the sampling technique we used for improving classification accuracy.

BibTeX

@techreport{Seo-2002-8430,
author = {Young-Woo Seo and Joseph Andrew Giampapa and Katia Sycara},
title = {Text Classification for Intelligent Portfolio Management},
year = {2002},
month = {May},
institute = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-02-14},
keywords = {Text Classification, Sampling technique of unlabeled data},
}