Effect of Part-of-Speech and Lemmatization Filtering in Email Classification for Automatic Reply - Robotics Institute Carnegie Mellon University

Effect of Part-of-Speech and Lemmatization Filtering in Email Classification for Automatic Reply

Rogerio Bonatti, Arthur G. de Paula, Victor S. Lamarca, and Fabio Gagliardi Cozman
Workshop Paper, AAAI '16 Knowledge Extraction from Text Workshop, pp. 496 - 501, February, 2016

Abstract

We study the automatic reply of email business messages in Brazilian Portuguese. We present a novel corpus containing messages from a real application, and baseline categorization experiments using Naive Bayes and Support Vector Machines. We then discuss the effect of lemmatization and the role of part-of-speech tagging filtering on precision and recall. Support Vector Machines classification coupled with non-lemmatized selection of verbs and nouns, adjectives and adverbs was the best approach, with 87.3% maximum accuracy. Straightforward lemmatization in Portuguese led to the lowest classification results in the group, with 85.3% and 81.7% precision in SVM and Naive Bayes respectively. Thus, while lemmatization reduced precision and recall, part-of-speech filtering improved overall results.

BibTeX

@workshop{Bonatti-2016-124292,
author = {Rogerio Bonatti and Arthur G. de Paula and Victor S. Lamarca and Fabio Gagliardi Cozman},
title = {Effect of Part-of-Speech and Lemmatization Filtering in Email Classification for Automatic Reply},
booktitle = {Proceedings of AAAI '16 Knowledge Extraction from Text Workshop},
year = {2016},
month = {February},
pages = {496 - 501},
}