Anomaly Pattern Detection in Categorical Datasets
Abstract
We propose a new method for detecting patterns of anomalies in categorical datasets. We assume that anomalies are generated by some underlying process which affects only a particular subset of the data. Our method consists of two steps: we first use a "local anomaly detector" to identify individual records with anomalous attribute values, and then detect patterns where the number of anomalous records is higher than expected. Given the set of anomalies flagged by the local anomaly detector, we search over all subsets of the data defined by any set of fixed values of a subset of the attributes, in order to detect self-similar patterns of anomalies. We wish to detect any such subset of the test data which displays a significant increase in anomalous activity as compared to the normal behavior of the system (as indicated by the training data). We perform significance testing to determine if the number of anomalies in any subset of the test data is significantly higher than expected, and propose an efficient algorithm to perform this test over all such subsets of the data. We show that this algorithm is able to accurately detect anomalous patterns in real-world hospital, container shipping and network intrusion data.
BibTeX
@conference{Das-2008-119821,author = {K. Das and J. Schneider and D. Neill},
title = {Anomaly Pattern Detection in Categorical Datasets},
booktitle = {Proceedings of 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '08)},
year = {2008},
month = {August},
pages = {169 - 176},
}