Bayesian Networks for Lossless Dataset Compression
Conference Paper, Proceedings of 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '99), pp. 387 - 391, August, 1999
Abstract
The recent explosion in research on probabilistic data mining algorithms such as Bayesian networks has been focused primarily on their use in diagnostics, prediction and efficient inference. In this paper, we examine the use of Bayesian networks for a different purpose: lossless compression of large datasets. We present algorithms for automatically learning Bayesian networks and new structures called \Hu man networks" that model statistical relationships in the datasets, and algorithms for using these models to then compress the datasets. These algorithms often achieve significantly better compression ratios than achieved with common dictionary-based algorithms such those used by programs like ZIP.
BibTeX
@conference{Davies-1999-16687,author = {Scott Davies and Andrew Moore},
title = {Bayesian Networks for Lossless Dataset Compression},
booktitle = {Proceedings of 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '99)},
year = {1999},
month = {August},
pages = {387 - 391},
}
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.