Word Clustering with Parallel Spoken Language Corpora

Ye-Yi Wang, John Lafferty, and Alex Waibel

Conference Paper, Proceedings of 4th International Conference on Spoken Language Processing (ICSLP '96), Vol. 4, pp. 2364 - 2367, October, 1996

View Publication

Abstract

We introduce a word clustering algorithm which uses a bilingual, parallel corpus to group together words in the source and target language. Our method generalizes previous mutual information clustering algorithms for monolingual data by incorporating a statistical translation model. Preliminary experiments have shown that the algorithm can effectively employ the constraints implicit in bilingual data to extract classes which are well suited to machine translation tasks.

BibTeX

@conference{Wang-1996-14227,
author = {Ye-Yi Wang and John Lafferty and Alex Waibel},
title = {Word Clustering with Parallel Spoken Language Corpora},
booktitle = {Proceedings of 4th International Conference on Spoken Language Processing (ICSLP '96)},
year = {1996},
month = {October},
volume = {4},
pages = {2364 - 2367},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.