Word Clustering with Parallel Spoken Language Corpora - Robotics Institute Carnegie Mellon University

Word Clustering with Parallel Spoken Language Corpora

Ye-Yi Wang, John Lafferty, and Alex Waibel
Conference Paper, Proceedings of 4th International Conference on Spoken Language Processing (ICSLP '96), Vol. 4, pp. 2364 - 2367, October, 1996

Abstract

We introduce a word clustering algorithm which uses a bilingual, parallel corpus to group together words in the source and target language. Our method generalizes previous mutual information clustering algorithms for monolingual data by incorporating a statistical translation model. Preliminary experiments have shown that the algorithm can effectively employ the constraints implicit in bilingual data to extract classes which are well suited to machine translation tasks.

BibTeX

@conference{Wang-1996-14227,
author = {Ye-Yi Wang and John Lafferty and Alex Waibel},
title = {Word Clustering with Parallel Spoken Language Corpora},
booktitle = {Proceedings of 4th International Conference on Spoken Language Processing (ICSLP '96)},
year = {1996},
month = {October},
volume = {4},
pages = {2364 - 2367},
}