Phonetic-Distance-Based Hypothesis Driven Lexical Adaptation for Transcribing Multlingual Broadcast News
Abstract
High out-of-vocabulary (OOV) rates are one of the most prevailing problems for languages with a rapid vocabulary growth due to a large number of inflections. Especially when transcribing Serbo-Croatian and German broadcast news, the OOV-rate is between 8.7% and 4.5%. Hypothesis Driven Lexical Adaptation (HDLA) has already been shown to decrease high OOV-rates significantly by using morphology-based linguistic knowledge. This paper introduces another approach to dynamically adapt a recognition lexicon to the utterance to be recognized. Instead of morphological knowledge about word stems and inflection endings, distance measures based on Levenstein distance are used. Results based on phoneme and grapheme distances will be presented. Compared to the use of morphological knowledge, our distance-based approach offers the distinct advantage that no expert knowledge about a specific language is required, no definition of complex grammar rules is necessary. Instead, grapheme sequences or the phoneme representation of words are sufficient to apply our HDLA algorithm easily to any new language. With our proposed technique we were able to decrease OOV-rates by more than half from 8.7% to 4%, thereby also improving recognition performance by an absolute 4.1% from 29.5% to 25.4% word error rate.
BibTeX
@conference{Geutner-1998-14818,author = {Petra Geutner and Michael Finke and Alex Waibel},
title = {Phonetic-Distance-Based Hypothesis Driven Lexical Adaptation for Transcribing Multlingual Broadcast News},
booktitle = {Proceedings of 5th International Conference on Spoken Language Processing (ICSLP '98)},
year = {1998},
month = {December},
}