A Flexible Stream Architecture for ASR Using Articulatory Features
Abstract
Recently, speech recognition systems based on articulatory features such as "voicing" or the position of lips and tongue have gained interest, because they promise advantages with respect to robustness and permit new adaptation methods to compensate for channel, noise, and speaker variability. These approaches are also interesting from a general point of view, because their models use phonological and phonetic concepts, which allow for a richer description of a speech act than the sequence of HMM-states, which is the prevalent ASR architecture today. In this work, we present a multi-stream architecture, in which CD-HMMS are supported by detectors for articulatory features, using a linear combination of log-likelihood scores. This multi-stream approach results in a 15% reduction of WER on a read Broadcast-News (BN) task and improves performance on a spontaneous scheduling task (ESST) by 7%. The proposed architecture potentially allows for new speaker and channel adaptation schemes, including stream asynchronicity.
BibTeX
@conference{Metze-2002-8546,author = {Florian Metze and Alex Waibel},
title = {A Flexible Stream Architecture for ASR Using Articulatory Features},
booktitle = {Proceedings of 7th International Conference on Spoken Language Processing (ICSLP '02)},
year = {2002},
month = {September},
}