A Flexible Stream Architecture for ASR Using Articulatory Features

Florian Metze and Alex Waibel

Conference Paper, Proceedings of 7th International Conference on Spoken Language Processing (ICSLP '02), September, 2002

Abstract

Recently, speech recognition systems based on articulatory features such as "voicing" or the position of lips and tongue have gained interest, because they promise advantages with respect to robustness and permit new adaptation methods to compensate for channel, noise, and speaker variability. These approaches are also interesting from a general point of view, because their models use phonological and phonetic concepts, which allow for a richer description of a speech act than the sequence of HMM-states, which is the prevalent ASR architecture today. In this work, we present a multi-stream architecture, in which CD-HMMS are supported by detectors for articulatory features, using a linear combination of log-likelihood scores. This multi-stream approach results in a 15% reduction of WER on a read Broadcast-News (BN) task and improves performance on a spontaneous scheduling task (ESST) by 7%. The proposed architecture potentially allows for new speaker and channel adaptation schemes, including stream asynchronicity.

BibTeX

@conference{Metze-2002-8546,
author = {Florian Metze and Alex Waibel},
title = {A Flexible Stream Architecture for ASR Using Articulatory Features},
booktitle = {Proceedings of 7th International Conference on Spoken Language Processing (ICSLP '02)},
year = {2002},
month = {September},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.