A Flexible Stream Architecture for ASR Using Articulatory Features - Robotics Institute Carnegie Mellon University

A Flexible Stream Architecture for ASR Using Articulatory Features

Florian Metze and Alex Waibel
Conference Paper, Proceedings of 7th International Conference on Spoken Language Processing (ICSLP '02), September, 2002

Abstract

Recently, speech recognition systems based on articulatory features such as "voicing" or the position of lips and tongue have gained interest, because they promise advantages with respect to robustness and permit new adaptation methods to compensate for channel, noise, and speaker variability. These approaches are also interesting from a general point of view, because their models use phonological and phonetic concepts, which allow for a richer description of a speech act than the sequence of HMM-states, which is the prevalent ASR architecture today. In this work, we present a multi-stream architecture, in which CD-HMMS are supported by detectors for articulatory features, using a linear combination of log-likelihood scores. This multi-stream approach results in a 15% reduction of WER on a read Broadcast-News (BN) task and improves performance on a spontaneous scheduling task (ESST) by 7%. The proposed architecture potentially allows for new speaker and channel adaptation schemes, including stream asynchronicity.

BibTeX

@conference{Metze-2002-8546,
author = {Florian Metze and Alex Waibel},
title = {A Flexible Stream Architecture for ASR Using Articulatory Features},
booktitle = {Proceedings of 7th International Conference on Spoken Language Processing (ICSLP '02)},
year = {2002},
month = {September},
}