Context-dependent Hybrid HME/HMM Speech Recognition Using Polyphone Clustering Decision Trees - Robotics Institute Carnegie Mellon University

Context-dependent Hybrid HME/HMM Speech Recognition Using Polyphone Clustering Decision Trees

Jurgen Fritsch and Alex Waibel
Conference Paper, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97), Vol. 3, pp. 1759 - 1762, April, 1997

Abstract

This paper presents a context-dependent hybrid connectionist speech recognition system that uses a set of generalized hierarchical mixtures of experts (HME) to estimate context-dependent posterior acoustic class probabilities. The connectionist part of the system is organized in a modular fashion, allowing the distributed training of such a system on regular workstations. Context classes are based on polyphonic contexts, clustered using decision trees which we adopt from our continuous density HMM recognizer JANUS (Waibel et al., 1996). The system is evaluated on ESST, an English speaker-independent spontaneous speech database. Context dependent modeling is shown to yield significant improvements over simple context-independent modeling, requiring only small additional overhead in terms of training and decoding time.

BibTeX

@conference{Fritsch-1997-16438,
author = {Jurgen Fritsch and and Alex Waibel},
title = {Context-dependent Hybrid HME/HMM Speech Recognition Using Polyphone Clustering Decision Trees},
booktitle = {Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)},
year = {1997},
month = {April},
volume = {3},
pages = {1759 - 1762},
}