Multi-speaker/Speaker-independent Architectures for the Multi-state Time Delay Neural Network - Robotics Institute Carnegie Mellon University

Multi-speaker/Speaker-independent Architectures for the Multi-state Time Delay Neural Network

Hermann Hild and Alex Waibel
Conference Paper, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '93), Vol. 2, pp. 255 - 258, April, 1993

Abstract

In this paper we present an improved Multi-State Time Delay Neural Network (MS-TDNN) for speaker-independent, connected letter recognition which outperforms an HMM based system (SPHINX) and previous MS-TDNNs, and explore new network architectures with "internal speaker models". Four different architectures characterized by an increasing number of speaker-specific parameters are introduced. The speaker-specific parameters can be adjusted by "automatic speaker identification" or by speaker adaptation, allowing for "tuning-in" to a new speaker. Both methods lead to significant improvements over the straightforward speaker-independent architecture. Similar as described in [BRIDLE91], even unsupervised "tuning-in" (speech is unlabeled) works astonishingly well.

BibTeX

@conference{Hild-1993-13473,
author = {Hermann Hild and Alex Waibel},
title = {Multi-speaker/Speaker-independent Architectures for the Multi-state Time Delay Neural Network},
booktitle = {Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '93)},
year = {1993},
month = {April},
volume = {2},
pages = {255 - 258},
}