Multi-speaker/Speaker-independent Architectures for the Multi-state Time Delay Neural Network
Abstract
In this paper we present an improved Multi-State Time Delay Neural Network (MS-TDNN) for speaker-independent, connected letter recognition which outperforms an HMM based system (SPHINX) and previous MS-TDNNs, and explore new network architectures with "internal speaker models". Four different architectures characterized by an increasing number of speaker-specific parameters are introduced. The speaker-specific parameters can be adjusted by "automatic speaker identification" or by speaker adaptation, allowing for "tuning-in" to a new speaker. Both methods lead to significant improvements over the straightforward speaker-independent architecture. Similar as described in [BRIDLE91], even unsupervised "tuning-in" (speech is unlabeled) works astonishingly well.
BibTeX
@conference{Hild-1993-13473,author = {Hermann Hild and Alex Waibel},
title = {Multi-speaker/Speaker-independent Architectures for the Multi-state Time Delay Neural Network},
booktitle = {Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '93)},
year = {1993},
month = {April},
volume = {2},
pages = {255 - 258},
}