Multi-speaker/Speaker-independent Architectures for the Multi-state Time Delay Neural Network

Hermann Hild and Alex Waibel

Conference Paper, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '93), Vol. 2, pp. 255 - 258, April, 1993

View Publication

Abstract

In this paper we present an improved Multi-State Time Delay Neural Network (MS-TDNN) for speaker-independent, connected letter recognition which outperforms an HMM based system (SPHINX) and previous MS-TDNNs, and explore new network architectures with "internal speaker models". Four different architectures characterized by an increasing number of speaker-specific parameters are introduced. The speaker-specific parameters can be adjusted by "automatic speaker identification" or by speaker adaptation, allowing for "tuning-in" to a new speaker. Both methods lead to significant improvements over the straightforward speaker-independent architecture. Similar as described in [BRIDLE91], even unsupervised "tuning-in" (speech is unlabeled) works astonishingly well.

BibTeX

@conference{Hild-1993-13473,
author = {Hermann Hild and Alex Waibel},
title = {Multi-speaker/Speaker-independent Architectures for the Multi-state Time Delay Neural Network},
booktitle = {Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '93)},
year = {1993},
month = {April},
volume = {2},
pages = {255 - 258},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.