Bimodal sensor integration on the example of “speechreading”
Conference Paper, Proceedings of IEEE International Conference on Neural Networks (ICNN '93), Vol. 2, pp. 667 - 671, March, 1993
Abstract
It is shown how recognition performance in automated speech preception can be significantly improved by additional lipreading, so called speech-reading. It is shown on an extension of an existing state-of-the-art speech recognition system, a modular multi-state time-delay neural network (MS-TDNN). The acoustic and visual speech data are preclassified in two separate front-end phoneme TDNNs and combined to acoustic-visual hypotheses for the dynamic time warping algorithm. This is shown on a connected word recognition problem, the letter spelling task. With speech-reading the error rate can be reduced up to half of the error rate of pure acoustic recognition.
BibTeX
@conference{Bregler-1993-15947,author = {C. Bregler and S. Manke and H. Hild and Alex Waibel},
title = {Bimodal sensor integration on the example of "speechreading"},
booktitle = {Proceedings of IEEE International Conference on Neural Networks (ICNN '93)},
year = {1993},
month = {March},
volume = {2},
pages = {667 - 671},
}
Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.