Bimodal sensor integration on the example of "speechreading" - Robotics Institute Carnegie Mellon University

Bimodal sensor integration on the example of “speechreading”

C. Bregler, S. Manke, H. Hild, and Alex Waibel
Conference Paper, Proceedings of IEEE International Conference on Neural Networks (ICNN '93), Vol. 2, pp. 667 - 671, March, 1993

Abstract

It is shown how recognition performance in automated speech preception can be significantly improved by additional lipreading, so called speech-reading. It is shown on an extension of an existing state-of-the-art speech recognition system, a modular multi-state time-delay neural network (MS-TDNN). The acoustic and visual speech data are preclassified in two separate front-end phoneme TDNNs and combined to acoustic-visual hypotheses for the dynamic time warping algorithm. This is shown on a connected word recognition problem, the letter spelling task. With speech-reading the error rate can be reduced up to half of the error rate of pure acoustic recognition.

BibTeX

@conference{Bregler-1993-15947,
author = {C. Bregler and S. Manke and H. Hild and Alex Waibel},
title = {Bimodal sensor integration on the example of "speechreading"},
booktitle = {Proceedings of IEEE International Conference on Neural Networks (ICNN '93)},
year = {1993},
month = {March},
volume = {2},
pages = {667 - 671},
}