Improving Connected Letter Recognition by Lipreading - Robotics Institute Carnegie Mellon University

Improving Connected Letter Recognition by Lipreading

C. Bregler, S. Manke, H. Hild, and Alex Waibel
Conference Paper, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '93), Vol. 1, pp. 557 - 560, April, 1993

Abstract

The authors show how recognition performance in automated speech perception can be significantly improved by additional lipreading, so called speech-reading. They show this on an extension of a state-of-the-art speech recognition system, a modular multistage time delay neural network architecture (MS-TDNN). The acoustic and visual speech data are preclassified in two separate front-end phoneme TDNNs and combined with acoustic-visual hypotheses for the dynamic time warping algorithm. This is shown on a connected word recognition problem, the notoriously difficult letter spelling task. With speech-reading, the error rate could be reduced by up to half of the error rate of the pure acoustic recognition.

BibTeX

@conference{Bregler-1993-15946,
author = {C. Bregler and S. Manke and H. Hild and Alex Waibel},
title = {Improving Connected Letter Recognition by Lipreading},
booktitle = {Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '93)},
year = {1993},
month = {April},
volume = {1},
pages = {557 - 560},
}