Improving Connected Letter Recognition by Lipreading

C. Bregler, S. Manke, H. Hild, and Alex Waibel

Conference Paper, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '93), Vol. 1, pp. 557 - 560, April, 1993

View Publication

Abstract

The authors show how recognition performance in automated speech perception can be significantly improved by additional lipreading, so called speech-reading. They show this on an extension of a state-of-the-art speech recognition system, a modular multistage time delay neural network architecture (MS-TDNN). The acoustic and visual speech data are preclassified in two separate front-end phoneme TDNNs and combined with acoustic-visual hypotheses for the dynamic time warping algorithm. This is shown on a connected word recognition problem, the notoriously difficult letter spelling task. With speech-reading, the error rate could be reduced by up to half of the error rate of the pure acoustic recognition.

BibTeX

@conference{Bregler-1993-15946,
author = {C. Bregler and S. Manke and H. Hild and Alex Waibel},
title = {Improving Connected Letter Recognition by Lipreading},
booktitle = {Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '93)},
year = {1993},
month = {April},
volume = {1},
pages = {557 - 560},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.