Recognition of Conversational Telephone Speech Using the JANUS Speech Engine

Thorsten Zeppenfeld, Klaus Ries, Martin Westphal, and Alex Waibel

Conference Paper, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97), Vol. 3, pp. 1815 - 1818, April, 1997

View Publication

Abstract

Recognition of conversational speech is one of the most challenging speech recognition tasks to-date. While recognition error rates of 10% or lower can now be reached on speech dictation tasks over vocabularies in excess of 60,000 words, recognition of conversational speech has persistently resisted most attempts at improvements by way of the proven techniques to date. Difficulties arise from shorter words, telephone channel degradation, and highly disfluent and coarticulated speech. In this paper, we describe the application, adaptation, and performance evaluation of our JANUS speech recognition engine to the Switchboard conversational speech recognition task. Through a number of algorithmic improvements, we have been able to reduce error rates from more than 50% word error to 38%, measured on the offical 1996 NIST evaluation test set. Improvements include vocal tract length normalization, polyphonic modeling, label boosting, speaker adaptation with and without confidence measures, and speaking mode dependent pronunciation modeling.

BibTeX

@conference{Zeppenfeld-1997-16439,
author = {Thorsten Zeppenfeld and and Klaus Ries and Martin Westphal and Alex Waibel},
title = {Recognition of Conversational Telephone Speech Using the JANUS Speech Engine},
booktitle = {Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '97)},
year = {1997},
month = {April},
volume = {3},
pages = {1815 - 1818},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.