LVCSR-based Language Identification - Robotics Institute Carnegie Mellon University

LVCSR-based Language Identification

Tanja Schultz, Ivica Rogina, and Alex Waibel
Conference Paper, Proceedings of 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '96), Vol. 2, pp. 781 - 784, May, 1996

Abstract

Automatic language identification is an important problem in building multilingual speech recognition and understanding systems. Building a language identification module for four languages we studied the influence of applying different levels of knowledge sources on a large vocabulary continuous speech recognition (LVCSR) approach, i.e. phonetic, phonotactic, lexical, and syntactic-semantic knowledge. The resulting language identification (LID) module can identify spontaneous speech input and can be used as a front end for the multilingual speech-to-speech translation system JANUS-II. A comparison of five LID systems showed that the incorporation of lexical and linguistic knowledge reduces the language identification error for the 2-language tests up to 50%. Based on these results we build a LID module for German, English, Spanish, and Japanese which yields 84% identification rate on the spontaneous scheduling task (SST).

BibTeX

@conference{Schultz-1996-16278,
author = {Tanja Schultz and Ivica Rogina and Alex Waibel},
title = {LVCSR-based Language Identification},
booktitle = {Proceedings of 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '96)},
year = {1996},
month = {May},
volume = {2},
pages = {781 - 784},
}