An overview of the SPHINX speech recognition system
Abstract
A description is given of SPHINX, a system that demonstrates the feasibility of accurate, large-vocabulary, speaker-independent, continuous speech recognition. SPHINX is based on discrete hidden Markov models (HMMs) with LPC- (linear-predictive-coding) derived parameters. To provide speaker independence, knowledge was added to these HMMs in several ways: multiple codebooks of fixed-width parameters, and an enhanced recognizer with carefully designed models and word-duration modeling. To deal with coarticulation in continuous speech, yet still adequately represent a large vocabulary, two new subword speech units are introduced: function-word-dependent phone models and generalized triphone models. With grammars of perplexity 997, 60, and 20, SPHINX attained word accuracies of 71, 94, and 96%, respectively, on a 997-word task.
see also IEEE Transactions on Signal Processing
BibTeX
@article{Lee-1990-13073,author = {K. F. Lee and H. W. Hon and Raj Reddy},
title = {An overview of the SPHINX speech recognition system},
journal = {IEEE Transactions on Acoustics, Speech, and Signal Processing},
year = {1990},
month = {January},
volume = {38},
number = {1},
pages = {35 - 45},
}