Pronunciation Variations in Emotional Speech

Thomas S. Polzin and Alex Waibel

Workshop Paper, ISCA '98 Workshop on Modeling Pronunciation Variation for Automatic Speech Recognition, May, 1998

View Publication

Abstract

In this paper we demonstrate how the emotional state of the speaker influences his or her speech. We show that recognition accuracy varies significantly depending on the emotional state of the speaker. Our system models the pronunciation variation of emotional speech both at the acoustic and prosodic level. We show that using emotion-specific acoustic and prosodic models allows the system to discriminate among four emotions (happy sad, angry, and afraid) well above chance level. Finally, we show that emotion-specific modeling improves the word accuracy of the speech recognition system when faced with emotional speech.

BibTeX

@workshop{Polzin-1998-14645,
author = {Thomas S. Polzin and Alex Waibel},
title = {Pronunciation Variations in Emotional Speech},
booktitle = {Proceedings of ISCA '98 Workshop on Modeling Pronunciation Variation for Automatic Speech Recognition},
year = {1998},
month = {May},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.