Telektronikk 2.2003
Spoken Language Technology in Telecommunications
Pronunciation Variation Modeling in Automatic Speech Recognition<a href="/telektronikk/volumes/index.php?page=author&auth_id=2">Ingunn Amdal</a> and <a href="/telektronikk/volumes/index.php?page=author&auth_id=10">Eric Fosler-Lussier</a>
Robust speech recognition is a critical research topic – systems must be able to handle a wide variation in types of speech to make speech technology more user-friendly. One major source of variation in speech is different speaking styles; handling this variation in user input is difficult for current state-of-the-art recognizers. Modeling pronunciation variation within the system can ameliorate the difficulties to some degree. Pronunciation variation can be modeled in different parts of the recognizer; in this presentation we focus on lexical adaptation (other articles in this issue of Telektronikk cover other types of robust modeling).
An overview of the methods used in pronunciation variation modeling by lexical adaptation will be given. First, the automatic speech recognition system will be explained briefly with a focus on the pronunciation lexicon. Then, the main distinction between pronunciation modeling methods, knowledge based or data-driven, is explained and illustrated with examples from selected work done in the field. Another distinction often made is modeling of the pronunciation variants directly or indirectly through pronunciation rules that make it possible to generalize knowledge or observations in a training set to unseen data. Finally, a section on confusability reduction is included.
Download PDF file