TY - GEN
T1 - Adjusting occurrence probabilities of automatically-generated abbreviated words in spoken dialogue systems
AU - Katsumaru, Masaki
AU - Komatani, Kazunori
AU - Ogata, Tetsuya
AU - Okuno, Hiroshi G.
PY - 2009
Y1 - 2009
N2 - Users often abbreviate long words when using spoken dialogue systems, which results in automatic speech recognition (ASR) errors. We define abbreviated words as sub-words of an original word and add them to the ASR dictionary. The first problem we face is that proper nouns cannot be correctly segmented by general morphological analyzers, although long and compound words need to be segmented in agglutinative languages such as Japanese. The second is that, as vocabulary size increases, adding many abbreviated words degrades the ASR accuracy. We have developed two methods, (1) to segment words by using conjunction probabilities between characters, and (2) to adjust occurrence probabilities of generated abbreviated words on the basis of the following two cues: phonological similarities between the abbreviated and original words and frequencies of abbreviated words in Web documents. Our method improves ASR accuracy by 34.9 points for utterances containing abbreviated words without degrading the accuracy for utterances containing original words.
AB - Users often abbreviate long words when using spoken dialogue systems, which results in automatic speech recognition (ASR) errors. We define abbreviated words as sub-words of an original word and add them to the ASR dictionary. The first problem we face is that proper nouns cannot be correctly segmented by general morphological analyzers, although long and compound words need to be segmented in agglutinative languages such as Japanese. The second is that, as vocabulary size increases, adding many abbreviated words degrades the ASR accuracy. We have developed two methods, (1) to segment words by using conjunction probabilities between characters, and (2) to adjust occurrence probabilities of generated abbreviated words on the basis of the following two cues: phonological similarities between the abbreviated and original words and frequencies of abbreviated words in Web documents. Our method improves ASR accuracy by 34.9 points for utterances containing abbreviated words without degrading the accuracy for utterances containing original words.
UR - http://www.scopus.com/inward/record.url?scp=70350660538&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=70350660538&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-02568-6_49
DO - 10.1007/978-3-642-02568-6_49
M3 - Conference contribution
AN - SCOPUS:70350660538
SN - 3642025676
SN - 9783642025679
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 481
EP - 490
BT - Next-Generation Applied Intelligence - 22nd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2009, Proceedings
T2 - 22nd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2009
Y2 - 24 June 2009 through 27 June 2009
ER -