TY - GEN
T1 - Classification of causes of speech recognition errors using attention-based bidirectional long short-term memory and modulation spectrum
AU - Santoso, Jennifer
AU - Yamada, Takeshi
AU - Makino, Shoji
N1 - Funding Information:
V. ACKNOWLEDGEMENT This work was supported by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grant No. 17K00224.
Publisher Copyright:
© 2019 IEEE.
PY - 2019/11
Y1 - 2019/11
N2 - In this paper, we address the problem of classifying four common utterance characteristics related to the utterance speed, which cause speech recognition errors. We previously proposed bidirectional long short-term memory (BLSTM) as a classifier and the modulation spectrum as an acoustic feature. However, the performance of it is still insufficient, since BLSTM classified the utterance characteristics from the overall utterance, while most of the recognition errors resulted from utterance characteristics occur in only a small part of utterance. In this paper, we propose an approach to enhance classifier by using attention mechanism (attention-based BLSTM). Attention-based BLSTM enables the classifier to weight each frame according to its importance instead of directly measuring overall information from the speech. Furthermore, we investigate the correspondence of utterance characteristics to different modulation spectrum block lengths. To evaluate the performance of the proposed method, we conducted a classification experiment on Japanese conversational speeches with four different utterance characteristics: 'fast', 'slow', 'filler', and 'stutter'. As a result, the proposed method improved the F-score by 0.033-0.129 compared with the previously proposed method using BLSTM. This result confirms the effectiveness of attention-based BLSTM in classifying cause of errors based on utterance characteristics.
AB - In this paper, we address the problem of classifying four common utterance characteristics related to the utterance speed, which cause speech recognition errors. We previously proposed bidirectional long short-term memory (BLSTM) as a classifier and the modulation spectrum as an acoustic feature. However, the performance of it is still insufficient, since BLSTM classified the utterance characteristics from the overall utterance, while most of the recognition errors resulted from utterance characteristics occur in only a small part of utterance. In this paper, we propose an approach to enhance classifier by using attention mechanism (attention-based BLSTM). Attention-based BLSTM enables the classifier to weight each frame according to its importance instead of directly measuring overall information from the speech. Furthermore, we investigate the correspondence of utterance characteristics to different modulation spectrum block lengths. To evaluate the performance of the proposed method, we conducted a classification experiment on Japanese conversational speeches with four different utterance characteristics: 'fast', 'slow', 'filler', and 'stutter'. As a result, the proposed method improved the F-score by 0.033-0.129 compared with the previously proposed method using BLSTM. This result confirms the effectiveness of attention-based BLSTM in classifying cause of errors based on utterance characteristics.
UR - http://www.scopus.com/inward/record.url?scp=85082389231&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85082389231&partnerID=8YFLogxK
U2 - 10.1109/APSIPAASC47483.2019.9023288
DO - 10.1109/APSIPAASC47483.2019.9023288
M3 - Conference contribution
AN - SCOPUS:85082389231
T3 - 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
SP - 302
EP - 306
BT - 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
Y2 - 18 November 2019 through 21 November 2019
ER -