Classification of causes of speech recognition errors using attention-based bidirectional long short-term memory and modulation spectrum

Jennifer Santoso*, Takeshi Yamada, Shoji Makino

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

In this paper, we address the problem of classifying four common utterance characteristics related to the utterance speed, which cause speech recognition errors. We previously proposed bidirectional long short-term memory (BLSTM) as a classifier and the modulation spectrum as an acoustic feature. However, the performance of it is still insufficient, since BLSTM classified the utterance characteristics from the overall utterance, while most of the recognition errors resulted from utterance characteristics occur in only a small part of utterance. In this paper, we propose an approach to enhance classifier by using attention mechanism (attention-based BLSTM). Attention-based BLSTM enables the classifier to weight each frame according to its importance instead of directly measuring overall information from the speech. Furthermore, we investigate the correspondence of utterance characteristics to different modulation spectrum block lengths. To evaluate the performance of the proposed method, we conducted a classification experiment on Japanese conversational speeches with four different utterance characteristics: 'fast', 'slow', 'filler', and 'stutter'. As a result, the proposed method improved the F-score by 0.033-0.129 compared with the previously proposed method using BLSTM. This result confirms the effectiveness of attention-based BLSTM in classifying cause of errors based on utterance characteristics.

Original languageEnglish
Title of host publication2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages302-306
Number of pages5
ISBN (Electronic)9781728132488
DOIs
Publication statusPublished - 2019 Nov
Externally publishedYes
Event2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019 - Lanzhou, China
Duration: 2019 Nov 182019 Nov 21

Publication series

Name2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019

Conference

Conference2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
Country/TerritoryChina
CityLanzhou
Period19/11/1819/11/21

ASJC Scopus subject areas

  • Information Systems

Fingerprint

Dive into the research topics of 'Classification of causes of speech recognition errors using attention-based bidirectional long short-term memory and modulation spectrum'. Together they form a unique fingerprint.

Cite this