An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR

Huaibo Zhao*, Yosuke Higuchi, Tetsuji Ogawa, Tetsunori Kobayashi

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In the present paper, an attempt is made to combine Mask-CTC and the triggered attention mechanism to construct a streaming end-to-end automatic speech recognition (ASR) system that provides high performance with low latency. The triggered attention mechanism, which performs autoregressive decoding triggered by the CTC spike, has shown to be effective in streaming ASR. However, in order to maintain high accuracy of alignment estimation based on CTC outputs, which is the key to its performance, it is inevitable that decoding should be performed with some future information input (i.e., with higher latency). It should be noted that in streaming ASR, it is desirable to be able to achieve high recognition accuracy while keeping the latency low. Therefore, the present study aims to achieve highly accurate streaming ASR with low latency by introducing Mask-CTC, which is capable of learning feature representations that anticipate future information (i.e., that can consider long-term cons), to the encoder pre-training. Experimental comparisons conducted using WSJ data demonstrate that the proposed method achieves higher accuracy with lower latency than the conventional triggered attention-based streaming ASR system.

Original languageEnglish
Title of host publication2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages477-483
Number of pages7
ISBN (Electronic)9789881476890
Publication statusPublished - 2021
Event2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Tokyo, Japan
Duration: 2021 Dec 142021 Dec 17

Publication series

Name2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings

Conference

Conference2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021
Country/TerritoryJapan
CityTokyo
Period21/12/1421/12/17

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Vision and Pattern Recognition
  • Signal Processing
  • Instrumentation

Fingerprint

Dive into the research topics of 'An Investigation of Enhancing CTC Model for Triggered Attention-based Streaming ASR'. Together they form a unique fingerprint.

Cite this