Joint CTC/attention decoding for end-to-end speech recognition

Takaaki Hori, Shinji Watanabe, John R. Hershey

Research output: Chapter in Book/Report/Conference proceedingConference contribution

93 Citations (Scopus)

Abstract

End-to-end automatic speech recognition (ASR) has become a popular alternative to conventional DNN/HMM systems because it avoids the need for linguistic resources such as pronunciation dictionary, tokenization, and context-dependency trees, leading to a greatly simplified model-building process. There are two major types of end-to-end architectures for ASR: attention-based methods use an attention mechanism to perform alignment between acoustic frames and recognized symbols, and connectionist temporal classification (CTC), uses Markov assumptions to efficiently solve sequential problems by dynamic programming. This paper proposes a joint decoding algorithm for end-to-end ASR with a hybrid CTC/attention architecture, which effectively utilizes both advantages in decoding. We have applied the proposed method to two ASR benchmarks (spontaneous Japanese and Mandarin Chinese), and showing the comparable performance to conventional state-of-the-art DNN/HMM ASR systems without linguistic resources.

Original languageEnglish
Title of host publicationACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
PublisherAssociation for Computational Linguistics (ACL)
Pages518-529
Number of pages12
ISBN (Electronic)9781945626753
DOIs
Publication statusPublished - 2017
Externally publishedYes
Event55th Annual Meeting of the Association for Computational Linguistics, ACL 2017 - Vancouver, Canada
Duration: 2017 Jul 302017 Aug 4

Publication series

NameACL 2017 - 55th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Long Papers)
Volume1

Other

Other55th Annual Meeting of the Association for Computational Linguistics, ACL 2017
Country/TerritoryCanada
CityVancouver
Period17/7/3017/8/4

ASJC Scopus subject areas

  • Language and Linguistics
  • Artificial Intelligence
  • Software
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Joint CTC/attention decoding for end-to-end speech recognition'. Together they form a unique fingerprint.

Cite this