BECTRA: Transducer-Based End-To-End ASR with Bert-Enhanced Encoder

Yosuke Higuchi*, Tetsuji Ogawa, Tetsunori Kobayashi, Shinji Watanabe

*この研究の対応する著者

研究成果: Conference contribution

3 被引用数 (Scopus)

抄録

We present BERT-CTC-Transducer (BECTRA), a novel end-to-end automatic speech recognition (E2E-ASR) model formulated by the transducer with a BERT-enhanced encoder. Integrating a large-scale pre-trained language model (LM) into E2E-ASR has been actively studied, aiming to utilize versatile linguistic knowledge for generating accurate text. One crucial factor that makes this integration challenging lies in the vocabulary mismatch; the vocabulary constructed for a pre-trained LM is generally too large for E2E-ASR training and is likely to have a mismatch against a target ASR domain. To overcome such an issue, we propose BECTRA, an extended version of our previous BERT-CTC, that realizes BERT-based E2E-ASR using a vocabulary of interest. BECTRA is a transducer-based model, which adopts BERT-CTC for its encoder and trains an ASR-specific decoder using a vocabulary suitable for a target task. With the combination of the transducer and BERT-CTC, we also propose a novel inference algorithm for taking advantage of both autoregressive and non-autoregressive decoding. Experimental results on several ASR tasks, varying in amounts of data, speaking styles, and languages, demonstrate that BECTRA outperforms BERT-CTC by effectively dealing with the vocabulary mismatch while exploiting BERT knowledge.

本文言語English
ホスト出版物のタイトルICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ISBN(電子版)9781728163277
DOI
出版ステータスPublished - 2023
イベント48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023 - Rhodes Island, Greece
継続期間: 2023 6月 42023 6月 10

出版物シリーズ

名前ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2023-June
ISSN(印刷版)1520-6149

Conference

Conference48th IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2023
国/地域Greece
CityRhodes Island
Period23/6/423/6/10

ASJC Scopus subject areas

  • ソフトウェア
  • 信号処理
  • 電子工学および電気工学

フィンガープリント

「BECTRA: Transducer-Based End-To-End ASR with Bert-Enhanced Encoder」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル