An End-to-End Language-Tracking Speech Recognizer for Mixed-Language Speech

Hiroshi Seki, Shinji Watanabe, Takaaki Hori, Jonathan Le Roux, John R. Hershey

研究成果: Conference contribution

41 被引用数 (Scopus)

抄録

End-to-end automatic speech recognition (ASR) can significantly reduce the burden of developing ASR systems for new languages, by eliminating the need for linguistic information such as pronunciation dictionaries. This also creates an opportunity to build a monolithic multilingual ASR system with a language-independent neural network architecture. In our previous work, we proposed a monolithic neural network architecture that can recognize multiple languages, and showed its effectiveness compared with conventional language-dependent models. However, the model is not guaranteed to properly handle switches in language within an utterance, thus lacking the flexibility to recognize mixed-language speech such as code-switching. In this paper, we extend our model to enable dynamic tracking of the language within an utterance, and propose a training procedure that takes advantage of a newly created mixed-language speech corpus. Experimental results show that the extended model outperforms both language-dependent models and our previous model without suffering from performance degradation that could be associated with language switching.

本文言語English
ホスト出版物のタイトル2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ4919-4923
ページ数5
ISBN(印刷版)9781538646588
DOI
出版ステータスPublished - 2018 9月 10
外部発表はい
イベント2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Calgary, Canada
継続期間: 2018 4月 152018 4月 20

出版物シリーズ

名前ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2018-April
ISSN(印刷版)1520-6149

Other

Other2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
国/地域Canada
CityCalgary
Period18/4/1518/4/20

ASJC Scopus subject areas

  • ソフトウェア
  • 信号処理
  • 電子工学および電気工学

フィンガープリント

「An End-to-End Language-Tracking Speech Recognizer for Mixed-Language Speech」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル