Multichannel end-to-end speech recognition

Tsubasa Ochiai*, Shinji Watanabe, Takaaki Hori, John R. Hershey

*この研究の対応する著者

研究成果: Conference contribution

27 被引用数 (Scopus)

抄録

The field of speech recognition is in the midst of a paradigm shift: end-to-end neural networks are challenging the dominance of hidden Markov models as a core technology. Using an attention mechanism in a recurrent encoder-decoder architecture solves the dynamic time alignment problem, allowing joint end-to-end training of the acoustic and language modeling components. In this paper we extend the end-to-end framework to cncompass microphone array signal processing for noise suppression and speech enhancement within the acoustic encoding network. This allows the beamforming components to be optimized jointly within the recognition architecture to improve the end-to-end speech recognition objective. Experiments on the noisy speech benchmarks (CHiME-4 and AMI) show that our multichannel end-to-end system outperformed the attention-based baseline with input from a conventional adaptive beamformer.

本文言語English
ホスト出版物のタイトル34th International Conference on Machine Learning, ICML 2017
出版社International Machine Learning Society (IMLS)
ページ4033-4042
ページ数10
6
ISBN(電子版)9781510855144
出版ステータスPublished - 2017 1月 1
外部発表はい
イベント34th International Conference on Machine Learning, ICML 2017 - Sydney, Australia
継続期間: 2017 8月 62017 8月 11

Other

Other34th International Conference on Machine Learning, ICML 2017
国/地域Australia
CitySydney
Period17/8/617/8/11

ASJC Scopus subject areas

  • 計算理論と計算数学
  • 人間とコンピュータの相互作用
  • ソフトウェア

フィンガープリント

「Multichannel end-to-end speech recognition」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル