抄録
The field of speech recognition is in the midst of a paradigm shift: end-to-end neural networks are challenging the dominance of hidden Markov models as a core technology. Using an attention mechanism in a recurrent encoder-decoder architecture solves the dynamic time alignment problem, allowing joint end-to-end training of the acoustic and language modeling components. In this paper we extend the end-to-end framework to cncompass microphone array signal processing for noise suppression and speech enhancement within the acoustic encoding network. This allows the beamforming components to be optimized jointly within the recognition architecture to improve the end-to-end speech recognition objective. Experiments on the noisy speech benchmarks (CHiME-4 and AMI) show that our multichannel end-to-end system outperformed the attention-based baseline with input from a conventional adaptive beamformer.
本文言語 | English |
---|---|
ホスト出版物のタイトル | 34th International Conference on Machine Learning, ICML 2017 |
出版社 | International Machine Learning Society (IMLS) |
ページ | 4033-4042 |
ページ数 | 10 |
巻 | 6 |
ISBN(電子版) | 9781510855144 |
出版ステータス | Published - 2017 1月 1 |
外部発表 | はい |
イベント | 34th International Conference on Machine Learning, ICML 2017 - Sydney, Australia 継続期間: 2017 8月 6 → 2017 8月 11 |
Other
Other | 34th International Conference on Machine Learning, ICML 2017 |
---|---|
国/地域 | Australia |
City | Sydney |
Period | 17/8/6 → 17/8/11 |
ASJC Scopus subject areas
- 計算理論と計算数学
- 人間とコンピュータの相互作用
- ソフトウェア