Deep beamforming networks for multi-channel speech recognition

Xiong Xiao, Shinji Watanabe, Hakan Erdogan, Liang Lu, John Hershey, Michael L. Seltzer, Guoguo Chen, Yu Zhang, Michael Mandel, Dong Yu

研究成果: Conference contribution

138 被引用数 (Scopus)

抄録

Despite the significant progress in speech recognition enabled by deep neural networks, poor performance persists in some scenarios. In this work, we focus on far-field speech recognition which remains challenging due to high levels of noise and reverberation in the captured speech signals. We propose to represent the stages of acoustic processing including beamforming, feature extraction, and acoustic modeling, as three components of a single unified computational network. The parameters of a frequency-domain beam-former are first estimated by a network based on features derived from the microphone channels. These filter coefficients are then applied to the array signals to form an enhanced signal. Conventional features are then extracted from this signal and passed to a second network that performs acoustic modeling for classification. The parameters of both the beamforming and acoustic modeling networks are trained jointly using back-propagation with a common cross-entropy objective function. In experiments on the AMI meeting corpus, we observed improvements by pre-training each sub-network with a network-specific objective function before joint training of both networks. The proposed method obtained a 3.2% absolute word error rate reduction compared to a conventional pipeline of independent processing stages.

本文言語English
ホスト出版物のタイトル2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ5745-5749
ページ数5
ISBN(電子版)9781479999880
DOI
出版ステータスPublished - 2016 5月 18
外部発表はい
イベント41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Shanghai, China
継続期間: 2016 3月 202016 3月 25

出版物シリーズ

名前ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2016-May
ISSN(印刷版)1520-6149

Other

Other41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
国/地域China
CityShanghai
Period16/3/2016/3/25

ASJC Scopus subject areas

  • ソフトウェア
  • 信号処理
  • 電子工学および電気工学

フィンガープリント

「Deep beamforming networks for multi-channel speech recognition」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル