Minimum word error training of long short-term memory recurrent neural network language models for speech recognition

Takaaki Hori, Chiori Hori, Shinji Watanabe, John R. Hershey

研究成果: Conference contribution

18 被引用数 (Scopus)

抄録

This paper describes minimum word error (MWE) training of recurrent neural network language models (RNNLMs) for speech recognition. RNNLMs are usually trained to minimize a cross entropy of estimated word probabilities against the correct word sequence, which corresponds to maximum likelihood criterion. However, this training does not necessarily maximize a performance measure in a target task, i.e. it does not minimize word error rate (WER) explicitly in speech recognition. To solve such a problem, several discriminative training methods have already been proposed for n-gram language models, but those for RNNLMs have not sufficiently investigated. In this paper, we propose a MWE training method for RNNLMs, and report significant WER reductions when we applied the MWE method to a standard Elman-type RNNLM and a more advanced model, a Long Short-Term Memory (LSTM) RNNLM. We also present efficient MWE training with N-best lists on Graphics Processing Units (GPUs).

本文言語English
ホスト出版物のタイトル2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
出版社Institute of Electrical and Electronics Engineers Inc.
ページ5990-5994
ページ数5
ISBN(電子版)9781479999880
DOI
出版ステータスPublished - 2016 5月 18
外部発表はい
イベント41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Shanghai, China
継続期間: 2016 3月 202016 3月 25

出版物シリーズ

名前ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
2016-May
ISSN(印刷版)1520-6149

Other

Other41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
国/地域China
CityShanghai
Period16/3/2016/3/25

ASJC Scopus subject areas

  • ソフトウェア
  • 信号処理
  • 電子工学および電気工学

フィンガープリント

「Minimum word error training of long short-term memory recurrent neural network language models for speech recognition」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル