TY - GEN
T1 - Minimum word error training of long short-term memory recurrent neural network language models for speech recognition
AU - Hori, Takaaki
AU - Hori, Chiori
AU - Watanabe, Shinji
AU - Hershey, John R.
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/5/18
Y1 - 2016/5/18
N2 - This paper describes minimum word error (MWE) training of recurrent neural network language models (RNNLMs) for speech recognition. RNNLMs are usually trained to minimize a cross entropy of estimated word probabilities against the correct word sequence, which corresponds to maximum likelihood criterion. However, this training does not necessarily maximize a performance measure in a target task, i.e. it does not minimize word error rate (WER) explicitly in speech recognition. To solve such a problem, several discriminative training methods have already been proposed for n-gram language models, but those for RNNLMs have not sufficiently investigated. In this paper, we propose a MWE training method for RNNLMs, and report significant WER reductions when we applied the MWE method to a standard Elman-type RNNLM and a more advanced model, a Long Short-Term Memory (LSTM) RNNLM. We also present efficient MWE training with N-best lists on Graphics Processing Units (GPUs).
AB - This paper describes minimum word error (MWE) training of recurrent neural network language models (RNNLMs) for speech recognition. RNNLMs are usually trained to minimize a cross entropy of estimated word probabilities against the correct word sequence, which corresponds to maximum likelihood criterion. However, this training does not necessarily maximize a performance measure in a target task, i.e. it does not minimize word error rate (WER) explicitly in speech recognition. To solve such a problem, several discriminative training methods have already been proposed for n-gram language models, but those for RNNLMs have not sufficiently investigated. In this paper, we propose a MWE training method for RNNLMs, and report significant WER reductions when we applied the MWE method to a standard Elman-type RNNLM and a more advanced model, a Long Short-Term Memory (LSTM) RNNLM. We also present efficient MWE training with N-best lists on Graphics Processing Units (GPUs).
KW - Long short-term memory
KW - Minimum word error training
KW - Recurrent neural network language model
KW - Speech recognition
UR - http://www.scopus.com/inward/record.url?scp=84973341477&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84973341477&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2016.7472827
DO - 10.1109/ICASSP.2016.7472827
M3 - Conference contribution
AN - SCOPUS:84973341477
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 5990
EP - 5994
BT - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
Y2 - 20 March 2016 through 25 March 2016
ER -