TY - GEN
T1 - Automated structure discovery and parameter tuning of neural network language model based on evolution strategy
AU - Tanaka, Tomohiro
AU - Moriya, Takafumi
AU - Shinozaki, Takahiro
AU - Watanabe, Shinji
AU - Hori, Takaaki
AU - Duh, Kevin
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2017/2/7
Y1 - 2017/2/7
N2 - Long short-term memory (LSTM) recurrent neural network based language models are known to improve speech recognition performance. However, significant effort is required to optimize network structures and training configurations. In this study, we automate the development process using evolutionary algorithms. In particular, we apply the covariance matrix adaptation-evolution strategy (CMA-ES), which has demonstrated robustness in other black box hyper-parameter optimization problems. By flexibly allowing optimization of various meta-parameters including layer wise unit types, our method automatically finds a configuration that gives improved recognition performance. Further, by using a Pareto based multi-objective CMA-ES, both WER and computational time were reduced jointly: after 10 generations, relative WER and computational time reductions for decoding were 4.1% and 22.7% respectively, compared to an initial baseline system whose WER was 8.7%.
AB - Long short-term memory (LSTM) recurrent neural network based language models are known to improve speech recognition performance. However, significant effort is required to optimize network structures and training configurations. In this study, we automate the development process using evolutionary algorithms. In particular, we apply the covariance matrix adaptation-evolution strategy (CMA-ES), which has demonstrated robustness in other black box hyper-parameter optimization problems. By flexibly allowing optimization of various meta-parameters including layer wise unit types, our method automatically finds a configuration that gives improved recognition performance. Further, by using a Pareto based multi-objective CMA-ES, both WER and computational time were reduced jointly: after 10 generations, relative WER and computational time reductions for decoding were 4.1% and 22.7% respectively, compared to an initial baseline system whose WER was 8.7%.
KW - Evolution strategy
KW - Language model
KW - Large vocabulary speech recognition
KW - Long short-term memory
KW - Multi-objective optimization
UR - http://www.scopus.com/inward/record.url?scp=85015997753&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85015997753&partnerID=8YFLogxK
U2 - 10.1109/SLT.2016.7846334
DO - 10.1109/SLT.2016.7846334
M3 - Conference contribution
AN - SCOPUS:85015997753
T3 - 2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings
SP - 665
EP - 671
BT - 2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 IEEE Workshop on Spoken Language Technology, SLT 2016
Y2 - 13 December 2016 through 16 December 2016
ER -