TY - GEN
T1 - Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR
AU - Weninger, Felix
AU - Erdogan, Hakan
AU - Watanabe, Shinji
AU - Vincent, Emmanuel
AU - Le Roux, Jonathan
AU - Hershey, John R.
AU - Schuller, Björn
N1 - Publisher Copyright:
© Springer International Publishing Switzerland 2015.
PY - 2015
Y1 - 2015
N2 - We evaluate some recent developments in recurrent neural network (RNN) based speech enhancement in the light of noise-robust automatic speech recognition (ASR). The proposed framework is based on Long Short-Term Memory (LSTM) RNNs which are discriminatively trained according to an optimal speech reconstruction objective. We demonstrate that LSTM speech enhancement, even when used ‘naïvely’ as front-end processing, delivers competitive results on the CHiME-2 speech recognition task. Furthermore, simple, feature-level fusion based extensions to the framework are proposed to improve the integration with the ASR back-end. These yield a best result of 13.76% average word error rate, which is, to our knowledge, the best score to date.
AB - We evaluate some recent developments in recurrent neural network (RNN) based speech enhancement in the light of noise-robust automatic speech recognition (ASR). The proposed framework is based on Long Short-Term Memory (LSTM) RNNs which are discriminatively trained according to an optimal speech reconstruction objective. We demonstrate that LSTM speech enhancement, even when used ‘naïvely’ as front-end processing, delivers competitive results on the CHiME-2 speech recognition task. Furthermore, simple, feature-level fusion based extensions to the framework are proposed to improve the integration with the ASR back-end. These yield a best result of 13.76% average word error rate, which is, to our knowledge, the best score to date.
UR - http://www.scopus.com/inward/record.url?scp=84944675581&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84944675581&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-22482-4_11
DO - 10.1007/978-3-319-22482-4_11
M3 - Conference contribution
AN - SCOPUS:84944675581
SN - 9783319224817
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 91
EP - 99
BT - Latent Variable Analysis and Signal Separation - 12th International Conference, LVA/ICA 2015, Proceedings
A2 - Koldovský, Zbynĕk
A2 - Vincent, Emmanuel
A2 - Yeredor, Arie
A2 - Tichavský, Petr
PB - Springer Verlag
T2 - 12th International Conference on Latent Variable Analysis and Signal Separation, LVA/ICA 2015
Y2 - 25 August 2015 through 28 August 2015
ER -