Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR

Felix Weninger*, Hakan Erdogan, Shinji Watanabe, Emmanuel Vincent, Jonathan Le Roux, John R. Hershey, Björn Schuller

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

414 Citations (Scopus)

Abstract

We evaluate some recent developments in recurrent neural network (RNN) based speech enhancement in the light of noise-robust automatic speech recognition (ASR). The proposed framework is based on Long Short-Term Memory (LSTM) RNNs which are discriminatively trained according to an optimal speech reconstruction objective. We demonstrate that LSTM speech enhancement, even when used ‘naïvely’ as front-end processing, delivers competitive results on the CHiME-2 speech recognition task. Furthermore, simple, feature-level fusion based extensions to the framework are proposed to improve the integration with the ASR back-end. These yield a best result of 13.76% average word error rate, which is, to our knowledge, the best score to date.

Original languageEnglish
Title of host publicationLatent Variable Analysis and Signal Separation - 12th International Conference, LVA/ICA 2015, Proceedings
EditorsZbynĕk Koldovský, Emmanuel Vincent, Arie Yeredor, Petr Tichavský
PublisherSpringer Verlag
Pages91-99
Number of pages9
ISBN (Print)9783319224817
DOIs
Publication statusPublished - 2015
Externally publishedYes
Event12th International Conference on Latent Variable Analysis and Signal Separation, LVA/ICA 2015 - Liberec, Czech Republic
Duration: 2015 Aug 252015 Aug 28

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9237
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other12th International Conference on Latent Variable Analysis and Signal Separation, LVA/ICA 2015
Country/TerritoryCzech Republic
CityLiberec
Period15/8/2515/8/28

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR'. Together they form a unique fingerprint.

Cite this