Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition

Felix Weninger, Shinji Watanabe, Yuuki Tachioka, Bjorn Schuller

Research output: Chapter in Book/Report/Conference proceedingConference contribution

71 Citations (Scopus)

Abstract

This paper describes our joint efforts to provide robust automatic speech recognition (ASR) for reverberated environments, such as in hands-free human-machine interaction. We investigate blind feature space de-reverberation and deep recurrent de-noising auto-encoders (DAE) in an early fusion scheme. Results on the 2014 REVERB Challenge development set indicate that the DAE front-end provides complementary performance gains to multi-condition training, feature transformations, and model adaptation. The proposed ASR system achieves word error rates of 17.62 % and 36.6 % on simulated and real data, which is a significant improvement over the Challenge baseline (25.16 and 47.2 %).

Original languageEnglish
Title of host publication2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages4623-4627
Number of pages5
ISBN (Print)9781479928927
DOIs
Publication statusPublished - 2014
Externally publishedYes
Event2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014 - Florence, Italy
Duration: 2014 May 42014 May 9

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2014 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2014
Country/TerritoryItaly
CityFlorence
Period14/5/414/5/9

Keywords

  • De-reverberation
  • automatic speech recognition
  • feature enhancement
  • recurrent neural networks

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition'. Together they form a unique fingerprint.

Cite this