Uncertainty training and decoding methods of deep neural networks based on stochastic representation of enhanced features

Yuuki Tachioka, Shinji Watanabe

Research output: Contribution to journalArticlepeer-review

10 Citations (Scopus)

Abstract

Speech enhancement is an important front-end technique to improve automatic speech recognition (ASR) in noisy environments. However, the wrong noise suppression of speech enhancement often causes additional distortions in speech signals, which degrades the ASR performance. To compensate the distortions, ASR needs to consider the uncertainty of enhanced features, which can be achieved by using the expectation of ASR decoding/training process with respect to the probabilistic representation of input features. However, unlike the Gaussian mixture model, it is difficult for Deep Neural Network (DNN) to deal with this expectation analytically due to the nonlinear activations. This paper proposes efficient Monte-Carlo approximation methods for this expectation calculation to realize DNN based uncertainty decoding and training. It first models the uncertainty of input features with linear interpolation between original and enhanced feature vectors with a random interpolation coefficient. By sampling input features based on this stochastic process in training, DNN can learn to generalize the variations of enhanced features. Our method also samples input features in decoding, and integrates multiple recognition hypotheses obtained from the samples. Experiments on the reverberated noisy speech recognition tasks (the second CHiME and REVERB challenges) show the effectiveness of our techniques.

Original languageEnglish
Pages (from-to)3541-3545
Number of pages5
JournalUnknown Journal
Volume2015-January
Publication statusPublished - 2015
Externally publishedYes

Keywords

  • Deep neural networks
  • Noise-robust speech recognition
  • Stochastic process of enhanced features
  • Uncertainty training/decoding

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'Uncertainty training and decoding methods of deep neural networks based on stochastic representation of enhanced features'. Together they form a unique fingerprint.

Cite this