TY - JOUR
T1 - Uncertainty propagation through deep neural networks
AU - Abdelaziz, Ahmed Hussen
AU - Watanabe, Shinji
AU - Hershey, John R.
AU - Vincent, Emanuel
AU - Kolossa, Dorothea
PY - 2015
Y1 - 2015
N2 - In order to improve the ASR performance in noisy environments, distorted speech is typically pre-processed by a speech enhancement algorithm, which usually results in a speech estimate containing residual noise and distortion. We may also have some measures of uncertainty or variance of the estimate. Uncertainty decoding is a framework that utilizes this knowledge of uncertainty in the input features during acoustic model scoring. Such frameworks have been well explored for traditional probabilistic models, but their optimal use for deep neural network (DNN)-based ASR systems is not yet clear. In this paper, we study the propagation of observation uncertainties through the layers of a DNN-based acoustic model. Since this is intractable due to the nonlinearities of the DNN, we employ approximate propagation methods, including Monte Carlo sampling, the unscented transform, and the piecewise exponential approximation of the activation function, to estimate the distribution of acoustic scores. Finally, the expected value of the acoustic score distribution is used for decoding, which is shown to further improve the ASR accuracy on the CHiME database, relative to a highly optimized DNN baseline.
AB - In order to improve the ASR performance in noisy environments, distorted speech is typically pre-processed by a speech enhancement algorithm, which usually results in a speech estimate containing residual noise and distortion. We may also have some measures of uncertainty or variance of the estimate. Uncertainty decoding is a framework that utilizes this knowledge of uncertainty in the input features during acoustic model scoring. Such frameworks have been well explored for traditional probabilistic models, but their optimal use for deep neural network (DNN)-based ASR systems is not yet clear. In this paper, we study the propagation of observation uncertainties through the layers of a DNN-based acoustic model. Since this is intractable due to the nonlinearities of the DNN, we employ approximate propagation methods, including Monte Carlo sampling, the unscented transform, and the piecewise exponential approximation of the activation function, to estimate the distribution of acoustic scores. Finally, the expected value of the acoustic score distribution is used for decoding, which is shown to further improve the ASR accuracy on the CHiME database, relative to a highly optimized DNN baseline.
KW - Deep Neural Networks
KW - Noise-robust ASR
KW - Observation Uncertainty
KW - Uncertainty Propagation
UR - http://www.scopus.com/inward/record.url?scp=84959121946&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84959121946&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:84959121946
VL - 2015-January
SP - 3561
EP - 3565
JO - Unknown Journal
JF - Unknown Journal
ER -