TY - GEN
T1 - Postfiltering Using an Adversarial Denoising Autoencoder with Noise-aware Training
AU - Tawara, Naohiro
AU - Tanabe, Hikari
AU - Kobayashi, Tetsunori
AU - Fujieda, Masaru
AU - Katagiri, Kazuhiro
AU - Yazu, Takashi
AU - Ogawa, Tetsuji
N1 - Funding Information:
This work was supported by JSPS KAKENHI Grant Number 17K12718
Publisher Copyright:
© 2019 IEEE.
PY - 2019/5
Y1 - 2019/5
N2 - An adversarial denoising autoencoder (ADAE) with noise-aware training is proposed and successfully applied to post-filtering for linear noise reduction. The ADAE is effective for attenuating interference sounds, however, it is difficult to learn to handle its various unexpected harmful effects (e.g., various types of noise) using a single network. Legacy speech enhancement was introduced as a pre-processor to make it possible to efficiently train the ADAEs by reducing the unexpected variabilities in the inputs to the ADAEs. Time-frequency masking performed well to suppress the variabilities, however, it induced unpleasant distortion, which is difficult for the ADAE to complement. In this paper, a minimum variance distortionless response (MVDR) beam-former, which can avoid troublesome non-linear distortions, is exploited as a preprocessor, and the MVDR outputs are used as the inputs to the ADAE-based post-filter. In addition, noise-dominant signals derived from the MVDR beamformer can improve the accuracy of the ADAE-based post-filter because the residual noise depends on the original noise signals. Experimental comparisons conducted using multichannel speech enhancement demonstrate that ADAE-based post-filtering yields significant improvements over the MVDR-and ADAE-based speech enhancement systems, and noise-aware training of ADAE works well.
AB - An adversarial denoising autoencoder (ADAE) with noise-aware training is proposed and successfully applied to post-filtering for linear noise reduction. The ADAE is effective for attenuating interference sounds, however, it is difficult to learn to handle its various unexpected harmful effects (e.g., various types of noise) using a single network. Legacy speech enhancement was introduced as a pre-processor to make it possible to efficiently train the ADAEs by reducing the unexpected variabilities in the inputs to the ADAEs. Time-frequency masking performed well to suppress the variabilities, however, it induced unpleasant distortion, which is difficult for the ADAE to complement. In this paper, a minimum variance distortionless response (MVDR) beam-former, which can avoid troublesome non-linear distortions, is exploited as a preprocessor, and the MVDR outputs are used as the inputs to the ADAE-based post-filter. In addition, noise-dominant signals derived from the MVDR beamformer can improve the accuracy of the ADAE-based post-filter because the residual noise depends on the original noise signals. Experimental comparisons conducted using multichannel speech enhancement demonstrate that ADAE-based post-filtering yields significant improvements over the MVDR-and ADAE-based speech enhancement systems, and noise-aware training of ADAE works well.
KW - Adversarial denoising autoencoder
KW - minimum variance distortionless response
KW - noise-aware training
KW - speech enhancement
UR - http://www.scopus.com/inward/record.url?scp=85068975809&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85068975809&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2019.8682684
DO - 10.1109/ICASSP.2019.8682684
M3 - Conference contribution
AN - SCOPUS:85068975809
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 3282
EP - 3286
BT - 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 44th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2019
Y2 - 12 May 2019 through 17 May 2019
ER -