TY - JOUR
T1 - Multi-channel speech enhancement using time-domain convolutional denoising autoencoder
AU - Tawara, Naohiro
AU - Kobayashi, Tetsunori
AU - Ogawa, Tetsuji
N1 - Publisher Copyright:
Copyright © 2019 ISCA
PY - 2019
Y1 - 2019
N2 - This paper investigates the use of time-domain convolutional denoising autoencoders (TCDAEs) with multiple channels as a method of speech enhancement. In general, denoising autoencoders (DAEs), deep learning systems that map noise-corrupted into clean waveforms, have been shown to generate high-quality signals while working in the time domain without the intermediate stage of phase modeling. Convolutional DAEs are one of the popular structures which learns a mapping between noise-corrupted and clean waveforms with convolutional denoising autoencoder. Multi-channel signals for TCDAEs are promising because the different times of arrival of a signal can be directly processed with their convolutional structure, Up to this time, TCDAEs have only been applied to single-channel signals. This paper explorers the effectiveness of TCDAEs in a multichannel configuration. A multi-channel TCDAEs are evaluated on multi-channel speech enhancement experiments, yielding significant improvement over single-channel DAEs in terms of signal-to-distortion ratio, perceptual evaluation of speech quality (PESQ), and word error rate.
AB - This paper investigates the use of time-domain convolutional denoising autoencoders (TCDAEs) with multiple channels as a method of speech enhancement. In general, denoising autoencoders (DAEs), deep learning systems that map noise-corrupted into clean waveforms, have been shown to generate high-quality signals while working in the time domain without the intermediate stage of phase modeling. Convolutional DAEs are one of the popular structures which learns a mapping between noise-corrupted and clean waveforms with convolutional denoising autoencoder. Multi-channel signals for TCDAEs are promising because the different times of arrival of a signal can be directly processed with their convolutional structure, Up to this time, TCDAEs have only been applied to single-channel signals. This paper explorers the effectiveness of TCDAEs in a multichannel configuration. A multi-channel TCDAEs are evaluated on multi-channel speech enhancement experiments, yielding significant improvement over single-channel DAEs in terms of signal-to-distortion ratio, perceptual evaluation of speech quality (PESQ), and word error rate.
KW - Dilated convolutional network
KW - Multi-channel speech enhancement
KW - Time-domain denoising autoencoder
UR - http://www.scopus.com/inward/record.url?scp=85074701829&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85074701829&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2019-3197
DO - 10.21437/Interspeech.2019-3197
M3 - Conference article
AN - SCOPUS:85074701829
SN - 2308-457X
VL - 2019-September
SP - 86
EP - 90
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 20th Annual Conference of the International Speech Communication Association: Crossroads of Speech and Language, INTERSPEECH 2019
Y2 - 15 September 2019 through 19 September 2019
ER -