TY - GEN
T1 - Invertible DNN-Based Nonlinear Time-Frequency Transform for Speech Enhancement
AU - Takeuchi, Daiki
AU - Yatabe, Kohei
AU - Koizumi, Yuma
AU - Oikawa, Yasuhiro
AU - Harada, Noboru
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/5
Y1 - 2020/5
N2 - We propose an end-to-end speech enhancement method with trainable time-frequency (T-F) transform based on invertible deep neural network (DNN). The resent development of speech enhancement is brought by using DNN. The ordinary DNN-based speech enhancement employs T-F transform, typically the short-time Fourier transform (STFT), and estimates a T-F mask using DNN. On the other hand, some methods have considered end-to-end networks which directly estimate the enhanced signals without T-F transform. While end-to-end methods have shown promising results, they are black boxes and hard to understand. Therefore, some end-to-end methods used a DNN to learn the linear T-F transform which is much easier to understand. However, the learned transform may not have a property important for ordinary signal processing. In this paper, as the important property of the T-F transform, perfect reconstruction is considered. An invertible nonlinear T-F transform is constructed by DNNs and learned from data so that the obtained transform is perfectly reconstructing filterbank.
AB - We propose an end-to-end speech enhancement method with trainable time-frequency (T-F) transform based on invertible deep neural network (DNN). The resent development of speech enhancement is brought by using DNN. The ordinary DNN-based speech enhancement employs T-F transform, typically the short-time Fourier transform (STFT), and estimates a T-F mask using DNN. On the other hand, some methods have considered end-to-end networks which directly estimate the enhanced signals without T-F transform. While end-to-end methods have shown promising results, they are black boxes and hard to understand. Therefore, some end-to-end methods used a DNN to learn the linear T-F transform which is much easier to understand. However, the learned transform may not have a property important for ordinary signal processing. In this paper, as the important property of the T-F transform, perfect reconstruction is considered. An invertible nonlinear T-F transform is constructed by DNNs and learned from data so that the obtained transform is perfectly reconstructing filterbank.
KW - Deep neural network (DNN)
KW - filterbank
KW - i-RevNet
KW - invertible DNN
KW - lifting scheme.
UR - http://www.scopus.com/inward/record.url?scp=85089239744&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85089239744&partnerID=8YFLogxK
U2 - 10.1109/ICASSP40776.2020.9053723
DO - 10.1109/ICASSP40776.2020.9053723
M3 - Conference contribution
AN - SCOPUS:85089239744
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 6644
EP - 6648
BT - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
Y2 - 4 May 2020 through 8 May 2020
ER -