TY - GEN
T1 - Reducing algorithmic delay using low-overlap window for online Wave-U-Net
AU - Nakaoka, Sotaro
AU - Li, Li
AU - Makino, Shoji
AU - Yamada, Takeshi
N1 - Funding Information:
This work was partly supported by JSPS KAKENHI Grant Number 19H04131, and JST CREST JPMJCR19A3.
Publisher Copyright:
© 2021 APSIPA.
PY - 2021
Y1 - 2021
N2 - Wave-U-Net is an end-to-end single-channel source separation method that works in the time domain and thus can take the phase information into account during separation. It has shown high performance in tasks such as singing voice separation and speech enhancement. We previously proposed an extension of Wave-U-Net to online processing with a short input using teacher-student learning. Since online Wave-U-Net processes input signals frame-by-frame, where the frames are segmented by applying a window function, the window length is generally the lower bound of the algorithmic delay. In this paper, based on the fact that the separation performance of online Wave-U-Net is concentrated at the center of the segment, we propose to reduce the algorithmic delay by applying windows with a zero region near the edges into the online Wave-U-Net. Experimental results showed that the proposed method reduced the algorithmic delay by 40% of that of the conventional method while keeping the high speech enhancement performance with source-to-distortion ratio improvement of about 15 dB, thus enabling low-delay and high-performance speech enhancement.
AB - Wave-U-Net is an end-to-end single-channel source separation method that works in the time domain and thus can take the phase information into account during separation. It has shown high performance in tasks such as singing voice separation and speech enhancement. We previously proposed an extension of Wave-U-Net to online processing with a short input using teacher-student learning. Since online Wave-U-Net processes input signals frame-by-frame, where the frames are segmented by applying a window function, the window length is generally the lower bound of the algorithmic delay. In this paper, based on the fact that the separation performance of online Wave-U-Net is concentrated at the center of the segment, we propose to reduce the algorithmic delay by applying windows with a zero region near the edges into the online Wave-U-Net. Experimental results showed that the proposed method reduced the algorithmic delay by 40% of that of the conventional method while keeping the high speech enhancement performance with source-to-distortion ratio improvement of about 15 dB, thus enabling low-delay and high-performance speech enhancement.
UR - http://www.scopus.com/inward/record.url?scp=85126663448&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85126663448&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85126663448
T3 - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
SP - 1210
EP - 1214
BT - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2021
Y2 - 14 December 2021 through 17 December 2021
ER -