TY - GEN
T1 - Sampling-Frequency-Independent Audio Source Separation Using Convolution Layer Based on Impulse Invariant Method
AU - Saito, Koichi
AU - Nakamura, Tomohiko
AU - Yatabe, Kohei
AU - Koizumi, Yuma
AU - Saruwatari, Hiroshi
N1 - Funding Information:
This work was supported by JSPS KAKENHI Grant Number JP20K19818.
Publisher Copyright:
© 2021 European Signal Processing Conference. All rights reserved.
PY - 2021
Y1 - 2021
N2 - Audio source separation is often used as preprocessing of various applications, and one of its ultimate goals is to construct a single versatile model capable of dealing with the varieties of audio signals. Since sampling frequency, one of the audio signal varieties, is usually application specific, the preceding audio source separation model should be able to deal with audio signals of all sampling frequencies specified in the target applications. However, conventional models based on deep neural networks (DNNs) are trained only at the sampling frequency specified by the training data, and there are no guarantees that they work with unseen sampling frequencies. In this paper, we propose a convolution layer capable of handling arbitrary sampling frequencies by a single DNN. Through music source separation experiments, we show that the introduction of the proposed layer enables a conventional audio source separation model to consistently work with even unseen sampling frequencies.
AB - Audio source separation is often used as preprocessing of various applications, and one of its ultimate goals is to construct a single versatile model capable of dealing with the varieties of audio signals. Since sampling frequency, one of the audio signal varieties, is usually application specific, the preceding audio source separation model should be able to deal with audio signals of all sampling frequencies specified in the target applications. However, conventional models based on deep neural networks (DNNs) are trained only at the sampling frequency specified by the training data, and there are no guarantees that they work with unseen sampling frequencies. In this paper, we propose a convolution layer capable of handling arbitrary sampling frequencies by a single DNN. Through music source separation experiments, we show that the introduction of the proposed layer enables a conventional audio source separation model to consistently work with even unseen sampling frequencies.
KW - Analog-to-digital filter conversion
KW - Audio source separation
KW - Deep neural networks
UR - http://www.scopus.com/inward/record.url?scp=85123169452&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123169452&partnerID=8YFLogxK
U2 - 10.23919/EUSIPCO54536.2021.9615941
DO - 10.23919/EUSIPCO54536.2021.9615941
M3 - Conference contribution
AN - SCOPUS:85123169452
T3 - European Signal Processing Conference
SP - 321
EP - 325
BT - 29th European Signal Processing Conference, EUSIPCO 2021 - Proceedings
PB - European Signal Processing Conference, EUSIPCO
T2 - 29th European Signal Processing Conference, EUSIPCO 2021
Y2 - 23 August 2021 through 27 August 2021
ER -