TY - JOUR
T1 - The fundamental limitation of frequency domain blind source separation for convolutive mixtures of speech
AU - Araki, Shoko
AU - Mukai, Ryo
AU - Makino, Shoji
AU - Nishikawa, Tsuyoki
AU - Saruwatari, Hiroshi
PY - 2003/3
Y1 - 2003/3
N2 - Despite several recent proposals to achieve blind source separation (BSS) for realistic acoustic signals, the separation performance is still not good enough. In particular, when the impulse responses are long, performance is highly limited. In this paper, we consider a two-input, two-output convolutive BSS problem. First, we show that it is not good to be constrained by the condition T > P, where T is the frame length of the DFT and P is the length of the room impulse responses. We show that there is an optimum frame size that is determined by the trade-off between maintaining the number of samples in each frequency bin to estimate statistics and covering the whole reverberation. We also clarify the reason for the poor performance of BSS in long reverberant environments, highlighting that the framework of BSS works as two sets of frequency-domain adaptive beamformers. Although BSS can reduce reverberant sounds to some extent like adaptive beamformers, they mainly remove the sounds from the jammer direction. This is the reason for the difficulty of BSS in reverberant environments.
AB - Despite several recent proposals to achieve blind source separation (BSS) for realistic acoustic signals, the separation performance is still not good enough. In particular, when the impulse responses are long, performance is highly limited. In this paper, we consider a two-input, two-output convolutive BSS problem. First, we show that it is not good to be constrained by the condition T > P, where T is the frame length of the DFT and P is the length of the room impulse responses. We show that there is an optimum frame size that is determined by the trade-off between maintaining the number of samples in each frequency bin to estimate statistics and covering the whole reverberation. We also clarify the reason for the poor performance of BSS in long reverberant environments, highlighting that the framework of BSS works as two sets of frequency-domain adaptive beamformers. Although BSS can reduce reverberant sounds to some extent like adaptive beamformers, they mainly remove the sounds from the jammer direction. This is the reason for the difficulty of BSS in reverberant environments.
KW - Blind source separation
KW - Convolutive mixture
KW - Frame size
KW - Frequency domain
KW - Independent component analysis
KW - Reverberant speech
UR - http://www.scopus.com/inward/record.url?scp=0037367812&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0037367812&partnerID=8YFLogxK
U2 - 10.1109/TSA.2003.809193
DO - 10.1109/TSA.2003.809193
M3 - Article
AN - SCOPUS:0037367812
SN - 1063-6676
VL - 11
SP - 109
EP - 116
JO - IEEE Transactions on Speech and Audio Processing
JF - IEEE Transactions on Speech and Audio Processing
IS - 2
ER -