TY - JOUR
T1 - Nonparametric Bayesian sparse factor analysis for frequency domain blind source separation without permutation ambiguity Sparse modeling for speech and audio processing
AU - Nagira, Kohei
AU - Otsuka, Takuma
AU - Okuno, Hiroshi G.
PY - 2013
Y1 - 2013
N2 - Blind source separation (BSS) and sound activity detection (SAD) from a sound source mixture with minimum prior information are two major requirements for computational auditory scene analysis that recognizes auditory events in many environments. In daily environments, BSS suffers from many problems such as reverberation, a permutation problem in frequency-domain processing, and uncertainty about the number of sources in the observed mixture. While many conventional BSS methods resort to a cascaded combination of subprocesses, e.g., frequency-wise separation and permutation resolution, to overcome these problems, their outcomes may be affected by the worst subprocess. Our aim is to develop a unified framework to cope with these problems. Our method, called permutation-free infinite sparse factor analysis (PF-ISFA), is based on a nonparametric Bayesian framework that enables inference without a pre-determined number of sources. It solves BSS, SAD and the permutation problem at the same time. Our method has two key ideas: unified source activities for all the frequency bins and the activation probabilities of all the frequency bins of all the sources. Experiments were carried out to evaluate the separation performance and the SAD performance under four reverberant conditions. For separation performance in the BSS-EVAL criteria, our method outperformed conventional complex ISFA under all conditions. For SAD performance, our method outperformed the conventional method by 5.9-0.5% in F-measure under the condition RT20 = 30-600 [ms], respectively.
AB - Blind source separation (BSS) and sound activity detection (SAD) from a sound source mixture with minimum prior information are two major requirements for computational auditory scene analysis that recognizes auditory events in many environments. In daily environments, BSS suffers from many problems such as reverberation, a permutation problem in frequency-domain processing, and uncertainty about the number of sources in the observed mixture. While many conventional BSS methods resort to a cascaded combination of subprocesses, e.g., frequency-wise separation and permutation resolution, to overcome these problems, their outcomes may be affected by the worst subprocess. Our aim is to develop a unified framework to cope with these problems. Our method, called permutation-free infinite sparse factor analysis (PF-ISFA), is based on a nonparametric Bayesian framework that enables inference without a pre-determined number of sources. It solves BSS, SAD and the permutation problem at the same time. Our method has two key ideas: unified source activities for all the frequency bins and the activation probabilities of all the frequency bins of all the sources. Experiments were carried out to evaluate the separation performance and the SAD performance under four reverberant conditions. For separation performance in the BSS-EVAL criteria, our method outperformed conventional complex ISFA under all conditions. For SAD performance, our method outperformed the conventional method by 5.9-0.5% in F-measure under the condition RT20 = 30-600 [ms], respectively.
UR - http://www.scopus.com/inward/record.url?scp=84887087700&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84887087700&partnerID=8YFLogxK
U2 - 10.1186/1687-4722-2013-4
DO - 10.1186/1687-4722-2013-4
M3 - Article
AN - SCOPUS:84887087700
SN - 1687-4714
VL - 2013
JO - Eurasip Journal on Audio, Speech, and Music Processing
JF - Eurasip Journal on Audio, Speech, and Music Processing
IS - 1
M1 - 4
ER -