Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment

Hiroshi Sawada*, Shoko Araki, Shoji Makino

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

293 Citations (Scopus)

Abstract

This paper presents a blind source separation method for convolutive mixtures of speech/audio sources. The method can even be applied to an underdetermined case where there are fewer microphones than sources. The separation operation is performed in the frequency domain and consists of two stages. In the first stage, frequency-domain mixture samples are clustered into each source by an expectationmaximization (EM) algorithm. Since the clustering is performed in a frequency bin-wise manner, the permutation ambiguities of the bin-wise clustered samples should be aligned. This is solved in the second stage by using the probability on how likely each sample belongs to the assigned class. This two-stage structure makes it possible to attain a good separation even under reverberant conditions. Experimental results for separating four speech signals with three microphones under reverberant conditions show the superiority of the new method over existing methods. We also report separation results for a benchmark data set and live recordings of speech mixtures.

Original languageEnglish
Article number5473129
Pages (from-to)516-527
Number of pages12
JournalIEEE Transactions on Audio, Speech and Language Processing
Volume19
Issue number3
DOIs
Publication statusPublished - 2011
Externally publishedYes

Keywords

  • Blind source separation (BSS)
  • convolutive mixture
  • expectationmaximization (EM) algorithm
  • permutation problem
  • short-time Fourier transform (STFT)
  • sparseness
  • timefrequency (TF) masking

ASJC Scopus subject areas

  • Acoustics and Ultrasonics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Underdetermined convolutive blind source separation via frequency bin-wise clustering and permutation alignment'. Together they form a unique fingerprint.

Cite this