A GMM sound source model for blind speech separation in under-determined conditions

Yasuharu Hirasawa*, Naoki Yasuraoka, Toru Takahashi, Tetsuya Ogata, Hiroshi G. Okuno

*この研究の対応する著者

研究成果: Conference contribution

2 被引用数 (Scopus)

抄録

This paper focuses on blind speech separation in under-determined conditions, that is, in the case when there are more sound sources than microphones. We introduce a sound source model based on the Gaussian mixture model (GMM) to represent a speech signal in the time-frequency domain, and derive rules for updating the model parameters using the auxiliary function method. Our GMM sound source model consists of two kinds of Gaussians: sharp ones representing harmonic parts and smooth ones representing nonharmonic parts. Experimental results reveal that our method outperforms the method based on non-negative matrix factorization (NMF) by 0.7dB in the signal-to-distortion ratio (SDR), and by 1.7dB in the signal-to-interference ratio (SIR). This means that our method effectively removes interference coming from other talkers.

本文言語English
ホスト出版物のタイトルLatent Variable Analysis and Signal Separation - 10th International Conference, LVA/ICA 2012, Proceedings
ページ446-453
ページ数8
DOI
出版ステータスPublished - 2012
外部発表はい
イベント10th International Conference on Latent Variable Analysis and Signal Separation, LVA/ICA 2012 - Tel Aviv, Israel
継続期間: 2012 3月 122012 3月 15

出版物シリーズ

名前Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
7191 LNCS
ISSN(印刷版)0302-9743
ISSN(電子版)1611-3349

Conference

Conference10th International Conference on Latent Variable Analysis and Signal Separation, LVA/ICA 2012
国/地域Israel
CityTel Aviv
Period12/3/1212/3/15

ASJC Scopus subject areas

  • 理論的コンピュータサイエンス
  • コンピュータ サイエンス(全般)

フィンガープリント

「A GMM sound source model for blind speech separation in under-determined conditions」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル