Fully Bayesian inference of multi-mixture Gaussian model and its evaluation using speaker clustering

Naohiro Tawara*, Tetsuji Ogawa, Shinji Watanabe, Tetsunori Kobayashi

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

6 Citations (Scopus)

Abstract

This study aims to verify effective optimization methods for estimating parametric, fully Bayesian models in speech processing. For that purpose, we investigate the impact of the difference in optimization methods for the multi-scale Gaussian mixture model, which is suitable for speaker clustering, on the clustering accuracy. The Markov chain Monte Carlo (MCMC)-based method was compared with the variational Bayesian method in the speaker clustering experiment; with a small amount of data, the MCMC-based method was more effective; with large scale data (more than one million samples), the difference between these methods in terms of the clustering accuracy decreased and the MCMC-based method was computationally efficient.

Original languageEnglish
Title of host publication2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Proceedings
Pages5253-5256
Number of pages4
DOIs
Publication statusPublished - 2012
Event2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012 - Kyoto, Japan
Duration: 2012 Mar 252012 Mar 30

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
ISSN (Print)1520-6149

Conference

Conference2012 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012
Country/TerritoryJapan
CityKyoto
Period12/3/2512/3/30

Keywords

  • Gibbs sampling
  • Speaker clustering
  • multi-scale Gaussian mixture model
  • variational Bayesian method

ASJC Scopus subject areas

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Fully Bayesian inference of multi-mixture Gaussian model and its evaluation using speaker clustering'. Together they form a unique fingerprint.

Cite this