TY - GEN
T1 - Gibbs sampling based multi-scale mixture model for speaker clustering
AU - Watanabe, Shinji
AU - Mochihashi, Daichi
AU - Hori, Takaaki
AU - Nakamura, Atsushi
PY - 2011
Y1 - 2011
N2 - The aim of this work is to apply a sampling approach to speech modeling, and propose a Gibbs sampling based Multi-scale Mixture Model (M3). The proposed approach focuses on the multi-scale property of speech dynamics, i.e., dynamics in speech can be observed on, for instance, short-time acoustical, linguistic-segmental, and utterance-wise temporal scales. M 3 is an extension of the Gaussian mixture model and is considered a hierarchical mixture model, where mixture components in each time scale will change at intervals of the corresponding time unit. We derive a fully Bayesian treatment of the multi-scale mixture model based on Gibbs sampling. The advantage of the proposed model is that each speaker cluster can be precisely modeled based on the Gaussian mixture model unlike conventional single-Gaussian based speaker clustering (e.g., using the Bayesian Information Criterion (BIC)). In addition, Gibbs sampling offers the potential to avoid a serious local optimum problem. Speaker clustering experiments confirmed these advantages and obtained a significant improvement over the conventional BIC based approaches.
AB - The aim of this work is to apply a sampling approach to speech modeling, and propose a Gibbs sampling based Multi-scale Mixture Model (M3). The proposed approach focuses on the multi-scale property of speech dynamics, i.e., dynamics in speech can be observed on, for instance, short-time acoustical, linguistic-segmental, and utterance-wise temporal scales. M 3 is an extension of the Gaussian mixture model and is considered a hierarchical mixture model, where mixture components in each time scale will change at intervals of the corresponding time unit. We derive a fully Bayesian treatment of the multi-scale mixture model based on Gibbs sampling. The advantage of the proposed model is that each speaker cluster can be precisely modeled based on the Gaussian mixture model unlike conventional single-Gaussian based speaker clustering (e.g., using the Bayesian Information Criterion (BIC)). In addition, Gibbs sampling offers the potential to avoid a serious local optimum problem. Speaker clustering experiments confirmed these advantages and obtained a significant improvement over the conventional BIC based approaches.
KW - Fully Bayesian approach
KW - Gaussian mixture
KW - Gibbs sampling
KW - multi-scale mixture model
KW - speaker clustering
UR - http://www.scopus.com/inward/record.url?scp=80051606569&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80051606569&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2011.5947360
DO - 10.1109/ICASSP.2011.5947360
M3 - Conference contribution
AN - SCOPUS:80051606569
SN - 9781457705397
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 4524
EP - 4527
BT - 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
T2 - 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
Y2 - 22 May 2011 through 27 May 2011
ER -