TY - GEN
T1 - Blocked Gibbs sampling based multi-scale mixture model for speaker clustering on noisy data
AU - Tawara, Naohiro
AU - Ogawa, Tetsuji
AU - Watanabe, Shinji
AU - Nakamura, Atsushi
AU - Kobayashi, Tetsunori
PY - 2013
Y1 - 2013
N2 - A novel sampling method is proposed for estimating a continuous multi-scale mixture model. The multi-scale mixture models we assume have a hierarchical structure in which each component of the mixture is represented by a Gaussian mixture model (GMM). In speaker modeling from speech, this GMM represents intra-speaker dynamics derived from the difference in the attributes such as phoneme contexts and the existence of non-stationary noise and the mixture of GMMs (MoGMMs) represents inter-speaker dynamics derived from the difference in speakers. Gibbs sampling is a powerful technique to estimate such hierarchically structured models but can easily induce the local optima problem depending on its use especially when the elemental GMMs are complex in structure. To solve this problem, a highly accurate and robust sampling method based on the blocked Gibbs sampling and iterative conditional modes (ICM) is proposed and effectively applied for reducing a singularity solution given in the model with complex multi-modal distributions. In speaker clustering experiments under non-stationary noise, the proposed sampling-based model estimation improved the clustering performance by 17% on average compared to the conventional sampling-based methods.
AB - A novel sampling method is proposed for estimating a continuous multi-scale mixture model. The multi-scale mixture models we assume have a hierarchical structure in which each component of the mixture is represented by a Gaussian mixture model (GMM). In speaker modeling from speech, this GMM represents intra-speaker dynamics derived from the difference in the attributes such as phoneme contexts and the existence of non-stationary noise and the mixture of GMMs (MoGMMs) represents inter-speaker dynamics derived from the difference in speakers. Gibbs sampling is a powerful technique to estimate such hierarchically structured models but can easily induce the local optima problem depending on its use especially when the elemental GMMs are complex in structure. To solve this problem, a highly accurate and robust sampling method based on the blocked Gibbs sampling and iterative conditional modes (ICM) is proposed and effectively applied for reducing a singularity solution given in the model with complex multi-modal distributions. In speaker clustering experiments under non-stationary noise, the proposed sampling-based model estimation improved the clustering performance by 17% on average compared to the conventional sampling-based methods.
KW - Fully Bayesian approach
KW - blocked Gibbs sampling
KW - iterative conditional modes
KW - multi-scale mixture model
KW - speaker clustering
UR - http://www.scopus.com/inward/record.url?scp=84893299796&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84893299796&partnerID=8YFLogxK
U2 - 10.1109/MLSP.2013.6661902
DO - 10.1109/MLSP.2013.6661902
M3 - Conference contribution
AN - SCOPUS:84893299796
SN - 9781479911806
T3 - IEEE International Workshop on Machine Learning for Signal Processing, MLSP
BT - 2013 IEEE International Workshop on Machine Learning for Signal Processing - Proceedings of MLSP 2013
T2 - 2013 16th IEEE International Workshop on Machine Learning for Signal Processing, MLSP 2013
Y2 - 22 September 2013 through 25 September 2013
ER -