TY - JOUR
T1 - Bayesian modelling of the speech spectrum using mixture of Gaussians
AU - Zolfaghari, Parham
AU - Watanabe, Shinji
AU - Nakamura, Atsushi
AU - Katagiri, Shigeru
PY - 2004
Y1 - 2004
N2 - This paper presents a method for modelling the speech spectral envelope using a mixture of Gaussians (MOG). A novel variational Bayesian (VB) framework for Gaussian mixture modelling of a histogram enables the derivation of an objective function that can be used to simultaneously optimise both model parameter distributions and model structure. A histogram representation of the STRAIGHT spectral envelope, which is free of glottal excitation information, is used for parametrisation using this MOG model. This results in a parameterisation scheme that purely models the vocal tract resonant characteristics. Maximum likelihood (ML) and variational Bayesian (VB) solutions of the mixture model on histogram data are found using an iterative algorithm. A comparison between ML-MOG and VB-MOG spectral modelling is carried out using spectral distortion measures and mean opinion scores (MOS). The main advantages of VB-MOG highlighted in this paper include better modelling using fewer Gaussians in the mixture resulting in better correspondence of Gaussians and formant-like peaks, and an objective measure of the number of Gaussians required to best fit the spectral envelope.
AB - This paper presents a method for modelling the speech spectral envelope using a mixture of Gaussians (MOG). A novel variational Bayesian (VB) framework for Gaussian mixture modelling of a histogram enables the derivation of an objective function that can be used to simultaneously optimise both model parameter distributions and model structure. A histogram representation of the STRAIGHT spectral envelope, which is free of glottal excitation information, is used for parametrisation using this MOG model. This results in a parameterisation scheme that purely models the vocal tract resonant characteristics. Maximum likelihood (ML) and variational Bayesian (VB) solutions of the mixture model on histogram data are found using an iterative algorithm. A comparison between ML-MOG and VB-MOG spectral modelling is carried out using spectral distortion measures and mean opinion scores (MOS). The main advantages of VB-MOG highlighted in this paper include better modelling using fewer Gaussians in the mixture resulting in better correspondence of Gaussians and formant-like peaks, and an objective measure of the number of Gaussians required to best fit the spectral envelope.
UR - http://www.scopus.com/inward/record.url?scp=4544260276&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=4544260276&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:4544260276
SN - 1520-6149
VL - 1
SP - I553-I556
JO - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
JF - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
T2 - Proceedings - IEEE International Conference on Acoustics, Speech, and Signal Processing
Y2 - 17 May 2004 through 21 May 2004
ER -