TY - GEN
T1 - Probabilistic integration of joint density model and speaker model for voice conversion
AU - Saito, Daisuke
AU - Watanabe, Shinji
AU - Nakamura, Atsushi
AU - Minematsu, Nobuaki
N1 - Funding Information:
The authors would like to thank Dr. T. Toda of NAIST, Japan, for fruitful discussion on VC approaches.
PY - 2010
Y1 - 2010
N2 - This paper describes a novel approach to voice conversion using both a joint density model and a speaker model. In voice conversion studies, approaches based on Gaussian Mixture Model (GMM) with probabilistic densities of joint vectors of a source and a target speakers are widely used to estimate a transformation. However, for sufficient quality, they require a parallel corpus which contains plenty of utterances with the same linguistic content spoken by both the speakers. In addition, the joint density GMM methods often suffer from over-training effects when the amount of training data is small. To compensate for these problems, we propose a novel approach to integrate the speaker GMM of the target with the joint density model using probabilistic formulation. The proposed method trains the joint density model with a few parallel utterances, and the speaker model with non-parallel data of the target, independently. It eases the burden on the source speaker. Experiments demonstrate the effectiveness of the proposed method, especially when the amount of the parallel corpus is small.
AB - This paper describes a novel approach to voice conversion using both a joint density model and a speaker model. In voice conversion studies, approaches based on Gaussian Mixture Model (GMM) with probabilistic densities of joint vectors of a source and a target speakers are widely used to estimate a transformation. However, for sufficient quality, they require a parallel corpus which contains plenty of utterances with the same linguistic content spoken by both the speakers. In addition, the joint density GMM methods often suffer from over-training effects when the amount of training data is small. To compensate for these problems, we propose a novel approach to integrate the speaker GMM of the target with the joint density model using probabilistic formulation. The proposed method trains the joint density model with a few parallel utterances, and the speaker model with non-parallel data of the target, independently. It eases the burden on the source speaker. Experiments demonstrate the effectiveness of the proposed method, especially when the amount of the parallel corpus is small.
KW - Joint density model
KW - Probabilistic unification
KW - Speaker model
KW - Voice conversion
UR - http://www.scopus.com/inward/record.url?scp=79959834571&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=79959834571&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:79959834571
T3 - Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
SP - 1728
EP - 1731
BT - Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010
PB - International Speech Communication Association
ER -