TY - GEN
T1 - High accurate model-integration-based voice conversion using dynamic features and model structure optimization
AU - Saito, Daisuke
AU - Watanabe, Shinji
AU - Nakamura, Atsushi
AU - Minematsu, Nobuaki
PY - 2011
Y1 - 2011
N2 - This paper combines a parameter generation algorithm and a model optimization approach with the model-integration-based voice conversion (MIVC). We have proposed probabilistic integration of a joint density model and a speaker model to mitigate a requirement of the parallel corpus in voice conversion (VC) based on Gaussian Mixture Model (GMM). As well as the other VC methods, MIVC also suffers from the problems; the degradation of the perceptual quality caused by the discontinuity through the parameter trajectory, and the difficulty to optimize the model structure. To solve the problems, this paper proposes a parameter generation algorithm constrained by dynamic features for the first problem and an information criterion including mutual influences between the joint density model and the speaker model for the second problem. Experimental results show that the first approach improved the performance of VC and the second approach appropriately predicted the optimal number of mixtures of the speaker model for our MIVC.
AB - This paper combines a parameter generation algorithm and a model optimization approach with the model-integration-based voice conversion (MIVC). We have proposed probabilistic integration of a joint density model and a speaker model to mitigate a requirement of the parallel corpus in voice conversion (VC) based on Gaussian Mixture Model (GMM). As well as the other VC methods, MIVC also suffers from the problems; the degradation of the perceptual quality caused by the discontinuity through the parameter trajectory, and the difficulty to optimize the model structure. To solve the problems, this paper proposes a parameter generation algorithm constrained by dynamic features for the first problem and an information criterion including mutual influences between the joint density model and the speaker model for the second problem. Experimental results show that the first approach improved the performance of VC and the second approach appropriately predicted the optimal number of mixtures of the speaker model for our MIVC.
KW - Voice conversion
KW - dynamic features
KW - information criterion
KW - probabilistic integration
UR - http://www.scopus.com/inward/record.url?scp=80051615070&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80051615070&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2011.5947373
DO - 10.1109/ICASSP.2011.5947373
M3 - Conference contribution
AN - SCOPUS:80051615070
SN - 9781457705397
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 4576
EP - 4579
BT - 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
T2 - 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
Y2 - 22 May 2011 through 27 May 2011
ER -