TY - GEN
T1 - Perceptual similarity measurement of speech by combination of acoustic features
AU - Adachi, Yoshihiro
AU - Kawamoto, Shinichi
AU - Morishima, Shigeo
AU - Nakamura, Satoshi
PY - 2008
Y1 - 2008
N2 - Future cast system is a new entertainment system where participant's face is captured and rendered into the movie as an instant Computer Graphics (CG) movie star, which had been first exhibited at the 2005 World Exposition in Aichi Japan. We are working to add new functionality which enables mapping not only faces but also speech individualities to the cast. Our approach is to find a speaker with the closest speech individuality and apply voice conversion. This paper investigates acoustic features to estimate perceptual similarity of speech individuality. We propose a method linearly combined eight acoustic features related to the perception of speech individualities. The proposed method optimizes weights for the acoustic features considering perceptual similarities. We have evaluated performance of our method with Spearman's rank correlation coefficients to perceptual similarities. As the results, the experiments evidenced that the proposed method achieves a correlation coefficient of 0.66.
AB - Future cast system is a new entertainment system where participant's face is captured and rendered into the movie as an instant Computer Graphics (CG) movie star, which had been first exhibited at the 2005 World Exposition in Aichi Japan. We are working to add new functionality which enables mapping not only faces but also speech individualities to the cast. Our approach is to find a speaker with the closest speech individuality and apply voice conversion. This paper investigates acoustic features to estimate perceptual similarity of speech individuality. We propose a method linearly combined eight acoustic features related to the perception of speech individualities. The proposed method optimizes weights for the acoustic features considering perceptual similarities. We have evaluated performance of our method with Spearman's rank correlation coefficients to perceptual similarities. As the results, the experiments evidenced that the proposed method achieves a correlation coefficient of 0.66.
KW - Acoustic correlators
KW - Speaker recognition
KW - Speech analysis
UR - http://www.scopus.com/inward/record.url?scp=51449118243&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=51449118243&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2008.4518746
DO - 10.1109/ICASSP.2008.4518746
M3 - Conference contribution
AN - SCOPUS:51449118243
SN - 1424414849
SN - 9781424414840
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 4861
EP - 4864
BT - 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
T2 - 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
Y2 - 31 March 2008 through 4 April 2008
ER -