TY - GEN
T1 - F0 control characterization by perceptual impressions on speaking attitudes using Multiple Dimensional Scaling analysis
AU - Kokenawa, Yoko
AU - Tsuzaki, Minoru
AU - Kato, Hiroaki
AU - Sagisaka, Yoshinori
PY - 2005/1/1
Y1 - 2005/1/1
N2 - Aiming at prosody control for speech synthesis expressing speaking attitudes, F0 shapes were characterized by their perceptual impressions. To directly correlate F0 shapes with perceptual impressions, single word utterances "n" extracted from daily conversations were employed. The analysis showed that speaking attitudes were manifested in the global F0 control of "n" as the differences of their average height (high-low) and dynamic patterns (rise, flat, fall and rise&fall). Next, controlled utterances of "n" were perceptually examined through Multiple Dimensional Scaling analysis to confirm F0 control freedoms found in the analysis. The result showed the three-dimensional structure of a perceptual impression space and factor dependent F0 control characteristics. The positive-negative attitude can be controlled by average F0 height while those of confident-doubtful or allowable -unacceptable are manifested through dynamic F0 patterns. These findings provide new possibilities of systematic F0 control for conversational speech synthesis with speaking attitudes using corpus-based approach.
AB - Aiming at prosody control for speech synthesis expressing speaking attitudes, F0 shapes were characterized by their perceptual impressions. To directly correlate F0 shapes with perceptual impressions, single word utterances "n" extracted from daily conversations were employed. The analysis showed that speaking attitudes were manifested in the global F0 control of "n" as the differences of their average height (high-low) and dynamic patterns (rise, flat, fall and rise&fall). Next, controlled utterances of "n" were perceptually examined through Multiple Dimensional Scaling analysis to confirm F0 control freedoms found in the analysis. The result showed the three-dimensional structure of a perceptual impression space and factor dependent F0 control characteristics. The positive-negative attitude can be controlled by average F0 height while those of confident-doubtful or allowable -unacceptable are manifested through dynamic F0 patterns. These findings provide new possibilities of systematic F0 control for conversational speech synthesis with speaking attitudes using corpus-based approach.
UR - http://www.scopus.com/inward/record.url?scp=33646770621&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33646770621&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2005.1415103
DO - 10.1109/ICASSP.2005.1415103
M3 - Conference contribution
AN - SCOPUS:33646770621
SN - 0780388747
SN - 9780780388741
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - I273-I276
BT - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05
Y2 - 18 March 2005 through 23 March 2005
ER -