TY - GEN
T1 - Expressing speaker's intentions through sentence-final intonations for Japanese conversational speech synthesis
AU - Iwata, Kazuhiko
AU - Kobayashi, Tetsunori
PY - 2012/12/1
Y1 - 2012/12/1
N2 - In this study, we investigated speaker's intentions that the listeners perceive through subtly different sentence-final intonations. Approximately 2,000 sentence utterances were recorded and the fundamental frequency (F0) contours at the last vowel of those sentences were classified through one of the standard clustering algorithms. There found various F0 contours, namely, not only simple rising and falling intonations but also rise-fall and fall-rise intonations. In order to reveal the relationship between the intonation and the intentions, 10 representative contours were selected on the basis of the results of the clustering. Using the selected contours, a subjective evaluation was conducted. Six Japanese sentences that could have different meanings according to the sentence-final intonations were synthesized and the F0 contour at the last vowel of each sentence was replaced with the contours. The results of the evaluation by nine listeners showed that, for example, a certain falling intonation could express the intention of the "conviction" and another one that slightly differ in the shape could convey "doubt." It was found that the subtle difference in the sentence-final F0 shape conveyed various nuances and connotations.
AB - In this study, we investigated speaker's intentions that the listeners perceive through subtly different sentence-final intonations. Approximately 2,000 sentence utterances were recorded and the fundamental frequency (F0) contours at the last vowel of those sentences were classified through one of the standard clustering algorithms. There found various F0 contours, namely, not only simple rising and falling intonations but also rise-fall and fall-rise intonations. In order to reveal the relationship between the intonation and the intentions, 10 representative contours were selected on the basis of the results of the clustering. Using the selected contours, a subjective evaluation was conducted. Six Japanese sentences that could have different meanings according to the sentence-final intonations were synthesized and the F0 contour at the last vowel of each sentence was replaced with the contours. The results of the evaluation by nine listeners showed that, for example, a certain falling intonation could express the intention of the "conviction" and another one that slightly differ in the shape could convey "doubt." It was found that the subtle difference in the sentence-final F0 shape conveyed various nuances and connotations.
KW - Sentence-final intonation
KW - Speaker's intention
KW - Speech synthesis
UR - http://www.scopus.com/inward/record.url?scp=84878387409&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84878387409&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:84878387409
SN - 9781622767595
T3 - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
SP - 442
EP - 445
BT - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
T2 - 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012
Y2 - 9 September 2012 through 13 September 2012
ER -