TY - JOUR
T1 - Effect of intra-phrase position on acceptability of change in segment duration in sentence speech
AU - Muto, Makiko
AU - Kato, Hiroaki
AU - Tsuzaki, Minoru
AU - Sagisaka, Yoshinori
N1 - Funding Information:
This work was supported in part by the National Institute of Information and Communications Technology.
PY - 2005/4
Y1 - 2005/4
N2 - For use as a naturalness criterion for duration rules in speech synthesis, human acceptability of change in segment duration is investigated with regard to the temporal position within a phrase. Three perceptual experiments are carried out to introduce variations in the attribute and context of a phrase in sentence speech: (1) the length of a phrase and the type of a phrase accent (2 lengths × 3 types), (2) variation in carrier sentence (3 carriers + 1 without carrier), and (3) the position of a phrase in a breath group (two positions). In total, 22 listeners evaluate the acceptability of resynthesized speech stimuli in which one of the vowel segments was either lengthened or shortened by up to 50 ms. Overall results show that a duration change in the phrase-initial segment is generally the least acceptable and that in the phrase-final segment the most acceptable, with that in a phrase at intermediate positions in between. This position-dependent tendency is observed regardless of the variations in phrase length, accent type, carrier sentence, presence of carrier sentence, and position in a breath group. These results suggest that the error criteria of duration modeling should be reconsidered by taking into account such perceptual characteristics in order to improve temporal naturalness in synthesized speech.
AB - For use as a naturalness criterion for duration rules in speech synthesis, human acceptability of change in segment duration is investigated with regard to the temporal position within a phrase. Three perceptual experiments are carried out to introduce variations in the attribute and context of a phrase in sentence speech: (1) the length of a phrase and the type of a phrase accent (2 lengths × 3 types), (2) variation in carrier sentence (3 carriers + 1 without carrier), and (3) the position of a phrase in a breath group (two positions). In total, 22 listeners evaluate the acceptability of resynthesized speech stimuli in which one of the vowel segments was either lengthened or shortened by up to 50 ms. Overall results show that a duration change in the phrase-initial segment is generally the least acceptable and that in the phrase-final segment the most acceptable, with that in a phrase at intermediate positions in between. This position-dependent tendency is observed regardless of the variations in phrase length, accent type, carrier sentence, presence of carrier sentence, and position in a breath group. These results suggest that the error criteria of duration modeling should be reconsidered by taking into account such perceptual characteristics in order to improve temporal naturalness in synthesized speech.
UR - http://www.scopus.com/inward/record.url?scp=15844399704&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=15844399704&partnerID=8YFLogxK
U2 - 10.1016/j.specom.2004.11.004
DO - 10.1016/j.specom.2004.11.004
M3 - Article
AN - SCOPUS:15844399704
SN - 0167-6393
VL - 45
SP - 361
EP - 372
JO - Speech Communication
JF - Speech Communication
IS - 4
ER -