Effect of intra-phrase position on acceptability of change in segment duration in sentence speech

Makiko Muto*, Hiroaki Kato, Minoru Tsuzaki, Yoshinori Sagisaka

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Citations (Scopus)


For use as a naturalness criterion for duration rules in speech synthesis, human acceptability of change in segment duration is investigated with regard to the temporal position within a phrase. Three perceptual experiments are carried out to introduce variations in the attribute and context of a phrase in sentence speech: (1) the length of a phrase and the type of a phrase accent (2 lengths × 3 types), (2) variation in carrier sentence (3 carriers + 1 without carrier), and (3) the position of a phrase in a breath group (two positions). In total, 22 listeners evaluate the acceptability of resynthesized speech stimuli in which one of the vowel segments was either lengthened or shortened by up to 50 ms. Overall results show that a duration change in the phrase-initial segment is generally the least acceptable and that in the phrase-final segment the most acceptable, with that in a phrase at intermediate positions in between. This position-dependent tendency is observed regardless of the variations in phrase length, accent type, carrier sentence, presence of carrier sentence, and position in a breath group. These results suggest that the error criteria of duration modeling should be reconsidered by taking into account such perceptual characteristics in order to improve temporal naturalness in synthesized speech.

Original languageEnglish
Pages (from-to)361-372
Number of pages12
JournalSpeech Communication
Issue number4
Publication statusPublished - 2005 Apr

ASJC Scopus subject areas

  • Software
  • Modelling and Simulation
  • Communication
  • Language and Linguistics
  • Linguistics and Language
  • Computer Vision and Pattern Recognition
  • Computer Science Applications


Dive into the research topics of 'Effect of intra-phrase position on acceptability of change in segment duration in sentence speech'. Together they form a unique fingerprint.

Cite this