Pitch pattern clustering of user utterances in human-machine dialogue

Takashi Yoshimura*, Satoru Hayamizu, Hiroshi Ohmura, Kazuyo Tanaka

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

12 Citations (Scopus)


This paper argues about pitch pattern variations of user utterances in human-machine dialogue. For intelligent human-machine communication, it is essential that machines understand prosodic characteristics which imply a user's various attitude, emotion and intention beyond vocabulary. Our original focus is on particularly distinct pitch patterns and their roles in the actual dialogues. We used human-machine dialogues collected by a Wizard of Oz simulation. Many utterance segments belonged to clusters that were prosodically flat patterns. From the result, we considered that utterances which belonged to the other clusters and those which were far from the centroids included non-verbal information. In these utterances, there were talks to themselves and questions to the machine including emotional expressions of a puzzle or a surprise. These pitch patterns were not only rich in ups and downs, but also their slopes were upward, while the pitch pattern were generally even or a little downward. These results indicate that peculiar pitch period patterns show non-verbal expressions. In order to actually utilize such information on human-machine interactions, the representative pitch patterns should be investigated concerning their relationship to various types of communication.

Original languageEnglish
Number of pages4
Publication statusPublished - 1996
Externally publishedYes
EventProceedings of the 1996 International Conference on Spoken Language Processing, ICSLP. Part 1 (of 4) - Philadelphia, PA, USA
Duration: 1996 Oct 31996 Oct 6


OtherProceedings of the 1996 International Conference on Spoken Language Processing, ICSLP. Part 1 (of 4)
CityPhiladelphia, PA, USA

ASJC Scopus subject areas

  • Computer Science(all)


Dive into the research topics of 'Pitch pattern clustering of user utterances in human-machine dialogue'. Together they form a unique fingerprint.

Cite this