Back-channel feedback generation using linguistic and nonlinguistic information and its application to spoken dialogue system

Shinya Fujie*, Kenta Fukushima, Tetsunori Kobayashi

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

26 Citations (Scopus)

Abstract

A conversational system which can generate back-channel feedback of proper content in proper timing by utilizing FST based early detectable decoder and prosody analysis is proposed. In human conversation, we do not take turns in order, but we give the back-channel feedbacks during the partner's speech. By receiving these feedbacks, speakers can know the partner's state and feel comfortable to speak. Therefore, spoken dialogue systems should be able to generate back-channel feedbacks in synchronization with user's utterances. The appropriateness of these feedbacks depends on the contents and the timings. The contents strongly depend on the contents of the dialogue partner's utterance, and the timings strongly depend on the prosody of the partner's utterance. In order to determine the content of the feedback earlier than the end of the utterance, we use finite state transducer based speech recognizer. We used prosody information, especially F0 and power of the utterance, to extract the proper timing of the feedback. We implemented these modules and applied them to the spoken dialogue system on the humanoid robot ROBISUKE. Experimental results show the effectiveness of our methods.

Original languageEnglish
Pages889-892
Number of pages4
Publication statusPublished - 2005 Dec 1
Event9th European Conference on Speech Communication and Technology - Lisbon, Portugal
Duration: 2005 Sept 42005 Sept 8

Conference

Conference9th European Conference on Speech Communication and Technology
Country/TerritoryPortugal
CityLisbon
Period05/9/405/9/8

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint

Dive into the research topics of 'Back-channel feedback generation using linguistic and nonlinguistic information and its application to spoken dialogue system'. Together they form a unique fingerprint.

Cite this