Abstract
This paper describes a novel speech-interface function, called "speech spotter", which enablesauserto enter voice commands into a speech recognizer in the midst of natural human-human conversation. In the past, it has been difficult to use automatic speech recognition in human-human conversation since it was not easy to judge, from only microphone input, whether a user was speaking to another person or a speech recognizer. We solve this problem by using two kinds of nonverbal speech information: a filled pause (a vowel-lengthening hesitation like "er⋯") and voice pitch. Only when a user utters a voice command with a high pitch just after a filled pause is the voice command accepted by the speech recognizer. By using this speech-spotter function, we have built two application systems: an on-demand information system for assisting human-human conversation and a music-playback system for enriching telephone conversation. The results from using these systems have shown that the speech-spotter function is robust and convenient enough to be used in face-to-face or cellular-phone conversations.
Original language | English |
---|---|
Pages | 1533-1536 |
Number of pages | 4 |
Publication status | Published - 2004 Jan 1 |
Event | 8th International Conference on Spoken Language Processing, ICSLP 2004 - Jeju, Jeju Island, Korea, Republic of Duration: 2004 Oct 4 → 2004 Oct 8 |
Other
Other | 8th International Conference on Spoken Language Processing, ICSLP 2004 |
---|---|
Country/Territory | Korea, Republic of |
City | Jeju, Jeju Island |
Period | 04/10/4 → 04/10/8 |
ASJC Scopus subject areas
- Language and Linguistics
- Linguistics and Language