Abstract
In this paper we propose a speech interface function, called speech starter, that enables noise-robust endpoint (utterance) detection for speech recognition. When current speech recognizers are used in a noisy environment, a typical recognition error is caused by incorrect endpoints because their automatic detection is likely to be disturbed by non-stationary noises. The speech starter function enables a user to specify the beginning of each utterance by uttering a filler with a filled pause, which is used as a trigger to start speech-recognition processes. Since filled pauses can be detected robustly in a noisy environment, practical endpoint detection is achieved. Speech starter also offers the advantage of providing a hands-free speech interface and it is user-friendly because a speaker tends to utter filled pauses (e.g., "er.") at the beginning of utterances when hesitating in human-human communication. Experimental results from a 10-dB-SNR noisy environment show that the recognition error rate with speech starter was lower than with conventional endpoint-detection methods.
Original language | English |
---|---|
Title of host publication | EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology |
Publisher | International Speech Communication Association |
Pages | 1237-1240 |
Number of pages | 4 |
Publication status | Published - 2003 |
Event | 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 - Geneva, Switzerland Duration: 2003 Sept 1 → 2003 Sept 4 |
Other
Other | 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003 |
---|---|
Country/Territory | Switzerland |
City | Geneva |
Period | 03/9/1 → 03/9/4 |
ASJC Scopus subject areas
- Computer Science Applications
- Software
- Linguistics and Language
- Communication