Abstract
In this paper, a novel index combination method for spoken term detection is proposed. In our method, outputs from four different recognizers (word, syllable, word-syllable, and fragment recognizer) are combined into one confusion network. A novel index-selection method for the multiple index-combination method is then used to suppress the increase of the index size. Two methods are proposed to reduce index size: (1) arc selection and (2) unit selection, both of which are based on an OOV-region classifier score. Experimental results with 39 hours of Japanese lecture recordings showed that the index-selection method achieved a 22% reduction of index size of the best confusion network while maintaining its high accuracy. Compared with the best phoneme-based index from a single recognizer, the proposed method achieved a 25.0% and 14.8% relative error reduction for IV and OOV queries without increasing the index size.
Original language | English |
---|---|
Title of host publication | ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings |
Pages | 8540-8544 |
Number of pages | 5 |
DOIs | |
Publication status | Published - 2013 Oct 18 |
Externally published | Yes |
Event | 2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 - Vancouver, BC Duration: 2013 May 26 → 2013 May 31 |
Other
Other | 2013 38th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2013 |
---|---|
City | Vancouver, BC |
Period | 13/5/26 → 13/5/31 |
Keywords
- keyword spotting
- out-of-vocabulary detection
- Spoken term detection
ASJC Scopus subject areas
- Signal Processing
- Software
- Electrical and Electronic Engineering