Abstract
Speech stream segregation is presented as a new speech enhancement for automatic speech recognition. Two issues are addressed: speech stream segregation from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition. Speech stream segregation is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting non-harmonic residue for non-harmonic parts of groups. The main problem in interfacing speech stream segregation with HMM-based speech recognition is how to improve the degradation of recognition performance due to special distortion of segregated sounds, which is caused mainly by transfer function of a binaural input. Our solution is to re-train the parameters of HMM with training data binauralized for four directions. Experiments with 500 mixtures of two women's utterances of a word showed that the cumulative accuracy of word recognition up to the 10th candidate of each woman's utterance is, on average, 75%.
Original language | English |
---|---|
Title of host publication | International Conference on Spoken Language Processing, ICSLP, Proceedings |
Editors | Anon |
Place of Publication | Piscataway, NJ, United States |
Publisher | IEEE |
Pages | 2356-2359 |
Number of pages | 4 |
Volume | 4 |
Publication status | Published - 1996 |
Externally published | Yes |
Event | Proceedings of the 1996 International Conference on Spoken Language Processing, ICSLP. Part 1 (of 4) - Philadelphia, PA, USA Duration: 1996 Oct 3 → 1996 Oct 6 |
Other
Other | Proceedings of the 1996 International Conference on Spoken Language Processing, ICSLP. Part 1 (of 4) |
---|---|
City | Philadelphia, PA, USA |
Period | 96/10/3 → 96/10/6 |
ASJC Scopus subject areas
- Computer Science(all)