New speech enhancement: Speech stream segregation

Hiroshi G. Okuno*, Tomohiro Nakatani, Takeshi Kawabata

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)


Speech stream segregation is presented as a new speech enhancement for automatic speech recognition. Two issues are addressed: speech stream segregation from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition. Speech stream segregation is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting non-harmonic residue for non-harmonic parts of groups. The main problem in interfacing speech stream segregation with HMM-based speech recognition is how to improve the degradation of recognition performance due to special distortion of segregated sounds, which is caused mainly by transfer function of a binaural input. Our solution is to re-train the parameters of HMM with training data binauralized for four directions. Experiments with 500 mixtures of two women's utterances of a word showed that the cumulative accuracy of word recognition up to the 10th candidate of each woman's utterance is, on average, 75%.

Original languageEnglish
Title of host publicationInternational Conference on Spoken Language Processing, ICSLP, Proceedings
Editors Anon
Place of PublicationPiscataway, NJ, United States
Number of pages4
Publication statusPublished - 1996
Externally publishedYes
EventProceedings of the 1996 International Conference on Spoken Language Processing, ICSLP. Part 1 (of 4) - Philadelphia, PA, USA
Duration: 1996 Oct 31996 Oct 6


OtherProceedings of the 1996 International Conference on Spoken Language Processing, ICSLP. Part 1 (of 4)
CityPhiladelphia, PA, USA

ASJC Scopus subject areas

  • Computer Science(all)


Dive into the research topics of 'New speech enhancement: Speech stream segregation'. Together they form a unique fingerprint.

Cite this