Listening to two simultaneous speeches

Hiroshi G. Okuno, Tomohiro Nakatani, Takeshi Kawabata

Research output: Contribution to journalArticlepeer-review

15 Citations (Scopus)


Speech stream segregation is presented as a new speech enhancement for automatic speech recognition. Two issues are addressed: speech stream segregation from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition. Speech stream segregation is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting non-harmonic residue for non-harmonic parts of a group. The main problem in interfacing speech stream segregation with hidden Markov model (HMM)-based speech recognition is how to improve the degradation of recognition performance due to spectral distortion of segregated sounds, which is caused mainly by transfer function of a binaural input. Our solution is to re-train the parameters of HMM with training data binauralized for four directions. Experiments with 500 mixtures of two women's utterances of an isolated word showed that the error reduction rate of the 1-best/10-best word recognition of each woman's utterance is, on average, 64% and 75%, respectively.

Original languageEnglish
Pages (from-to)299-310
Number of pages12
JournalSpeech Communication
Issue number3
Publication statusPublished - 1999 Apr
Externally publishedYes

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering
  • Experimental and Cognitive Psychology
  • Linguistics and Language


Dive into the research topics of 'Listening to two simultaneous speeches'. Together they form a unique fingerprint.

Cite this