Listening to two simultaneous speeches

Hiroshi G. Okuno, Tomohiro Nakatani, Takeshi Kawabata

研究成果: Article査読

15 被引用数 (Scopus)

抄録

Speech stream segregation is presented as a new speech enhancement for automatic speech recognition. Two issues are addressed: speech stream segregation from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition. Speech stream segregation is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting non-harmonic residue for non-harmonic parts of a group. The main problem in interfacing speech stream segregation with hidden Markov model (HMM)-based speech recognition is how to improve the degradation of recognition performance due to spectral distortion of segregated sounds, which is caused mainly by transfer function of a binaural input. Our solution is to re-train the parameters of HMM with training data binauralized for four directions. Experiments with 500 mixtures of two women's utterances of an isolated word showed that the error reduction rate of the 1-best/10-best word recognition of each woman's utterance is, on average, 64% and 75%, respectively.

本文言語English
ページ(範囲)299-310
ページ数12
ジャーナルSpeech Communication
27
3
DOI
出版ステータスPublished - 1999 4月
外部発表はい

ASJC Scopus subject areas

  • 信号処理
  • 電子工学および電気工学
  • 実験心理学および認知心理学
  • 言語学および言語

フィンガープリント

「Listening to two simultaneous speeches」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル