TY - JOUR
T1 - Listening to two simultaneous speeches
AU - Okuno, Hiroshi G.
AU - Nakatani, Tomohiro
AU - Kawabata, Takeshi
PY - 1999/4
Y1 - 1999/4
N2 - Speech stream segregation is presented as a new speech enhancement for automatic speech recognition. Two issues are addressed: speech stream segregation from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition. Speech stream segregation is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting non-harmonic residue for non-harmonic parts of a group. The main problem in interfacing speech stream segregation with hidden Markov model (HMM)-based speech recognition is how to improve the degradation of recognition performance due to spectral distortion of segregated sounds, which is caused mainly by transfer function of a binaural input. Our solution is to re-train the parameters of HMM with training data binauralized for four directions. Experiments with 500 mixtures of two women's utterances of an isolated word showed that the error reduction rate of the 1-best/10-best word recognition of each woman's utterance is, on average, 64% and 75%, respectively.
AB - Speech stream segregation is presented as a new speech enhancement for automatic speech recognition. Two issues are addressed: speech stream segregation from a mixture of sounds, and interfacing speech stream segregation with automatic speech recognition. Speech stream segregation is modeled as a process of extracting harmonic fragments, grouping these extracted harmonic fragments, and substituting non-harmonic residue for non-harmonic parts of a group. The main problem in interfacing speech stream segregation with hidden Markov model (HMM)-based speech recognition is how to improve the degradation of recognition performance due to spectral distortion of segregated sounds, which is caused mainly by transfer function of a binaural input. Our solution is to re-train the parameters of HMM with training data binauralized for four directions. Experiments with 500 mixtures of two women's utterances of an isolated word showed that the error reduction rate of the 1-best/10-best word recognition of each woman's utterance is, on average, 64% and 75%, respectively.
UR - http://www.scopus.com/inward/record.url?scp=0032633660&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0032633660&partnerID=8YFLogxK
U2 - 10.1016/S0167-6393(98)00080-6
DO - 10.1016/S0167-6393(98)00080-6
M3 - Article
AN - SCOPUS:0032633660
SN - 0167-6393
VL - 27
SP - 299
EP - 310
JO - Speech Communication
JF - Speech Communication
IS - 3
ER -