TY - GEN
T1 - Genetic algorithm-based improvement of robot hearing capabilities in separating and recognizing simultaneous speech signals
AU - Yamamoto, Shun'ichi
AU - Nakadai, Kazuhiro
AU - Nakano, Mikio
AU - Tsujino, Hiroshi
AU - Valin, Jean Marc
AU - Takeda, Ryu
AU - Komatani, Kazunori
AU - Ogata, Tetsuya
AU - Okuno, Hiroshi G.
PY - 2006/1/1
Y1 - 2006/1/1
N2 - Since a robot usually hears a mixture of sounds, in particular, simultaneous speech signals, it should be able to localize, separate, and recognize each speech signal. Since separated speech signals suffer from spectral distortion, normal automatic speech recognition (ASR) may fail in recognizing such distorted speech signals. Yamamoto et al. proposed using the Missing Feature Theory to mask corrupt features in ASR, and developed the automatic missing-feature-mask generation (AMG) system by using information obtained by sound source separation (SSS). Our evaluations of recognition performance of the system indicate possibilities for improving it by optimizing many of its parameters. We used genetic algorithms to optimize these parameters. Each chromosome consists of a set of parameters for SSS and AMG, and each chromosome is evaluated by recognition rate of separated sounds. We obtained an optimized sets of parameters for each distance (from 50 cm to 250 cm by 50 cm) and direction (30, 60, and 90 degree intervals) for two simultaneous speech signals. The average isolated word recognition rates ranged from 84.9% to 94.7%.
AB - Since a robot usually hears a mixture of sounds, in particular, simultaneous speech signals, it should be able to localize, separate, and recognize each speech signal. Since separated speech signals suffer from spectral distortion, normal automatic speech recognition (ASR) may fail in recognizing such distorted speech signals. Yamamoto et al. proposed using the Missing Feature Theory to mask corrupt features in ASR, and developed the automatic missing-feature-mask generation (AMG) system by using information obtained by sound source separation (SSS). Our evaluations of recognition performance of the system indicate possibilities for improving it by optimizing many of its parameters. We used genetic algorithms to optimize these parameters. Each chromosome consists of a set of parameters for SSS and AMG, and each chromosome is evaluated by recognition rate of separated sounds. We obtained an optimized sets of parameters for each distance (from 50 cm to 250 cm by 50 cm) and direction (30, 60, and 90 degree intervals) for two simultaneous speech signals. The average isolated word recognition rates ranged from 84.9% to 94.7%.
KW - Microphone array
KW - Robot audition
KW - Robot-human interaction
KW - Simultaneous Speakers
KW - Sound source separation
KW - Speech recognition
UR - http://www.scopus.com/inward/record.url?scp=33746191291&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33746191291&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:33746191291
SN - 3540354530
SN - 9783540354536
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 207
EP - 217
BT - Advances in Applied Artificial Intelligence - 19th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2006, Proceedings
PB - Springer Verlag
T2 - 19th International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2006
Y2 - 27 June 2006 through 30 June 2006
ER -