TY - GEN
T1 - Real-time robot audition system that recognizes simultaneous speech in the real world
AU - Yamamoto, Shun'ichi
AU - Nakadai, Kazuhiro
AU - Nakano, Mikio
AU - Tsujino, Hiroshi
AU - Valin, Jean Marc
AU - Komatani, Kazunori
AU - Ogata, Tetsuya
AU - Okuno, Hiroshi G.
PY - 2006
Y1 - 2006
N2 - This paper presents a robot audition system that recognizes simultaneous speech in the real world by using robot-embedded microphones. We have previously reported Missing Feature Theory (MFT) based integration of Sound Source Separation (SSS) and Automatic Speech Recognition (ASR) for building robust robot audition. We demonstrated that a MFT-based prototype system drastically improved the performance of speech recognition even when three speakers talked to a robot simultaneously. However, the prototype system had three problems; being offline, hand-tuning of system parameters, and failure in Voice Activity Detection (VAD). To attain online processing, we introduced FlowDesigner-based architecture to integrate sound source localization (SSL), SSS and ASR. This architecture brings fast processing and easy implementation because it provides a simple framework of shared-object-based integration. To optimize the parameters, we developed Genetic Algorithm (GA) based parameter optimization, because it is difficult to build an analytical optimization model for mutually dependent system parameters. To improve VAD, we integrated new VAD based on a power spectrum and location of a sound source into the system, since conventional VAD relying only on power often fails due to low signal-to-noise ratio of simultaneous speech. We, then, constructed a robot audition system for Honda ASIMO. As a result, we showed that the system worked online and fast, and had a better performance in robustness and accuracy through experiments on recognition of simultaneous speech in a noisy and echoic environment.
AB - This paper presents a robot audition system that recognizes simultaneous speech in the real world by using robot-embedded microphones. We have previously reported Missing Feature Theory (MFT) based integration of Sound Source Separation (SSS) and Automatic Speech Recognition (ASR) for building robust robot audition. We demonstrated that a MFT-based prototype system drastically improved the performance of speech recognition even when three speakers talked to a robot simultaneously. However, the prototype system had three problems; being offline, hand-tuning of system parameters, and failure in Voice Activity Detection (VAD). To attain online processing, we introduced FlowDesigner-based architecture to integrate sound source localization (SSL), SSS and ASR. This architecture brings fast processing and easy implementation because it provides a simple framework of shared-object-based integration. To optimize the parameters, we developed Genetic Algorithm (GA) based parameter optimization, because it is difficult to build an analytical optimization model for mutually dependent system parameters. To improve VAD, we integrated new VAD based on a power spectrum and location of a sound source into the system, since conventional VAD relying only on power often fails due to low signal-to-noise ratio of simultaneous speech. We, then, constructed a robot audition system for Honda ASIMO. As a result, we showed that the system worked online and fast, and had a better performance in robustness and accuracy through experiments on recognition of simultaneous speech in a noisy and echoic environment.
KW - Genetic algorithm
KW - Missing feature theory
KW - Parameter optimization
KW - Real-time processing
KW - Robot audition
KW - Voice activity detection
UR - http://www.scopus.com/inward/record.url?scp=34250652551&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34250652551&partnerID=8YFLogxK
U2 - 10.1109/IROS.2006.282037
DO - 10.1109/IROS.2006.282037
M3 - Conference contribution
AN - SCOPUS:34250652551
SN - 142440259X
SN - 9781424402595
T3 - IEEE International Conference on Intelligent Robots and Systems
SP - 5333
EP - 5338
BT - 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2006
T2 - 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2006
Y2 - 9 October 2006 through 15 October 2006
ER -