Target speech detection and separation for communication with humanoid robots in noisy home environments

Hyun Don Kim*, Jinsung Kim, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno


研究成果: Article査読

1 被引用数 (Scopus)


People usually talk face to face when they communicate with their partner. Therefore, in robot audition, the recognition of the front talker is critical for smooth interactions. This paper presents an enhanced speech detection method for a humanoid robot that can separate and recognize speech signals originating from the front even in noisy home environments. The robot audition system consists of a new type of voice activity detection (VAD) based on the complex spectrum circle centroid (CSCC) method and a maximum signal-to-noise ratio (SNR) beamformer. This VAD based on CSCC can classify speech signals that are retrieved at the frontal region of two microphones embedded on the robot. The system works in real-time without needing training filter coefficients given in advance even in a noisy environment (SNR > 0 dB). It can cope with speech noise generated from televisions and audio devices that does not originate from the center. Experiments using a humanoid robot, SIG2, with two microphones showed that our system enhanced extracted target speech signals more than 12 dB (SNR) and the success rate of automatic speech recognition for Japanese words was increased by about 17 points.

ジャーナルAdvanced Robotics
出版ステータスPublished - 2009 10月 1

ASJC Scopus subject areas

  • ソフトウェア
  • 制御およびシステム工学
  • 人間とコンピュータの相互作用
  • ハードウェアとアーキテクチャ
  • コンピュータ サイエンスの応用


「Target speech detection and separation for communication with humanoid robots in noisy home environments」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。