Target speech detection and separation for humanoid robots in sparse dialogue with noisy home environments

Hyun Don Kim*, Jinsung Kim, Kazunori Komatani, Tetsuya Ogata, Hiroshi G. Okuno

*この研究の対応する著者

研究成果: Conference contribution

10 被引用数 (Scopus)

抄録

In normal human communication, people face the speaker when listening and usually pay attention to the speaker' face. Therefore, in robot audition, the recognition of the front talker is critical for smooth interactions. This paper presents an enhanced speech detection method for a humanoid robot that can separate and recognize speech signals originating from the front even in noisy home environments. The robot audition system consists of a new type of voice activity detection (VAD) based on the complex spectrum circle centroid (CSCC) method and a maximum signal-to-noise (Max-SNR) beamformer. This VAD based on CSCC can classify speech signals that are retrieved at the frontal region of two microphones embedded on the robot. The system works in real-time without needing training filter coefficients given in advance even in a noisy environment (SNR > 0 dB). It can cope with speech noise generated from televisions and audio devices that does not originate from the center. Experiments using a humanoid robot, SIG2, with two microphones showed that our system enhanced extracted target speech signals more than 12 dB (SNR) and the success rate of automatic speech recognition for Japanese words was increased about 17 points.

本文言語English
ホスト出版物のタイトル2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS
ページ1705-1711
ページ数7
DOI
出版ステータスPublished - 2008 12月 1
外部発表はい
イベント2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS - Nice, France
継続期間: 2008 9月 222008 9月 26

出版物シリーズ

名前2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS

Conference

Conference2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS
国/地域France
CityNice
Period08/9/2208/9/26

ASJC Scopus subject areas

  • 人工知能
  • コンピュータ ビジョンおよびパターン認識
  • 制御およびシステム工学
  • 電子工学および電気工学

フィンガープリント

「Target speech detection and separation for humanoid robots in sparse dialogue with noisy home environments」の研究トピックを掘り下げます。これらがまとまってユニークなフィンガープリントを構成します。

引用スタイル