抄録
We have developed a human tracking system for use by robots that integrate sound and face localization. Conventional systems usually require many microphones and/or prior information to localize several sound sources. Moreover, they are incapable of coping with various types of background noise. Our system, the cross-power spectrum phase analysis of sound signals obtained with only two microphones, is used to localize the sound source without having to use prior information such as impulse response data. An expectation- maximization (EM) algorithm is used to help the system cope with several moving sound sources. The problem of distinguishing whether sounds are coming from the front or back is also solved with only two microphones by rotating the robot's head. A developed method that uses facial skin colors classified by another EM algorithm enables the system to detect faces in various poses. It can compensate for the error in the sound localization for a speaker and also identify noise signals entering from undesired directions by detecting a human face. A developed probability-based method is used to integrate the auditory and visual information in order to produce a reliable tracking path in real-time. Experiments using a robot showed that our system can localize two sounds at the same time and track a communication partner while dealing with various types of background noise.
本文言語 | English |
---|---|
ページ(範囲) | 629-653 |
ページ数 | 25 |
ジャーナル | Advanced Robotics |
巻 | 23 |
号 | 6 |
DOI | |
出版ステータス | Published - 2009 |
外部発表 | はい |
ASJC Scopus subject areas
- ソフトウェア
- 制御およびシステム工学
- 人間とコンピュータの相互作用
- ハードウェアとアーキテクチャ
- コンピュータ サイエンスの応用