TY - GEN
T1 - Barge-in-able robot audition based on ICA and missing feature theory under semi-blind situation
AU - Takeda, Ryu
AU - Nakadai, Kazuhiro
AU - Komatani, Kazunori
AU - Ogata, Tetsuya
AU - Okuno, Hiroshi G.
PY - 2008
Y1 - 2008
N2 - This paper describes a robot audition system that allows the user to barge-in; that is, the user can speak simultaneously when the robot is speaking. Our "barge-in-able" system consists of two stages: (1) cancellation of robot speech and (2) recognition of the separated user speech under the "semi-blind situation". The semi-blind situation is where a robot's speech signal is known but a user's speech signal is not. The first stage is achieved by using an adaptive filter based on time-frequency domain Independent Component Analysis, because that can separate robot speech more robustly against noise than conventional echo cancellers. To improve performance in online processing, we utilized known source normalization and the exponentially weighted stepsize method. The second stage is achieved by automatic speech recognition (ASR) based on the missing feature theory which provides robust recognition by exploiting the reliability of speech features distorted due to noise and/or separation. The semi-blind situation simplifies the estimation of such reliabilities. Experiments demonstrated that our system improved word correctness of ASR by 10.0 %.
AB - This paper describes a robot audition system that allows the user to barge-in; that is, the user can speak simultaneously when the robot is speaking. Our "barge-in-able" system consists of two stages: (1) cancellation of robot speech and (2) recognition of the separated user speech under the "semi-blind situation". The semi-blind situation is where a robot's speech signal is known but a user's speech signal is not. The first stage is achieved by using an adaptive filter based on time-frequency domain Independent Component Analysis, because that can separate robot speech more robustly against noise than conventional echo cancellers. To improve performance in online processing, we utilized known source normalization and the exponentially weighted stepsize method. The second stage is achieved by automatic speech recognition (ASR) based on the missing feature theory which provides robust recognition by exploiting the reliability of speech features distorted due to noise and/or separation. The semi-blind situation simplifies the estimation of such reliabilities. Experiments demonstrated that our system improved word correctness of ASR by 10.0 %.
UR - http://www.scopus.com/inward/record.url?scp=69549122134&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=69549122134&partnerID=8YFLogxK
U2 - 10.1109/IROS.2008.4650799
DO - 10.1109/IROS.2008.4650799
M3 - Conference contribution
AN - SCOPUS:69549122134
SN - 9781424420582
T3 - 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS
SP - 1718
EP - 1723
BT - 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS
T2 - 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS
Y2 - 22 September 2008 through 26 September 2008
ER -