TY - GEN
T1 - Speech recognition in the blind condition based on multiple directivity patterns using a microphone array
AU - Sekiya, Toshiyuki
AU - Kobayashi, Tetsunori
PY - 2005/1/1
Y1 - 2005/1/1
N2 - A novel hands free speech recognition method using a microphone array is proposed and is applied to the multi-talk recognition in the blind condition, no prior information about the sound sources and the characteristics of room acoustics. The proposed system is constructed by the cascade of the sound localization system, MUSIC, and the sound segregation system, SMDP (Segregation using Multiple Directivity Patterns) proposed in our previous paper. SMDP is characterized by using redundant directivity patterns. Usually, it is difficult for this sort of cascade system to achieve high performance because the sound localization stage cannot be perfect and errors occurred in this first stage cause serious damages to the segregation stage. Particularly missing the sound source is critical. By arranging the virtual sound sources, we treat the excess sound sources. In the proposed method, contrarily, the errors in the localization stage hardly cause the problems as long as they are insertion. SMDP uses redundant directivity patterns from the first, so it tolerates the insertion errors. The proposed method achieved 70% word accuracy in the double-talk recognition experiment of 20 K vocabulary, which is 18 point better compared to the ICA-based blind source separation with the source-number-given condition.
AB - A novel hands free speech recognition method using a microphone array is proposed and is applied to the multi-talk recognition in the blind condition, no prior information about the sound sources and the characteristics of room acoustics. The proposed system is constructed by the cascade of the sound localization system, MUSIC, and the sound segregation system, SMDP (Segregation using Multiple Directivity Patterns) proposed in our previous paper. SMDP is characterized by using redundant directivity patterns. Usually, it is difficult for this sort of cascade system to achieve high performance because the sound localization stage cannot be perfect and errors occurred in this first stage cause serious damages to the segregation stage. Particularly missing the sound source is critical. By arranging the virtual sound sources, we treat the excess sound sources. In the proposed method, contrarily, the errors in the localization stage hardly cause the problems as long as they are insertion. SMDP uses redundant directivity patterns from the first, so it tolerates the insertion errors. The proposed method achieved 70% word accuracy in the double-talk recognition experiment of 20 K vocabulary, which is 18 point better compared to the ICA-based blind source separation with the source-number-given condition.
UR - http://www.scopus.com/inward/record.url?scp=33646801179&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33646801179&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2005.1415128
DO - 10.1109/ICASSP.2005.1415128
M3 - Conference contribution
AN - SCOPUS:33646801179
SN - 0780388747
SN - 9780780388741
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - I373-I376
BT - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05 - Proceedings - Image and Multidimensional Signal Processing Multimedia Signal Processing
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2005 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP '05
Y2 - 18 March 2005 through 23 March 2005
ER -