TY - GEN
T1 - Ensemble integration of calibrated speaker localization and statistical speech detection in domestic environments
AU - Tachioka, Yuuki
AU - Narita, Tomohiro
AU - Watanabe, Shinji
AU - Le Roux, Jonathan
PY - 2014
Y1 - 2014
N2 - This paper describes speaker localization and speech detection techniques for domestic environments. In real environments, it is hard to localize speakers because reverberation causes discrepancy from the simple spherical wave assumption. We propose a template-based method that calibrates the localization errors included in conventional methods. In addition, we use statistical speech detection methods to deal with noises. However, in this challenge, there are five rooms and leaked utterances from other rooms must be rejected. This kind of rejection is hard to perform by only using speech detection results. To address this problem, we also propose a method that integrates speech localization and speech detection using a minimum cost criterion or a classifier-based strategy. The proposed method achieved an accuracy of 0.712 for speaker localization and an F value of 0.743 for speech detection on the development set compared with the baseline 0.559 and 0.570, and 0.666 and 0.706 on the test set compared with the baseline 0.517 and 0.602.
AB - This paper describes speaker localization and speech detection techniques for domestic environments. In real environments, it is hard to localize speakers because reverberation causes discrepancy from the simple spherical wave assumption. We propose a template-based method that calibrates the localization errors included in conventional methods. In addition, we use statistical speech detection methods to deal with noises. However, in this challenge, there are five rooms and leaked utterances from other rooms must be rejected. This kind of rejection is hard to perform by only using speech detection results. To address this problem, we also propose a method that integrates speech localization and speech detection using a minimum cost criterion or a classifier-based strategy. The proposed method achieved an accuracy of 0.712 for speaker localization and an F value of 0.743 for speech detection on the development set compared with the baseline 0.559 and 0.570, and 0.666 and 0.706 on the test set compared with the baseline 0.517 and 0.602.
KW - Speaker localization
KW - calibration
KW - rejection
KW - speech detection
UR - http://www.scopus.com/inward/record.url?scp=84904479862&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84904479862&partnerID=8YFLogxK
U2 - 10.1109/HSCMA.2014.6843272
DO - 10.1109/HSCMA.2014.6843272
M3 - Conference contribution
AN - SCOPUS:84904479862
SN - 9781479931095
T3 - 2014 4th Joint Workshop on Hands-Free Speech Communication and Microphone Arrays, HSCMA 2014
SP - 162
EP - 166
BT - 2014 4th Joint Workshop on Hands-Free Speech Communication and Microphone Arrays, HSCMA 2014
PB - IEEE Computer Society
T2 - 2014 4th Joint Workshop on Hands-Free Speech Communication and Microphone Arrays, HSCMA 2014
Y2 - 12 May 2014 through 14 May 2014
ER -