TY - JOUR
T1 - Sound source localization using deep learning models
AU - Yalta, Nelson
AU - Nakadai, Kazuhiro
AU - Ogata, Tetsuya
N1 - Funding Information:
The work has been supported by MEXT Grant-in-Aid tor scientific Research (A) 15H01710.
Publisher Copyright:
© 2017, Fuji Technology Press. All rights reserved.
PY - 2017/2
Y1 - 2017/2
N2 - This study proposes the use of a deep neural network to localize a sound source using an array of microphones in a reverberant environment. During the last few years, applications based on deep neural networks have performed various tasks such as image classification or speech recognition to levels that exceed even human capabilities. In our study, we employ deep residual networks, which have recently shown remarkable performance in image classification tasks even when the training period is shorter than that of other models. Deep residual networks are used to process audio input similar to multiple signal classification (MUSIC) methods. We show that with end-to-end training and generic preprocessing, the performance of deep residual networks not only surpasses the block level accuracy of linear models on nearly clean environments but also shows robustness to challenging conditions by exploiting the time delay on power information.
AB - This study proposes the use of a deep neural network to localize a sound source using an array of microphones in a reverberant environment. During the last few years, applications based on deep neural networks have performed various tasks such as image classification or speech recognition to levels that exceed even human capabilities. In our study, we employ deep residual networks, which have recently shown remarkable performance in image classification tasks even when the training period is shorter than that of other models. Deep residual networks are used to process audio input similar to multiple signal classification (MUSIC) methods. We show that with end-to-end training and generic preprocessing, the performance of deep residual networks not only surpasses the block level accuracy of linear models on nearly clean environments but also shows robustness to challenging conditions by exploiting the time delay on power information.
KW - Deep learning
KW - Deep residual networks
KW - Sound source localization
UR - http://www.scopus.com/inward/record.url?scp=85013969406&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85013969406&partnerID=8YFLogxK
U2 - 10.20965/jrm.2017.p0037
DO - 10.20965/jrm.2017.p0037
M3 - Article
AN - SCOPUS:85013969406
SN - 0915-3942
VL - 29
SP - 37
EP - 48
JO - Journal of Robotics and Mechatronics
JF - Journal of Robotics and Mechatronics
IS - 1
ER -