TY - GEN
T1 - The Sound of Bounding-Boxes
AU - Oya, Takashi
AU - Iwase, Shohei
AU - Morishima, Shigeo
N1 - Funding Information:
Acknowledgements This research was fully supported by JST Mirai Program No. JPMJMI19B2, and JSPS KAKENHI Nos. 19H01129, 19H04137, 21H0504.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - In the task of audio-visual sound source separation, which leverages visual information for sound source separation, identifying objects in an image is a crucial step prior to separating the sound source. However, existing methods that assign sound on detected bounding boxes suffer from a problem that their approach heavily relies on pre-trained object detectors. Specifically, when using these existing methods, it is required to predetermine all the possible categories of objects that can produce sound and use an object detector applicable to all such categories. To tackle this problem, we propose a fully unsupervised method that learns to detect objects in an image and separate sound source simultaneously. As our method does not rely on any pre-trained detector, our method is applicable to arbitrary categories without any additional annotation. Furthermore, although being fully unsupervised, we found that our method performs comparably in separation accuracy.
AB - In the task of audio-visual sound source separation, which leverages visual information for sound source separation, identifying objects in an image is a crucial step prior to separating the sound source. However, existing methods that assign sound on detected bounding boxes suffer from a problem that their approach heavily relies on pre-trained object detectors. Specifically, when using these existing methods, it is required to predetermine all the possible categories of objects that can produce sound and use an object detector applicable to all such categories. To tackle this problem, we propose a fully unsupervised method that learns to detect objects in an image and separate sound source simultaneously. As our method does not rely on any pre-trained detector, our method is applicable to arbitrary categories without any additional annotation. Furthermore, although being fully unsupervised, we found that our method performs comparably in separation accuracy.
UR - http://www.scopus.com/inward/record.url?scp=85143621435&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85143621435&partnerID=8YFLogxK
U2 - 10.1109/ICPR56361.2022.9956384
DO - 10.1109/ICPR56361.2022.9956384
M3 - Conference contribution
AN - SCOPUS:85143621435
T3 - Proceedings - International Conference on Pattern Recognition
SP - 9
EP - 15
BT - 2022 26th International Conference on Pattern Recognition, ICPR 2022
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 26th International Conference on Pattern Recognition, ICPR 2022
Y2 - 21 August 2022 through 25 August 2022
ER -