TY - GEN
T1 - Weakly-Supervised Sound Event Detection with Self-Attention
AU - Miyazaki, Koichi
AU - Komatsu, Tatsuya
AU - Hayashi, Tomoki
AU - Watanabe, Shinji
AU - Toda, Tomoki
AU - Takeda, Kazuya
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/5
Y1 - 2020/5
N2 - In this paper, we propose a novel sound event detection (SED) method that incorporates a self-attention mechanism of the Transformer for a weakly-supervised learning scenario. The proposed method utilizes the Transformer encoder, which consists of multiple self-attention modules, allowing to take both local and global context information of the input feature sequence into account. Furthermore, inspired by the great success of BERT in the natural language processing field, the proposed method introduces a special tag token into the input sequence for weak label prediction, which enables the aggregation of the whole sequence information. To demonstrate the performance of the proposed method, we conduct the experimental evaluation using the DCASE2019 Task4 dataset. The experimental results demonstrate that the proposed method outperforms the DCASE2019 Task4 baseline method, which is based on the convolutional recurrent neural network, and the self-attention mechanism effectively works for SED.
AB - In this paper, we propose a novel sound event detection (SED) method that incorporates a self-attention mechanism of the Transformer for a weakly-supervised learning scenario. The proposed method utilizes the Transformer encoder, which consists of multiple self-attention modules, allowing to take both local and global context information of the input feature sequence into account. Furthermore, inspired by the great success of BERT in the natural language processing field, the proposed method introduces a special tag token into the input sequence for weak label prediction, which enables the aggregation of the whole sequence information. To demonstrate the performance of the proposed method, we conduct the experimental evaluation using the DCASE2019 Task4 dataset. The experimental results demonstrate that the proposed method outperforms the DCASE2019 Task4 baseline method, which is based on the convolutional recurrent neural network, and the self-attention mechanism effectively works for SED.
KW - Transformer
KW - self-attention
KW - sound event detection
KW - weakly-supervised learning
UR - http://www.scopus.com/inward/record.url?scp=85089230850&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85089230850&partnerID=8YFLogxK
U2 - 10.1109/ICASSP40776.2020.9053609
DO - 10.1109/ICASSP40776.2020.9053609
M3 - Conference contribution
AN - SCOPUS:85089230850
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 66
EP - 70
BT - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2020
Y2 - 4 May 2020 through 8 May 2020
ER -