TY - GEN
T1 - Performance evaluation of acoustic scene classification using DNN-GMM and frame-concatenated acoustic features
AU - Takahashi, Gen
AU - Yamada, Takeshi
AU - Ono, Nobutaka
AU - Makino, Shoji
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2018/2/5
Y1 - 2018/2/5
N2 - We previously proposed a method of acoustic scene classification using a deep neural network-Gaussian mixture model (DNN-GMM) and frame-concatenated acoustic features. It was submitted to the Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Challenge and was ranked eighth among 49 algorithms. In the proposed method, acoustic features in temporally distant frames were concatenated to capture their temporal relationship. The experimental results indicated that the classification accuracy is improved by increasing the number of concatenated frames. On the other hand, the frame concatenation interval, which is the interval with which the frames used for frame concatenation are selected, is another important parameter. In our previous method, the frame concatenation interval was fixed to 100 ms. In this paper, we optimize the number of concatenated frames and the frame concatenation interval for the previously proposed method. As a result, it was confirmed that the classification accuracy of the method was improved by 2.61% in comparison with the result submitted to the DCASE 2016.
AB - We previously proposed a method of acoustic scene classification using a deep neural network-Gaussian mixture model (DNN-GMM) and frame-concatenated acoustic features. It was submitted to the Detection and Classification of Acoustic Scenes and Events (DCASE) 2016 Challenge and was ranked eighth among 49 algorithms. In the proposed method, acoustic features in temporally distant frames were concatenated to capture their temporal relationship. The experimental results indicated that the classification accuracy is improved by increasing the number of concatenated frames. On the other hand, the frame concatenation interval, which is the interval with which the frames used for frame concatenation are selected, is another important parameter. In our previous method, the frame concatenation interval was fixed to 100 ms. In this paper, we optimize the number of concatenated frames and the frame concatenation interval for the previously proposed method. As a result, it was confirmed that the classification accuracy of the method was improved by 2.61% in comparison with the result submitted to the DCASE 2016.
UR - http://www.scopus.com/inward/record.url?scp=85050492634&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85050492634&partnerID=8YFLogxK
U2 - 10.1109/APSIPA.2017.8282314
DO - 10.1109/APSIPA.2017.8282314
M3 - Conference contribution
AN - SCOPUS:85050492634
T3 - Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
SP - 1739
EP - 1743
BT - Proceedings - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 9th Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2017
Y2 - 12 December 2017 through 15 December 2017
ER -