TY - GEN
T1 - Improving semantic video indexing
T2 - 41st IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016
AU - Ueki, Kazuya
AU - Kobayashi, Tetsunori
PY - 2016/5/18
Y1 - 2016/5/18
N2 - In this paper, we propose a method for improving the performance of semantic video indexing. Our approach involves extracting features from multiple convolutional neural networks (CNNs), creating multiple classifiers, and integrating them. We employed four measures to accomplish this: (1) utilizing multiple evidences observed in each video and effectively compressing them into a fixed-length vector; (2) introducing gradient and motion features to CNNs; (3) enriching variations of the training and the testing sets; and (4) extracting features from several CNNs trained with various large-scale datasets. Using the test dataset from TRECVID's 2014 evaluation benchmark, we evaluated the performance of the proposal in terms of the mean extended inferred average precision measure. On this measure, our system's performance was 35.7, outperforming the state-of-the-art TRECVID 2014 benchmark performance of 33.2. Based on this work, our submission at TRECVID 2015 was ranked second among all submissions.
AB - In this paper, we propose a method for improving the performance of semantic video indexing. Our approach involves extracting features from multiple convolutional neural networks (CNNs), creating multiple classifiers, and integrating them. We employed four measures to accomplish this: (1) utilizing multiple evidences observed in each video and effectively compressing them into a fixed-length vector; (2) introducing gradient and motion features to CNNs; (3) enriching variations of the training and the testing sets; and (4) extracting features from several CNNs trained with various large-scale datasets. Using the test dataset from TRECVID's 2014 evaluation benchmark, we evaluated the performance of the proposal in terms of the mean extended inferred average precision measure. On this measure, our system's performance was 35.7, outperforming the state-of-the-art TRECVID 2014 benchmark performance of 33.2. Based on this work, our submission at TRECVID 2015 was ranked second among all submissions.
KW - CNN
KW - Semantic video indexing
KW - TRECVID
KW - generic object recognition
KW - video search
UR - http://www.scopus.com/inward/record.url?scp=84973344429&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84973344429&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2016.7471863
DO - 10.1109/ICASSP.2016.7471863
M3 - Conference contribution
AN - SCOPUS:84973344429
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 1184
EP - 1188
BT - 2016 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2016 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 20 March 2016 through 25 March 2016
ER -