TY - GEN
T1 - Video semantic indexing using object detection-derived features
AU - Kikuchi, Kotaro
AU - Ueki, Kazuya
AU - Ogawa, Tetsuji
AU - Kobayashi, Tetsunori
PY - 2016/11/28
Y1 - 2016/11/28
N2 - A new feature extraction method based on object detection to achieve accurate and robust semantic indexing of videos is proposed. Local features (e.g., SIFT and HOG) and convolutional neural network (CNN)-derived features, which have been used in semantic indexing, in general are extracted from the entire image and do not explicitly represent the information of meaningful objects that contributes to the determination of semantic categories. In this case, the background region, which does not contain the meaningful objects, is unduly considered, exerting a harmful effect on the indexing performance. In the present study, an attempt was made to suppress the undesirable effects derived from the redundant background information by incorporating object detection technology into semantic indexing. In the proposed method, a combination of the meaningful objects detected in the video frame image is represented as a feature vector for verification of semantic categories. Experimental comparisons demonstrate that the proposed method facilitates the TRECVID semantic indexing task.
AB - A new feature extraction method based on object detection to achieve accurate and robust semantic indexing of videos is proposed. Local features (e.g., SIFT and HOG) and convolutional neural network (CNN)-derived features, which have been used in semantic indexing, in general are extracted from the entire image and do not explicitly represent the information of meaningful objects that contributes to the determination of semantic categories. In this case, the background region, which does not contain the meaningful objects, is unduly considered, exerting a harmful effect on the indexing performance. In the present study, an attempt was made to suppress the undesirable effects derived from the redundant background information by incorporating object detection technology into semantic indexing. In the proposed method, a combination of the meaningful objects detected in the video frame image is represented as a feature vector for verification of semantic categories. Experimental comparisons demonstrate that the proposed method facilitates the TRECVID semantic indexing task.
UR - http://www.scopus.com/inward/record.url?scp=85005976115&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85005976115&partnerID=8YFLogxK
U2 - 10.1109/EUSIPCO.2016.7760456
DO - 10.1109/EUSIPCO.2016.7760456
M3 - Conference contribution
AN - SCOPUS:85005976115
T3 - European Signal Processing Conference
SP - 1288
EP - 1292
BT - 2016 24th European Signal Processing Conference, EUSIPCO 2016
PB - European Signal Processing Conference, EUSIPCO
T2 - 24th European Signal Processing Conference, EUSIPCO 2016
Y2 - 28 August 2016 through 2 September 2016
ER -