Video semantic indexing using object detection-derived features

Kotaro Kikuchi, Kazuya Ueki, Tetsuji Ogawa, Tetsunori Kobayashi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)


A new feature extraction method based on object detection to achieve accurate and robust semantic indexing of videos is proposed. Local features (e.g., SIFT and HOG) and convolutional neural network (CNN)-derived features, which have been used in semantic indexing, in general are extracted from the entire image and do not explicitly represent the information of meaningful objects that contributes to the determination of semantic categories. In this case, the background region, which does not contain the meaningful objects, is unduly considered, exerting a harmful effect on the indexing performance. In the present study, an attempt was made to suppress the undesirable effects derived from the redundant background information by incorporating object detection technology into semantic indexing. In the proposed method, a combination of the meaningful objects detected in the video frame image is represented as a feature vector for verification of semantic categories. Experimental comparisons demonstrate that the proposed method facilitates the TRECVID semantic indexing task.

Original languageEnglish
Title of host publication2016 24th European Signal Processing Conference, EUSIPCO 2016
PublisherEuropean Signal Processing Conference, EUSIPCO
Number of pages5
ISBN (Electronic)9780992862657
Publication statusPublished - 2016 Nov 28
Event24th European Signal Processing Conference, EUSIPCO 2016 - Budapest, Hungary
Duration: 2016 Aug 282016 Sept 2

Publication series

NameEuropean Signal Processing Conference
ISSN (Print)2219-5491


Other24th European Signal Processing Conference, EUSIPCO 2016

ASJC Scopus subject areas

  • Signal Processing
  • Electrical and Electronic Engineering


Dive into the research topics of 'Video semantic indexing using object detection-derived features'. Together they form a unique fingerprint.

Cite this