TY - GEN
T1 - Fine-grained Video Retrieval using Query Phrases - Waseda-Meisei TRECVID 2017 AVS System - Waseda-Meisei T
AU - Ueki, Kazuya
AU - Hirakawa, Koji
AU - Kikuchi, Kotaro
AU - Kobayash, Tetsunori
N1 - Funding Information:
ACKNOWLEDGMENTS This work was partially supported by JSPS KAKENHI Grants numbers 15K00249, 17H01831, and 18K11362, and the Kayamori Foundation of Informational Science Advancement.
Publisher Copyright:
© 2018 IEEE.
PY - 2018/11/26
Y1 - 2018/11/26
N2 - In this paper, a joint team from Waseda University and Meisei University (team name: Waseda-Meisei) report their efforts on the ad-hoc video search (AVS) task for the TRECVID benchmark, which is conducted annually by the National Institute of Standards and Technology (NIST). For the AVS task, a system is required to perform a fine-grained search of target videos from a large-scale video database using a query phrase including multiple keywords, such as objects, persons, scenes, and actions. The system we submitted has the following two characteristics. First, to improve the coverage rate of classes corresponding to keywords in query phrases, we prepared a large number of classifiers that can detect objects, persons, scenes, and actions, which were trained using various image and video datasets. Second, when choosing a concept classifier corresponding to a keyword, we introduced a mechanism that allows us to select additional concept classifiers by incorporating natural language processing techniques. We submitted multiple systems with these characteristics to the TRECVID 2017 AVS task and one of our systems ranked the highest among all the submitted systems from 22 teams.
AB - In this paper, a joint team from Waseda University and Meisei University (team name: Waseda-Meisei) report their efforts on the ad-hoc video search (AVS) task for the TRECVID benchmark, which is conducted annually by the National Institute of Standards and Technology (NIST). For the AVS task, a system is required to perform a fine-grained search of target videos from a large-scale video database using a query phrase including multiple keywords, such as objects, persons, scenes, and actions. The system we submitted has the following two characteristics. First, to improve the coverage rate of classes corresponding to keywords in query phrases, we prepared a large number of classifiers that can detect objects, persons, scenes, and actions, which were trained using various image and video datasets. Second, when choosing a concept classifier corresponding to a keyword, we introduced a mechanism that allows us to select additional concept classifiers by incorporating natural language processing techniques. We submitted multiple systems with these characteristics to the TRECVID 2017 AVS task and one of our systems ranked the highest among all the submitted systems from 22 teams.
UR - http://www.scopus.com/inward/record.url?scp=85059742311&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85059742311&partnerID=8YFLogxK
U2 - 10.1109/ICPR.2018.8546122
DO - 10.1109/ICPR.2018.8546122
M3 - Conference contribution
AN - SCOPUS:85059742311
T3 - Proceedings - International Conference on Pattern Recognition
SP - 3322
EP - 3327
BT - 2018 24th International Conference on Pattern Recognition, ICPR 2018
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 24th International Conference on Pattern Recognition, ICPR 2018
Y2 - 20 August 2018 through 24 August 2018
ER -