TY - GEN
T1 - AxIoU
T2 - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
AU - Togashi, Riku
AU - Otani, Mayu
AU - Nakashima, Yuta
AU - Rahtu, Esa
AU - Heikkila, Janne
AU - Sakai, Tetsuya
N1 - Funding Information:
This work was partly supported by JST CREST Grant No. JPMJCR20D3, FOREST Grant No. JPMJFR216O, and Academy of Finland project number 324346.
Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Evaluation measures have a crucial impact on the direction of research. Therefore, it is of utmost importance to develop appropriate and reliable evaluation measures for new applications where conventional measures are not well suited. Video Moment Retrieval (VMR) is one such application, and the current practice is to use R@K, θ for evaluating VMR systems. However, this measure has two disadvantages. First, it is rank-insensitive: It ignores the rank positions of successfully localised moments in the top-K ranked list by treating the list as a set. Second, it binarizes the Intersection over Union (IoU) of each retrieved video moment using the threshold θ and thereby ignoring fine-grained localisation quality of ranked moments. We propose an alternative measure for evaluating VMR, called Average Max IoU (AxIoU), which is free from the above two problems. We show that AxIoU satisfies two important axioms for VMR evaluation, namely, Invariance against Redundant Moments and Monotonicity with respect to the Best Moment, and also that R@ K, θ satisfies the first axiom only. We also empirically examine how Ax-IoU agrees with R@K, θ, as well as its stability with respect to change in the test data and human-annotated temporal boundaries.
AB - Evaluation measures have a crucial impact on the direction of research. Therefore, it is of utmost importance to develop appropriate and reliable evaluation measures for new applications where conventional measures are not well suited. Video Moment Retrieval (VMR) is one such application, and the current practice is to use R@K, θ for evaluating VMR systems. However, this measure has two disadvantages. First, it is rank-insensitive: It ignores the rank positions of successfully localised moments in the top-K ranked list by treating the list as a set. Second, it binarizes the Intersection over Union (IoU) of each retrieved video moment using the threshold θ and thereby ignoring fine-grained localisation quality of ranked moments. We propose an alternative measure for evaluating VMR, called Average Max IoU (AxIoU), which is free from the above two problems. We show that AxIoU satisfies two important axioms for VMR evaluation, namely, Invariance against Redundant Moments and Monotonicity with respect to the Best Moment, and also that R@ K, θ satisfies the first axiom only. We also empirically examine how Ax-IoU agrees with R@K, θ, as well as its stability with respect to change in the test data and human-annotated temporal boundaries.
KW - Datasets and evaluation
KW - Video analysis and understanding
UR - http://www.scopus.com/inward/record.url?scp=85141809701&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85141809701&partnerID=8YFLogxK
U2 - 10.1109/CVPR52688.2022.02040
DO - 10.1109/CVPR52688.2022.02040
M3 - Conference contribution
AN - SCOPUS:85141809701
T3 - Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
SP - 21044
EP - 21053
BT - Proceedings - 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022
PB - IEEE Computer Society
Y2 - 19 June 2022 through 24 June 2022
ER -