TY - GEN
T1 - Bi-directional attention flow for video alignment
AU - Abobeah, Reham
AU - Torki, Marwan
AU - Shoukry, Amin
AU - Katto, Jiro
N1 - Funding Information:
This work has been supported by the Ministry of Higher Education (MoHE) of Egypt and Waseda University at Japan through a PhD scholarship.
Publisher Copyright:
Copyright © 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
PY - 2019
Y1 - 2019
N2 - In this paper, a novel technique is introduced to address the video alignment task which is one of the hot topics in computer vision. Specifically, we aim at finding the best possible correspondences between two overlapping videos without the restrictions imposed by previous techniques. The novelty of this work is that the video alignment problem is solved by drawing an analogy between it and the machine comprehension (MC) task in natural language processing (NLP). Simply, MC seeks to give the best answer to a question about a given paragraph. In our work, one of the two videos is considered as a query, while the other as a context. First, a pre-trained CNN is used to obtain high-level features from the frames of both the query and context videos. Then, the bidirectional attention flow mechanism; that has achieved considerable success in MC; is used to compute the query-context interactions in order to find the best mapping between the two input videos. The proposed model has been trained using 10k of collected video pairs from”YouTube”. The initial experimental results show that it is a promising solution for the video alignment task when compared to the state of the art techniques.
AB - In this paper, a novel technique is introduced to address the video alignment task which is one of the hot topics in computer vision. Specifically, we aim at finding the best possible correspondences between two overlapping videos without the restrictions imposed by previous techniques. The novelty of this work is that the video alignment problem is solved by drawing an analogy between it and the machine comprehension (MC) task in natural language processing (NLP). Simply, MC seeks to give the best answer to a question about a given paragraph. In our work, one of the two videos is considered as a query, while the other as a context. First, a pre-trained CNN is used to obtain high-level features from the frames of both the query and context videos. Then, the bidirectional attention flow mechanism; that has achieved considerable success in MC; is used to compute the query-context interactions in order to find the best mapping between the two input videos. The proposed model has been trained using 10k of collected video pairs from”YouTube”. The initial experimental results show that it is a promising solution for the video alignment task when compared to the state of the art techniques.
KW - Attention Mechanisms
KW - Bi-directional Attention
KW - Synchronization
KW - Temporal Alignment
UR - http://www.scopus.com/inward/record.url?scp=85068262659&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85068262659&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85068262659
T3 - VISIGRAPP 2019 - Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications
SP - 583
EP - 589
BT - VISIGRAPP 2019 - Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications
A2 - Kerren, Andreas
A2 - Hurter, Christophe
A2 - Braz, Jose
PB - SciTePress
T2 - 14th International Conference on Computer Vision Theory and Applications, VISAPP 2019 - Part of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2019
Y2 - 25 February 2019 through 27 February 2019
ER -