TY - JOUR
T1 - Video Alignment Using Bi-Directional Attention Flow in a Multi-Stage Learning Model
AU - Abobeah, Reham
AU - Shoukry, Amin
AU - Katto, Jiro
N1 - Funding Information:
This work was supported in part by the Ministry of Higher Education (MoHE) of Egypt through the Ph.D. Scholarship granted to the first author to join the Egypt-Japan University of Science and Technology (E-JUST) as a graduate student. It is also a result of the joint collaboration between the Cyber-Physical System (CPS) Laboratory, Computer Science and Engineering Department, E-JUST, and Prof. Katto's Laboratory, Computer Science and Communication Engineering Department, Waseda University, Tokyo, Japan.*%blankline%*
Funding Information:
This work was supported in part by the Ministry of Higher Education (MoHE) of Egypt through the Ph.D. Scholarship granted to the first author to join the Egypt-Japan University of Science and Technology (E-JUST) as a graduate student. It is also a result of the joint collaboration between the Cyber-Physical System (CPS) Laboratory, Computer Science and Engineering Department, E-JUST, and Prof. Katto’s Laboratory, Computer Science and Communication Engineering Department, Waseda University, Tokyo, Japan.
Publisher Copyright:
© 2020 IEEE.
PY - 2020
Y1 - 2020
N2 - Recently, deep learning techniques have contributed to solving a multitude of computer vision tasks. In this paper, we propose a deep-learning approach for video alignment, which involves finding the best correspondences between two overlapping videos. We formulate the video alignment task as a variant of the well-known machine comprehension (MC) task in natural language processing. While MC answers a question about a given paragraph, our technique determines the most relevant frame sequence in the context video to the query video. This is done by representing the individual frames of the two videos by highly discriminative and compact descriptors. Next, the descriptors are fed into a multi-stage network that is able, with the help of the bidirectional attention flow mechanism, to represent the context video at various granularity levels besides estimating the query-aware context part. The proposed model was trained on 10k video-pairs collected from 'YouTube'. The obtained results show that our model outperforms all known state of the art techniques by a considerable margin, confirming its efficacy.
AB - Recently, deep learning techniques have contributed to solving a multitude of computer vision tasks. In this paper, we propose a deep-learning approach for video alignment, which involves finding the best correspondences between two overlapping videos. We formulate the video alignment task as a variant of the well-known machine comprehension (MC) task in natural language processing. While MC answers a question about a given paragraph, our technique determines the most relevant frame sequence in the context video to the query video. This is done by representing the individual frames of the two videos by highly discriminative and compact descriptors. Next, the descriptors are fed into a multi-stage network that is able, with the help of the bidirectional attention flow mechanism, to represent the context video at various granularity levels besides estimating the query-aware context part. The proposed model was trained on 10k video-pairs collected from 'YouTube'. The obtained results show that our model outperforms all known state of the art techniques by a considerable margin, confirming its efficacy.
KW - Bi-directional attention
KW - temporal alignment
KW - video alignment
KW - video retrieval
KW - video synchronization
UR - http://www.scopus.com/inward/record.url?scp=85081103496&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85081103496&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2020.2967750
DO - 10.1109/ACCESS.2020.2967750
M3 - Article
AN - SCOPUS:85081103496
SN - 2169-3536
VL - 8
SP - 18097
EP - 18109
JO - IEEE Access
JF - IEEE Access
M1 - 8963636
ER -