TY - GEN
T1 - Geometric Features Informed Multi-person Human-Object Interaction Recognition in Videos
AU - Qiao, Tanqiu
AU - Men, Qianhui
AU - Li, Frederick W.B.
AU - Kubotani, Yoshiki
AU - Morishima, Shigeo
AU - Shum, Hubert P.H.
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Human-Object Interaction (HOI) recognition in videos is important for analyzing human activity. Most existing work focusing on visual features usually suffer from occlusion in the real-world scenarios. Such a problem will be further complicated when multiple people and objects are involved in HOIs. Consider that geometric features such as human pose and object position provide meaningful information to understand HOIs, we argue to combine the benefits of both visual and geometric features in HOI recognition, and propose a novel Two-level Geometric feature-informed Graph Convolutional Network (2G-GCN ). The geometric-level graph models the interdependency between geometric features of humans and objects, while the fusion-level graph further fuses them with visual features of humans and objects. To demonstrate the novelty and effectiveness of our method in challenging scenarios, we propose a new multi-person HOI dataset (MPHOI-72 ). Extensive experiments on MPHOI-72 (multi-person HOI), CAD-120 (single-human HOI) and Bimanual Actions (two-hand HOI) datasets demonstrate our superior performance compared to state-of-the-arts.
AB - Human-Object Interaction (HOI) recognition in videos is important for analyzing human activity. Most existing work focusing on visual features usually suffer from occlusion in the real-world scenarios. Such a problem will be further complicated when multiple people and objects are involved in HOIs. Consider that geometric features such as human pose and object position provide meaningful information to understand HOIs, we argue to combine the benefits of both visual and geometric features in HOI recognition, and propose a novel Two-level Geometric feature-informed Graph Convolutional Network (2G-GCN ). The geometric-level graph models the interdependency between geometric features of humans and objects, while the fusion-level graph further fuses them with visual features of humans and objects. To demonstrate the novelty and effectiveness of our method in challenging scenarios, we propose a new multi-person HOI dataset (MPHOI-72 ). Extensive experiments on MPHOI-72 (multi-person HOI), CAD-120 (single-human HOI) and Bimanual Actions (two-hand HOI) datasets demonstrate our superior performance compared to state-of-the-arts.
KW - Feature fusion
KW - Graph convolution neural networks
KW - Human-object interaction
KW - Multi-person interaction
UR - http://www.scopus.com/inward/record.url?scp=85142736903&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85142736903&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-19772-7_28
DO - 10.1007/978-3-031-19772-7_28
M3 - Conference contribution
AN - SCOPUS:85142736903
SN - 9783031197710
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 474
EP - 491
BT - Computer Vision – ECCV 2022 - 17th European Conference, 2022, Proceedings
A2 - Avidan, Shai
A2 - Brostow, Gabriel
A2 - Cissé, Moustapha
A2 - Farinella, Giovanni Maria
A2 - Hassner, Tal
PB - Springer Science and Business Media Deutschland GmbH
T2 - 17th European Conference on Computer Vision, ECCV 2022
Y2 - 23 October 2022 through 27 October 2022
ER -