TY - GEN
T1 - Image information assistance neural network for videopose3d-based monocular 3D pose estimation
AU - Wang, Hao
AU - Luo, Dingli
AU - Ikenaga, Takeshi
N1 - Funding Information:
This work was supported by and Waseda University.
Publisher Copyright:
© 2021 MVA Organization.
PY - 2021/7/25
Y1 - 2021/7/25
N2 - 3D pose estimation based on a monocular camera can be applied to various fields such as human-computer interaction and human action recognition. As a two-stage 3D pose estimator, VideoPose3D achieves state-of-the-art accuracy. However, because of the limitation of two-stage processing, image information is partially lost in the process of mapping 2D poses to 3D space, which results in limited final accuracy. This paper proposes an image-assisting pose estimation model and a back-projection based offset generating module. The image-assisting pose estimation model consists of a 2D pose processing branch and an image processing branch. Image information is processed to generate an offset to refine the intermediate 3D pose produced by the 2D pose processing network. The back-projection based offset generating module projects the intermediate 3D poses to 2D space and calculates the error between the projection and input 2D pose. With the error combining with extracted image feature, the neural network generates an offset to decrease the error. By evaluation, the accuracy on each action of Human3.6M dataset gets an average improvement of 0.9 mm over the VideoPose3D baseline.
AB - 3D pose estimation based on a monocular camera can be applied to various fields such as human-computer interaction and human action recognition. As a two-stage 3D pose estimator, VideoPose3D achieves state-of-the-art accuracy. However, because of the limitation of two-stage processing, image information is partially lost in the process of mapping 2D poses to 3D space, which results in limited final accuracy. This paper proposes an image-assisting pose estimation model and a back-projection based offset generating module. The image-assisting pose estimation model consists of a 2D pose processing branch and an image processing branch. Image information is processed to generate an offset to refine the intermediate 3D pose produced by the 2D pose processing network. The back-projection based offset generating module projects the intermediate 3D poses to 2D space and calculates the error between the projection and input 2D pose. With the error combining with extracted image feature, the neural network generates an offset to decrease the error. By evaluation, the accuracy on each action of Human3.6M dataset gets an average improvement of 0.9 mm over the VideoPose3D baseline.
UR - http://www.scopus.com/inward/record.url?scp=85113999850&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85113999850&partnerID=8YFLogxK
U2 - 10.23919/MVA51890.2021.9511380
DO - 10.23919/MVA51890.2021.9511380
M3 - Conference contribution
AN - SCOPUS:85113999850
T3 - Proceedings of MVA 2021 - 17th International Conference on Machine Vision Applications
BT - Proceedings of MVA 2021 - 17th International Conference on Machine Vision Applications
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 17th International Conference on Machine Vision Applications, MVA 2021
Y2 - 25 July 2021 through 27 July 2021
ER -