TY - JOUR
T1 - Multi-task neural network with physical constraint for real-time multi-person 3D pose estimation from monocular camera
AU - Luo, Dingli
AU - Du, Songlin
AU - Ikenaga, Takeshi
N1 - Funding Information:
This work was jointly supported by the Waseda University Grant for Special Research Projects under grants 2020C-657 and 2020R-040, the National Natural Science Foundation of China under grant 62001110, the Natural Science Foundation of Jiangsu Province under grant BK20200353, the Guangdong Basic and Applied Basic Research Foundation under grant 2020A1515110145, the Shenzhen Science and Technology Program under grant RCBS20200714114858072, the 111 Project under grant B17040, and the Fundamental Research Funds for the Central Universities under grant 2242021R10115.
Publisher Copyright:
© 2021, The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature.
PY - 2021/7
Y1 - 2021/7
N2 - 3D human pose estimation has many important applications in human-computer interaction and human action recognition. Simultaneously achieving real-time speed, varying human number, and high accuracy from a single RGB image is a challenging problem. To this end, this paper proposes a multi-task and multi-level neural network structure with physical constraint. The unique network structure estimates 3D human poses from single RGB image in an end-to-end way and achieves both high accuracy and high speed. Experimental results shows that the proposed system achieves 21 fps on RTX 2080 GPU with only 33 mm accuracy loss compared with conventional works. The mechanism of the network is also analyzed through network visualization. This work shows the possibility of estimating 3D human pose from a single RGB monocular camera with real-time speed.
AB - 3D human pose estimation has many important applications in human-computer interaction and human action recognition. Simultaneously achieving real-time speed, varying human number, and high accuracy from a single RGB image is a challenging problem. To this end, this paper proposes a multi-task and multi-level neural network structure with physical constraint. The unique network structure estimates 3D human poses from single RGB image in an end-to-end way and achieves both high accuracy and high speed. Experimental results shows that the proposed system achieves 21 fps on RTX 2080 GPU with only 33 mm accuracy loss compared with conventional works. The mechanism of the network is also analyzed through network visualization. This work shows the possibility of estimating 3D human pose from a single RGB monocular camera with real-time speed.
KW - 3D human pose estimation
KW - Convolutional neural network
KW - Multi-task learning
KW - Real-time processing
UR - http://www.scopus.com/inward/record.url?scp=85105790729&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85105790729&partnerID=8YFLogxK
U2 - 10.1007/s11042-021-10982-1
DO - 10.1007/s11042-021-10982-1
M3 - Article
AN - SCOPUS:85105790729
SN - 1380-7501
VL - 80
SP - 27223
EP - 27244
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 18
ER -