TY - GEN
T1 - Multi-task and multi-level detection neural network based real-time 3D pose estimation
AU - Luo, Dingli
AU - Du, Songlin
AU - Ikenaga, Takeshi
N1 - Funding Information:
This work was supported by Waseda University Grant for Special Research Projects (2019C-581)
Publisher Copyright:
© 2019 IEEE.
PY - 2019/11
Y1 - 2019/11
N2 - 3D pose estimation is a core step for human-computer interaction and human action recognition. However, time-sensitive applications like virtual reality also need this task to achieve real-time speed. This paper proposes a multitask and multi-level neural network architecture with a highspeed friendly 3D human pose representation. Based on this, we build a real-time multi-person 3D pose estimation system with a single RGB image as input. The network estimates 3D poses from the input image directly by the multi-task design and keeps both accuracy and speed by the multi-level detection design. By evaluation, we show our system achieves the 21 fps on RTX 2080 with only 33 mm accuracy lose compared with related works. We also provide network visualization to prove our network work as we design. This work shows the possibility for a single RGB image based 3D pose estimation system to achieve real-time speed, which is a basement for building a low-cost 3D motion capture system.
AB - 3D pose estimation is a core step for human-computer interaction and human action recognition. However, time-sensitive applications like virtual reality also need this task to achieve real-time speed. This paper proposes a multitask and multi-level neural network architecture with a highspeed friendly 3D human pose representation. Based on this, we build a real-time multi-person 3D pose estimation system with a single RGB image as input. The network estimates 3D poses from the input image directly by the multi-task design and keeps both accuracy and speed by the multi-level detection design. By evaluation, we show our system achieves the 21 fps on RTX 2080 with only 33 mm accuracy lose compared with related works. We also provide network visualization to prove our network work as we design. This work shows the possibility for a single RGB image based 3D pose estimation system to achieve real-time speed, which is a basement for building a low-cost 3D motion capture system.
UR - http://www.scopus.com/inward/record.url?scp=85082399248&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85082399248&partnerID=8YFLogxK
U2 - 10.1109/APSIPAASC47483.2019.9023084
DO - 10.1109/APSIPAASC47483.2019.9023084
M3 - Conference contribution
AN - SCOPUS:85082399248
T3 - 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
SP - 1427
EP - 1434
BT - 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2019
Y2 - 18 November 2019 through 21 November 2019
ER -