Resolving hand-object occlusion for mixed reality with joint deep learning and model optimization

Qi Feng, Hubert P.H. Shum*, Shigeo Morishima

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

1 Citation (Scopus)


By overlaying virtual imagery onto the real world, mixed reality facilitates diverse applications and has drawn increasing attention. Enhancing physical in-hand objects with a virtual appearance is a key component for many applications that require users to interact with tools such as surgery simulations. However, due to complex hand articulations and severe hand-object occlusions, resolving occlusions in hand-object interactions is a challenging topic. Traditional tracking-based approaches are limited by strong ambiguities from occlusions and changing shapes, while reconstruction-based methods show a poor capability of handling dynamic scenes. In this article, we propose a novel real-time optimization system to resolve hand-object occlusions by spatially reconstructing the scene with estimated hand joints and masks. To acquire accurate results, we propose a joint learning process that shares information between two models and jointly estimates hand poses and semantic segmentation. To facilitate the joint learning system and improve its accuracy under occlusions, we propose an occlusion-aware RGB-D hand data set that mitigates the ambiguity through precise annotations and photorealistic appearance. Evaluations show more consistent overlays compared with literature, and a user study verifies a more realistic experience.

Original languageEnglish
Article numbere1956
JournalComputer Animation and Virtual Worlds
Issue number4-5
Publication statusPublished - 2020 Jul 1


  • deep learning
  • hand tracking
  • mixed reality
  • occlusion
  • optimization

ASJC Scopus subject areas

  • Software
  • Computer Graphics and Computer-Aided Design


Dive into the research topics of 'Resolving hand-object occlusion for mixed reality with joint deep learning and model optimization'. Together they form a unique fingerprint.

Cite this