Abstract
In this paper we present the latent regression forest (LRF), a novel framework for real-time, 3D hand pose estimation from a single depth image. Prior discriminative methods often fall into two categories: holistic and patch-based. Holistic methods are efficient but less flexible due to their nearest neighbour nature. Patch-based methods can generalise to unseen samples by consider local appearance only. However, they are complex because each pixel need to be classified or regressed during testing. In contrast to these two baselines, our method can be considered as a structured coarse-to-fine search, starting from the centre of mass of a point cloud until locating all the skeletal joints. The searching process is guided by a learnt latent tree model which reflects the hierarchical topology of the hand. Our main contributions can be summarised as follows: (i) Learning the topology of the hand in an unsupervised, data-driven manner. (ii) A new forest-based, discriminative framework for structured search in images, as well as an error regression step to avoid error accumulation. (iii) A new multi-view hand pose dataset containing 180 K annotated images from 10 different subjects. Our experiments on two datasets show that the LRF outperforms baselines and prior arts in both accuracy and efficiency.
Original language | English |
---|---|
Pages (from-to) | 1374-1387 |
Journal | IEEE Transactions on Pattern Analysis and Machine Intelligence |
Volume | 39 |
Issue number | 7 |
Early online date | 10 Aug 2016 |
DOIs | |
Publication status | Published - Jul 2017 |
Keywords
- Random forest
- regression forest
- latent tree model
- hand pose estimation
- 3D
- depth-averaged velocity