Abstract
We propose a novel transformer-based framework that reconstructs two high fidelity hands from multi-view RGB images. Unlike existing hand pose estimation methods, where one typically trains a deep network to regress hand model parameters from single RGB image, we consider a more challenging problem setting where we directly regress the absolute root poses of two-hands with extended forearm at high resolution from egocentric view. As existing datasets are either infeasible for egocentric viewpoints or lack background variations, we create a large-scale synthetic dataset with diverse scenarios and collect a real dataset from multi-calibrated camera setup to verify our proposed multi-view image feature fusion strategy. To make the reconstruction physically plausible, we propose two strategies: (i) a coarse-to-fine spectral graph convolution decoder to smoothen the meshes during upsampling and (ii) an optimisation-based refinement stage at inference to prevent self-penetrations. Through extensive quantitative and qualitative evaluations, we show that our framework is able to produce realistic two-hand reconstructions and demonstrate the generalisation of synthetic-trained models to real data, as well as real-time AR/VR applications.
Original language | English |
---|---|
Title of host publication | 2023 IEEE/CVF International Conference on Computer Vision (ICCV) |
Publisher | IEEE |
Pages | 14620-14631 |
Number of pages | 12 |
ISBN (Electronic) | 9798350307184 |
ISBN (Print) | 9798350307191 (PoD) |
DOIs | |
Publication status | Published - 15 Jan 2024 |
Event | 2023 International Conference on Computer Vision - Paris Convention Centre, Paris, France Duration: 2 Oct 2023 → 6 Oct 2023 |
Publication series
Name | Proceedings of the IEEE International Conference on Computer Vision |
---|---|
Publisher | IEEE |
ISSN (Print) | 1550-5499 |
ISSN (Electronic) | 2380-7504 |
Conference
Conference | 2023 International Conference on Computer Vision |
---|---|
Abbreviated title | ICCV 2023 |
Country/Territory | France |
City | Paris |
Period | 2/10/23 → 6/10/23 |
Bibliographical note
Acknowledgments:This research was supported by the MSIT (Ministry of Science and ICT), Korea, under the ITRC (Information Technology Research Center) support program (IITP-2023-2020-0-01789), supervised by the IITP (Institute for Information & Communications Technology Planning & Evaluation).
Keywords
- Laplace equations
- Image resolution
- Pose estimation
- Transformers
- Robustness
- Real-time systems
- Graph theory