Model-based 3D Hand Reconstruction via Self-Supervised Learning

Yujin Chen; Zhigang Tu; Di Kang; Linchao Bao; Ying Zhang; Xuefei Zhe; Ruizhi Chen; Junsong Yuan

doi:10.1109/CVPR46437.2021.01031

Model-based 3D Hand Reconstruction via Self-Supervised Learning

Yujin Chen, Zhigang Tu, Di Kang, Linchao Bao, Ying Zhang, Xuefei Zhe, Ruizhi Chen, Junsong Yuan

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Reconstructing a 3D hand from a single-view RGB image is challenging due to various hand configurations and depth ambiguity. To reliably reconstruct a 3D hand from a monocular image, most state-of-the-art methods heavily rely on 3D annotations at the training stage, but obtaining 3D annotations is expensive. To alleviate reliance on labeled training data, we propose S2HAND, a self-supervised 3D hand reconstruction network that can jointly estimate pose, shape, texture, and the camera viewpoint. Specifically, we obtain geometric cues from the input image through easily accessible 2D detected keypoints. To learn an accurate hand reconstruction model from these noisy geometric cues, we utilize the consistency between 2D and 3D representations and propose a set of novel losses to rationalize outputs of the neural network. For the first time, we demonstrate the feasibility of training an accurate 3D hand reconstruction network without relying on manual annotations. Our experiments show that the proposed self-supervised method achieves comparable performance with recent fully-supervised methods. The code is available at https://github.com/TerenceCYJ/S2HAND.

Original language	English
Title of host publication	2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Publisher	IEEE
Pages	10446-10455
Number of pages	10
ISBN (Electronic)	9781665445092
ISBN (Print)	9781665445108
DOIs	https://doi.org/10.1109/CVPR46437.2021.01031
Publication status	Published - 2 Nov 2021
Event	2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) - Nashville, TN, USA Duration: 20 Jun 2021 → 25 Jun 2021

Publication series

Name	Conference on Computer Vision and Pattern Recognition (CVPR)
Publisher	IEEE
ISSN (Print)	1063-6919
ISSN (Electronic)	2575-7075

Conference

Conference	2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Period	20/06/21 → 25/06/21

Bibliographical note

Funding Agency:
10.13039/501100012166-National Key Research and Development Program of China
10.13039/501100012226-Fundamental Research Funds for the Central Universities
10.13039/100000001-National Science Foundation

Keywords

Training
Solid modeling
Surface reconstruction
Three-dimensional displays
Annotations
Shape
Training data

Access to Document

10.1109/CVPR46437.2021.01031Licence: None: All rights reserved

Cite this

Chen, Y., Tu, Z., Kang, D., Bao, L., Zhang, Y., Zhe, X., Chen, R., & Yuan, J. (2021). Model-based 3D Hand Reconstruction via Self-Supervised Learning. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 10446-10455). Article 9577060 (Conference on Computer Vision and Pattern Recognition (CVPR)). IEEE. https://doi.org/10.1109/CVPR46437.2021.01031

@inproceedings{ae5cb27611874e2987eac9101c29a459,

title = "Model-based 3D Hand Reconstruction via Self-Supervised Learning",

abstract = "Reconstructing a 3D hand from a single-view RGB image is challenging due to various hand configurations and depth ambiguity. To reliably reconstruct a 3D hand from a monocular image, most state-of-the-art methods heavily rely on 3D annotations at the training stage, but obtaining 3D annotations is expensive. To alleviate reliance on labeled training data, we propose S2HAND, a self-supervised 3D hand reconstruction network that can jointly estimate pose, shape, texture, and the camera viewpoint. Specifically, we obtain geometric cues from the input image through easily accessible 2D detected keypoints. To learn an accurate hand reconstruction model from these noisy geometric cues, we utilize the consistency between 2D and 3D representations and propose a set of novel losses to rationalize outputs of the neural network. For the first time, we demonstrate the feasibility of training an accurate 3D hand reconstruction network without relying on manual annotations. Our experiments show that the proposed self-supervised method achieves comparable performance with recent fully-supervised methods. The code is available at https://github.com/TerenceCYJ/S2HAND.",

keywords = "Training, Solid modeling, Surface reconstruction, Three-dimensional displays, Annotations, Shape, Training data",

author = "Yujin Chen and Zhigang Tu and Di Kang and Linchao Bao and Ying Zhang and Xuefei Zhe and Ruizhi Chen and Junsong Yuan",

note = "Funding Agency: 10.13039/501100012166-National Key Research and Development Program of China 10.13039/501100012226-Fundamental Research Funds for the Central Universities 10.13039/100000001-National Science Foundation; 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) ; Conference date: 20-06-2021 Through 25-06-2021",

year = "2021",

month = nov,

day = "2",

doi = "10.1109/CVPR46437.2021.01031",

language = "English",

isbn = "9781665445108",

series = "Conference on Computer Vision and Pattern Recognition (CVPR)",

publisher = "IEEE",

pages = "10446--10455",

booktitle = "2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)",

}

Chen, Y, Tu, Z, Kang, D, Bao, L, Zhang, Y, Zhe, X, Chen, R & Yuan, J 2021, Model-based 3D Hand Reconstruction via Self-Supervised Learning. in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)., 9577060, Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, pp. 10446-10455, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 20/06/21. https://doi.org/10.1109/CVPR46437.2021.01031

Model-based 3D Hand Reconstruction via Self-Supervised Learning. / Chen, Yujin; Tu, Zhigang; Kang, Di et al.
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2021. p. 10446-10455 9577060 (Conference on Computer Vision and Pattern Recognition (CVPR)).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Model-based 3D Hand Reconstruction via Self-Supervised Learning

AU - Chen, Yujin

AU - Tu, Zhigang

AU - Kang, Di

AU - Bao, Linchao

AU - Zhang, Ying

AU - Zhe, Xuefei

AU - Chen, Ruizhi

AU - Yuan, Junsong

N1 - Funding Agency: 10.13039/501100012166-National Key Research and Development Program of China 10.13039/501100012226-Fundamental Research Funds for the Central Universities 10.13039/100000001-National Science Foundation

PY - 2021/11/2

Y1 - 2021/11/2

N2 - Reconstructing a 3D hand from a single-view RGB image is challenging due to various hand configurations and depth ambiguity. To reliably reconstruct a 3D hand from a monocular image, most state-of-the-art methods heavily rely on 3D annotations at the training stage, but obtaining 3D annotations is expensive. To alleviate reliance on labeled training data, we propose S2HAND, a self-supervised 3D hand reconstruction network that can jointly estimate pose, shape, texture, and the camera viewpoint. Specifically, we obtain geometric cues from the input image through easily accessible 2D detected keypoints. To learn an accurate hand reconstruction model from these noisy geometric cues, we utilize the consistency between 2D and 3D representations and propose a set of novel losses to rationalize outputs of the neural network. For the first time, we demonstrate the feasibility of training an accurate 3D hand reconstruction network without relying on manual annotations. Our experiments show that the proposed self-supervised method achieves comparable performance with recent fully-supervised methods. The code is available at https://github.com/TerenceCYJ/S2HAND.

AB - Reconstructing a 3D hand from a single-view RGB image is challenging due to various hand configurations and depth ambiguity. To reliably reconstruct a 3D hand from a monocular image, most state-of-the-art methods heavily rely on 3D annotations at the training stage, but obtaining 3D annotations is expensive. To alleviate reliance on labeled training data, we propose S2HAND, a self-supervised 3D hand reconstruction network that can jointly estimate pose, shape, texture, and the camera viewpoint. Specifically, we obtain geometric cues from the input image through easily accessible 2D detected keypoints. To learn an accurate hand reconstruction model from these noisy geometric cues, we utilize the consistency between 2D and 3D representations and propose a set of novel losses to rationalize outputs of the neural network. For the first time, we demonstrate the feasibility of training an accurate 3D hand reconstruction network without relying on manual annotations. Our experiments show that the proposed self-supervised method achieves comparable performance with recent fully-supervised methods. The code is available at https://github.com/TerenceCYJ/S2HAND.

KW - Training

KW - Solid modeling

KW - Surface reconstruction

KW - Three-dimensional displays

KW - Annotations

KW - Shape

KW - Training data

U2 - 10.1109/CVPR46437.2021.01031

DO - 10.1109/CVPR46437.2021.01031

M3 - Conference contribution

SN - 9781665445108

T3 - Conference on Computer Vision and Pattern Recognition (CVPR)

SP - 10446

EP - 10455

BT - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

PB - IEEE

T2 - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Y2 - 20 June 2021 through 25 June 2021

ER -

Model-based 3D Hand Reconstruction via Self-Supervised Learning

Abstract

Publication series

Conference

Bibliographical note

Keywords

Access to Document

Fingerprint

Cite this