SeqHAND:: RGB-Sequence-Based 3D Hand Pose and Shape Estimation

John Yang; Hyung Jin Chang; Seungeui Lee; Nojun Kwak

SeqHAND: RGB-Sequence-Based 3D Hand Pose and Shape Estimation

John Yang, Hyung Jin Chang, Seungeui Lee, Nojun Kwak

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

3D hand pose estimation based on RGB images has been studied for a long time. Most of the studies, however, have performed frame-by-frame estimation based on independent static images. In this paper, we attempt to not only consider the appearance of a hand but incorporate the temporal movement information of a hand in motion into the learning framework, which leads to the necessity of a large-scale dataset with sequential RGB hand images. We propose a novel method that generates a synthetic dataset that mimics natural human hand movements
by re-engineering annotations of an extant static hand pose dataset into pose-flows. With the generated dataset, we train a newly proposed recurrent framework, exploiting visuo-temporal features from sequential synthetic hand images and emphasizing smoothness of estimations with
temporal consistency constraints. Our novel training strategy of detaching the recurrent layer of the framework during domain finetuning from synthetic to real allows preservation of the visuo-temporal features learned from sequential synthetic hand images. Hand poses that are sequentially estimated consequently produce natural and smooth hand movements which lead to more robust estimations. Utilizing temporal information for 3D hand pose estimation significantly enhances general pose estimations by outperforming state-of-the-art methods in our experiments on hand pose estimation benchmarks.

Original language	English
Title of host publication	Computer Vision - ECCV 2020
Subtitle of host publication	16th European Conference, Glasgow , UK, August 23-28 2020, Proceedings
Publisher	Springer
Number of pages	17
Publication status	Accepted/In press - 3 Jul 2020
Event	16th European Conference on Computer Vision (ECCV2020) - Virtual Event Duration: 23 Aug 2020 → 28 Aug 2020

Publication series

Name	Lecture Notes in Computer Science
Publisher	Springer
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	16th European Conference on Computer Vision (ECCV2020)
City	Virtual Event
Period	23/08/20 → 28/08/20

Keywords

3D Hand Pose Estimations
Pose- ow Generation
Syntheticto- real domain gap reduction
Synthetic hand motion dataset

Cite this

@inproceedings{b238a8e84c7d401a82e53473cb844353,

title = "SeqHAND:: RGB-Sequence-Based 3D Hand Pose and Shape Estimation",

abstract = "3D hand pose estimation based on RGB images has been studied for a long time. Most of the studies, however, have performed frame-by-frame estimation based on independent static images. In this paper, we attempt to not only consider the appearance of a hand but incorporate the temporal movement information of a hand in motion into the learning framework, which leads to the necessity of a large-scale dataset with sequential RGB hand images. We propose a novel method that generates a synthetic dataset that mimics natural human hand movementsby re-engineering annotations of an extant static hand pose dataset into pose-flows. With the generated dataset, we train a newly proposed recurrent framework, exploiting visuo-temporal features from sequential synthetic hand images and emphasizing smoothness of estimations withtemporal consistency constraints. Our novel training strategy of detaching the recurrent layer of the framework during domain finetuning from synthetic to real allows preservation of the visuo-temporal features learned from sequential synthetic hand images. Hand poses that are sequentially estimated consequently produce natural and smooth hand movements which lead to more robust estimations. Utilizing temporal information for 3D hand pose estimation significantly enhances general pose estimations by outperforming state-of-the-art methods in our experiments on hand pose estimation benchmarks.",

keywords = "3D Hand Pose Estimations, Pose- ow Generation, Syntheticto- real domain gap reduction, Synthetic hand motion dataset",

author = "John Yang and Chang, {Hyung Jin} and Seungeui Lee and Nojun Kwak",

year = "2020",

month = jul,

day = "3",

language = "English",

series = "Lecture Notes in Computer Science",

publisher = "Springer",

booktitle = "Computer Vision - ECCV 2020",

note = "16th European Conference on Computer Vision (ECCV2020) ; Conference date: 23-08-2020 Through 28-08-2020",

}

TY - GEN

T1 - SeqHAND:

T2 - 16th European Conference on Computer Vision (ECCV2020)

AU - Yang, John

AU - Chang, Hyung Jin

AU - Lee, Seungeui

AU - Kwak, Nojun

PY - 2020/7/3

Y1 - 2020/7/3

N2 - 3D hand pose estimation based on RGB images has been studied for a long time. Most of the studies, however, have performed frame-by-frame estimation based on independent static images. In this paper, we attempt to not only consider the appearance of a hand but incorporate the temporal movement information of a hand in motion into the learning framework, which leads to the necessity of a large-scale dataset with sequential RGB hand images. We propose a novel method that generates a synthetic dataset that mimics natural human hand movementsby re-engineering annotations of an extant static hand pose dataset into pose-flows. With the generated dataset, we train a newly proposed recurrent framework, exploiting visuo-temporal features from sequential synthetic hand images and emphasizing smoothness of estimations withtemporal consistency constraints. Our novel training strategy of detaching the recurrent layer of the framework during domain finetuning from synthetic to real allows preservation of the visuo-temporal features learned from sequential synthetic hand images. Hand poses that are sequentially estimated consequently produce natural and smooth hand movements which lead to more robust estimations. Utilizing temporal information for 3D hand pose estimation significantly enhances general pose estimations by outperforming state-of-the-art methods in our experiments on hand pose estimation benchmarks.

AB - 3D hand pose estimation based on RGB images has been studied for a long time. Most of the studies, however, have performed frame-by-frame estimation based on independent static images. In this paper, we attempt to not only consider the appearance of a hand but incorporate the temporal movement information of a hand in motion into the learning framework, which leads to the necessity of a large-scale dataset with sequential RGB hand images. We propose a novel method that generates a synthetic dataset that mimics natural human hand movementsby re-engineering annotations of an extant static hand pose dataset into pose-flows. With the generated dataset, we train a newly proposed recurrent framework, exploiting visuo-temporal features from sequential synthetic hand images and emphasizing smoothness of estimations withtemporal consistency constraints. Our novel training strategy of detaching the recurrent layer of the framework during domain finetuning from synthetic to real allows preservation of the visuo-temporal features learned from sequential synthetic hand images. Hand poses that are sequentially estimated consequently produce natural and smooth hand movements which lead to more robust estimations. Utilizing temporal information for 3D hand pose estimation significantly enhances general pose estimations by outperforming state-of-the-art methods in our experiments on hand pose estimation benchmarks.

KW - 3D Hand Pose Estimations

KW - Pose- ow Generation

KW - Syntheticto- real domain gap reduction

KW - Synthetic hand motion dataset

M3 - Conference contribution

T3 - Lecture Notes in Computer Science

BT - Computer Vision - ECCV 2020

PB - Springer

Y2 - 23 August 2020 through 28 August 2020

ER -