FS-Net: fast shape-based network for category-level 6D object pose estimation with decoupled rotation mechanism

Wei Chen; Xi Jia; Hyung Jin Chang; Jinming Duan; Linlin Shen; Ales Leonardis

doi:10.1109/CVPR46437.2021.00163

FS-Net: fast shape-based network for category-level 6D object pose estimation with decoupled rotation mechanism

Wei Chen, Xi Jia, Hyung Jin Chang, Jinming Duan, Linlin Shen, Ales Leonardis

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

250 Downloads (Pure)

Abstract

In this paper, we focus on category-level 6D pose and size estimation from a monocular RGB-D image. Previous methods suffer from inefficient category-level pose feature extraction, which leads to low accuracy and inference speed. To tackle this problem, we propose a fast shape based network (FS-Net) with efficient category-level feature extraction for 6D pose estimation. First, we design an orientation aware auto encoder with 3D graph convolution for latent feature extraction. Thanks to the shift and scale invariance properties of 3D graph convolution, the learned latent feature is insensitive to point shift and object size. Then, to efficiently decode category-level rotation information from the latent feature, we propose a novel decoupled rotation mechanism that employs two decoders to complementarily access the rotation information. For translation and size, we estimate them by two residuals: the difference between the mean of object points and ground truth translation, and the difference between the mean size of the category and ground truth size, respectively. Finally, to increase the generalization ability of the FS-Net, we propose an online box-cage based 3D deformation mechanism to augment the training data. Extensive experiments on two benchmark datasets show that the proposed method achieves state-of-the-art performance in both category- and instance-level 6D object pose estimation. Especially in category-level pose estimation, without extra synthetic data, our method outperforms existing methods by 6.3% on the NOCS-REAL dataset.

Original language	English
Title of host publication	2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Publisher	IEEE
Pages	1581-1590
Number of pages	10
ISBN (Electronic)	9781665445092
ISBN (Print)	9781665445108
DOIs	https://doi.org/10.1109/CVPR46437.2021.00163
Publication status	Published - 2 Nov 2021
Event	2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition - Nashville, United States Duration: 20 Jun 2021 → 25 Jun 2021

Publication series

Name	Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Publisher	IEEE
ISSN (Print)	1063-6919
ISSN (Electronic)	2575-7075

Conference

Conference	2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition
Abbreviated title	CVPR 2021
Country/Territory	United States
City	Nashville
Period	20/06/21 → 25/06/21

Keywords

measurement
training
solid modeling
three-dimensional displays
convolution
pose estimation
training data

Access to Document

10.1109/CVPR46437.2021.00163Licence: None: All rights reserved

ChenW2021FSNet
This is the accepted manuscript for W. Chen, X. Jia, H. J. Chang, J. Duan, L. Shen and A. Leonardis, "FS-Net: Fast Shape-based Network for Category-Level 6D Object Pose Estimation with Decoupled Rotation Mechanism," 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 1581-1590, doi: 10.1109/CVPR46437.2021.00163.in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). © 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Accepted author manuscript, 5.04 MBLicence: Other (please specify with Rights Statement)

Cite this

Chen, W., Jia, X., Chang, H. J., Duan, J., Shen, L., & Leonardis, A. (2021). FS-Net: fast shape-based network for category-level 6D object pose estimation with decoupled rotation mechanism. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 1581-1590). Article 9578410 (Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition). IEEE. https://doi.org/10.1109/CVPR46437.2021.00163

@inproceedings{7160c37fc444449a93f45e40d575c112,

title = "FS-Net: fast shape-based network for category-level 6D object pose estimation with decoupled rotation mechanism",

abstract = "In this paper, we focus on category-level 6D pose and size estimation from a monocular RGB-D image. Previous methods suffer from inefficient category-level pose feature extraction, which leads to low accuracy and inference speed. To tackle this problem, we propose a fast shape based network (FS-Net) with efficient category-level feature extraction for 6D pose estimation. First, we design an orientation aware auto encoder with 3D graph convolution for latent feature extraction. Thanks to the shift and scale invariance properties of 3D graph convolution, the learned latent feature is insensitive to point shift and object size. Then, to efficiently decode category-level rotation information from the latent feature, we propose a novel decoupled rotation mechanism that employs two decoders to complementarily access the rotation information. For translation and size, we estimate them by two residuals: the difference between the mean of object points and ground truth translation, and the difference between the mean size of the category and ground truth size, respectively. Finally, to increase the generalization ability of the FS-Net, we propose an online box-cage based 3D deformation mechanism to augment the training data. Extensive experiments on two benchmark datasets show that the proposed method achieves state-of-the-art performance in both category- and instance-level 6D object pose estimation. Especially in category-level pose estimation, without extra synthetic data, our method outperforms existing methods by 6.3% on the NOCS-REAL dataset.",

keywords = "measurement, training, solid modeling, three-dimensional displays, convolution, pose estimation, training data",

author = "Wei Chen and Xi Jia and Chang, {Hyung Jin} and Jinming Duan and Linlin Shen and Ales Leonardis",

year = "2021",

month = nov,

day = "2",

doi = "10.1109/CVPR46437.2021.00163",

language = "English",

isbn = "9781665445108",

series = "Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition",

publisher = "IEEE",

pages = "1581--1590",

booktitle = "2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)",

note = "2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2021 ; Conference date: 20-06-2021 Through 25-06-2021",

}

Chen, W, Jia, X, Chang, HJ , Duan, J, Shen, L & Leonardis, A 2021, FS-Net: fast shape-based network for category-level 6D object pose estimation with decoupled rotation mechanism. in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)., 9578410, Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, IEEE, pp. 1581-1590, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, Tennessee, United States, 20/06/21. https://doi.org/10.1109/CVPR46437.2021.00163

FS-Net: fast shape-based network for category-level 6D object pose estimation with decoupled rotation mechanism. / Chen, Wei; Jia, Xi; Chang, Hyung Jin et al.
2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2021. p. 1581-1590 9578410 (Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - FS-Net

T2 - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition

AU - Chen, Wei

AU - Jia, Xi

AU - Chang, Hyung Jin

AU - Duan, Jinming

AU - Shen, Linlin

AU - Leonardis, Ales

PY - 2021/11/2

Y1 - 2021/11/2

N2 - In this paper, we focus on category-level 6D pose and size estimation from a monocular RGB-D image. Previous methods suffer from inefficient category-level pose feature extraction, which leads to low accuracy and inference speed. To tackle this problem, we propose a fast shape based network (FS-Net) with efficient category-level feature extraction for 6D pose estimation. First, we design an orientation aware auto encoder with 3D graph convolution for latent feature extraction. Thanks to the shift and scale invariance properties of 3D graph convolution, the learned latent feature is insensitive to point shift and object size. Then, to efficiently decode category-level rotation information from the latent feature, we propose a novel decoupled rotation mechanism that employs two decoders to complementarily access the rotation information. For translation and size, we estimate them by two residuals: the difference between the mean of object points and ground truth translation, and the difference between the mean size of the category and ground truth size, respectively. Finally, to increase the generalization ability of the FS-Net, we propose an online box-cage based 3D deformation mechanism to augment the training data. Extensive experiments on two benchmark datasets show that the proposed method achieves state-of-the-art performance in both category- and instance-level 6D object pose estimation. Especially in category-level pose estimation, without extra synthetic data, our method outperforms existing methods by 6.3% on the NOCS-REAL dataset.

AB - In this paper, we focus on category-level 6D pose and size estimation from a monocular RGB-D image. Previous methods suffer from inefficient category-level pose feature extraction, which leads to low accuracy and inference speed. To tackle this problem, we propose a fast shape based network (FS-Net) with efficient category-level feature extraction for 6D pose estimation. First, we design an orientation aware auto encoder with 3D graph convolution for latent feature extraction. Thanks to the shift and scale invariance properties of 3D graph convolution, the learned latent feature is insensitive to point shift and object size. Then, to efficiently decode category-level rotation information from the latent feature, we propose a novel decoupled rotation mechanism that employs two decoders to complementarily access the rotation information. For translation and size, we estimate them by two residuals: the difference between the mean of object points and ground truth translation, and the difference between the mean size of the category and ground truth size, respectively. Finally, to increase the generalization ability of the FS-Net, we propose an online box-cage based 3D deformation mechanism to augment the training data. Extensive experiments on two benchmark datasets show that the proposed method achieves state-of-the-art performance in both category- and instance-level 6D object pose estimation. Especially in category-level pose estimation, without extra synthetic data, our method outperforms existing methods by 6.3% on the NOCS-REAL dataset.

KW - measurement

KW - training

KW - solid modeling

KW - three-dimensional displays

KW - convolution

KW - pose estimation

KW - training data

UR - https://ieeexplore.ieee.org/xpl/conhome/1000147/all-proceedings

U2 - 10.1109/CVPR46437.2021.00163

DO - 10.1109/CVPR46437.2021.00163

M3 - Conference contribution

SN - 9781665445108

T3 - Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition

SP - 1581

EP - 1590

BT - 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

PB - IEEE

Y2 - 20 June 2021 through 25 June 2021

ER -

Chen W, Jia X, Chang HJ , Duan J, Shen L, Leonardis A. FS-Net: fast shape-based network for category-level 6D object pose estimation with decoupled rotation mechanism. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE. 2021. p. 1581-1590. 9578410. (Proceedings. IEEE Computer Society Conference on Computer Vision and Pattern Recognition). doi: 10.1109/CVPR46437.2021.00163

FS-Net: fast shape-based network for category-level 6D object pose estimation with decoupled rotation mechanism

Abstract

Publication series

Conference

Keywords

Access to Document

Fingerprint

Cite this