Trans6D: transformer-based 6D object pose estimation and refinement

Zhongqun Zhang*, Wei Chen, Linfang Zheng, Ales Leonardis, Hyung Jin Chang

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

341 Downloads (Pure)

Abstract

Estimating 6D object pose from a monocular RGB image remains challenging due to factors such as texture-less and occlusion. Although convolution neural network (CNN)-based methods have made remarkable progress, they are not efficient in capturing global dependencies and often suffer from information loss due to downsampling operations. To extract robust feature representation, we propose a Transformer-based 6D object pose estimation approach (Trans6D). Specifically, we first build two transformer-based strong baselines and compare their performance: pure Transformers following the ViT (Trans6D-pure) and hybrid Transformers integrating CNNs with Transformers (Trans6D-hybrid). Furthermore, two novel modules have been proposed to make the Trans6D-pure more accurate and robust: (i) a patch-aware feature fusion module. It decreases the number of tokens without information loss via shifted windows, cross-attention, and token pooling operations, which is used to predict dense 2D-3D correspondence maps; (ii) a pure Transformer-based pose refinement module (Trans6D+) which refines the estimated poses iteratively. Extensive experiments show that the proposed approach achieves state-of-the-art performances on two datasets.
Original languageEnglish
Title of host publicationComputer Vision – ECCV 2022 Workshops
EditorsLeonid Karlinsky, Tomer Michaeli, Ko Nishino
Place of PublicationCham
PublisherSpringer
Pages112–128
Number of pages17
Edition1
ISBN (Electronic)9783031250859
ISBN (Print)9783031250842
DOIs
Publication statusPublished - 12 Feb 2023
Event7th International Workshop on Recovering 6D Object Pose - Tel-Aviv, Israel
Duration: 23 Oct 202223 Oct 2022

Publication series

NameLecture Notes in Computer Science
PublisherSpringer
Volume13808
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Workshop

Workshop7th International Workshop on Recovering 6D Object Pose
Country/TerritoryIsrael
CityTel-Aviv
Period23/10/2223/10/22

Keywords

  • 6D object pose estimation
  • Transformer

Fingerprint

Dive into the research topics of 'Trans6D: transformer-based 6D object pose estimation and refinement'. Together they form a unique fingerprint.

Cite this