Towards generic 3D tracking in RGBD videos: benchmark and baseline

Jinyu Yang; Zhongqun Zhang; Zhe Li; Hyung Jin Chang; Ales Leonardis; Feng Zheng

doi:10.1007/978-3-031-20047-2_7

Towards generic 3D tracking in RGBD videos: benchmark and baseline

Jinyu Yang, Zhongqun Zhang, Zhe Li, Hyung Jin Chang, Ales Leonardis, Feng Zheng^*

^*Corresponding author for this work

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

164 Downloads (Pure)

Abstract

Tracking in 3D scenes is gaining momentum because of its numerous applications in robotics, autonomous driving, and scene understanding. Currently, 3D tracking is limited to specific model-based approaches involving point clouds, which impedes 3D trackers from applying in natural 3D scenes. RGBD sensors provide a more reasonable and acceptable solution for 3D object tracking due to their readily available synchronised color and depth information. Thus, in this paper, we investigate a novel problem: is it possible to track a generic (class-agnostic) 3D object in RGBD videos and predict 3D bounding boxes of the object of interest? To inspire research on this topic, we newly construct a standard benchmark for generic 3D object tracking, ‘Track-it-in-3D’, which contains 300 RGBD video sequences with dense 3D annotations and corresponding evaluation protocols. Furthermore, we propose an effective tracking baseline to estimate 3D bounding boxes for arbitrary objects in RGBD videos, by fusing appearance and spatial information effectively. Resources are available on https://github.com/yjybuaa/Track-it-in-3D.

Original language	English
Title of host publication	Computer Vision – ECCV 2022
Subtitle of host publication	17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII
Editors	Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, Tal Hassner
Publisher	Springer
Pages	112–128
Number of pages	17
Edition	1
ISBN (Electronic)	9783031200472
ISBN (Print)	9783031200465
DOIs	https://doi.org/10.1007/978-3-031-20047-2_7
Publication status	Published - 23 Oct 2022
Event	17th European Conference on Computer Vision (ECCV 2022) - Tel Aviv, Israel Duration: 24 Oct 2022 → 28 Oct 2022

Publication series

Name	Lecture Notes in Computer Science
Publisher	Springer
Volume	13682
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	17th European Conference on Computer Vision (ECCV 2022)
Abbreviated title	ECCV 2022
Country/Territory	Israel
City	Tel Aviv
Period	24/10/22 → 28/10/22

Access to Document

10.1007/978-3-031-20047-2_7

YangJ2022Towards
This version of the contribution has been accepted for publication, after peer review (when applicable) but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections. The Version of Record is available online at: http://dx.doi.org/10.1007/978-3-031-20047-2_7. Use of this Accepted Version is subject to the publisher’s Accepted Manuscript terms of use: https://www.springernature.com/gp/open-research/policies/accepted-manuscript-terms
Accepted author manuscript, 2.61 MBLicence: Other (please specify with Rights Statement)

Cite this

Yang, J., Zhang, Z., Li, Z., Chang, H. J., Leonardis, A., & Zheng, F. (2022). Towards generic 3D tracking in RGBD videos: benchmark and baseline. In S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, & T. Hassner (Eds.), Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII (1 ed., pp. 112–128). (Lecture Notes in Computer Science; Vol. 13682). Springer. https://doi.org/10.1007/978-3-031-20047-2_7

Yang, Jinyu ; Zhang, Zhongqun ; Li, Zhe et al. / Towards generic 3D tracking in RGBD videos : benchmark and baseline. Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII. editor / Shai Avidan ; Gabriel Brostow ; Moustapha Cissé ; Giovanni Maria Farinella ; Tal Hassner. 1. ed. Springer, 2022. pp. 112–128 (Lecture Notes in Computer Science).

@inproceedings{601f8a24724a4edc97b5a66980e1cf8b,

title = "Towards generic 3D tracking in RGBD videos: benchmark and baseline",

abstract = "Tracking in 3D scenes is gaining momentum because of its numerous applications in robotics, autonomous driving, and scene understanding. Currently, 3D tracking is limited to specific model-based approaches involving point clouds, which impedes 3D trackers from applying in natural 3D scenes. RGBD sensors provide a more reasonable and acceptable solution for 3D object tracking due to their readily available synchronised color and depth information. Thus, in this paper, we investigate a novel problem: is it possible to track a generic (class-agnostic) 3D object in RGBD videos and predict 3D bounding boxes of the object of interest? To inspire research on this topic, we newly construct a standard benchmark for generic 3D object tracking, {\textquoteleft}Track-it-in-3D{\textquoteright}, which contains 300 RGBD video sequences with dense 3D annotations and corresponding evaluation protocols. Furthermore, we propose an effective tracking baseline to estimate 3D bounding boxes for arbitrary objects in RGBD videos, by fusing appearance and spatial information effectively. Resources are available on https://github.com/yjybuaa/Track-it-in-3D.",

author = "Jinyu Yang and Zhongqun Zhang and Zhe Li and Chang, {Hyung Jin} and Ales Leonardis and Feng Zheng",

year = "2022",

month = oct,

day = "23",

doi = "10.1007/978-3-031-20047-2_7",

language = "English",

isbn = "9783031200465",

series = "Lecture Notes in Computer Science",

publisher = "Springer",

pages = "112–128",

editor = "Shai Avidan and Gabriel Brostow and Moustapha Ciss{\'e} and Farinella, {Giovanni Maria} and Tal Hassner",

booktitle = "Computer Vision – ECCV 2022",

edition = "1",

note = "17th European Conference on Computer Vision (ECCV 2022), ECCV 2022 ; Conference date: 24-10-2022 Through 28-10-2022",

}

Yang, J, Zhang, Z, Li, Z, Chang, HJ , Leonardis, A & Zheng, F 2022, Towards generic 3D tracking in RGBD videos: benchmark and baseline. in S Avidan, G Brostow, M Cissé, GM Farinella & T Hassner (eds), Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII. 1 edn, Lecture Notes in Computer Science, vol. 13682, Springer, pp. 112–128, 17th European Conference on Computer Vision (ECCV 2022), Tel Aviv, Israel, 24/10/22. https://doi.org/10.1007/978-3-031-20047-2_7

Towards generic 3D tracking in RGBD videos: benchmark and baseline. / Yang, Jinyu; Zhang, Zhongqun; Li, Zhe et al.
Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII. ed. / Shai Avidan; Gabriel Brostow; Moustapha Cissé; Giovanni Maria Farinella; Tal Hassner. 1. ed. Springer, 2022. p. 112–128 (Lecture Notes in Computer Science; Vol. 13682).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Towards generic 3D tracking in RGBD videos

T2 - 17th European Conference on Computer Vision (ECCV 2022)

AU - Yang, Jinyu

AU - Zhang, Zhongqun

AU - Li, Zhe

AU - Chang, Hyung Jin

AU - Leonardis, Ales

AU - Zheng, Feng

PY - 2022/10/23

Y1 - 2022/10/23

N2 - Tracking in 3D scenes is gaining momentum because of its numerous applications in robotics, autonomous driving, and scene understanding. Currently, 3D tracking is limited to specific model-based approaches involving point clouds, which impedes 3D trackers from applying in natural 3D scenes. RGBD sensors provide a more reasonable and acceptable solution for 3D object tracking due to their readily available synchronised color and depth information. Thus, in this paper, we investigate a novel problem: is it possible to track a generic (class-agnostic) 3D object in RGBD videos and predict 3D bounding boxes of the object of interest? To inspire research on this topic, we newly construct a standard benchmark for generic 3D object tracking, ‘Track-it-in-3D’, which contains 300 RGBD video sequences with dense 3D annotations and corresponding evaluation protocols. Furthermore, we propose an effective tracking baseline to estimate 3D bounding boxes for arbitrary objects in RGBD videos, by fusing appearance and spatial information effectively. Resources are available on https://github.com/yjybuaa/Track-it-in-3D.

AB - Tracking in 3D scenes is gaining momentum because of its numerous applications in robotics, autonomous driving, and scene understanding. Currently, 3D tracking is limited to specific model-based approaches involving point clouds, which impedes 3D trackers from applying in natural 3D scenes. RGBD sensors provide a more reasonable and acceptable solution for 3D object tracking due to their readily available synchronised color and depth information. Thus, in this paper, we investigate a novel problem: is it possible to track a generic (class-agnostic) 3D object in RGBD videos and predict 3D bounding boxes of the object of interest? To inspire research on this topic, we newly construct a standard benchmark for generic 3D object tracking, ‘Track-it-in-3D’, which contains 300 RGBD video sequences with dense 3D annotations and corresponding evaluation protocols. Furthermore, we propose an effective tracking baseline to estimate 3D bounding boxes for arbitrary objects in RGBD videos, by fusing appearance and spatial information effectively. Resources are available on https://github.com/yjybuaa/Track-it-in-3D.

U2 - 10.1007/978-3-031-20047-2_7

DO - 10.1007/978-3-031-20047-2_7

M3 - Conference contribution

SN - 9783031200465

T3 - Lecture Notes in Computer Science

SP - 112

EP - 128

BT - Computer Vision – ECCV 2022

A2 - Avidan, Shai

A2 - Brostow, Gabriel

A2 - Cissé, Moustapha

A2 - Farinella, Giovanni Maria

A2 - Hassner, Tal

PB - Springer

Y2 - 24 October 2022 through 28 October 2022

ER -

Yang J, Zhang Z, Li Z, Chang HJ , Leonardis A, Zheng F. Towards generic 3D tracking in RGBD videos: benchmark and baseline. In Avidan S, Brostow G, Cissé M, Farinella GM, Hassner T, editors, Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXII. 1 ed. Springer. 2022. p. 112–128. (Lecture Notes in Computer Science). doi: 10.1007/978-3-031-20047-2_7

Towards generic 3D tracking in RGBD videos: benchmark and baseline

Abstract

Publication series

Conference

Access to Document

Fingerprint

Cite this