Learning Dual-Fused Modality-Aware Representations for RGBD Tracking

Shang Gao; Jinyu Yang; Zhe Li; Feng Zheng; Aleš Leonardis; Jingkuan Song

doi:10.1007/978-3-031-25085-9_27

Learning Dual-Fused Modality-Aware Representations for RGBD Tracking

Shang Gao, Jinyu Yang, Zhe Li, Feng Zheng^*, Aleš Leonardis, Jingkuan Song

^*Corresponding author for this work

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

With the development of depth sensors in recent years, RGBD object tracking has received significant attention. Compared with the traditional RGB object tracking, the addition of the depth modality can effectively solve the target and background interference. However, some existing RGBD trackers use the two modalities separately and thus some particularly useful shared information between them is ignored. On the other hand, some methods attempt to fuse the two modalities by treating them equally, resulting in the missing of modality-specific features. To tackle these limitations, we propose a novel Dual-fused Modality-aware Tracker (termed DMTracker) which aims to learn informative and discriminative representations of the target objects for robust RGBD tracking. The first fusion module focuses on extracting the shared information between modalities based on cross-modal attention. The second aims at integrating the RGB-specific and depth-specific information to enhance the fused features. By fusing both the modality-shared and modality-specific information in a modality-aware scheme, our DMTracker can learn discriminative representations in complex tracking scenes. Experiments show that our proposed tracker achieves very promising results on challenging RGBD benchmarks. Code is available at https://github.com/ShangGaoG/DMTracker.

Original language	English
Title of host publication	Computer Vision – ECCV 2022 Workshops, Proceedings
Editors	Leonid Karlinsky, Tomer Michaeli, Ko Nishino
Publisher	Springer
Pages	478-494
Number of pages	17
ISBN (Electronic)	9783031250859
ISBN (Print)	9783031250842
DOIs	https://doi.org/10.1007/978-3-031-25085-9_27
Publication status	Published - 12 Feb 2023
Event	17th European Conference on Computer Vision, ECCV 2022 - Tel Aviv, Israel Duration: 23 Oct 2022 → 27 Oct 2022

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	13808 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	17th European Conference on Computer Vision, ECCV 2022
Country/Territory	Israel
City	Tel Aviv
Period	23/10/22 → 27/10/22

Bibliographical note

Funding Information:
Acknowledgements. This work is supported by the National Natural Science Foundation of China under Grant No. 61972188 and 62122035.

Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

Keywords

Multi-modal learning
Object tracking
RGBD tracking

ASJC Scopus subject areas

Theoretical Computer Science
General Computer Science

Access to Document

10.1007/978-3-031-25085-9_27Licence: None: All rights reserved

Cite this

Gao, S., Yang, J., Li, Z., Zheng, F., Leonardis, A., & Song, J. (2023). Learning Dual-Fused Modality-Aware Representations for RGBD Tracking. In L. Karlinsky, T. Michaeli, & K. Nishino (Eds.), Computer Vision – ECCV 2022 Workshops, Proceedings (pp. 478-494). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13808 LNCS). Springer. https://doi.org/10.1007/978-3-031-25085-9_27

Gao, Shang ; Yang, Jinyu ; Li, Zhe et al. / Learning Dual-Fused Modality-Aware Representations for RGBD Tracking. Computer Vision – ECCV 2022 Workshops, Proceedings. editor / Leonid Karlinsky ; Tomer Michaeli ; Ko Nishino. Springer, 2023. pp. 478-494 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{fd9921d6372a41ef97e37cfdd9ea690f,

title = "Learning Dual-Fused Modality-Aware Representations for RGBD Tracking",

abstract = "With the development of depth sensors in recent years, RGBD object tracking has received significant attention. Compared with the traditional RGB object tracking, the addition of the depth modality can effectively solve the target and background interference. However, some existing RGBD trackers use the two modalities separately and thus some particularly useful shared information between them is ignored. On the other hand, some methods attempt to fuse the two modalities by treating them equally, resulting in the missing of modality-specific features. To tackle these limitations, we propose a novel Dual-fused Modality-aware Tracker (termed DMTracker) which aims to learn informative and discriminative representations of the target objects for robust RGBD tracking. The first fusion module focuses on extracting the shared information between modalities based on cross-modal attention. The second aims at integrating the RGB-specific and depth-specific information to enhance the fused features. By fusing both the modality-shared and modality-specific information in a modality-aware scheme, our DMTracker can learn discriminative representations in complex tracking scenes. Experiments show that our proposed tracker achieves very promising results on challenging RGBD benchmarks. Code is available at https://github.com/ShangGaoG/DMTracker.",

keywords = "Multi-modal learning, Object tracking, RGBD tracking",

author = "Shang Gao and Jinyu Yang and Zhe Li and Feng Zheng and Ale{\v s} Leonardis and Jingkuan Song",

note = "Funding Information: Acknowledgements. This work is supported by the National Natural Science Foundation of China under Grant No. 61972188 and 62122035. Publisher Copyright: {\textcopyright} 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.; 17th European Conference on Computer Vision, ECCV 2022 ; Conference date: 23-10-2022 Through 27-10-2022",

year = "2023",

month = feb,

day = "12",

doi = "10.1007/978-3-031-25085-9_27",

language = "English",

isbn = "9783031250842",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer",

pages = "478--494",

editor = "Leonid Karlinsky and Tomer Michaeli and Ko Nishino",

booktitle = "Computer Vision – ECCV 2022 Workshops, Proceedings",

}

Gao, S, Yang, J, Li, Z, Zheng, F, Leonardis, A & Song, J 2023, Learning Dual-Fused Modality-Aware Representations for RGBD Tracking. in L Karlinsky, T Michaeli & K Nishino (eds), Computer Vision – ECCV 2022 Workshops, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13808 LNCS, Springer, pp. 478-494, 17th European Conference on Computer Vision, ECCV 2022, Tel Aviv, Israel, 23/10/22. https://doi.org/10.1007/978-3-031-25085-9_27

Learning Dual-Fused Modality-Aware Representations for RGBD Tracking. / Gao, Shang; Yang, Jinyu; Li, Zhe et al.
Computer Vision – ECCV 2022 Workshops, Proceedings. ed. / Leonid Karlinsky; Tomer Michaeli; Ko Nishino. Springer, 2023. p. 478-494 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13808 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Learning Dual-Fused Modality-Aware Representations for RGBD Tracking

AU - Gao, Shang

AU - Yang, Jinyu

AU - Li, Zhe

AU - Zheng, Feng

AU - Leonardis, Aleš

AU - Song, Jingkuan

N1 - Funding Information: Acknowledgements. This work is supported by the National Natural Science Foundation of China under Grant No. 61972188 and 62122035. Publisher Copyright: © 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.

PY - 2023/2/12

Y1 - 2023/2/12

N2 - With the development of depth sensors in recent years, RGBD object tracking has received significant attention. Compared with the traditional RGB object tracking, the addition of the depth modality can effectively solve the target and background interference. However, some existing RGBD trackers use the two modalities separately and thus some particularly useful shared information between them is ignored. On the other hand, some methods attempt to fuse the two modalities by treating them equally, resulting in the missing of modality-specific features. To tackle these limitations, we propose a novel Dual-fused Modality-aware Tracker (termed DMTracker) which aims to learn informative and discriminative representations of the target objects for robust RGBD tracking. The first fusion module focuses on extracting the shared information between modalities based on cross-modal attention. The second aims at integrating the RGB-specific and depth-specific information to enhance the fused features. By fusing both the modality-shared and modality-specific information in a modality-aware scheme, our DMTracker can learn discriminative representations in complex tracking scenes. Experiments show that our proposed tracker achieves very promising results on challenging RGBD benchmarks. Code is available at https://github.com/ShangGaoG/DMTracker.

AB - With the development of depth sensors in recent years, RGBD object tracking has received significant attention. Compared with the traditional RGB object tracking, the addition of the depth modality can effectively solve the target and background interference. However, some existing RGBD trackers use the two modalities separately and thus some particularly useful shared information between them is ignored. On the other hand, some methods attempt to fuse the two modalities by treating them equally, resulting in the missing of modality-specific features. To tackle these limitations, we propose a novel Dual-fused Modality-aware Tracker (termed DMTracker) which aims to learn informative and discriminative representations of the target objects for robust RGBD tracking. The first fusion module focuses on extracting the shared information between modalities based on cross-modal attention. The second aims at integrating the RGB-specific and depth-specific information to enhance the fused features. By fusing both the modality-shared and modality-specific information in a modality-aware scheme, our DMTracker can learn discriminative representations in complex tracking scenes. Experiments show that our proposed tracker achieves very promising results on challenging RGBD benchmarks. Code is available at https://github.com/ShangGaoG/DMTracker.

KW - Multi-modal learning

KW - Object tracking

KW - RGBD tracking

UR - http://www.scopus.com/inward/record.url?scp=85151390839&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-25085-9_27

DO - 10.1007/978-3-031-25085-9_27

M3 - Conference contribution

AN - SCOPUS:85151390839

SN - 9783031250842

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 478

EP - 494

BT - Computer Vision – ECCV 2022 Workshops, Proceedings

A2 - Karlinsky, Leonid

A2 - Michaeli, Tomer

A2 - Nishino, Ko

PB - Springer

T2 - 17th European Conference on Computer Vision, ECCV 2022

Y2 - 23 October 2022 through 27 October 2022

ER -

Gao S, Yang J, Li Z, Zheng F, Leonardis A, Song J. Learning Dual-Fused Modality-Aware Representations for RGBD Tracking. In Karlinsky L, Michaeli T, Nishino K, editors, Computer Vision – ECCV 2022 Workshops, Proceedings. Springer. 2023. p. 478-494. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-25085-9_27