Robust fusion of colour and depth data for RGB-D target tracking using adaptive range-invariant depth models and spatio-temporal consistency constraints

Jingjing Xiao; Rustam Stolkin; Yuqing Gao; Ales Leonardis

doi:10.1109/TCYB.2017.2740952

Robust fusion of colour and depth data for RGB-D target tracking using adaptive range-invariant depth models and spatio-temporal consistency constraints

Jingjing Xiao, Rustam Stolkin, Yuqing Gao, Ales Leonardis

Research output: Contribution to journal › Article › peer-review

20 Citations (Scopus)

552 Downloads (Pure)

Abstract

This paper presents a novel robust method for single target tracking in RGB-D images, and also contributes a substantial new benchmark dataset for evaluating RGB-D trackers. While a target object’s colour distribution is reasonably motioninvariant, this is not true for the target’s depth distribution,
which continually varies as the target moves relative to the camera. It is therefore non-trivial to design target models which can fully exploit (potentially very rich) depth information for target tracking. For this reason, much of the previous RGBD literature relies on colour information for tracking, while exploiting depth information only for occlusion reasoning. In contrast, we propose an adaptive range-invariant target depth model, and show how both depth and colour information can be fully and adaptively fused during the search for the target in each new RGB-D image. We introduce a new, hierarchical, two-layered target model (comprising local and global models)
which uses spatio-temporal consistency constraints to achieve stable and robust on-the-fly target relearning. In the global layer, multiple features, derived from both colour and depth data, are adaptively fused to find a candidate target region. In ambiguous frames, where one or more features disagree, this global candidate region is further decomposed into smaller local candidate regions for matching to local-layer models of small target parts. We also note that conventional use of depth data, for occlusion reasoning, can easily trigger false occlusion detections when the target moves rapidly towards the camera. To overcome this problem, we show how combining target information with
contextual information enables the target’s depth constraint to be relaxed. Our adaptively relaxed depth constraints can robustly accommodate large and rapid target motion in the depth direction, while still enabling the use of depth data for highly accurate reasoning about occlusions. For evaluation, we introduce a new RGB-D benchmark dataset with per-frame annotated attributes and extensive bias analysis. Our tracker is evaluated using two different state-of-the-art methodologies, VOT [20] and OTB [45], and in both cases it significantly outperforms four other state-of-the-art RGB-D trackers from the literature.

Original language	English
Pages (from-to)	1-15
Number of pages	14
Journal	IEEE Transactions on Cybernetics
Volume	99
DOIs	https://doi.org/10.1109/TCYB.2017.2740952
Publication status	Published - 6 Sept 2017

Keywords

range-invariant depth models
clustered decision tree
RGB-D tracking

Access to Document

10.1109/TCYB.2017.2740952Licence: None: All rights reserved

Xiao_et_al_Robust_fusion_of_color_and_depth_data_IEEE_Transactions_on_Cybernetics_2017
(c) 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works
Accepted author manuscript, 3.9 MBLicence: Other (please specify with Rights Statement)

http://ieeexplore.ieee.org/document/8026575/Licence: None: All rights reserved

Robotic systems for retrieval of contaminated material from hazardous zones
Stolkin, R., Leonardis, A. & Mistry, M.
Engineering & Physical Science Research Council
1/04/15 → 31/03/18
Project: Research

Cite this

@article{bb0e605586494247840550327cf75df8,

title = "Robust fusion of colour and depth data for RGB-D target tracking using adaptive range-invariant depth models and spatio-temporal consistency constraints",

abstract = "This paper presents a novel robust method for single target tracking in RGB-D images, and also contributes a substantial new benchmark dataset for evaluating RGB-D trackers. While a target object{\textquoteright}s colour distribution is reasonably motioninvariant, this is not true for the target{\textquoteright}s depth distribution,which continually varies as the target moves relative to the camera. It is therefore non-trivial to design target models which can fully exploit (potentially very rich) depth information for target tracking. For this reason, much of the previous RGBD literature relies on colour information for tracking, while exploiting depth information only for occlusion reasoning. In contrast, we propose an adaptive range-invariant target depth model, and show how both depth and colour information can be fully and adaptively fused during the search for the target in each new RGB-D image. We introduce a new, hierarchical, two-layered target model (comprising local and global models)which uses spatio-temporal consistency constraints to achieve stable and robust on-the-fly target relearning. In the global layer, multiple features, derived from both colour and depth data, are adaptively fused to find a candidate target region. In ambiguous frames, where one or more features disagree, this global candidate region is further decomposed into smaller local candidate regions for matching to local-layer models of small target parts. We also note that conventional use of depth data, for occlusion reasoning, can easily trigger false occlusion detections when the target moves rapidly towards the camera. To overcome this problem, we show how combining target information withcontextual information enables the target{\textquoteright}s depth constraint to be relaxed. Our adaptively relaxed depth constraints can robustly accommodate large and rapid target motion in the depth direction, while still enabling the use of depth data for highly accurate reasoning about occlusions. For evaluation, we introduce a new RGB-D benchmark dataset with per-frame annotated attributes and extensive bias analysis. Our tracker is evaluated using two different state-of-the-art methodologies, VOT [20] and OTB [45], and in both cases it significantly outperforms four other state-of-the-art RGB-D trackers from the literature.",

keywords = "range-invariant depth models, clustered decision tree, RGB-D tracking",

author = "Jingjing Xiao and Rustam Stolkin and Yuqing Gao and Ales Leonardis",

year = "2017",

month = sep,

day = "6",

doi = "10.1109/TCYB.2017.2740952",

language = "English",

volume = "99",

pages = "1--15",

journal = "IEEE Transactions on Cybernetics",

issn = "2168-2267",

publisher = "IEEE Advancing Technology for Humanity",

}

TY - JOUR

T1 - Robust fusion of colour and depth data for RGB-D target tracking using adaptive range-invariant depth models and spatio-temporal consistency constraints

AU - Xiao, Jingjing

AU - Stolkin, Rustam

AU - Gao, Yuqing

AU - Leonardis, Ales

PY - 2017/9/6

Y1 - 2017/9/6

N2 - This paper presents a novel robust method for single target tracking in RGB-D images, and also contributes a substantial new benchmark dataset for evaluating RGB-D trackers. While a target object’s colour distribution is reasonably motioninvariant, this is not true for the target’s depth distribution,which continually varies as the target moves relative to the camera. It is therefore non-trivial to design target models which can fully exploit (potentially very rich) depth information for target tracking. For this reason, much of the previous RGBD literature relies on colour information for tracking, while exploiting depth information only for occlusion reasoning. In contrast, we propose an adaptive range-invariant target depth model, and show how both depth and colour information can be fully and adaptively fused during the search for the target in each new RGB-D image. We introduce a new, hierarchical, two-layered target model (comprising local and global models)which uses spatio-temporal consistency constraints to achieve stable and robust on-the-fly target relearning. In the global layer, multiple features, derived from both colour and depth data, are adaptively fused to find a candidate target region. In ambiguous frames, where one or more features disagree, this global candidate region is further decomposed into smaller local candidate regions for matching to local-layer models of small target parts. We also note that conventional use of depth data, for occlusion reasoning, can easily trigger false occlusion detections when the target moves rapidly towards the camera. To overcome this problem, we show how combining target information withcontextual information enables the target’s depth constraint to be relaxed. Our adaptively relaxed depth constraints can robustly accommodate large and rapid target motion in the depth direction, while still enabling the use of depth data for highly accurate reasoning about occlusions. For evaluation, we introduce a new RGB-D benchmark dataset with per-frame annotated attributes and extensive bias analysis. Our tracker is evaluated using two different state-of-the-art methodologies, VOT [20] and OTB [45], and in both cases it significantly outperforms four other state-of-the-art RGB-D trackers from the literature.

AB - This paper presents a novel robust method for single target tracking in RGB-D images, and also contributes a substantial new benchmark dataset for evaluating RGB-D trackers. While a target object’s colour distribution is reasonably motioninvariant, this is not true for the target’s depth distribution,which continually varies as the target moves relative to the camera. It is therefore non-trivial to design target models which can fully exploit (potentially very rich) depth information for target tracking. For this reason, much of the previous RGBD literature relies on colour information for tracking, while exploiting depth information only for occlusion reasoning. In contrast, we propose an adaptive range-invariant target depth model, and show how both depth and colour information can be fully and adaptively fused during the search for the target in each new RGB-D image. We introduce a new, hierarchical, two-layered target model (comprising local and global models)which uses spatio-temporal consistency constraints to achieve stable and robust on-the-fly target relearning. In the global layer, multiple features, derived from both colour and depth data, are adaptively fused to find a candidate target region. In ambiguous frames, where one or more features disagree, this global candidate region is further decomposed into smaller local candidate regions for matching to local-layer models of small target parts. We also note that conventional use of depth data, for occlusion reasoning, can easily trigger false occlusion detections when the target moves rapidly towards the camera. To overcome this problem, we show how combining target information withcontextual information enables the target’s depth constraint to be relaxed. Our adaptively relaxed depth constraints can robustly accommodate large and rapid target motion in the depth direction, while still enabling the use of depth data for highly accurate reasoning about occlusions. For evaluation, we introduce a new RGB-D benchmark dataset with per-frame annotated attributes and extensive bias analysis. Our tracker is evaluated using two different state-of-the-art methodologies, VOT [20] and OTB [45], and in both cases it significantly outperforms four other state-of-the-art RGB-D trackers from the literature.

KW - range-invariant depth models

KW - clustered decision tree

KW - RGB-D tracking

U2 - 10.1109/TCYB.2017.2740952

DO - 10.1109/TCYB.2017.2740952

M3 - Article

SN - 2168-2267

VL - 99

SP - 1

EP - 15

JO - IEEE Transactions on Cybernetics

JF - IEEE Transactions on Cybernetics

ER -

Robust fusion of colour and depth data for RGB-D target tracking using adaptive range-invariant depth models and spatio-temporal consistency constraints

Abstract

Keywords

Access to Document

Fingerprint

Projects

Robotic systems for retrieval of contaminated material from hazardous zones

Cite this