BoIR: Box-Supervised Instance Representation for Multi Person Pose Estimation

Uyoung Jeong; Seungryul Baek; Hyung Jin Chang; Kwang In Kim

BoIR: Box-Supervised Instance Representation for Multi Person Pose Estimation

Uyoung Jeong, Seungryul Baek, Hyung Jin Chang, Kwang In Kim^*

^*Corresponding author for this work

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

Abstract

Single-stage multi-person human pose estimation (MPPE) methods have shown great performance improvements, but existing methods fail to disentangle features by individual instances under crowded scenes. In this paper, we propose a bounding box-level instance representation learning called BoIR, which simultaneously solves instance detection, instance disentanglement and instance-keypoint association problems. Our new instance embedding loss provides learning signal on the entire area of the image with bounding box annotations, achieving globally consistent and disentangled instance representation. Our method exploits multi-task learning of bottom-up keypoint estimation, bounding box regression and contrastive instance embedding learning, without additional computational cost during inference. We demonstrate that BoIR outperforms state-of-the-arts on COCO (0.5 AP), CrowdPose (4.9 AP) and OCHuman (3.5 AP).

Original language	English
Title of host publication	The 34th British Machine Vision Conference Proceedings
Publisher	British Machine Vision Association
Publication status	Accepted/In press - 25 Aug 2023
Event	The 34th British Machine Vision Conference - Aberdeen, United Kingdom Duration: 20 Nov 2023 → 24 Nov 2023

Conference

Conference	The 34th British Machine Vision Conference
Country/Territory	United Kingdom
City	Aberdeen
Period	20/11/23 → 24/11/23

Bibliographical note

Not yet published as of 29/02/2024.

Cite this

@inproceedings{29fffb0d0152475d840220a460b11788,

title = "BoIR: Box-Supervised Instance Representation for Multi Person Pose Estimation",

abstract = "Single-stage multi-person human pose estimation (MPPE) methods have shown great performance improvements, but existing methods fail to disentangle features by individual instances under crowded scenes. In this paper, we propose a bounding box-level instance representation learning called BoIR, which simultaneously solves instance detection, instance disentanglement and instance-keypoint association problems. Our new instance embedding loss provides learning signal on the entire area of the image with bounding box annotations, achieving globally consistent and disentangled instance representation. Our method exploits multi-task learning of bottom-up keypoint estimation, bounding box regression and contrastive instance embedding learning, without additional computational cost during inference. We demonstrate that BoIR outperforms state-of-the-arts on COCO (0.5 AP), CrowdPose (4.9 AP) and OCHuman (3.5 AP).",

author = "Uyoung Jeong and Seungryul Baek and Chang, {Hyung Jin} and Kim, {Kwang In}",

note = "Not yet published as of 29/02/2024.; The 34th British Machine Vision Conference ; Conference date: 20-11-2023 Through 24-11-2023",

year = "2023",

month = aug,

day = "25",

language = "English",

booktitle = "The 34th British Machine Vision Conference Proceedings",

publisher = "British Machine Vision Association",

}

TY - GEN

T1 - BoIR

T2 - The 34th British Machine Vision Conference

AU - Jeong, Uyoung

AU - Baek, Seungryul

AU - Chang, Hyung Jin

AU - Kim, Kwang In

N1 - Not yet published as of 29/02/2024.

PY - 2023/8/25

Y1 - 2023/8/25

N2 - Single-stage multi-person human pose estimation (MPPE) methods have shown great performance improvements, but existing methods fail to disentangle features by individual instances under crowded scenes. In this paper, we propose a bounding box-level instance representation learning called BoIR, which simultaneously solves instance detection, instance disentanglement and instance-keypoint association problems. Our new instance embedding loss provides learning signal on the entire area of the image with bounding box annotations, achieving globally consistent and disentangled instance representation. Our method exploits multi-task learning of bottom-up keypoint estimation, bounding box regression and contrastive instance embedding learning, without additional computational cost during inference. We demonstrate that BoIR outperforms state-of-the-arts on COCO (0.5 AP), CrowdPose (4.9 AP) and OCHuman (3.5 AP).

AB - Single-stage multi-person human pose estimation (MPPE) methods have shown great performance improvements, but existing methods fail to disentangle features by individual instances under crowded scenes. In this paper, we propose a bounding box-level instance representation learning called BoIR, which simultaneously solves instance detection, instance disentanglement and instance-keypoint association problems. Our new instance embedding loss provides learning signal on the entire area of the image with bounding box annotations, achieving globally consistent and disentangled instance representation. Our method exploits multi-task learning of bottom-up keypoint estimation, bounding box regression and contrastive instance embedding learning, without additional computational cost during inference. We demonstrate that BoIR outperforms state-of-the-arts on COCO (0.5 AP), CrowdPose (4.9 AP) and OCHuman (3.5 AP).

UR - https://bmvc2023.org/

M3 - Conference contribution

BT - The 34th British Machine Vision Conference Proceedings

PB - British Machine Vision Association

Y2 - 20 November 2023 through 24 November 2023

ER -

BoIR: Box-Supervised Instance Representation for Multi Person Pose Estimation

Abstract

Conference

Bibliographical note

Fingerprint

Cite this