Learning to exploit stability for 3D scene parsing

Yilun  Du; Zhijian Liu; Hector Basevi; Ales Leonardis; William T. Freeman; Joshua T. Tenenbaum; Jiajun Wu

Learning to exploit stability for 3D scene parsing

Yilun Du, Zhijian Liu, Hector Basevi, Ales Leonardis, William T. Freeman, Joshua T. Tenenbaum, Jiajun Wu

Computer Science

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

65 Downloads (Pure)

Abstract

Human scene understanding uses a variety of visual and non-visual cues to perform inference on object types, poses, and relations. Physics is a rich and universal cue that we exploit to enhance scene understanding. In this paper, we integrate the physical cue of stability into the learning process by looping in a physics engine into bottom-up recognition models, and apply it to the problem of 3D scene parsing. We first show that applying physics supervision to an existing scene understanding model increases performance, produces more stable predictions, and allows training to an equivalent performance level with fewer annotated training examples. We then present a novel architecture for 3D scene parsing named Prim R-CNN, learning to predict bounding boxes as well as their 3D size, translation, and rotation. With physics supervision, Prim R-CNN outperforms existing scene understanding approaches on this problem. Finally, we show that finetuning with physics supervision on unlabeled real images improves real domain transfer of models training on synthetic data.

Original language	English
Title of host publication	Advances in Neural Information Processing Systems 31 (NIPS 2018)
Editors	S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, R. Garnett
Publisher	NIPS
Number of pages	11
Publication status	E-pub ahead of print - 2 Dec 2018
Event	32nd Conference on Neural Information Processing Systems (NIPS 2018) - Vancouver, Canada Duration: 2 Dec 2018 → 8 Dec 2018

Publication series

Name	Electronic Proceedings of the Neural Information Processing Systems Conference
Publisher	NIPS
Volume	31

Conference

Conference	32nd Conference on Neural Information Processing Systems (NIPS 2018)
Country/Territory	Canada
City	Vancouver
Period	2/12/18 → 8/12/18

Access to Document

Learning to Exploit - scenephys_nips_final
Checked for eligibility: 19/12/2018
Accepted author manuscript, 7.82 MBLicence: None: All rights reserved

http://papers.nips.cc/paper/7444-learning-to-exploit-stability-for-3d-scene-parsing.pdfLicence: None: All rights reserved

Cite this

Du, Y., Liu, Z., Basevi, H., Leonardis, A., Freeman, W. T., Tenenbaum, J. T., & Wu, J. (2018). Learning to exploit stability for 3D scene parsing. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), Advances in Neural Information Processing Systems 31 (NIPS 2018) Article 7444 (Electronic Proceedings of the Neural Information Processing Systems Conference; Vol. 31). NIPS. Advance online publication. http://papers.nips.cc/paper/7444-learning-to-exploit-stability-for-3d-scene-parsing.pdf

@inproceedings{d56a225e6be74a02abd2de05c897aff2,

title = "Learning to exploit stability for 3D scene parsing",

abstract = "Human scene understanding uses a variety of visual and non-visual cues to perform inference on object types, poses, and relations. Physics is a rich and universal cue that we exploit to enhance scene understanding. In this paper, we integrate the physical cue of stability into the learning process by looping in a physics engine into bottom-up recognition models, and apply it to the problem of 3D scene parsing. We first show that applying physics supervision to an existing scene understanding model increases performance, produces more stable predictions, and allows training to an equivalent performance level with fewer annotated training examples. We then present a novel architecture for 3D scene parsing named Prim R-CNN, learning to predict bounding boxes as well as their 3D size, translation, and rotation. With physics supervision, Prim R-CNN outperforms existing scene understanding approaches on this problem. Finally, we show that finetuning with physics supervision on unlabeled real images improves real domain transfer of models training on synthetic data.",

author = "Yilun Du and Zhijian Liu and Hector Basevi and Ales Leonardis and Freeman, {William T.} and Tenenbaum, {Joshua T.} and Jiajun Wu",

year = "2018",

month = dec,

day = "2",

language = "English",

series = "Electronic Proceedings of the Neural Information Processing Systems Conference",

publisher = "NIPS",

editor = "S. Bengio and H. Wallach and H. Larochelle and K. Grauman and N. Cesa-Bianchi and R. Garnett",

booktitle = "Advances in Neural Information Processing Systems 31 (NIPS 2018)",

note = "32nd Conference on Neural Information Processing Systems (NIPS 2018) ; Conference date: 02-12-2018 Through 08-12-2018",

}

Du, Y, Liu, Z, Basevi, H , Leonardis, A, Freeman, WT, Tenenbaum, JT & Wu, J 2018, Learning to exploit stability for 3D scene parsing. in S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi & R Garnett (eds), Advances in Neural Information Processing Systems 31 (NIPS 2018)., 7444, Electronic Proceedings of the Neural Information Processing Systems Conference, vol. 31, NIPS, 32nd Conference on Neural Information Processing Systems (NIPS 2018), Vancouver, Canada, 2/12/18. <http://papers.nips.cc/paper/7444-learning-to-exploit-stability-for-3d-scene-parsing.pdf>

Learning to exploit stability for 3D scene parsing. / Du, Yilun ; Liu, Zhijian; Basevi, Hector et al.
Advances in Neural Information Processing Systems 31 (NIPS 2018). ed. / S. Bengio; H. Wallach; H. Larochelle; K. Grauman; N. Cesa-Bianchi; R. Garnett. NIPS, 2018. 7444 (Electronic Proceedings of the Neural Information Processing Systems Conference; Vol. 31).

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

TY - GEN

T1 - Learning to exploit stability for 3D scene parsing

AU - Du, Yilun

AU - Liu, Zhijian

AU - Basevi, Hector

AU - Leonardis, Ales

AU - Freeman, William T.

AU - Tenenbaum, Joshua T.

AU - Wu, Jiajun

PY - 2018/12/2

Y1 - 2018/12/2

N2 - Human scene understanding uses a variety of visual and non-visual cues to perform inference on object types, poses, and relations. Physics is a rich and universal cue that we exploit to enhance scene understanding. In this paper, we integrate the physical cue of stability into the learning process by looping in a physics engine into bottom-up recognition models, and apply it to the problem of 3D scene parsing. We first show that applying physics supervision to an existing scene understanding model increases performance, produces more stable predictions, and allows training to an equivalent performance level with fewer annotated training examples. We then present a novel architecture for 3D scene parsing named Prim R-CNN, learning to predict bounding boxes as well as their 3D size, translation, and rotation. With physics supervision, Prim R-CNN outperforms existing scene understanding approaches on this problem. Finally, we show that finetuning with physics supervision on unlabeled real images improves real domain transfer of models training on synthetic data.

AB - Human scene understanding uses a variety of visual and non-visual cues to perform inference on object types, poses, and relations. Physics is a rich and universal cue that we exploit to enhance scene understanding. In this paper, we integrate the physical cue of stability into the learning process by looping in a physics engine into bottom-up recognition models, and apply it to the problem of 3D scene parsing. We first show that applying physics supervision to an existing scene understanding model increases performance, produces more stable predictions, and allows training to an equivalent performance level with fewer annotated training examples. We then present a novel architecture for 3D scene parsing named Prim R-CNN, learning to predict bounding boxes as well as their 3D size, translation, and rotation. With physics supervision, Prim R-CNN outperforms existing scene understanding approaches on this problem. Finally, we show that finetuning with physics supervision on unlabeled real images improves real domain transfer of models training on synthetic data.

M3 - Conference contribution

T3 - Electronic Proceedings of the Neural Information Processing Systems Conference

BT - Advances in Neural Information Processing Systems 31 (NIPS 2018)

A2 - Bengio, S.

A2 - Wallach, H.

A2 - Larochelle, H.

A2 - Grauman, K.

A2 - Cesa-Bianchi, N.

A2 - Garnett, R.

PB - NIPS

T2 - 32nd Conference on Neural Information Processing Systems (NIPS 2018)

Y2 - 2 December 2018 through 8 December 2018

ER -

Du Y, Liu Z, Basevi H , Leonardis A, Freeman WT, Tenenbaum JT et al. Learning to exploit stability for 3D scene parsing. In Bengio S, Wallach H, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R, editors, Advances in Neural Information Processing Systems 31 (NIPS 2018). NIPS. 2018. 7444. (Electronic Proceedings of the Neural Information Processing Systems Conference). Epub 2018 Dec 2.

Learning to exploit stability for 3D scene parsing

Abstract

Publication series

Conference

Access to Document

Fingerprint

Cite this