Referring image segmentation by generative adversarial learning

Shuang Qiu, Yao Zhao*, Jianbo Jiao, Yunchao Wei, Shikui Wei

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

22 Citations (Scopus)

Abstract

Referring expression is a kind of language expression being used for referring to particular objects. In this paper, we focus on the problem of image segmentation from natural language referring expressions. Existing works tackle this problem by augmenting the convolutional semantic segmentation networks with an LSTM sentence encoder, which is optimized by a pixel-wise classification loss. We argue that the distribution similarity between the inference and ground truth plays an important role in referring image segmentation. Therefore we introduce a complementary loss considering the consistency between the two distributions. To this end, we propose to train the referring image segmentation model in a generative adversarial fashion, which well addresses the distribution similarity problem. In particular, the proposed adversarial semantic guidance network (ASGN) includes the following advantages: a) more detailed visual information is incorporated by the detail enhancement; b) semantic information counteracts the word embedding impact; c) the proposed adversarial learning approach relieves the distribution inconsistencies. Experimental results on four standard datasets show significant improvements over all the compared baseline models, demonstrating the effectiveness of our method.

Original languageEnglish
Article number8845685
Pages (from-to)1333-1344
Number of pages12
JournalIEEE Transactions on Multimedia
Volume22
Issue number5
DOIs
Publication statusPublished - May 2020

Bibliographical note

Funding Information:
Manuscript received July 26, 2018; revised January 18, 2019 and May 21, 2019; accepted September 4, 2019. Date of publication September 20, 2019; date of current version April 23, 2020. This work was supported in part by National Key Research and Development of China (2016YFB0800404), in part by National Natural Science Foundation of China (61532005 and 61972022), in part by Program of China Scholarships Council (201807095006), and in part by Fundamental Research Funds for the Central Universities (2018JBZ001). The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Mohammed Daoudi. (Corresponding author: Yao Zhao.) S. Qiu, Y. Zhao, and S. Wei are with the Institute of Information Science, Beijing Jiaotong University, Beijing 100044, China, and also with the Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing 100044, China (e-mail: 14120332@bjtu.edu.cn; yzhao@bjtu.edu.cn; shkwei@bjtu.edu.cn).

Publisher Copyright:
© 1999-2012 IEEE.

Keywords

  • Adversarial training
  • Image referring segmentation

ASJC Scopus subject areas

  • Signal Processing
  • Media Technology
  • Computer Science Applications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Referring image segmentation by generative adversarial learning'. Together they form a unique fingerprint.

Cite this