Multi-scale adaptive feature fusion network for semantic segmentation in remote sensing images

Research output: Contribution to journalArticlepeer-review

Standard

Multi-scale adaptive feature fusion network for semantic segmentation in remote sensing images. / Shang, Ronghua; Zhang, Jiyu; Jiao, Licheng; Li, Yangyang; Marturi, Naresh; Stolkin, Rustam.

In: Remote Sensing, Vol. 12, No. 5, 872, 01.03.2020.

Research output: Contribution to journalArticlepeer-review

Harvard

APA

Vancouver

Author

Bibtex

@article{b9d1e750f05c45b78f7a3d3498168e0b,
title = "Multi-scale adaptive feature fusion network for semantic segmentation in remote sensing images",
abstract = "Semantic segmentation of high-resolution remote sensing images is highly challenging due to the presence of a complicated background, irregular target shapes, and similarities in the appearance of multiple target categories. Most of the existing segmentation methods that rely only on simple fusion of the extracted multi-scale features often fail to provide satisfactory results when there is a large difference in the target sizes. Handling this problem through multi-scale context extraction and efficient fusion of multi-scale features, in this paper we present an end-to-end multi-scale adaptive feature fusion network (MANet) for semantic segmentation in remote sensing images. It is a coding and decoding structure that includes a multi-scale context extraction module (MCM) and an adaptive fusion module (AFM). The MCM employs two layers of atrous convolutions with different dilatation rates and global average pooling to extract context information at multiple scales in parallel. MANet embeds the channel attention mechanism to fuse semantic features. The high-and low-level semantic information are concatenated to generate global features via global average pooling. These global features are used as channel weights to acquire adaptive weight information of each channel by the fully connected layer. To accomplish an efficient fusion, these tuned weights are applied to the fused features. Performance of the proposed method has been evaluated by comparing it with six other state-of-the-art networks: fully convolutional networks (FCN), U-net, UZ1, Light-weight RefineNet, DeepLabv3+, and APPD. Experiments performed using the publicly available Potsdam and Vaihingen datasets show that the proposed MANet significantly outperforms the other existing networks, with overall accuracy reaching 89.4% and 88.2%, respectively and with average of F1 reaching 90.4% and 86.7% respectively.",
keywords = "Adaptive fusion, CNN, Deep learning, Multi-scale context, Remote sensing image, Semantic segmentation",
author = "Ronghua Shang and Jiyu Zhang and Licheng Jiao and Yangyang Li and Naresh Marturi and Rustam Stolkin",
note = "Funding Information: Funding: This research was funded by the National Natural Science Foundation of China under Grants Nos. 61773304, 61836009, 61871306, 61772399 and U1701267, the Fund for Foreign Scholars in University Research and Teaching Programs (the 111 Project) under Grants No. B07048, and the Program for Cheung Kong Scholars and Innovative Research Team in University under Grant IRT1170. Publisher Copyright: {\textcopyright} 2020 by the author. Licensee MDPI, Basel, Switzerland. Copyright: Copyright 2020 Elsevier B.V., All rights reserved.",
year = "2020",
month = mar,
day = "1",
doi = "10.3390/rs12050872",
language = "English",
volume = "12",
journal = "Remote Sensing",
issn = "2072-4292",
publisher = "MDPI",
number = "5",

}

RIS

TY - JOUR

T1 - Multi-scale adaptive feature fusion network for semantic segmentation in remote sensing images

AU - Shang, Ronghua

AU - Zhang, Jiyu

AU - Jiao, Licheng

AU - Li, Yangyang

AU - Marturi, Naresh

AU - Stolkin, Rustam

N1 - Funding Information: Funding: This research was funded by the National Natural Science Foundation of China under Grants Nos. 61773304, 61836009, 61871306, 61772399 and U1701267, the Fund for Foreign Scholars in University Research and Teaching Programs (the 111 Project) under Grants No. B07048, and the Program for Cheung Kong Scholars and Innovative Research Team in University under Grant IRT1170. Publisher Copyright: © 2020 by the author. Licensee MDPI, Basel, Switzerland. Copyright: Copyright 2020 Elsevier B.V., All rights reserved.

PY - 2020/3/1

Y1 - 2020/3/1

N2 - Semantic segmentation of high-resolution remote sensing images is highly challenging due to the presence of a complicated background, irregular target shapes, and similarities in the appearance of multiple target categories. Most of the existing segmentation methods that rely only on simple fusion of the extracted multi-scale features often fail to provide satisfactory results when there is a large difference in the target sizes. Handling this problem through multi-scale context extraction and efficient fusion of multi-scale features, in this paper we present an end-to-end multi-scale adaptive feature fusion network (MANet) for semantic segmentation in remote sensing images. It is a coding and decoding structure that includes a multi-scale context extraction module (MCM) and an adaptive fusion module (AFM). The MCM employs two layers of atrous convolutions with different dilatation rates and global average pooling to extract context information at multiple scales in parallel. MANet embeds the channel attention mechanism to fuse semantic features. The high-and low-level semantic information are concatenated to generate global features via global average pooling. These global features are used as channel weights to acquire adaptive weight information of each channel by the fully connected layer. To accomplish an efficient fusion, these tuned weights are applied to the fused features. Performance of the proposed method has been evaluated by comparing it with six other state-of-the-art networks: fully convolutional networks (FCN), U-net, UZ1, Light-weight RefineNet, DeepLabv3+, and APPD. Experiments performed using the publicly available Potsdam and Vaihingen datasets show that the proposed MANet significantly outperforms the other existing networks, with overall accuracy reaching 89.4% and 88.2%, respectively and with average of F1 reaching 90.4% and 86.7% respectively.

AB - Semantic segmentation of high-resolution remote sensing images is highly challenging due to the presence of a complicated background, irregular target shapes, and similarities in the appearance of multiple target categories. Most of the existing segmentation methods that rely only on simple fusion of the extracted multi-scale features often fail to provide satisfactory results when there is a large difference in the target sizes. Handling this problem through multi-scale context extraction and efficient fusion of multi-scale features, in this paper we present an end-to-end multi-scale adaptive feature fusion network (MANet) for semantic segmentation in remote sensing images. It is a coding and decoding structure that includes a multi-scale context extraction module (MCM) and an adaptive fusion module (AFM). The MCM employs two layers of atrous convolutions with different dilatation rates and global average pooling to extract context information at multiple scales in parallel. MANet embeds the channel attention mechanism to fuse semantic features. The high-and low-level semantic information are concatenated to generate global features via global average pooling. These global features are used as channel weights to acquire adaptive weight information of each channel by the fully connected layer. To accomplish an efficient fusion, these tuned weights are applied to the fused features. Performance of the proposed method has been evaluated by comparing it with six other state-of-the-art networks: fully convolutional networks (FCN), U-net, UZ1, Light-weight RefineNet, DeepLabv3+, and APPD. Experiments performed using the publicly available Potsdam and Vaihingen datasets show that the proposed MANet significantly outperforms the other existing networks, with overall accuracy reaching 89.4% and 88.2%, respectively and with average of F1 reaching 90.4% and 86.7% respectively.

KW - Adaptive fusion

KW - CNN

KW - Deep learning

KW - Multi-scale context

KW - Remote sensing image

KW - Semantic segmentation

UR - http://www.scopus.com/inward/record.url?scp=85081923108&partnerID=8YFLogxK

U2 - 10.3390/rs12050872

DO - 10.3390/rs12050872

M3 - Article

AN - SCOPUS:85081923108

VL - 12

JO - Remote Sensing

JF - Remote Sensing

SN - 2072-4292

IS - 5

M1 - 872

ER -