Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Class-Wise Fully Convolutional Network for Semantic Segmentation of Remote Sensing Images

Remote Sens. 2021, 13(16), 3211; https://0-doi-org.brum.beds.ac.uk/10.3390/rs13163211

by Tian Tian¹

, Zhengquan Chu², Qian Hu² and Li Ma^3,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Remote Sens. 2021, 13(16), 3211; https://0-doi-org.brum.beds.ac.uk/10.3390/rs13163211

Submission received: 29 June 2021 / Revised: 10 August 2021 / Accepted: 11 August 2021 / Published: 13 August 2021

(This article belongs to the Special Issue Knowledge Graph-Guided Deep Learning for Remote Sensing Image Understanding)

Round 1

Reviewer 1 Report

This manuscript introduces a class-wise FCN(C-FCN) with the form of the traditional encoder-decoder with class-wise transition (CT), class-wise up-sampling (CU), 14 class-wise supervision (CS), and class-wise classification (CC) modules in order to learn class-specific features for remote sensing image segmentation. The proposed method is evaluated on two benchmark datasets and demonstrates prominent results. The motivation is quite clear and the proposed method seems to work. However, there are several issues to be considered in the revision.

The literature review of remote sensing data classification is not sufficient. For example, the following work are also related to the topic introduced in this paper.

[1] ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data

[2] Dense dilated convolutions’ merging network for land cover classification

[3] Deep metric learning based on scalable neighborhood components for remote sensing scene characterization

2. As the motivation pointed out in this paper, the class-wise sepcific features are aimed to be learned in the proposed method. Thus, the traditional final CNN layer of the segmentation architecture is modified into several independent binary classification layers. However, there should be more experiments or results shown in the paper that such modification is better. For example, the inter-class features can be better discriminated based on the proposed method than the traditional one.

Author Response

Thanks for your recognition and valuable comments of our work. The following revisions have been made in the revised paper:

Comment #1: The literature review of remote sensing data classification is not sufficient. For example, the following work is also related to the topic introduced in this paper…

Response:

We supplement the following literatures as the representatives of segmentation networks in the remote sensing field:

[1] Diakogiannis, Foivos I., et al. "ResUNet-a: A deep learning framework for semantic segmentation of remotely sensed data." ISPRS Journal of Photogrammetry and Remote Sensing 162 (2020): 94-114.

[2] Liu, Qinghui, et al. "Dense dilated convolutions’ merging network for land cover classification." IEEE Transactions on Geoscience and Remote Sensing 58.9 (2020): 6309-6320.

[3] Yi, Yaning, et al. "Semantic segmentation of urban buildings from VHR remote sensing imagery using a deep convolutional neural network." Remote sensing 11.15 (2019): 1774.

[4] Kang, Jian, et al. "Deep metric learning based on scalable neighborhood components for remote sensing scene characterization." IEEE Transactions on Geoscience and Remote Sensing 58.12 (2020): 8905-8918.

Comment #2: …However, there should be more experiments or results shown in the paper that such modification is better. For example, the inter-class features can be better discriminated based on the proposed method than the traditional one.

Response:

From Sect.4.4.1, the experiments have shown the benefits of class-wise design compared with the backbone Res-Unet. Results of all Potsdam categories and most Vaihigen categories are better with the design of class-wise modules based on the backbone segmentation network, which indicate that the inter-class features are better discriminated. Moreover, results of the hard class “clutter” are enhanced as well, which indicate the within-class features are better extracted too.

For other ablation experiments, Sect.4.4.2 gives the results with and without the CS module. The reason that why only the CS module is carried out in the ablation experiments is because the CT, CU and CC modules cannot be replaced individually. They rely on the design of class-wise structure, and can only exist or be replaced together. Thus, Sect.4.4.1 is actually an ablation experiment of the whole class-wise structure, and Sect.4.4.2 is an ablation experiment of the only module that is free to be removed. Moreover, results of CS module have validated the necessity of it.

Comparison experiments on two datasets are shown in Sect.4.4.4~4.4.5. All involved methods are representatives of semantic segmentation methods. All of experiments are not cited but implemented by ourselves on the same settings and fair circumstances. Though the results do not go beyond the dataset challenge records, they are actual results we can achieve on our equipment. Moreover, the backbone networks of the involved methods are all basic ones and experiments are done without bells and whistles. No post-processing is applied to enhance the quantitative and visualized results. We think these results are authentic to reflect the effectiveness of the proposed framework.

Though we don’t know what experiments can be further supplemented besides the quantitative ones, visualized ones, ablation ones and parameters, we refer to other related literatures and comments of other reviewers. We add one subsection Sect. 4.4.6 to show the parameter sizes and time consumption and to respond the concerns of one reviewer. This part shows the reasonable parameter size of our model as well as some shortcomings of our design, which can be adjusted with the selection of hyper-parameters and improved in the future work. With the additional experiments, we hope the comparisons of the involved methods are comprehensive and thorough.

Reviewer 2 Report

The article deals with a very important issue in remote sensing, namely semantic segmentation.
The authors of the article present a modified version of the fully convolutional network for the problem of semantic label for every pixel in given image.
In proposed solution, the part responsible for the decoder, which is class-wise to process class-specific features, has been improved.
In this solution, each class possesses an individual decoder and binary classifier instead of a parallel running of the entire encoder-decoder path for each category. The research carried out by the authors proved that the proposed Class-wise Fully Convolutional Network significantly improves the efficiency of the segmentation process and has a significant potential for applications. In my opinion this article can be published in this form.

Author Response

Thanks for your recognition of our work. We have further improved it according other reviewer’s comments. We hope it can be accepted and published by Remote Sensing.

Reviewer 3 Report

Dear authors,

Congratulations for your work.

I have found a very good paper, easy to read, with high quality figures and elegant formalism.

Three suggestions only:

(1) Abstract needs to be improved. It is not clear.

(2) English needs to be polished, not too much, but some checking is needed.

(3) The authors say that the new end-to-end algorithm is better than previous and it is demonstrated using the two benchmarks of Vaihingen and Postdam. However a comparison concerning the CPU time is hardly needed because the new segmentation algorithms need to be more efficient in classifying and in CPU time.

Good luck!

Author Response

Thanks for your recognition and valuable comments of our work. The following revisions have been made in the revised paper. Tiny revisions of spellings and writings are not highlighted one by one for a neat view.

Comment #1: Abstract needs to be improved. It is not clear.

Response: We have improved the abstract to make it clearer to understand.

Comment #2: English needs to be polished, not too much, but some checking is needed.

Response: We have carefully read the manuscript again and the writing is improved to avoid typos and grammar problems. Thanks very much for your meticulous reading.

Comment #3: The authors say that the new end-to-end algorithm is better than previous and it is demonstrated using the two benchmarks of Vaihingen and Postdam. However a comparison concerning the CPU time is hardly needed because the new segmentation algorithms need to be more efficient in classifying and in CPU time.

Response: We’re sorry we don’t very understand whether the time comparison is needed or not according to the comment. Most related literatures usually do not compare on efficiency performances unless they focus on the improvements of lightweight model. However, there are also some references give the time analysis such as shown in literature [1]. Therefore, for comprehensiveness, we supplement additional experiments of parameter amounts and running timings in Sect. 4.4.6 to respond your concerns of time. As seen from the results, our proposed model has a small parameter size, which is only greater than UNet. However, due to the CT module, the selection of optimal hyper-parameter k and the group convolutions, its inference time is slower than other baselines. The forward/backward pass parameter size can be greatly reduced and the running time can be obviously speeded up by choosing a smaller k (such as 8 instead of 32) with a slightly decrease on performances. In the future work, the class-wise idea with better and faster implementations will also be concerned as our priority research topic.

[1] Liu, Qinghui, et al. "Dense dilated convolutions’ merging network for land cover classification." IEEE Transactions on Geoscience and Remote Sensing 58.9 (2020): 6309-6320.

Round 2

Reviewer 1 Report

The authors have addressed all the comments, and I think the paper can be accapted for publication.

Reviewer 3 Report

Dear authors,

Thank you for your reply. I have found very satisfactory your comments and the solutions to my questions.

Congratulations

Article Menu

Class-Wise Fully Convolutional Network for Semantic Segmentation of Remote Sensing Images

Further Information

Guidelines

MDPI Initiatives

Follow MDPI