Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Printed Edition

A printed edition of this Special Issue is available at MDPI Books....

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Cloud and Snow Segmentation in Satellite Images Using an Encoder–Decoder Deep Convolutional Neural Networks

ISPRS Int. J. Geo-Inf. 2021, 10(7), 462; https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi10070462

by Kai Zheng¹, Jiansheng Li^1,*, Lei Ding², Jianfeng Yang¹, Xucheng Zhang¹ and Xun Zhang¹

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

ISPRS Int. J. Geo-Inf. 2021, 10(7), 462; https://0-doi-org.brum.beds.ac.uk/10.3390/ijgi10070462

Submission received: 21 May 2021 / Revised: 23 June 2021 / Accepted: 1 July 2021 / Published: 6 July 2021

(This article belongs to the Special Issue Artificial Intelligence for Multisource Geospatial Information)

Round 1

Reviewer 1 Report

General comment:

This study presents an interesting and meaningful method based on an improved Deep Convolutional Neural Network, then applied to TH-1 images, to segment both clouds and snow, which is very useful for quickly and automatically analyzing the increasing amount of acquired remote sensing data. However, it is not clear how this method achieve better snow VS clouds segmentation performance against traditional threshold-based methods and other NNs. Overall, the general style could be improved, as well as it is necessary to provide some more details to have all the information necessary to fully appreciate the manuscript. The context may be improved a little bit to reinforce the scientific aspect, particularly in the Discussion and Conclusions section.

Specific comments:

In the scientific literature the terms "dataset", "upsampling" and "output" are mainly used instead of "data set", "up sampling" and "out put", I suggest you to make these terms uniform both in the main text and in the figures.
Lines 96-98, please could you improve the TH-1 sensors description, resolution, etc.?
Lines 117-135, I think that the point description makes the reading cumbersome, I would suggest you eliminate this shape and make the text more fluent.
Lines 142-146, would it be possible to move this general description of the place of application of the method in the introduction section and give here more details and including a figure that summarizes the tiles considered?
Section 2.2.2, please improve the text with more technical terms and, again, I suggest to avoid point description. Moreover, how are the rough and the fine labelling performed? Which are the DCNN outputs?
Figures, please avoid repetitions, maybe through the use of vertical and horizontal headers. Plus, enhance captions with accurate descriptions.
Lines 182, could you give an explanation of this parameter?
Line 194, please clarify what do you mean with “we add skip connections in the decoder”
Line 230, please change to “Exponential Linear Unit (ELU)”
Line 239, I suggest using scientific notation.
Table 1, the caption is not exhaustive, I suggest improving the description to guide in reading the table, eg. what does the "Number" column refer to?
Lines 262, here you are talking about PA and MIoU. What do these acronyms refer to? I would suggest introducing a paragraph to describe how the comparison/validation was conducted and the evaluation metrics used. Moreover, how is performed the distinction between clouds and snow?
Line 26, please, change Tab. to Table. (Also in the main text, this journal requires extended Table and Figure also when calling them in the main text). Could you give a name to your network instead of “Ours”?
Lines 281-287, which are the other two/three networks? Please specify. Line 284, by "false alarm" do you mean the occurrence of false positives? The images in Fig. 9 are not labelled, so it is not possible to understand the text.
Lines 289, does “from different time-phase” means “acquired in different seasons of the year”?
Line 298, I think it is not correct to say that traditional algorithms are not robust, literature proves it, so please change the statement (also in line 35, Introduction section).
Lines 303-304, where are presented this additional experiments?
Lines 309-310, please provide supporting literature to reinforce this affirmation.

Kind regards

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 2 Report

I found the paper "Cloud and Snow Segmentation in Satellite Images Using an Encoder-Decoder Deep Convolutional Neural Networks" interesting and worth to be published in a paper after suggested revisions. The introduction provides sufficient background and includes all relevant references, the research design is appropriate, methods are adequately described, results are clearly presented and the conclusions are supported by the results.

I suggest the following changes for further improvement of this paper:

Some acronyms in the paper are not presented with a full name (for example, PA, MIoU…), so there may be confusion. Therefore, I suggest that the term be listed first in the full name, and the acronym in parentheses, and afterwards only the acronym can be used.

In lines 125-128 it is stated that the experiment would be conducted using TH-1 images of different temporal phase and Google earth images. But Google earth images are not mentioned in a paper except in a Conclusions, but just as an example for further testing of the proposed method.

It is not explained why chosen criterion of the fine-labelled images (ie., more accurate edge markings) is used in the paper (lines 163-168).

In tab. 2 the method proposed in the paper is marked as Ours. Could there be used some specific name or abbreviation for the proposed method, since it is not always referred ad “Ours” in the text? What does “proposed network” in tab. 1 and later in the text refer to?

It is not clear the number of images used for quantitative analyses (lines 280-288 and Fig. 9). Are only 4 images used? In the same paragraph, Fig. 9 is referred to as 9(a), 9(b), 9(c) or 9(d) but there are no such labels in Fig. 9.

Line 80: There is no number of reference in the literature after He et al [??]

Line 160: Instead of Figure 1 it should be Figure 2

Line 166: Instead of Figure 2 it should be Figure 3

Line 173: Instead of Figure 3 it should be Figure 4

Line 208: Sentence (together with a mark) "means element-wise sum in Figure 6" could be moved to the title (description) of Figure 6.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

A nice job was done. The methodology is clear and interesting. The work is showing good results.

However, I suggest you should consider more recent references from 2020 or even 2021 (so far none from 2021 exists) at least concerning the segmentation methods so that you can compare your approach with them to show how your methods are better and more practical.

The conclusions section also needs to be extended to contain more information about the results and their comparison with others' works and researches.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Dear Authors,

I am satisfied with the revisions made and accept the paper in its current form,

Kind regards

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

Round 1

Reviewer 1 Report

The manuscript by Zheng et al. proposes an innovative method of cloud and snow segmentation of satellite images using encoder-decoder deep convolutional neural networks. Considering the potential roles of satellite imageries in science and policy measures, precise information from satellites are inevitable. However, the quality of satellite-based information is influenced by the impacts of several path radiance and other factors including cloud, snow, etc. In this context, the advanced and improved methods of cloud and snow segmentations are much needed. Although the manuscript effectively presents the results and its evaluation, there are some flaws in its structure which need to be modified. I recommend accepting the manuscript after the incorporation of the following major issues that should be resolved.

Line 54: Please replace 'With' by 'with'.
Section 2 'Related Work' is not required separately. In fact, the literature review is conventionally kept in the introduction in the research articles. I would suggest the authors to merge the section 2 in the introduction without sub-sections 2.1 and 2.2.
Also, in the introduction, authors have put points 1-4 in the last section as the process of work which is not a matter of introduction. In the last paragraph of the introduction, please write the objective of the paper only.
Please make Section 2 as 'Methodology' which will include: 2.1 Process (include point 1-4 of the last paragraph of the introduction), then section 2.2. Datasets and Section 2.3. Methods (which is currently explained in section 4).
Line 147: Replace 'data' with 'Data'.
Line 148: Please provide the full start and end date i.e. from 2018 to 2019 when the data was acquired.
Section 3.2. step 1: Please use passive voice. The entire section 3 should be re-written with English proofreading.
Also, in line 98, please define what is the difference between rough labeled and fine labeled images.
What is the difference between sections 4 and 5? Section 5 also comes under methods only. I would suggest distinctly categorize all the methodology related things under one section and results separately under the result and discussion section.
Figure 6: Please improve the quality of the figure.
The section comparison and analysis should come under results.
Table 1 is missing from the manuscript. Please check.
Since the authors present an innovative method, please include the further scope and suitability of such a method on different satellites in the conclusion section.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

The work is good enough for publication.

The results are promising but still need to be compared with other works from 2020 to enhance the work with the most recent methodologies and algorithms.

The language is readable but there still are some typos to correct and improve.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

In this paper, the authors apply the deep CNN to the cloud and snow segmentation of RSIs. The proposed network combines the DeepLabv3+ and FCN. Though the accuracy has been increased slightly (according to table 3), the number of parameters for the new network has been increased a lot. Therefore, the innovation of this work is really small. Yet, the discussion about label quality and quantity is interesting, but should do more experiments to support their conclusion.
The detailed comments are as follows:
1, Give the full name of TH. Besides, provide more information about the characteristics of the sensor of TH-1, the spatial resolution, the image acquisition time, the location of these images, etc.
2, What’s the relationship between figure 5 and 6? Besides, it is better to mark the ‘encoder’ part and ‘decoder’ part in figure 5. And why there is a ‘ASPP Module’ in the yellow block in figure 5.
3, In 4.1.1, the author mentioned snow and cloud segmentation ‘puts forward a high demand for the detail extraction ability’. Why the ResNet50 has a better ‘detail extraction ability’ than Xception?
4, Is the “focal loss” in section 4.1.3 proposed by the authors? If not, please provide the reference.
5, What is the relation between “Stage 1,2” in table 2 and “Stage 1-5” in figure 6?
6, Experimental comparison in section 5.4 is too small. For table 3, besides Xception and ResNet50, more cloud detection methods based on deep learning should be selected for comparison. Moreover, the authors should do the accuracy comparison for different cases such as images including thick cloud, thin cloud, cloud+snow/ice.
6, The authors conclude that “the cloud snow segmentation performance is positively related to the label quantity more than the quality”. It is doubtable since too few experiments have been done. The testing sample used in table 4 is the rough-labeled samples. So it seems that it is not fair for results using fine-labeled samples. The authors should present much more comparisons to support this conclusion.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 4 Report

Title: Cloud and Snow Segmentation of TH-1 Satellite Images Using Encoder-Decoder Deep Convolutional Neural Networks

As requested, I have reviewed the above-titled paper for potential publication in the IJGI-MDPI Journal. I divided my comments in the sections presented as follows.

Contribution

This paper applies an image classification approach (DCNN) to segment cloud and snow based on the TH-1 satellite images. It should be said that the similar characteristics of high reflectivity of cloud and snow in visible light band, some classified methods are hard to distinguish them, which is prone to misjudgment and hinders the automatic processing. The procedure involves DCNN network with encoder-decoder architecture. The concatenated feature maps are sent to 3×3 convolution and 4×upsampling to get the segmentation result. Then, TH-1 satellite images are rough-labeled and fine-labeled including cloud, snow and background samples. The results are discussed through segmentation comparisons (Otsu, Xception, ResNet50, and this study approach) and the data quality and quantity. The authors clam that the proposed method can accurately segment cloud and snow, and has good generalization ability. Finally, authors comment on the performance of cloud snow segmentation is mainly positively related to the label quantity. The main motivation of the manuscript is based that the authors believe that the proposed end-to-end cloud and snow segmentation network could avoid the shortcomings of traditional cloud detection algorithms including manual setting of threshold, time-consuming and many preconditions.

Briefly some of my concerns with the paper include:

Ln. 173 refers that fine-labeled images are marked with error less than 5 pixels, but there is no explanation about the error of rough-labeled images. Authors should clearly explain the difference in accuracy between the two labeled images.
The legends of Fig. 3 are too small to read, please draw and enlarge the layout.
In Ln. 177 to Ln. 179, authors mention that the data set is augmented in order to enhance the problem of over fitting. Is all data set augmented? Or is the data selected at random for augmentation?
In Ln. 194, Fig. 4 must be corrected to Fig. 5.
In Ln. 224, authors used Focal loss to solve the problem of sample imbalance in semantic segmentation task. Does the formula 1 refer to other references? If so, it should be introduced.
In Ln. 200 to Ln. 222, the article is mentioned that ” there are many kinds of clouds with different shapes. Generally, the proportion of thin clouds and cirrus clouds is less than that of thick clouds. The training data set in this paper also reflects the characteristics of less data of thin clouds, cirrus clouds and snow.” However, authors have not explained how to define thin clouds, cirrus clouds, thick clouds and snow before classification. Reviewer strongly recommends that authors add a chapter to qualitatively and quantitatively explain the classified characteristics.
Fig. 7 and Fig. 8 are too small to judge the author’s experimental data.
Fig. 9 shows the identification results of the various methods, but these results can only reflect the identification of properties. How to provide accuracies or uncertainties for the results presented to the reader in such approach? On the other hand, I have not noticed any further comments in the manuscript to contrast the results of the proposed methodology with the results of different techniques and statistical measurements explored in the literature review. Why do the authors not have considered or explored the statistical roughness measurements based on texture statistics derived from remotely sensed information? Did the authors explore the geostatistical approach, such as the confusion matrix and Cohen's kappa coefficient? If not, why?

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Accept in present form

Reviewer 3 Report

See attached file.

Reviewer 4 Report

1. There is a problem with typesetting issues in Ln. 134 to Ln. 156, so it is hard to understand what authors want to express.

2. The legend text in Figure 3 is still unclear.

3. The authors have not made a good response and corrections to the Point 6. Deep learning analysis relies on sample identification and training. However, the author's reply indicated that deep learning analysis does not need to identify the types of clouds and snow in advance, which is different from the deep learning analysis recognized by the reviewer. The reviewer believes that any image classification requires human definition and judgment basis, otherwise it is impossible to make a reasonable explanation for the classification results. Reviewer strongly requires that authors add a chapter to qualitatively and quantitatively explain the classified characteristics.

Article Menu

Printed Edition

Cloud and Snow Segmentation in Satellite Images Using an Encoder–Decoder Deep Convolutional Neural Networks

Further Information

Guidelines

MDPI Initiatives

Follow MDPI