Next Article in Journal
Mapping Woody Volume of Mediterranean Forests by Using SAR and Machine Learning: A Case Study in Central Italy
Next Article in Special Issue
Building Damage Detection Using U-Net with Attention Mechanism from Pre- and Post-Disaster Remote Sensing Datasets
Previous Article in Journal
Uncertainty Assessment of the Vertically-Resolved Cloud Amount for Joint CloudSat–CALIPSO Radar–Lidar Observations
Previous Article in Special Issue
Identification of Abandoned Jujube Fields Using Multi-Temporal High-Resolution Imagery and Machine Learning
 
 
Review
Peer-Review Record

Deep Learning-Based Semantic Segmentation of Urban Features in Satellite Images: A Review and Meta-Analysis

by Bipul Neupane 1, Teerayut Horanont 2,* and Jagannath Aryal 3
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Submission received: 18 January 2021 / Revised: 15 February 2021 / Accepted: 18 February 2021 / Published: 23 February 2021

Round 1

Reviewer 1 Report

This is a very in-depth and detailed review that will be very useful as a reference source

A couple of typographical errors detected:

239 *cite more*

374: satellite image corrcetion

An interesting question for my research: Are there studies and conclusions about the advantage of using images rather than orthoimages as input data in this context? In the case of images captured with cameras embedded in drones, in urban or industrial areas, where the change in perspective is very strong, it would be logical to think that using all the images would provide three-dimensional information

Author Response

Dear Reviewer,

Thank you for your constructive suggestions and comments.

This letter responds the points raised by the academic editor and reviewers on previously submitted manuscript on “Deep learning-based semantic segmentation of urban features in satellite images: A review and meta-analysis”.

There have been some changes on the manuscript, and we summarize the changes in this letter. Also, we have attached a highlighted file to track the changes.

Best Regards,

One behalf of co-authors

 

Changes Summary

The major changes in the new version of manuscript include:

  1. The change in the title of the paper from “Deep learning-based semantic segmentation of urban remote sensing images: A review and meta-analysis” to “Deep learning-based semantic segmentation of urban features in satellite images: A review and meta-analysis”.
  2. Grammatical corrections throughout the paper. These minor corrections are not highlighted in the “Track Changes” document as the correction are made mostly on the articles and the prepositions.
  3. Change in the style of writing from structure such as “As reported by [4]” to “As reported by Cowen et al. [4]” throughout the paper.
  4. Added spatial coverage of 71 papers in the new Figure 1. The figure shows an overview of first author’s affiliation grouped by countries and continents. Also, in line 376 of Section 3.2.2. (Data Sources), three new lines have been added to show the number of local domain study and global study.
  5. All figures are resized and some re-arranged to make them clearer.
  6. Understanding the confusion in the previous diagram, we have split the previous Figure 3 into Figure 4 and Figure 5. Also, more explanations have been added in the caption. Figure 4 shows an overview of the DL architectures employed and Figure 5 shows an overview of the convolutional backbones employed.

 

Reply to individual reviewer:

Comments and Suggestions for Authors

This is a very in-depth and detailed review that will be very useful as a reference source

  1. A couple of typographical errors detected:

239 *cite more*

374: satellite image correction

Response: Please excuse us for the blunder. We have corrected them.

 

  1. An interesting question for my research: Are there studies and conclusions about the advantage of using images rather than orthoimages as input data in this context? In the case of images captured with cameras embedded in drones, in urban or industrial areas, where the change in perspective is very strong, it would be logical to think that using all the images would provide three-dimensional information.

 

Response: The studies we have reviewed mention of using orthoimages, usually tiled to tiles/patches with or without overlap. Some studies used Scene images from Google Street View along with their training data. Also, many used digital surface model (DSM) and normalized DSM computed possibly from orthomosaic as one of the channels of input image to train their network. You may find the papers which used DSMs and other channels of images from “Pre-processing/preparation” column of Appendix-1 or from Section 3.2.3. Data preparation.

Among the papers that used images collected from drones such as ISPRS’s 2D Labeling Dataset, the 3D point cloud could have been used to prepare orthomosaic and DSM using some algorithm like Structure-from-motion (SFM). However, we have not found conclusions that suggests to use 3D information from the papers we reviewed and it would make an interesting research problem if all the information obtained from drones could be used with deep-learning.

 

Author Response File: Author Response.docx

Reviewer 2 Report

Synopsis

This paper extensively reviewed 71 published peer-review papers regarding the topic of semantic segmentation of remote sensing images using deep learning methods. It categorized and analyzed the statistics of those 71 papers based on several different aspects, including methods, data, research target, data preparation, training detail, and performance comparison. This review will benefit the researchers who are interested in the semantic segmentation of remote sensing images using deep learning methods. The analysis is reasonable, the discussion is informational, but the grammar needs a lot of work (if possible, suggest the authors find some native English speaker to do a technical proof-reading). I think this paper needs some minor revision before being accepted.

Could the authors help clarify the spatial coverage of those 71 papers? Are they all for local domain study or they are for global study?

It’s difficult to understand Fig. 3 and Fig. 6. How to connect the labels on the left to the ring chart on the right?  The summation of count seems to be different in these two figures.

Suggest to re-order Appendix, because currently the appearance order is Appendix C, A, and B.

Line 120-122: please rephrase the sentence “new ambiguities and problems…”.

Line 163: “The first proposed FCNs were FCN-8s, FCN-16s, and FCN-32s.” please add references

Line 239: please clarify “*cite more*”

Please spell out the abbreviation of “SVM”, “ARTMAP”, “PBIA”, “ISPRS”, “GRSS” when it first appears

Below please find the detailed comments on some grammatic errors (There are so many errors that I have to give up at Section 3):

  1. Line 5: “in” should be “of”
  2. Line 7: “the” is missing in front of “urban”
  3. Line 19: “to be” should be “being”
  4. Line 20: “a” is missing in front of “greater”
  5. Line 29: “a” is missing in front of “minimum”
  6. Line 31: “the” is missing in front of “spectral”
  7. Line 32: “the” is missing in front of “urban” and “same”
  8. Line 34: “a” is missing in front of “similar”
  9. Line 39: “the” is missing in front of “generation”
  10. Line 44: “for” is redundant
  11. Line 52: “a” is missing in front of “large”
  12. Line 56: “the” is missing in front of “classification”
  13. Line 60: “an” is missing in front of “adequately”
  14. Line 61: “have” should be “has”, “a” is missing in front of “core”
  15. Line 74: “a” is missing in front of “meta-analysis”
  16. Line 75: “a” is missing in front of “few”
  17. Line 76: “the” is missing in front of “study”, “a” is missing in front of “brief”
  18. Line 89: “and concludes …in Section 5” should be “Section 5 concludes…”
  19. Line 92: “the” is missing in front of “available”
  20. Line 103: “the” is missing in front of “majority”
  21. Line 106: “a” is missing in front of “fraction”
  22. Line 113: missing “the” in front of “input”
  23. Line 126: “was” should be “were”, missing “the” in front of “spotlight”
  24. Line 135: “used” is missing in front of “to accelerate”
  25. Line 137: missing “the” in front of “output”
  26. Line 140: missing “the” in front of “input”
  27. Line 149: missing “the” in front of “required”
  28. Line 150: “have” should be “has”
  29. Line 152: “on” is redundant
  30. Line 157: missing “a” in front of “pooling”
  31. Line 161: “the” is missing in front of “same”
  32. Line 165: “in” is missing in front of “finer”
  33. Line 167: “in” is missing in front of “classification”
  34. Line 170: “less” should be “fewer”
  35. Line 172: “the” is missing in front of “first”
  36. Line 174: “layers” should be “layer”
  37. Line 208: “the” is missing in front of “encoder-decoder”
  38. Line 218: “layers” should be “layer”
  39. Line 228: “a” is missing in front of “longer”
  40. Line 247: “the” is missing in front of “encoder”
  41. Line 253: “a” is missing in front of “fully”
  42. Line 255: “a FCN” should be “an FCN”
  43. Line 257: “for” is redundant
  44. Line 261: “a” is missing in front of “new”
  45. Line 266: “an” is missing in front of “active”
  46. Line 267: “the” is missing in front of “original”
  47. Line 270: “encodes” should be “encode”
  48. Line 271: “the” is missing in front of “sigmoid”
  49. Line 272: “as” should be “to”
  50. Line 274-275: “translate image from source domain to target domain” should be “translate an image from the source domain to the target domain”
  51. Line 275: “the” is missing in front of “target domain using GAN”
  52. Line 276: “the” is missing in front of “target dataset”
  53. Line 279: “the the label”, “the” is redundant
  54. Line 290: “a” is missing in front of “global”
  55. Line 297: “the” is missing in front of “highest”
  56. Line 302: “the” is missing in front of “experiment”
  57. Line 303: “other” should be “others”

Author Response

Dear Reviewer,

Thank you for your constructive suggestions and comments.

This letter responds the points raised by the academic editor and reviewers on previously submitted manuscript on “Deep learning-based semantic segmentation of urban features in satellite images: A review and meta-analysis”.

There have been some changes on the manuscript, and we summarize the changes in this letter. Also, we have attached a highlighted file to track the changes.

Best Regards,

One behalf of co-authors

 

Changes Summary

The major changes in the new version of manuscript include:

  1. The change in the title of the paper from “Deep learning-based semantic segmentation of urban remote sensing images: A review and meta-analysis” to “Deep learning-based semantic segmentation of urban features in satellite images: A review and meta-analysis”.
  2. Grammatical corrections throughout the paper. These minor corrections are not highlighted in the “Track Changes” document as the correction are made mostly on the articles and the prepositions.
  3. Change in the style of writing from structure such as “As reported by [4]” to “As reported by Cowen et al. [4]” throughout the paper.
  4. Added spatial coverage of 71 papers in the new Figure 1. The figure shows an overview of first author’s affiliation grouped by countries and continents. Also, in line 376 of Section 3.2.2. (Data Sources), three new lines have been added to show the number of local domain study and global study.
  5. All figures are resized and some re-arranged to make them clearer.
  6. Understanding the confusion in the previous diagram, we have split the previous Figure 3 into Figure 4 and Figure 5. Also, more explanations have been added in the caption. Figure 4 shows an overview of the DL architectures employed and Figure 5 shows an overview of the convolutional backbones employed.

 

Reply to individual reviewer:

Comments and Suggestions for Authors

Synopsis

This paper extensively reviewed 71 published peer-review papers regarding the topic of semantic segmentation of remote sensing images using deep learning methods. It categorized and analyzed the statistics of those 71 papers based on several different aspects, including methods, data, research target, data preparation, training detail, and performance comparison. This review will benefit the researchers who are interested in the semantic segmentation of remote sensing images using deep learning methods. The analysis is reasonable, the discussion is informational, but the grammar needs a lot of work (if possible, suggest the authors find some native English speaker to do a technical proof-reading). I think this paper needs some minor revision before being accepted.

  1. Could the authors help clarify the spatial coverage of those 71 papers? Are they all for local domain study or they are for global study?

 

Response: Figure 1 has been added to show an overview of first author’s affiliation grouped by countries and continents. Also, in line 376 of Section 3.2.2. Data Sources, three new lines have been added to show the number of local domain study and global study.

  1. It’s difficult to understand Fig. 3 and Fig. 6. How to connect the labels on the left to the ring chart on the right? The summation of count seems to be different in these two figures.

Response: Understanding the confusion in the previous diagram, we have split the previous Figure 3 into Figure 4 and Figure 5. Also, more explanations have been added in the caption.

Figure 4 shows an overview of the DL architectures employed. The encoder-decoder models like FCN, U-Net, SegNet, DeepLab, Hourglass and others are the most commonly employed ones. Many papers have employed more than one of these architectures to later fuse the output feature maps, making the total summation more than the number of papers reviewed.

Figure 5 shows an overview of the convolutional backbones employed. Not all papers mention the name of backbone used to prepare their DL model. Out of the papers that mentioned the use of backbones, the most commonly employed are ResNet and VGG. As many papers used multiple DL architectures, more than one backbone is used by those papers. Line 410 to 416 have been added for more explanation.

All figures including previous Fig 6 (now Fig 8) have been resized to make the writings clearer.

  1. Suggest to re-order Appendix, because currently the appearance order is Appendix C, A, and B.

Response: The three appendixes are now in order of A, B and C in the manuscript.

  1. Line 120-122: please rephrase the sentence “new ambiguities and problems…”.

Response: Rephrased to “problems”

  1. Line 163: “The first proposed FCNs were FCN-8s, FCN-16s, and FCN-32s.” please add references.

Response: Citation [29] has been added in Line 164. All FCNs come from the same paper.

  1. Line 239: please clarify “*cite more*”

Response: Please excuse us for the blunder. We have addressed the technical error in Latex – this was a note for us – and we added more citations in line 244 (previously line 239).

  1. Please spell out the abbreviation of “SVM”, “ARTMAP”, “PBIA”, “ISPRS”, “GRSS” when it first appears

Response: The abbreviations have been added as follows:

  • SVM in line 103
  • ARTMAP in line 109
  • PBIA in line 25
  • ISPRS in line 313
  • GRSS in line 340
  1. Below please find the detailed comments on some grammatic errors (There are so many errors that I have to give up at Section 3):

Response: We are extremely thankful for pointing out the grammatical mistakes. In the new iteration, we have used a software to find the grammatical errors and have corrected them. It was noticed that writing in Latex was not helpful for us non-native English speakers in terms of grammar suggestion from software. This was a lesson to us.

Reviewer 3 Report

  1. Please check the format of this manuscript. For example, Line 74 is a different format.
  2. The language should be checked.
  3. Fig.1 should include (a) and (b).
  4. Fig.3 is not clear
  5. Fig.4 and Fig.5 can be changed into a single figure. Fig.7 and Fig.8 have the same case as well.

Author Response

Dear Reviewer,

Thank you for your constructive suggestions and comments.

This letter responds the points raised by the academic editor and reviewers on previously submitted manuscript on “Deep learning-based semantic segmentation of urban features in satellite images: A review and meta-analysis”.

There have been some changes on the manuscript, and we summarize the changes in this letter. Also, we have attached a highlighted file to track the changes.

Best Regards,

One behalf of co-authors

 

Changes Summary

The major changes in the new version of manuscript include:

  1. The change in the title of the paper from “Deep learning-based semantic segmentation of urban remote sensing images: A review and meta-analysis” to “Deep learning-based semantic segmentation of urban features in satellite images: A review and meta-analysis”.
  2. Grammatical corrections throughout the paper. These minor corrections are not highlighted in the “Track Changes” document as the correction are made mostly on the articles and the prepositions.
  3. Change in the style of writing from structure such as “As reported by [4]” to “As reported by Cowen et al. [4]” throughout the paper.
  4. Added spatial coverage of 71 papers in the new Figure 1. The figure shows an overview of first author’s affiliation grouped by countries and continents. Also, in line 376 of Section 3.2.2. (Data Sources), three new lines have been added to show the number of local domain study and global study.
  5. All figures are resized and some re-arranged to make them clearer.
  6. Understanding the confusion in the previous diagram, we have split the previous Figure 3 into Figure 4 and Figure 5. Also, more explanations have been added in the caption. Figure 4 shows an overview of the DL architectures employed and Figure 5 shows an overview of the convolutional backbones employed.

 

Reply to individual reviewer:

Comments and Suggestions for Authors

  1. Please check the format of this manuscript. For example, Line 74 is a different format.

Response: The manuscript is prepared in Overleaf Latex Editor using MDPI’s latex template under the following document class definition:\documentclass[remotesensing, review, submit, moreauthors, pdftex]{Definitions/mdpi}

Format should not be a problem because we used the latex template. Perhaps, there could be some error in the document that the system sent to you? We found that the document format is well otherwise. Please let us know the problem if you still see the same problem. We may contact to the handling editor.

  1. The language should be checked.

Response: We are extremely thankful for the comment. In the new iteration, we have used a software to find the grammatical errors and corrected them manually after reflecting and re-reading. Also, we have changed the style of writing from structure such as “As reported by [4]” to “As reported by Cowen et al. [4]” throughout the paper.

  1. 1 should include (a) and (b).

Response: We have corrected the issue in previous Fig 1, now Fig 2 (a) and Fig 2 (b).

  1. 3 is not clear

Response: We have corrected the issue in previous Fig 3, now Fig 4 and Fig 5. Understanding the confusion in the previous diagram, we have split the previous Fig 3 into Fig 4 and Fig 5. Also, more explanations have been added in the caption.

Figure 4 shows an overview of the DL architectures employed. Many papers have employed more than one of these architectures to later fuse the output feature maps, making the total summation more than the number of papers reviewed.

Figure 5 shows an overview of the convolutional backbones employed. Not all papers mention the name of backbone used to prepare their DL model. Out of the papers that mentioned the use of backbones, the most commonly employed are ResNet and VGG. As many papers used multiple DL architectures, more than one backbone is used by those papers. Line 410 to 416 have been added for more explanation.

All figures have been resized to make the labels and writings clearer.

  1. 4 and Fig.5 can be changed into a single figure. Fig.7 and Fig.8 have the same case as well.

Response: Previous Fig 4, 5, 7, 8 (now Fig 6, 7, 9, 10) have now been separated and kept in a separate line as the previous figures were not clear.

Reviewer 4 Report

Lidar dataset is an important data source for urban land area classification and urban remote sensing. The authors just mentioned about it in the end of the paper. I would like to suggest the authors to define the topic extent. The title is on “Deep learning-based semantic segmentation of urban remote sensing images”. However, the paper is most about deep learning on the urban land cover classification or urban feature extraction. And, urban remote sensing is not only about classification. the It is better to rethink about the objectives of the paper and give a concise title. Line 28-30: “As reported by [4]”. Please think about added the authors’ names. Please check throughout the manuscript. For example, Line 154: When [23] first proposed FCN…. . It is somewhat not fluent. Line 30-38: Citations are needed. Line 39-50: Citations are needed. Line 89-90: grammar issue. Line 72: PASCAL VOC: full name needed. Line 239: *cite more* error.

Author Response

Dear Reviewer,

Thank you for your constructive suggestions and comments.

This letter responds the points raised by the academic editor and reviewers on previously submitted manuscript on “Deep learning-based semantic segmentation of urban features in satellite images: A review and meta-analysis”.

There have been some changes on the manuscripts, and we summarize the changes in this letter. Also, we have attached a highlighted file to track the changes.

Best Regards,

One behalf of co-authors

 

Changes Summary

The major changes in the new version of manuscript include:

  1. The change in the title of the paper from “Deep learning-based semantic segmentation of urban remote sensing images: A review and meta-analysis” to “Deep learning-based semantic segmentation of urban features in satellite images: A review and meta-analysis”.
  2. Grammatical corrections throughout the paper. These minor corrections are not highlighted in the “Track Changes” document as the correction are made mostly on the articles and the prepositions.
  3. Change in the style of writing from structure such as “As reported by [4]” to “As reported by Cowen et al. [4]” throughout the paper.
  4. Added spatial coverage of 71 papers in the new Figure 1. The figure shows an overview of first author’s affiliation grouped by countries and continents. Also, in line 376 of Section 3.2.2. (Data Sources), three new lines have been added to show the number of local domain study and global study.
  5. All figures are resized and some re-arranged to make them clearer.
  6. Understanding the confusion in the previous diagram, we have split the previous Figure 3 into Figure 4 and Figure 5. Also, more explanations have been added in the caption. Figure 4 shows an overview of the DL architectures employed and Figure 5 shows an overview of the convolutional backbones employed.

 

Reply to individual reviewer:

Comments and Suggestions for Authors

  1. Lidar dataset is an important data source for urban land area classification and urban remote sensing. The authors just mentioned about it in the end of the paper. I would like to suggest the authors to define the topic extent. The title is on “Deep learning-based semantic segmentation of urban remote sensing images”. However, the paper is most about deep learning on the urban land cover classification or urban feature extraction. And, urban remote sensing is not only about classification. the It is better to rethink about the objectives of the paper and give a concise title.

Response: Understanding and acknowledging the problem, we have changed our title to “Deep learning-based semantic segmentation of urban features in satellite images: A review and meta-analysis”.

  1. Line 28-30: “As reported by [4]”. Please think about added the authors’ names. Please check throughout the manuscript. For example, Line 154: When [23] first proposed FCN…. . It is somewhat not fluent.

Response: We have now fixed the writing issues throughout the manuscript. Thank you.

  1. Line 30-38: Citations are needed.

Response: The statements are now supported with citations.

  1. Line 39-50: Citations are needed.

Response: The statements are now supported with citations.

  1. Line 89-90: grammar issue.

Response: Changes made as “Section 5 concludes…”

  1. Line 72: PASCAL VOC: full name needed.

Response: “PASCAL Visual Object Classes (VOC)” in Line 73

  1. Line 239: *cite more* error.

Response: Please excuse us for the blunder. We have addressed the technical error in Latex and we added more citations in line 244 (previously line 239).

Round 2

Reviewer 2 Report

The authors did a good job responding to my previous comments. This article is recommended for publication after correcting the following grammatic errors:

  1. Line 17: “classes” may be changed to “objects”
  2. Line 45: change “For e.g.” to “E.g.”
  3. Line 67: missing “the” in front of “segmentation”
  4. Line 94: “list of available …” should be “a list of the available …”
  5. Line 107: “was” should be “were”
  6. Line 139: missing “the” in front of “output”
  7. Line 154: redundant “on” in front of “semantic”
  8. Line 171: missing “the” in front of “max pooling”
  9. Line 192: “bu” should be “by”
  10. Line 200: missing “a” or “the” in front of “pyramid scene….”
  11. Line 205: missing “the” in front of “concatenated”
  12. Line 208: missing “the” in front of “highest”
  13. Line 271: missing “a” or “the” in front of “generator”
  14. Line 273: missing “a” or “the” in front of “decoder”
  15. Line 276: missing “a” or “the” in front of “discriminator”
  16. Line 278: missing “the” in front of “generator”
  17. Line 282: missing “the” in front of “source”
  18. Line 286: redundant “the” in front of “the label”
  19. Line 327: missing “the” in front of “first”, and “is shown” should be “are shown”
  20. Line 328: missing “the” in front of “previous”
  21. Line 347: “for” should be “to”
  22. Line 357: missing “a” in front of “publicly”
  23. Line 361: “satellite image” should be “satellite images”
  24. Line 377: “local(global) domain study” should be “local(global) domain studies”, and missing “the” in front of “dataset”
  25. Line 382: “method” should be “methods”
  26. Line 399: “Majority” should be “The majority”
  27. Line 401: “are” should be “is”
  28. Line 404: “includes” should be “include”
  29. Line 413: “the” is missing in front of “reduction”
  30. Line 423: missing “a” or “the” in front of “special”
  31. Line 437: missing “the” in front of “learning rate”
  32. Line 447: missing “the” in front of “most”
  33. Line 452: missing “the” in front of “metric”
  34. Line 454: missing “the” in front of “ScasNet”
  35. Line 456, 483: missing “the” in front of “ISPRS”
  36. Line 462: missing “the” in front of “Massachusetts”
  37. Line 469: “dataset” should be “datasets”
  38. Line 495: missing “the” in front of “remaining”
  39. Line 512: “dataset” should be “datasets”
  40. Line 513: missing “an” in front of “ablation”
  41. Line 516: missing “the” in front of “boundary”
  42. Line 518: missing “of” in front of “papers”
  43. Line 520: “dataset” should be “datasets”
  44. Line 527: missing “a” in front of “few”
  45. Line 532, 560: “a” should be “an” in front of “RF”
  46. Line 536, 553: missing “a” in front of “slight”
  47. Line 573: missing “the” in front of “use”
  48. Line 575: missing “a” in front of “symmetrical”
  49. Line 581: missing “the” in front of “accuracy”
  50. Line 596, 601: missing “the” in front of “Potsdam”
  51. Line 624: “to” should be “of”
  52. Line 661: “fom” should be “from”
  53. Line 666: missing “a” in front of “high”
  54. Line 667: “have” should be “has”
  55. Line 669: “dataset” should “datasets”
  56. Line 674: missing “a” in front of “few”
  57. Line 675: “colaboratory” should be “Collaboratory”
  58. Line 676: redundant “about” in front of “the GPU”
  59. Line 677: missing “the” in front of “rapid” and “latest”
  60. Line 688: missing “a” in front of “more”
  61. Line 690: missing “the” in front of “most”
  62. Line 694: missing “the” in front of “training”

Author Response

Dear Reviewer,

Thank you for your constructive suggestions and comments.

This letter responds the points raised by the academic editor and reviewers on previously submitted manuscript on “Deep learning-based semantic segmentation of urban features in satellite images: A review and meta-analysis”.

There have been some minor changes on the manuscript, and we summarize the changes in this letter. We will upload the new iteration and track changes after we get the response on round 1 revision from all the reviewers.

Best Regards,

One behalf of co-authors

 

Changes Summary

The changes in the new version of manuscript include:

  1. Grammatical corrections as suggested by the reviewer.

Given Comments and Suggestions for Authors

The authors did a good job responding to my previous comments. This article is recommended for publication after correcting the following grammatic errors:

Response: Our deepest gratitude for the thorough review. We have made the changes as you suggested for all of the comments, except for these two suggestions. We have made the following changes for two of your suggestions:

  1. Line 675: “colaboratory” should be “Collaboratory”

Response: It should be “Google Colaboratory”. We capitalized “C”.

  1. Line 688: missing “a” in front of “more”

Response: “availability of more public datasets including VHR imagery as well as coarser resolution imagery”

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Round 1

Reviewer 1 Report

This manuscript investigated mounts of RS images semantic segmentation approaches, especially on DP-based work. The matter is of interest. I appreciate the author's meticulous and comprehensive work in this review. However, the paper suffers the following limits:  (1) My biggest concern is that if authors did 52 comparative experiments under consistent conditions (including experimental environment, benchmark datasets). From section 4.9, authors gave a very detailed and efficient explanation on performance comparison, It seems to me that these specific metric results are derived from the original paper. Such a performance comparison lacks conviction.  (2)I really like the layout of authors' manuscript, It is very clear and easy to follow. However, in Section 3.2, A large amount of content belongs to the introduction of the background and development of CNN and FCN. There is no doubt, these are the basis for the semantic segmentation of RS images. However, their relevance to the topic of this paper is not as important as the literature in Sec 4. I advise the author to cut this part down.

Reviewer 2 Report

  • Line 30. DL-based techniques. This is the first time the acronym is used in the paper. You need to clarify that it means Deep Learning.
  • The quality of written English should improve. Particularly note that the paper seems rather 'staccato' in presentation in some parts.
  • About the databases the authors mention that “IEEE Xplore and ScienceDirect, and from the web scientific indexing services Web of Science and Google Scholar”. Two issues: First, there are overlaps between these databases. Second, the statement “web scientific indexing services” is misleading. It is not web scientific indexing. Rather, for instance, web of science only indexes scientific journals.
  • How was the search done? Did you search in “titles, abstracts, and keywords” for example? How did it do it in Google Scholar?
  • How many articles were returned from each database? How did you do the screening? How many overlaps? Other issues… please provide details…
  • The search string does not seem to be correct. You want to focus on studies that are focused on urban feature classification. Therefore, the string should be: [[“semantic segmentation”] OR [“pixel-level classification”]] AND [“urban feature classification”] AND [“satellite imagery”]]. In addition, “urban feature classification” is not representative of all studies that use these methods. A paper, for example, may use the methods for examining land use change in cities without referring to “urban feature classification”. Please address these major issues related to the methods. Otherwise the study results will not be accurate.
  • The research questions are too many.
  • Line 116: “…our survey of 52 papers that used DL-based methods”. This means that “deep learning” should have been included in the search string.
  • The results and discussions sections are lengthy and hard to follow. These should be presented in a much concise form. Tables, figures should be used to summarize the contents and present in a more concise manner.
  • The Conclusions section is reductionist and does not provide what is expected to be presented in the Conclusions. It stands like a repetition of the Abstract. I recommend combining discussions and conclusions. In the new combined section, the authors must clearly explain how the 9 research questions mentioned in the Methods section (page 3) have been addressed. In other words, responses to each question should be clearly mentioned (for instance, as separate sub-sections). This also helps improve the presentation style of the paper. Currently, a lot of information has been presented in the results and discussions sections but it is lengthy and can even be presented as Appendix in some cases. Instead of such lengthy details that would make it challenging for the audience to follow the discussions, the authors must clearly provide responses to the research questions (which I believe could have been a smaller number of questions)
  •  

 

 

Reviewer 3 Report

see attached file, please.

Comments for author File: Comments.docx

Reviewer 4 Report

remotesensing-954405

 

The manuscript “Semantic segmentation of remote sensing images for urban feature classification: A survey” addresses an interesting and up-to-date subject, which adhere to Remote Sensing journal policies.

 

In this research which is a review, there was evaluated 52 papers that use deep learning methods for image classification.

 

The manuscript contains interesting discussions and good bibliographic documentation, well fitted in the context. In addition, the work is well conceived, realized and written, so that I did not identify deficiencies or shortcoming.

 

            The manuscript contains genuine work, and I am not sure if it is the bibliographic part (first part) of the main author’s PhD thesis, or a standalone review.

Back to TopTop