Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Multi-Stage Semantic Segmentation Quantifies Fragmentation of Small Habitats at a Landscape Scale

Remote Sens. 2023, 15(22), 5277; https://0-doi-org.brum.beds.ac.uk/10.3390/rs15225277

by Thijs L. van der Plas^1,2

, Simon T. Geikie²

, David G. Alexander^2,†

and Daniel M. Simms^3,*,†

Reviewer 1:

Zhujun Gu

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Reviewer 4: Anonymous

Remote Sens. 2023, 15(22), 5277; https://0-doi-org.brum.beds.ac.uk/10.3390/rs15225277

Submission received: 9 October 2023 / Revised: 3 November 2023 / Accepted: 4 November 2023 / Published: 7 November 2023

(This article belongs to the Special Issue Towards Biodiversity Conservation: Remote Sensing Applications in Ecological Modeling)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

High accuracy remote sensing classification of land cover has important ecological significance. This paper present a Machine Learning approach, using Convolutional Neural Networks (CNNs) to classify Land Cover across the Peak District National Park in the UK, and then analyze the degree and distribution of fragmentation of wet grassland and rush pasture, at a landscape scale in the PDNP. The manuscript was basically well structured, with rational methods, results and analysis. There are some issues need to note：

1) In the Introduction Section, it is suggested to introduce the research progress and existing problems of machine learning technology, as well as the objectives of this paper;

2) The contents of research methods, results or conclusions, such as Line55-87, are recommended not to be included in the Introduction Section, but be included in the Methods, Results and Conclusion Sections;

3) Because there are complex inclusions and cross-relationships among the training samples attributes, the labeling of training samples has a great impact on classification results. How did you deal with this problem?

4）There are 1027 patches (64 m x 64 m) selected for training and testing the models, Is it enough? Did you consider enhance the samples to a suitable number?

5) Pay attention to some details, for example, Line108 “for this task; the U-Net [11,33,34].”，change ”;” to ”,”.

Comments on the Quality of English Language

Pay attention to some details, for example, Line108 “for this task; the U-Net [11,33,34].”，change ”;” to ”,”.

Author Response

We thank the reviewer very much for their time and their positive evaluation of our manuscript. In the following, all reviewer questions are listed in black and our responses in blue. All changes are highlighted in the marked-up version of the manuscript.

1) In the Introduction Section, it is suggested to introduce the research progress and existing problems of machine learning technology, as well as the objectives of this paper;

Thank you for your suggestion. We had tried to focus the paper on the practical application of CNNs, which overcomes many of the limitations in ‘shallow’ ML approaches. We can see that in doing so we neglected an introduction to ML in LC mapping, and have now taken on board your (and Reviewer 3’s) comments. New paragraph added to the introduction with references (second paragraph).

The objectives are stated as challenges to overcome in the second to last paragraph of the introduction.

We have taken the decision to present an overview of our approach in the introduction as, because of the technical nature of the work, we feel the manuscript is more easily understood using this structure. However, we do agree that there is detail that can be moved to more appropriate sections without breaking the intended narrative. Introduction edited.

The model considers RGB image patches of 64 m x 64 m, which are often composed of multiple LC classes. The examples in Figs 3 and 4 illustrate this. Therefore, it is indeed not possible to give an image patch a single label, as this could therefore contain many errors. Instead, we labelled each pixel individually (by drawing boundaries between LC classes), and let the model perform a semantic segmentation task (instead of an image classification task, following the terminology of Kattenborn et al., 2021, section 3.2.2). Hence, our model performs end-to-end learning of RGB pixels to LC pixels. Notably, the convolutional networks that we used work by then using the patterns between (nearby) pixels. We hope that this has answered your question.

4）There are 1027 patches (64 m x 64 m) selected for training and testing the models, Is it enough? Did you consider enhance the samples to a suitable number?

Thank you for this question. We consider this to be a sufficient amount of data, because the model was able to learn all LC classes. It is important to highlight that 1,027 patches does not equal 1,027 data points, because this is an semantic segmentation task, meaning that every pixel in each image (5122 = 262,144 pixels per image patch) is classified (with of course strong correlation among nearby pixels) - as opposed to objection classification tasks where each image patch would only constitute a single data point. The example patches in Fig 4e illustrate this, showing that there can be many different LC elements within one image patch. Lastly, we used data augmentation (flipping horizontally and or vertically) during training (Methods 2.7).

We trust that this answers your question.

5) Pay attention to some details, for example, Line108 “for this task; the U-Net [11,33,34].”，change ”;” to ”,”.

Done, thank you. We have critically re-read the manuscript to avoid any other typos.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Remote sensing technology can play an important role in the analysis of ecological diversity, which is well reflected in this paper. There are some suggestions for land cover classification methods.

(1) APGB remote sensing images are collected multiple times within 2 years. So, is there any radiation correction applied to images between different years to ensure uniformity of image spectra. Please consider in the pre-processing or explanation.

(2) The paper has mentioned five CNN models multiple times, hoping to provide the role of each model and the processing flow including the five models. And how to train each CNN model with one labeled data set.

(3)The paper is relatively new in the treatment of f3D objects, and it is recommended to provide a comparison of experimental results between traditional and new methods.

Author Response

We thank you, as well as reviewer 3, for raising this important point. The variability in fly-date is inherent to this data set, and so we trained the model on data from all months and years to make the model predictions robust to seasonal/annual variability. To do so, we sampled data from across the study area (Figure 1c). No (radiometric) correction was applied except for z-scoring, as detailed in section 2.7 (Model training). We have added an explanation in the discussion (section 4.1) to further address this point.

Fig 2a provides a flow chart of the methodology and different CNN models. We have now improved the references to this figure in Methods to clarify this. Indeed, we have one labeled data set (split into a train and test set). The single-stage classifier (used for control analysis) was trained directly on these labels. The main classifier was trained on the corresponding high-level labels (for example, C1, C2, C4 and C5 were all relabeled to C). The three detailed classifiers were only trained on their relevant classes (for example, the C classifier was only trained to distinguish C1, C2, C4 and C5). We have now clarified this in Methods.

(3)The paper is relatively new in the treatment of f3D objects, and it is recommended to provide a comparison of experimental results between traditional and new methods.

The traditional method for mapping rush pasture (F3d) is through ground surveys or visual interpretation of aerial photography. Ground survey is not feasible at the scale of our study (1,439km2), so we compared the CNN results to visual interpretation by experts (verification data had a total area of 1.26km2). The resulting precision and sensitivity values are denoted in Tables 3 and 4. Crucially, the major advantage of our CNN method is that we can now apply this at scale, allowing us to map rush pasture in detail across the Peak District. We have now further motivated this in Methods 2.6.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

A comparison of different Deep Learning methods would be appropriate to see the model's efficiency and accuracy, currently, there are only 2 approaches one can see. Please include relevant literature and expand the Introduction. Include more screenshots from different areas of the park for the classified maps. Can using other variables such as PCA, ICA, and Texture can aid in more accurate classification?

Comments are mentioned in PDF.

Comments for author File: Comments.pdf

Comments on the Quality of English Language

Needs minor changes to increase the readability to a broader audience.

Author Response

We thank the reviewer for their detailed, constructive comments. Below, we have addressed their major comments (in blue), while all other minor comments (in the PDF) have been resolved directly, unless further addressed in the below. All changes are highlighted in the marked-up version of the manuscript.

A comparison of different Deep Learning methods would be appropriate to see the model's efficiency and accuracy, currently, there are only 2 approaches one can see.

This is a good point, and we have indeed tried two different backbones, as well as two different loss functions (Methods 2.7). Other parameters were not varied for performance reasons: Image input size and batch size were maximised for the available computational resources, and the standard, best available optimizer (Adam) and network architecture for semantic segmentation (U-net) were used. For example, Lobo Torres et al. (2020) compared of five architectures (U-Net, SegNet, FC-DenseNet, 2x DeepLab) in a remote sensing tree semantic segmentation task, and found that U-Nets provided the second-best performance and second-lowest training time (while no method ranked first in both). We have now added this reference to the manuscript for extra clarification. It is also worth noting that many other backbones, such as Inception-Resnet-V2 as suggested by the reviewer, have substantially more parameters (2.5x as much as Resnet50 that we used). This would increase the computational load, which would then require a lower batch size etc.

Please include relevant literature and expand the Introduction.

Thank you for the suggested papers, we have now included the most relevant ones in our revised introduction (and another one was already implemented in our paper elsewhere). We had tried to focus the paper on the practical application of CNNs, which overcomes many of the limitations in ‘shallow’ ML approaches. We can see that in doing so we neglected an introduction to ML in LC mapping, and have now taken on board your and Reviewer 1’s comments. New paragraph added to the introduction with references (second paragraph).

Include more screenshots from different areas of the park for the classified maps.

Thank you for this suggestion. We have now also added an extra figure - Fig 8 - that shows the RGB image and LC classifications for the example of Fig 7a.

Can using other variables such as PCA, ICA, and Texture can aid in more accurate classification?

The CNN is able to learn spatial patterns end-to-end from the raw input image data, effectively incorporating texture, shape and local context within the model without the need for complex preprocessing and feature selection. In this way, CNNs differ from ‘shallow’ ML approaches like Random Forests, and therefore don’t require feature selection by PCA etc. This is an important point, and has now been added to the introduction.

[From PDF, line 100] Is the dataset inherently 3 bands?

Yes, we have updated Methods 2.2 to make this clear.

[From PDF line 260] What was the resolution of the soils data? Did authors field verify? How was it incorporated in respect to the high-resolution of image?

These soils data - the only publicly available peat soil data set at this scale and location- is provided as vector file (https://naturalengland-defra.opendata.arcgis.com/datasets/1e5a1cdb2ab64b1a94852fb982c42b52_0/about). To combine this with the model predictions, we vectorised model predictions (at 12.5 cm resolution), and then computed the intersection between the .shp files. Given the difficulty of measuring soil composition, we did not field verify it. We would like to highlight that this data only serves the purpose of splitting 3 classes (D1, D2 and D6) into their sub-classes (peaty and non-peaty variants), so it is up to the end-user whether they choose to use the model predictions split, or not split, by soil content. All accuracy metrics presented in the paper are for model results before post-processing with ancillary data.

[From PDF, Fig 4d-iv] what's the issue with image iv, classes D1,3,6 ? Doesn't look rightly classified. & [From PDF Fig 5] Either use a different color scheme or show more snippets of classified areas, currently in these snippets visually there seems to be issues present in classification.

Thank you for your question. We have reviewed these figures and believe that these images are well classified (within the stated accuracy). Fig 4d-iv shows the correct classification of Bracken (D3) in the NE corner, with some (correct) spill-over in the NW corner, dense Heather (D1) in the centre of the image, and heather/grass mosaics (D6) in the SE corner. Fig 5 shows much larger areas, and indeed contains some minor misclassifications. For example, in 5b-i, some cut/burned heather (D1) patches are classified as heather/grass mosaic (D6) (NW corner, S) and some lead mining features misclassified as broadleaved trees (C1) (centre of image). We believe that this is a realistic representation of the model performance, which includes small features that are misclassified.

[From PDF, line 325] How many training patches were used for each class? It would be good to know the training sample distribution.

Thank you for this suggestion. We have now added a second y-axis to figures 3b and 4b that shows the equivalent number of full patches. (In other words, the total area per class divided by the area per patch).

[From PDF, line 411] Data DOI not found

We apologise for the confusion. Although we have already uploaded the data to the data repository, the DOI/data set will be published upon publication of this article, as the repository cannot be edited [during the review process] once the DOI is published. For your information, the repository contains all train/test data samples (RGB images + our LC annotations), in both python format (directly compatible with our published code) and in TIF format (compatible with any data analysis software). Further the repository contains extensive data descriptions, and is released with a CC-BY-4.0 licence.

[From PDF, line 413] The data was flown from 2019-2022, across different months, so how did you address the variability?

We thank you, as well as reviewer 1, for raising this important point. The variability in fly-date is inherent to this data set, and so we trained the model on data from all months and years to make the model predictions robust to seasonal/annual variability. To do so, we sampled data from across the study area (Figure 1c). No (radiometric) correction was applied except for z-scoring, as detailed in section 2.7 (Model training). We have added an explanation in the discussion (section 4.1) to further address this point.

[From PDF, line 453] How much area was verified using ground truth?

30% of the 1,027 annotated patches were used for verification (with site visits in any areas of labelling uncertainty), which totals 1.26 km2. (And the model was trained on 70%, i.e., 2.94 km2.)

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

In this manuscript, the authors developed a Machine Learning approach in order to present at high resolution and fine class detail, targeting smaller and more specific habitats.

The authors present very interesting work and very huge effort was made.

The authors have conducted thorough experiments, employed appropriate methodologies, and analyzed the data effectively. Overall, the material and methods sections are well-described. But, I suggest adding a flowchart about the different steps of the methodology.

The results are well presented.

Unfortunately, the discussion is very limited mainly the subsection: Application of classifying fragmentation of patch habitats at a landscape scale. I suggest building specific research questions and hypotheses so the discussion will be more structured.

I suggest adding a brief paragraph about the implications of this study to the conclusion section.

Author Response

Thank you for your positive evaluation of our work. We had already included a flowchart of the methodology, Fig 2a, but now realise that this wasn’t sufficiently referenced in Methods. We have now added more references to link each methodological step to this flow chart.

We have now substantially added to the discussion section. This includes a broader discussion on habitat fragmentation in that subsection. Our introduction states our research question as challenges, which we reflect on in the discussion. The goal of the habitat fragmentation analysis was to develop and demonstrate the possibilities of high spatial and class resolution LC maps. We have now provided more context to this in the discussion.

I suggest adding a brief paragraph about the implications of this study to the conclusion section.

Thank you for your suggestion, we have added information on impact in the conclusion.

Author Response File: Author Response.pdf

Article Menu

Multi-Stage Semantic Segmentation Quantifies Fragmentation of Small Habitats at a Landscape Scale

Further Information

Guidelines

MDPI Initiatives

Follow MDPI