Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

FARMSAR: Fixing AgRicultural Mislabels Using Sentinel-1 Time Series and AutoencodeRs

Remote Sens. 2023, 15(1), 35; https://0-doi-org.brum.beds.ac.uk/10.3390/rs15010035

by Thomas Di Martino^1,2,*

, Régis Guinvarc’h¹

, Laetitia Thirion-Lefevre¹

and Elise Colin²

Reviewer 1:

Fuyou Tian

Reviewer 2:

Branko Brkljač

Reviewer 3: Anonymous

Remote Sens. 2023, 15(1), 35; https://0-doi-org.brum.beds.ac.uk/10.3390/rs15010035

Submission received: 24 October 2022 / Revised: 15 December 2022 / Accepted: 17 December 2022 / Published: 21 December 2022

(This article belongs to the Special Issue AI Datasets, Tools, and Specifications for Earth Observation Applications)

Round 1

Reviewer 1 Report

The research designed a method to improve the quality of satellite image training data: crop type mislabels and mis-split crops exits in agricultural crop type survey. But the MS is unclear in some major problems.

The major problem is that the MS didn’t show the necessary for label correcting especially in the Introduction. As shown in the literature you cited as [23], it mention that “The results show that both classifiers are little influenced for low random noise levels up to 25%–30%, but their performances drop down for higher noise levels.”.

According to Chen et.al 2017, they draw the almost same conclusion: the classification model is robust to mislabel with error rate within 10-15% (shown below).

Figure 1. Sample robustness to size reduction and errors in sample. a. As sample size increases, the accuracy quickly reaches a plateau. b. As the impurity percentage of sample increases the accuracy decreases (Chen et.al 2017)

Chen, B., Xu, B., Zhu, Z., Yuan, C., Suen, H. P., Guo, J., ... & Yang, J. J. S. B. (2019). Stable classification with limited sample: Transferring a 30-m resolution sample set collected in 2015 to mapping 10-m resolution global land cover in 2017. Sci. Bull, 64, 370-373.

But you mention that “the impact of errors in agricultural training labels and show a non-negligible impact of label noise on prediction performance, even at low noise levels.”, which is contrast with original statements. What is the error rate in the dataset?

Specific comments:

1. It is hard to understand what this MS want to do. The first paragraph in introduction is introducing the advantage of SAR data. Then you tell us the DL model can be used for anomaly detection. What is the connection? Could you give more introduction about samples denoise. Same problem for the abstract.

2. Lack of overall framework/chart to make the whole research clear.

3. Directly use the example of mis-split and mislabeled parcels in your dataset. Figure 1 can be merged with Figure 4 or remove it.

4. More detail about the “Convolutional autoencoder for SAR times series” is needed. What are linear layers? Which framework you used to build CAEs, Pytorch or TensorFlow?

5. What is the potential usage of proposed method? How about the accuracy with it applied in a larger area?

6. Less information in equation 1.

7. If you use mean-square error loss, the loss will dramatically fluctuate, how to determine the optimal model. Also the train detail of CAE is missing.

8. Hard to read for Figure 7.

9. Line 263: how the CAE was trained, the training dataset? Training class? The training detail? Which optimizer was used?

Comments for author File: Comments.pdf

Author Response

Dear Reviewer,

Thank you for your thorough review and comments. You will find in the attachment our response to your comments.

Author Response File: Author Response.pdf

Reviewer 2 Report

Very nicely written explorative study considering novel methods for anomaly detection in case of (manual) labeling of field parcels corresponding to different crop types. I have enjoyed reading the manuscript and agree with everything that was discussed and proposed in the text. Moreover, the main idea to explore the feature space of SAR image time series in order to reduce the initial set of labeled pixels, i.e. parcels to the more consistent subset of trustworthy samples is very similar to one of the most well-known robust estimation algorithms, the RANSAC. Of course, applications and context are totally different, but the idea of discarding the outliers is important and I would say inspiring.

I would have a few suggestions that could further improve the study, but only in the terms of fidelity of experiments, which are already at high level. Namely, Otsu’s method as an automated and adaptive threshold selection criterion is very good at bimodal distributions, and in general can serve the purpose (as was confirmed by the reported results in the paper). However, it would also be interesting to make some further investigation into sensitivity of the iterative CAE outlier filtering procedure, in terms of the threshold dependent performance. In other words, could you please also add an additional set of results that would compare the reported Otsu based thresholding with a simpler one based on a usual Neyman-Pearson detection criterion. For example, it would be interesting to compare the best performing curves from Figure 13, the ones corresponding to the proposed method, with additionally generated curves corresponding to set of Neyman-Pearson thresholds for each class defined by the false positives rates (size of hypothesis tests) of e.g.: 2%, 5%, 10%, 20%. This would give more insights into advantages of the proposed threshold selection criteria, but also shed some light on the system performance in the case that the user would like to explicitly control the sensitivity (discarding the Otsu’s strategy). For example, in the case of an unbalanced dataset, maybe I would like to be more sure about the labels of some classes that were hard to collect, or that were scarce in the area, and are therefore outnumbered by the remaining ones. Besides this point, I would also suggest to check the label on the ordinate of the diagram in Figure 13a, it seems that it is mistyped. In addition, I would also suggest to remove the attribute “linear” from the feature dimension reduction blocks in Figure 6, since based on the Table 2 it is clear that the elements of the layers are not linear, but have some form of nonlinearity (max pooling steps, and also the exponential linear units – the abbreviation ELU is not defined in the text). In other words, dimensionality reduction is non-linear, although it would be interesting to have an auto encoder that could be replaced by a simple matrix (an alternative projection method to PCA).
What struck me more than this “label” thing, was that the output vector size in the embedding layer is only one, which is very interesting. Could you please provide some kind of statistics over this quantity (real number corresponding to embedding of different classes)? I would be very interested to see how this value fluctuates inside each class, and between the classes.
Also, did you try any experiments with higher dimensionality of the embedding space? In any case, this is just a research question, but does not influence the main results.
At the end, since in the Section 2.2. you have discussed the difficulty of building the reliable datasets and data collection campaign, I would also like to bring your attention to some related work that I have done with some colleagues back in 2014, when we have tried to alleviate such problems with a crowdsourcing mobile application for farmers or general public, that provided possibility of collecting geotagged images of different crop types. When combined with other sensor reading like the built in compass it made the outlier detection a much easier task in the hard cases, similar to the one described at the end of your paper. If you would be interested in citing similar works, please check the following links:

https://0-doi-org.brum.beds.ac.uk/10.1117/1.JRS.8.083512 https://www.researchgate.net/publication/305659618_SPACE4AGRI_-_WP4_IN_SITU http://space4agri.irea.cnr.it/it/prodotti-della-ricerca/manuale-duso-app-space4agri/at_download/file http://space4agri.irea.cnr.it/it/prodotti-della-ricerca/manuale-duso-app-space4agri/view

I would recommend the Editors to consider this paper as a strong candidate for possible publication after the additional small revisions.

Nice work, good luck.

Author Response

Dear Reviewer,

Thank you for your enthiusasm and your comments, they helped us to improve our manuscript. You will find attached our answers to your comments.

Author Response File: Author Response.pdf

Reviewer 3 Report

This paper presents an interesting approach to an important problem - mislabelled agricultural fields. I found the paper mostly easy to read, and the presented ideas practically useful and significant.

I have a number of detailed questions and suggestions that I hope will help to make the paper easier to read, and perhaps to provoke the authors to explain some of their algorithm choices a little more clearly:

The use of some words is unfamiliar, for example:

-permutation (abstract and elsewhere) - do you mean randomly introducing errors into labels? Permutation usually means reordering of a set doesn't it?

-Sensible (line 73) - do you mean sensitive?

-Wrapping (line 71) - do you mean warping?

-Apparition (line 130) - do you mean introduction of errors?

-Trustful (line 212 and elsewhere) - do you mean trustworthy?

-Atomicity (line 208) - accuracy?

Would the authors consider choosing more familiar or commonly used words in this context, so it is easier for a reader to digest?

Line 189- could you please explain "1.2M time series"? Is this the number of pixels in the area, so it there are that many per-pixel time series?

Line 190 - is each field only harvested once in the year? Is there any double cropping that could confuse the algorithm, which relies on a single label per field/pixel?

Fig 4(c) - if the labels are annual, do you need this subfigure, looks identical to 4(a) apart from the very small areas between fields. I don't think it adds any value.

Line 214 - how were the crops split? Was the split spatial (top half/bottom half of area or similar), or random fields within the whole area? If random, could spatial autocorrelation between nearby areas lead to unrealistically high accuracy of relabelling?

It seems you are using a per-pixel methodology, rather than per-field (or object) methodology. I'd suggest making this very clear in the methods.

Line 266 - what does "embedding dimension of 1" mean? Does this mean each time series of 61x2 is given a single value in the 'embedding layer' of table 2? How many values are possible for each class CAE?

Line 271 - after this 10 iteration process, it would be interesting to know what proportion of the time series are discarded, both from the original dataset, and from your manually verified dataset?

Line 284 - again, if this is per-pixel, please make it clear - ie is this list of suspicious time series at the pixel level, or field level (I think pixel, but it would clarify if this was made explicit in the text)?

Figure 10 - split case. What happens if the split of the field is 30-70 or 20-80? Why did you set the minimum threshold to 40% of pixels within a field (assuming the field is roughly equally divided into 2 classes)?

Line 302 - could you explain the S1 preprocessing in a little more detail earlier (around line 184)? What is the resolution of the boxcar despeckling?

Line 314 - I'm wondering if you could have used a fixed threshold of the auto encoder loss ) like 4, rather than resorting to the Otsu method? Then you could vary this threshold to tradeoff your precision/recall of relabels.

Fig 11 - would you expect a bimodal distribution if there were a reasonable number of errors (like for onions (a))? Does the fact carrots aren't bimodal indicate that there aren't many mislabels in that dataset, or that there is a lot of variance in the time series (perhaps due to diverse planting dates)?

Line 356 - as above, by 'permute' do you mean artificially introduce label errors?

Line 360 - please briefly explain the supervised methods. Are you training the algorithms using 61x2=122 flat feature vector?

Line 367 - why is the supervised relabelling different to the CAE relabelling in Fig. 10 (ie threshold of 40% with split crops for CAE, but 75% assuming no split crops for supervised)?

Fig. 13 - could you discuss the possibilities for adjusting thresholds to improve recall at the cost of precision, and in what situation this might be useful?

Table 4 - I found this hard to follow. For example, in alfalfa - the mislabelled as... - would these be field edges that are mislabelled as alfalfa, that's why they're just a line of pixels? Maybe you could explain one row of the table in the text. What does the null symbol mean - none were found for this class?

I'd suggest combing Fig 18+19, 20+21 and 22+23 in a single figure.

Fig 19 =- could you discuss why it appears the top half and bottom half of the field are split - (looks like field is actually composed of 4 classes, rather than just 2)?

Author Response

Dear Reviewer,

Thank you for your comments, we considered them and they significantly helped to improve our manuscript. You will find attached our answers to your comments.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Lack of overall framework/chart to make the whole research clear.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Article Menu

FARMSAR: Fixing AgRicultural Mislabels Using Sentinel-1 Time Series and AutoencodeRs

Further Information

Guidelines

MDPI Initiatives

Follow MDPI