Next Article in Journal
Geographically Weighted Machine Learning and Downscaling for High-Resolution Spatiotemporal Estimations of Wind Speed
Next Article in Special Issue
Remote Sensing Big Data Classification with High Performance Distributed Deep Learning
Previous Article in Journal
A Novel Effectively Optimized One-Stage Network for Object Detection in Remote Sensing Imagery
Previous Article in Special Issue
Domain Adversarial Neural Networks for Large-Scale Land Cover Classification
 
 
Article
Peer-Review Record

Automatic Detection of Track and Fields in China from High-Resolution Satellite Images Using Multi-Scale-Fused Single Shot MultiBox Detector

by Zhengchao Chen 1, Kaixuan Lu 1,2, Lianru Gao 3, Baipeng Li 1, Jianwei Gao 1, Xuan Yang 2,3, Mufeng Yao 1,2 and Bing Zhang 2,3,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Submission received: 20 April 2019 / Revised: 28 May 2019 / Accepted: 3 June 2019 / Published: 10 June 2019
(This article belongs to the Special Issue Analysis of Big Data in Remote Sensing)

Round 1

Reviewer 1 Report

The paper presents a method of applying deep learning to high-resolution satellite images for detecting track and fields for large scales, such as the whole nation of China. The proposed methods were examined in details with the tuning of parameters through considerations of geometric characteristics of the tracks and fields in the satellite images. The results indicated quite better performance than the existing approach. Besides, the methodology could be applied not only to track and fields, but also other ground objects. Therefore, I would recommend this for publication in Remote Sensing.

Before the publication process, I would request authors to address the following points for better soundness of the paper.

1. Figure 12 - Please indicate the FPR at the TPR of 99.8%. (It looks like 4-6% though an exact number is helpful for better understanding).

2. While the TPR is very high, the results would have still a few (or many) false positive errors. Please show some examples of false positive errors and extend discussions on possible noises and errors.

3. L541-542 - If possible, please consider comparing the number of detected track and fields with official statistics for supporting the validity of the result. Although comparison with national statistics would be preferred, comparison with some province- or city-level statistics will be still fine.

4. Table 2- With my understanding, the hyperparameters are with optimizer/solver algorithms. Please indicate what optimizer/solver or function in Caffe was applied.

5. Figure 1 - Please insert the figure in finer resolution or vector graphics.

6. Somewhat extensive English editing is needed. I found some points needing such editing as the following:

6-a. L391 - "FPR, TPR" should be spelled out at the first appearance.

6-b. Figure 9 -  "True Positive Rate ()TPR" is with an unnecessary bracket.

6-c. L422 - "mAP' should be spelled out at the first appearance.

Author Response

Dear Reviewer:

Many thanks for your the comments about our paper submitted to the journal of remote sensing(ID. 499699 ). Overall the comments have been fair, encouraging and constructive. We have learned a lot about it.

After carefully studying your comments and advice, we have made corresponding changes to the paper. Our response to the comments is enclosed.

 

Yours sincerely,

Kaixuan Lu


Response to the Comments of Reviewer 1:

 

Q1: Figure 12 – Please indicate the FPR at the TPR of 99.8%. (It looks like 4-6% through an exact number is helpful for better understanding)

 

Response: Thanks a lot for this comment. According to your suggestion, we choose a FPR of 5.7% for better understanding, while the TPR of MSF_SSD is 98.8% and the threshold is 0.77, which are not as the same as before. We mainly change the contents of three places as follows:

 

Firstly, L416-L419, we change “In order to ensure that the recall rate is close to 1 when the precision rate is still relatively high, we set the threshold of confidence to be 0.6. At this time, our model can achieve a precision rate of 92.9% while keeping the recall rate in a high level (99.8%).” to “In order to ensure that the recall rate is high enough and the precision rate is not too low, we set the threshold of confidence to be 0.77. At this setting, our model can achieve a false positive rate of 5.7% (precision rate of 94.3%) while keeping the true positive rate (recall rate) in a high level (98.8%).”. And, Figure 9 is also modified accordingly.

 

Secondly, L459-L460, we change “achieves an accuracy of 92.9% while keeping the recall rate in a high level (99.8%).”  to “and when the false positive rate is 5.7%, the true positive rate of MSF_SSD and SSD can reach 98.8% and 96.1% respectively.” And, Figure 12 is also modified accordingly.

 

Thirdly, due to changes in the threshold for detection, the result of track and fields extraction in China is also changed from “83112” to “82519”, as shown in L578-L579.

 

 

Q2: While the TPR is very high, the results would have still a few (or many) false positive errors. Please show some examples of false positive errors and extend discussions on possible noises and errors

 

Response: Many thanks for this good suggestion. Some pictures below show the examples of false positive errors. The errors mainly include three types.

The first one is the elliptical object in which some materials or plants are placed. Because elliptical shape is the same as the track and field, and the trail that is looming on the edge of the ellipse is mistaken as a runway. Besides, the colour of this kind of object is also very similar to the colour of non-standard track and field. So, it is easy to be detected as track and field by the model. The second one is the ring pond. The pond has the similar features to football field and its shape is similar to the track and field, which can confuse the detection of the model. The third one is the oval bare land, because its shape and feature is similar to the non-standard track and field made by soil, the oval bare land is also easily to be mis-detected. To better illustrate this point, we add a paragraph in the L488-L496 of the revised manuscript.

 

 

Q3: L541-542 – if possible, please consider comparing the number of detected track and fields with official statistic for supporting the validity of the result. Although comparison with national statistics would be preferred, comparison with some province– or city-level statistics will be still fine.

 

Response: Many thanks. It is a good comment indeed. However, we are really sorry that we cannot compare the number of detected track and fields with official statistic, because to our knowledge there are no official statistics of quantity and coordinates for outdoor track and fields in China.

Although we do not have the official statistic, the enough quantity of outdoor track and fields are exactly selected as testing samples to support the validity of the results. In the Section 3, we use ROC curve to verify the model’s precision rate and recall rate (L451- 457). We think those precision rates and recall rates can denote the performance of model in large-scene to a certain degree.

 

 

Q4: Table 2- With my understanding, the hyper parameters are with optimizer/solver algorithms. Please indicate what optimizer/solver or function in Caffe was applied.

 

Response: Many thanks. In L339-341, to explain this question, we change “momentum is set to be 0.9, weight decay is set to be 0.0005, batch size is 8, and maximum number of iterations is 120K” to “We train the model by using the optimizer SGD with 0.9 momentum, 0.0005 weight decay, and batch size 8, and the maximum number of iterations is set to be 120K.”

 

 

Q5: Figure 1- Please insert the figure in finer resolution or vector graphics.

 

Response: Thanks a lot for pointing out this question. According to your suggestions, we have changed Figure 1 in the revised manuscript.

 

 

Q6: Somewhat extensive English editing is needed. I found some points needing such editing as the following:

6-a. L391 - “FPR, TPR” should be spelled out at the first appearance.

6-b. Figure 9 – “True Positive Rate () TPR” is with an unnecessary bracket.

6-c. L422 – “mAP” should be spelled out at the first appearance.

 

Response: Thanks a lot for your careful reading and pointing out these questions. According to your suggestions, we have carefully checked the manuscript and made considerable improvement on the English editing. Those written errors in L401, Figure 9, and L434 are also corrected in the revised manuscript.


PS: The pictures of the response in the box can't display, and can be viewed in the submitted .PDF file (remote-sensing-499699-Response to the Comments of Reviewer 1)


Author Response File: Author Response.pdf

Reviewer 2 Report

In the introduction several words which are dealing with deep learning and remote sensing are missing

a) Maltezos, Evangelos, et al. "Deep convolutional neural networks for building extraction from orthoimages and dense image matching point clouds." Journal of Applied Remote Sensing 11.4 (2017): 042620.

b)Chen, Yushi, et al. "Deep learning-based classification of hyperspectral data." IEEE Journal of Selected topics in applied earth observations and remote sensing 7.6 (2014): 2094-2107.

c) Zhao, W., & Du, S. (2016). Spectral–spatial feature extraction for hyperspectral image classification: A dimension reduction and deep learning approach. IEEE Transactions on Geoscience and Remote Sensing54(8), 4544-4554.

d)Zhao, Wenzhi, et al. "On combining multiscale deep learning features for the classification of hyperspectral remote sensing imagery." International Journal of Remote Sensing 36.13 (2015): 3368-3379.

2) This papers deals with RGB data. But other types of nonRGB data can be useful as well. The comment previously should be addressed with respect to the facts.

3)RGB data are very sensitive to illumination. How you address this issue?

4) the Orientation of the camera how is taken into account?

5) How the selected deep model can be generalized?

6)How do the parameters of the network affect its performance/

7)Please give some results against non deep learning algorithms as well.

8) Can Lidar information improve the performance Please check the IEEE paper

9)How the training data can affect the performance?



Author Response

Dear Reviewer:

Many thanks for your the comments about our paper submitted to the journal of remote sensing(ID. 499699 ). Overall the comments have been fair, encouraging and constructive. We have learned a lot about it.

After carefully studying your comments and advice, we have made corresponding changes to the paper. Our response to the comments is enclosed.

 

Yours sincerely,

Kaixuan Lu


Response to the Comments of Reviewer 2:

 

Q1: In the introduction several words which are dealing with deep learning and remote sensing are missing.

a) Maltezos, Evangelos, et al. "Deep convolutional neural networks for building extraction from orthoimages and dense image matching point clouds." Journal of Applied Remote Sensing 11.4 (2017): 042620.

b) Chen, Yushi, et al. "Deep learning-based classification of hyperspectral data." IEEE Journal of Selected topics in applied earth observations and remote sensing 7.6 (2014): 2094-2107.

c) Zhao, W., & Du, S. (2016). Spectral–spatial feature extraction for hyperspectral image classification: A dimension reduction and deep learning approach. IEEE Transactions on Geoscience and Remote Sensing, 54(8), 4544-4554.

d) Zhao, Wenzhi, et al. "On combining multiscale deep learning features for the classification of hyperspectral remote sensing imagery." International Journal of Remote Sensing 36.13 (2015): 3368-3379

 

Response: Many thanks for providing these references which are helpful to better introduce related works and our works, we have already added these references to the introduction, shown as [30][31][32][35].

 

 

Q2: This papers deals with RGB data. But other types of nonRGB data can be useful as well. The comment previously should be addressed with respect to the facts.

 

Response: We gratefully thank you for this good question. Other types of nonRGB data surely can be useful for detection of track and fields. For example, this research can use Lidar data to increase detection accuracy, and high-precision DSM data can help remove some false objects. However, this research focuses on the large-scale track and field detection in China, and there is no high resolution Lidar data can meet the requirement of covering national scale. In the future, if the conditions are met, we will carry out related research work. We add this discussion at the conclution(L600-L605) of the revised manuscript.


Q3: RGB data are very sensitive to illumination. How you address this issue?

 

Response: Many thanks. For the sensitivity of the data to illumination, we reduce it by data enhancement. We write a function to randomly adjust the brightness of the image so that the data samples will randomly be distributed throughout the entire brightness range, which can improve the adaptability of data to brightness. We add this discussion at L200-L204 of the revised manuscript.

 

 

Q4: The orientation of the camera how is taken into account?

 

Response: Many thanks. The orientation of the camera may also interfere with the features of track and fields, except the rotation, the tall buildings or trees around the track and fields may cause shadows or occlusion. Firstly, we add a rotation enhancement to reduce the impact of rotation. Then in order to solve the problem of shadows, we deliberately choose some track and fields with shadows when making samples. To solve the problem of occlusion, we use randomly cropping method to enhance the generalization ability of samples. The above three methods can effectively solve the problems caused by the orientation of the camera. We add this discussion at L204-L210 of the revised manuscript.

Some pictures below show the examples of detected track and fields with shadows or occlusion.

                                         

 

Q5: How the selected deep model can be generalized?

 

Response: Many thanks. In the case where the structure of the model is fixed, the model is easy to be generalized to other targets. Firstly, the generalization of MSF_SSD has been explained by extraction the different track and fields in the large scene of China. Secondly, although the structure of MSF_SSD is mainly designed for small targets, it is also effective for large targets by optimizing some parameters of the network. Thirdly, for other different targets, MSF_SSD can effectively extract the features of the targets and the semantic information of the background by multi-scale-fused method. So no matter how the characteristics of the target change, MSF_SSD can achieve the detection task with high precision and recall by optimizing the parameters of the network. We add this discussion at L480-L487 of the revised manuscript.

 

 

Q6: How do the parameters of the network affect its performance?

 

Response: Many thanks. The parameters of the network include hyper parameters and geometric parameters, and both of them can affect the model’s performance. The hyper parameters include learning rate, batch size and so on, and they can determine the speed of weight update, increase the efficiency of iteration, and prevent over-fitting of the network. The geometric parameters include zoom out factor, zoom in factor, box, and nms threshold.  The zoom out/in factor in data layer are mainly used for data enhancement to increase the diversity of samples, as discussed in L513-L525. The box parameters in prediction layer are mainly used to improve the match between the gt boxes and prior boxes, as discussed in L526-L535. The nms parameters in post processing layer are mainly used to eliminate redundancy of the detected boxes, as discussed in L536-L541. The appropriate parameters can increase the efficiency and precision of the model.

 

 

Q7: Please give some results against non deep learning algorithms as well.

 

Response: Many thanks. By consulting the reference [40], we add the comparisons of MSF_SSD and some non deep learning algorithms as measured by mAP, shown in Table 4. From Table 4, it can be seen that the proposed MSF_SSD obtains the best mAP of 79.3% among all the object detection methods on the track and field dataset of DOTA [41].

Table 4 The mAP of the object detection methods

Track and Field

BoW

SSC Bow

FDDL

MSF_SSD

mAP

7.8%

10.1%

20.1%

79.3%

 

We add this discussion at L463-L468 of the revised manuscript.

 

Q8: Can Lidar information improve the performance Please check the IEEE paper.

 

Response: Many thanks. This research can use Lidar data to increase detection accuracy, because there is almost no ups and downs in the track and filed in the local area, so high-precision DSM data can help remove some false objects. However, this research focuses on the large-scale track and field detection in China, and there is currently no Lidar data with national scale that can meet this requirement. In the future, if the conditions are met, we will carry out related research work. We add this discussion at the conclusion(L600-L605) of the revised manuscript.

 

 

Q9: How the training data can affect the performance?

 

Response: Many thanks. Deep learning is essentially data-based learning, the training data is very important. Firstly, in this paper, the high complexity of the track and field, the natural background scenes, and the huge differences themselves in remote sensing images make the semantic information and the features of targets more complex and non-identity. If the training data does not contain enough semantic information and features of the track and field, the model will not have the generalization capabilities. Based on this view, we collect various images of track and fields to make training samples. In addition, the more training samples, the more effective the model. Although enormous samples will reduce the efficiency of training model, samples will never be sufficient enough in the field of remote sensing due to the complexity of remote sensing images. Therefore, in the case where the sample quality is guaranteed, with the growth of the number of training samples the performance of the model will increase accordingly.

 

We train the model by different training samples, as shown below, the mAP increases as the number of training samples increases. 


PS: The pictures of the response in the box can't be displayed, and can be viewed  in the submitted .PDF file(remote-sensing-499699-Response to the Comments of Reviewer 2)


Author Response File: Author Response.pdf

Reviewer 3 Report

This article presents a very interesting application of deep learning in remote sensing. The article is very well structured and well written. However, I recommend that the quality of the following figures be improved.
- Figure 1: A part of the text in the figure is unreadable. Also, remove the atmospheric correction that is not applied here, according to the text ((lines 178 to 180).
- Figure 6: Improve the quality of the text in the boxes (too small).
- Figure 7: Some of the text is unreadable (too small).
- Figure 17: Increase the size of the figure (China maps and image subsets) to the entire width of the page.


Author Response

Dear Reviewer:

Many thanks for your the comments about our paper submitted to the journal of remote sensing(ID. 499699 ). Overall the comments have been fair, encouraging and constructive. We have learned a lot about it.

After carefully studying your comments and advice, we have made corresponding changes to the paper. Our response to the comments is enclosed.

 

Yours sincerely,

Kaixuan Lu

Response to the Comments of Reviewer 3:

 

Q1: Figure 1: A part of the text in the figure is unreadable. Also, remove the atmospheric correction that is not applied here, according to the text (lines 178 to 180)

 

Response: Many thanks. According to your suggestions, we have changed Figure 1 in the revised manuscript, we have changed “atmospheric correction” to “raw image enhancement”.

 

 

Q2: Figure 6: Improve the quality of the text in the boxes (too small)

 

Response: Many thanks. According to your suggestions, we have changed Figure 6 in the revised manuscript.

 

 

Q3: Figure 7: Some of the text is unreadable (too small)

 

Response: Many thanks. According to your suggestions, we have changed Figure 7 in the revised manuscript.

 

 

Q4: Figure 17: Increase the size of the figure (China maps and image subsets) to the entire width of the page

 

Response: Many thanks. According to your suggestions, we have changed Figure 18 in the revised manuscript.




Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

All my comments have been addressed.

Back to TopTop