Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Effective Airplane Detection in Remote Sensing Images Based on Multilayer Feature Fusion and Improved Nonmaximal Suppression Algorithm

Remote Sens. 2019, 11(9), 1062; https://0-doi-org.brum.beds.ac.uk/10.3390/rs11091062

by Mingming Zhu^1,*, Yuelei Xu², Shiping Ma³, Shuai Li¹

, Hongqiang Ma⁴ and Yongsai Han¹

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Remote Sens. 2019, 11(9), 1062; https://0-doi-org.brum.beds.ac.uk/10.3390/rs11091062

Submission received: 23 March 2019 / Revised: 30 April 2019 / Accepted: 2 May 2019 / Published: 5 May 2019

(This article belongs to the Special Issue Deep Transfer Learning for Remote Sensing)

Round 1

Reviewer 1 Report

Dear authors:

Your work is an interesting proposal for Airplane Detectio in Remote Sensing Images and it has a strong theoretical support and a well designed experimental phase. However, I habe some conerns that must be addressed to consider your work for publishing.

Using a plagiarism check tool, it seems that you took info of the paper entitled End-to-End Airport Detection in Remote Sensing Images Combining Cascade Region Proposal Networks and Multi-Threshold Detection Networks which were written by some authors of this paper. I encourage you to modify this text using paraphrasis.

Please, explain better the paragraph of section 3.2 there are some confusions about the sizes of training, validation and test sets.

Please, provide an explanation about why did you choose to split the data into training, validation and test sets instead of using cross-validation methods.

I recommend to highlight the best results obtained in the results table.

Improve the quality of the figures, some of them look blurred.

I hope you find useful my recommendations to improve the quality of work.

Author Response

Point 1: Using a plagiarism check tool, it seems that you took info of the paper entitled End-to-End Airport Detection in Remote Sensing Images Combining Cascade Region Proposal Networks and Multi-Threshold Detection Networks which were written by some authors of this paper. I encourage you to modify this text using paraphrasis.  

Response 1: We have already modifying this text using paraphrasis. (See the blue part of the manuscript for details)

Point 2: Please, explain better the paragraph of section 3.2 there are some confusions about the sizes of training, validation and test sets.

Response 2: We have already re-described the size of training, validation and test sets in Section 3.2. (See line 342-349 of the manuscript for details)

Point 3: Please, provide an explanation about why did you choose to split the data into training, validation and test sets instead of using cross-validation methods.

Response 3: As shown in the following figure, in object detection task based on deep learning, the original data set is usually divided into three parts: training, validation and test sets. The training set is used to train the model, the validation set is used for the parameter selection configuration of the model and the test set is unknown data for the model and is used to evaluate the generalization ability of the model.

Point 4: I recommend to highlight the best results obtained in the results table.

Response 4: We have already highlighted the best results obtained in the results table. (See table 3, 4 and 5 of the manuscript for details)

Point 5: Improve the quality of the figures, some of them look blurred.

Response 5: We have already improved the quality of the figures.

Author Response File: Author Response.docx

Reviewer 2 Report

-Figure 1 entitled “The multilayer feature fusion is shown in the blue frame, and the non-maximum suppression based on soft decision is shown in the red frame” should be explained.

-Figure 2. “Faster R-CNN architecture” should be explained

- The section 2.1.1. entitled “Region Proposal Networks” has no references. The authors should mention the source of the theoretical background and of the presented equations.

-In section 2.1.1 “the results are outputted to classification layers and regression layers to simultaneously conduct object classification and position regression for the candidate boxes”. It is not clear why the authors have presented results only for the classification layers and not about the regression layer.

-It is not clear why the authors have not used ordinary metrics about the classification such as F1, precision, recall, confusion matrix, etc.

-There is no methodology section.

-There is no discussion section. The authors should compare their research with other similar researches and point out the new elements of their study in order to demonstrate the original contribution to new knowledge.

-The proposed methodology in the manuscript is similar to the one presented in the article https://0-www-mdpi-com.brum.beds.ac.uk/1424-8220/18/7/2335. The authors should discuss and explain the novelty of their proposed methodology.

Author Response

Point 1: Figure 1 entitled “The multilayer feature fusion is shown in the blue frame, and the non-maximum suppression based on soft decision is shown in the red frame” should be explained.  

Response 1: We have already modified the title of Figure 1 and gave an explanation. (See line 105-112 of the manuscript for details)

Point 2: Figure 2. “Faster R-CNN architecture” should be explained.

Response 2: We have already added the explanation. (See line 114-118 of the manuscript for details)

Point 3: The section 2.1.1. entitled “Region Proposal Networks” has no references. The authors should mention the source of the theoretical background and of the presented equations.

Response 3: We have already added the references and introduction of RPN in Section 2.1.1. (See line 122-124 of the manuscript for details)

Point 4: In section 2.1.1 “the results are outputted to classification layers and regression layers to simultaneously conduct object classification and position regression for the candidate boxes”. It is not clear why the authors have presented results only for the classification layers and not about the regression layer.

Response 4: The result should indeed be entered into two layers: the classification layer and the regression layer, which have been re-described in the text. (See line 127-128 of the manuscript for details)

Point 5: It is not clear why the authors have not used ordinary metrics about the classification such as F1, precision, recall, confusion matrix, etc.

Response 5: In the previous remote sensing image airplane detection methods, the detection rate, false alarm rate and average processing time are usually used to measure the performance of the algorithm. In order to maintain consistency and to facilitate the description of the superiority of the method, the above three evaluation indicators are used.

Point 6: There is no methodology section.

Response 6: Section 2 is the methodology section. The proposed method includes three parts, namely the network architecture of Faster R-CNN, multilayer feature fusion and soft decision. The Faster R-CNN takes a single image as the input and outputs the prediction probability value and object detection box of the desired object category. The aim of the multilayer feature fusion is to improve representation ability of weak and small objects. The non-maximum suppression based on soft decision is used to reduce the missed detection rate.

Point 7: There is no discussion section. The authors should compare their research with other similar researches and point out the new elements of their study in order to demonstrate the original contribution to new knowledge.

Response 7: We have already added the discussion in Section 4.3. (See line 422-443 of the manuscript for details)

Point 8: The proposed methodology in the manuscript is similar to the one presented in the article https://0-www-mdpi-com.brum.beds.ac.uk/1424-8220/18/7/2335. The authors should discuss and explain the novelty of their proposed methodology.

Response 8: We have already added the discussion and explanation about the novelty of our proposed methodology. (See line 432-443 of the manuscript for details)

In addition, in order to improve the English language and style, this paper has undergone English language editing by American Journal Experts (AJE). AJE uses experienced, native English speaking editors. (See the underlined part of the manuscript for details)

Author Response File: Author Response.docx

Reviewer 3 Report

In this paper, an extension for the Faster R-CNN approach is introduced. With the aim of detecting small objects in the images, the authors propose to fuse the output of the intermediate convolutional layers of the original convolutional block (VGG16) with the output of the whole block. The rest of the pipeline is as the original Faster R-CNN propoposal. The authors also introduced a soft decision method to perform the NMS step.

Despite the limited scientific soundness, the paper provide enough background and it can be understood fairly well. The comparision with state of the art methods is strong, and they also analized the impact of involving different layers in the fusion step.

Nonetheless, I have some suggestions to improve the paper
- The methods they used to compare the approach are a bit outdated. I suggest to involve YOLOv3 in the benchmark, which is the state-of-the-art in object detection and recognition. This would definitely make the paper stronger. YOLOv3 is reportedly more accurate and computationally efficient that Faster R-CNN.
- I also would like to see the proposal applied to another datasets in order to check whether the method scales well to different problems or not.

Minor issues:
- Table 2 caption says "Results comprison on different methods" but that table shows the VGG16 network parameters for the convolutional layers
- Equation 7 says "ariports"

Author Response

Point 1: The methods they used to compare the approach are a bit outdated. I suggest to involve YOLOv3 in the benchmark, which is the state-of-the-art in object detection and recognition. This would definitely make the paper stronger. YOLOv3 is reportedly more accurate and computationally efficient that Faster R-CNN.

Response 1: In order to illustrate the superiority of the proposed method in the airplane detection method, the existing representative algorithms are selected for comparison experiments. Therefore, the object detection algorithms based on deep learning such as YOLOv3 are not selected as the comparison method. On the other hand, due to the limitations of our experimental platform (NVIDIA GTX1060), when YOLOv3 performs data training, insufficient memory will cause the network to be untrained. Although reducing the batch size and closing the multi-scale training can alleviate this problem to a certain extent, it will inevitably affect the detection accuracy of YOLOv3, which is the main reason why the YOLOv3 is not selected for comparison. With the improvement of the performance of the experimental platform, more advanced object detection algorithms such as YOLOv3 are the focus and direction of the next step of this paper.

Point 2: I also would like to see the proposal applied to another datasets in order to check whether the method scales well to different problems or not.

Response 2: We have applied the proposed method to an airport data set. (See Section 4.4 of the manuscript for details)

Point 3: Table 2 caption says "Results comprison on different methods" but that table shows the VGG16 network parameters for the convolutional layers.

Response 3: We have modified the title of Table 2. (See line 306 of the manuscript for details)

Point 4: Equation 7 says "ariports".

Response 4: I don't understand if there is any omission in this review.

In addition, in order to improve the English language and style, this paper has undergone English language editing by MDPI. MDPI uses experienced, native English speaking editors. (See the underlined part of the manuscript for details)

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Dear authors:

In my opinion you have well addressed all my previous concerns.

Author Response

Thank you very much for your valuable advice.

Reviewer 2 Report

-Line 322 “DR= number of detected airports/ number of ariports”: the word “ariports” is wrong. Why the authors measure the detected airports and not the detected airplanes? All the results presented in the paper with the metric DR are only for detecting airports? The title of the manuscript and the whole methodology should be changed to airport detection.

-The metric Detection rate or recall is not always enough to evaluate aircraft detection. The authors should justify the selection of these metrics based on the literature. Generally, for the aircraft detection performance, four metrics are commonly used: the false positive rate (FPR), the missing ratio (MR), accuracy (AC), and error ratio (ER). Related articles:

Zhang, F., Du, B., Zhang, L., & Xu, M. (2016). Weakly supervised learning based on coupled convolutional neural networks for aircraft detection. IEEE Transactions on Geoscience and Remote Sensing, 54(9), 5553-5563.

Deng, Z., Sun, H., Zhou, S., Zhao, J., Lei, L., & Zou, H. (2018). Multi-scale object detection in remote sensing imagery with convolutional neural networks. ISPRS Journal of Photogrammetry and Remote Sensing, 145, 3-22. https://0-www-sciencedirect-com.brum.beds.ac.uk/science/article/pii/S0924271618301096

-Methodology 105-110: 6 lines are not enough for a research article to explain the proposed methodology. Sections 2.1, etc. explain already known techniques (Region Proposal Networks etc.) and not the methodology.

- 107-108: The phrase “The Faster R-CNN takes a single image as the input and outputs the prediction probability value and object detection box of the desired object category” does not correspond to the figure 1 which shows different flowchart boxes which still remain unexplained.

-Figure 2. “Faster R-CNN architecture” is not explained enough in lines 114-118. E.g. no mentions in the text about Rol feature vector, Rol pooling layer, etc.

Author Response

Point 1: Line 322 “DR= number of detected airports/ number of ariports”: the word “ariports” is wrong. Why the authors measure the detected airports and not the detected airplanes? All the results presented in the paper with the metric DR are only for detecting airports? The title of the manuscript and the whole methodology should be changed to airport detection.  

Response 1: We have already modified related formula. (See line 344 of the manuscript for details)

Point 2: The metric Detection rate or recall is not always enough to evaluate aircraft detection. The authors should justify the selection of these metrics based on the literature. Generally, for the aircraft detection performance, four metrics are commonly used: the false positive rate (FPR), the missing ratio (MR), accuracy (AC), and error ratio (ER).

Response 2: We have already adopted the FPR, MR, AC and ER to evaluate detection performance. (See line 342-347 of the manuscript for details)

Point 3: Methodology 105-110: 6 lines are not enough for a research article to explain the proposed methodology. Sections 2.1, etc. explain already known techniques (Region Proposal Networks etc.) and not the methodology.

Response 3: We have reinterpreted the proposed method and adjusted the structure of the article. (See line 158-176 of the manuscript for details)

Point 4: 107-108: The phrase “The Faster R-CNN takes a single image as the input and outputs the prediction probability value and object detection box of the desired object category” does not correspond to the figure 1 which shows different flowchart boxes which still remain unexplained.

Response 4: We have reinterpreted the flowchart of this method. (See line 166-176 of the manuscript for details)

Point 5: Figure 2. “Faster R-CNN architecture” is not explained enough in lines 114-118. E.g. no mentions in the text about Rol feature vector, Rol pooling layer, etc.

Response 5: We have reinterpreted the Figure 2. (See line 106-113 of the manuscript for details)

Reviewer 3 Report

Not having enough computation power to train and run YOLO, or any other deep learning-based approach, seems a weak reason.

The authors should perform these experiments involving YOLO (or any other state of the art RCNN or pixel-wise classification) in order to demonstrate the superiority of their method. Maybe a theoretical analysis of YOLO could lead to stronger reasons to not to test it.

Author Response

Point 1: Not having enough computation power to train and run YOLO, or any other deep learning-based approach, seems a weak reason. The authors should perform these experiments involving YOLO (or any other state of the art RCNN or pixel-wise classification) in order to demonstrate the superiority of their method. Maybe a theoretical analysis of YOLO could lead to stronger reasons to not to test it.

Response 1: We have added YOLOv2 as a comparison method and conducted an experimental analysis. (See line 371-390 of the manuscript for details)

Round 3

Reviewer 2 Report

The manuscript has been significantly improved.

-Line 344 the word “ariplane” is wrong and should be corrected. All the manuscript should be checked for spelling and grammatical errors.

Author Response

Point 1: Line 344 the word “ariplane” is wrong and should be corrected. All the manuscript should be checked for spelling and grammatical errors.  

Response 1: We have already modified the error in line 344 and checked all the manuscript for spelling and grammatical errors. (See the highlighted text in yellow for details)

Article Menu

Effective Airplane Detection in Remote Sensing Images Based on Multilayer Feature Fusion and Improved Nonmaximal Suppression Algorithm

Further Information

Guidelines

MDPI Initiatives

Follow MDPI