Next Article in Journal
Efficient Layered Parallel Architecture and Application for Large Matrix LDPC Decoder
Previous Article in Journal
Distributed and Lightweight Software Assurance in Cellular Broadcasting Handshake and Connection Establishment
 
 
Article
Peer-Review Record

Background Instance-Based Copy-Paste Data Augmentation for Object Detection

by Liuying Zhang, Zhiqiang Xing * and Xikun Wang
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3:
Reviewer 4:
Reviewer 5:
Submission received: 8 August 2023 / Revised: 4 September 2023 / Accepted: 5 September 2023 / Published: 7 September 2023

Round 1

Reviewer 1 Report

see the attachment 

Comments for author File: Comments.pdf

Moderate editing of English language required

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The theme of the article is significant. The advantage of the article is a proposed method of data augmentation for object detection.   Proposed results have practical value. But the presentation of the method is weak and has to be improved.

There are such notes

1.  Chapter 2 "Related works" have to be improved. The authors observed research published in arXiv.org , there are 25 references from 27 presented.

It is recommended to realize a review in other databases, for example, Scopus, DBLP, and others.

2. It is completely unclear what the authors mean by the concept "Feature of image". It is recommended to define it and connected it with the research of other authors.

3. It is recommended to add links for all used datasets.

4. In the description of formula (1) would be better to add the dimension of all vectors. (see lines 198-202)

5. it is unclear "Vb ’s length" (see line 200). What is the length of the vector? Is it the standard vector's length?   Is it a magnitude? Why is it equal to the channel depth of the image feature? It is recommended to add more details.

6. it is recommended to add the description of "feature space" (see line 216). 

7. The first sentence of the Conclusion "This section is not mandatory but can be added to the manuscript if the discussion is unusually long or complex" belongs to Instruction, and must be eliminated.

8. All writing in the article has to be in a more scientific style like specific writing in a scientific journal. it means that all objects have to be described and there must be a clear logic of thoughts and sentences construction.

After deep rewriting the article can be published

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

The main question addressed by the research is how to improve the performance of supervised deep learning models for object detection by enhancing the objectivity and continuity of contextual information between object and background through a novel data augmentation model called Background Instance-Based Copy-Paste (BIB-Copy-Paste).

The topic is both original and relevant in the field of object detection and deep learning. The authors introduce a novel approach to data augmentation that addresses the limitations of existing methods and provides experimental evidence of its effectiveness and transferability. One specific gap in the field that the authors address is the context mismatch caused by using copy-paste for data augmentation for object detection. They propose a method to generate pseudo-labels based on the features of object background instances and use the pseudo-labels to train the background classifier to guide copy-paste. This strategy solves the problem of severe occlusion between the pasted object and the original object, which affects annotation accuracy and is not conducive to model training and convergence. Overall, the author's approach is a promising solution to the limitations of existing data augmentation methods for object detection and could potentially have significant implications for improving the accuracy and robustness of object detection models in various applications.

This study needs to provide a comprehensive comparison of the proposed BIB-Copy-Paste model with other published material in the subject area of object detection and deep learning. However, the authors briefly discuss some limitations of existing data augmentation methods, such as copy-paste, and how their proposed approach addresses them. Specifically, the authors note that copy-paste that does not take context into account can lead to severe occlusion between the pasted object and the original object, thus affecting annotation accuracy. They propose a method to generate pseudo-labels based on the features of object background instances and use the pseudo-labels to train the background classifier to guide copy-paste. This strategy solves the above problem well. Overall, the proposed BIB-Copy-Paste model complements object detection and deep learning by addressing some of the limitations of existing data augmentation methods and providing a novel approach to enhancing the objectivity and continuity of contextual information between objects and backgrounds. However, a more comprehensive comparison with other published material would be needed to evaluate its complementarity to the subject area fully.

The study needs to evaluate the proposed BIB-Copy-Paste model comprehensively, so it is difficult to determine specific improvements that the authors should consider. However, based on the information provided, there are a few potential areas for improvement and further controls that could be considered:

1. The authors could provide a more detailed comparison of their proposed approach with other data augmentation methods for object detection, such as cut-paste, mix-up, and random erasing. This would help better evaluate their approach's effectiveness and transferability and identify potential areas for improvement.

2. The authors could consider conducting more extensive experiments on larger datasets to validate the effectiveness and robustness of their proposed approach. This would help ensure their policy applies to a broader range of object detection tasks and scenarios.

3. The authors could consider conducting a more detailed analysis of their proposed approach's computational cost and efficiency compared to existing methods. This would help to identify potential trade-offs between accuracy and efficiency and provide insights into the practicality of their approach for real-world applications.

Overall, the proposed BIB-Copy-Paste model is a promising approach to data augmentation for object detection. However, further evaluation and controls would be needed to assess its effectiveness and potential for practical applications fully.

The conclusions presented in the study are consistent with the evidence and arguments presented. The authors provide experimental results on the PASCAL VOC 2012 and MS COCO datasets that demonstrate the effectiveness and transferability of their proposed data augmentation model for object detection. They also acknowledge the limitations of their approach and identify potential areas for improvement in the balance of their model for different classes of object augmentation. Overall, the conclusions address the main issues raised in the article, such as the limitations of existing data augmentation methods for object detection and the need for a novel approach that addresses these limitations. The authors' proposed BIB-Copy-Paste model is a promising solution to these issues, and the experimental results provide evidence to support their conclusions.

Based on the information provided in the text, the references are appropriate and relevant to the subject area of object detection and deep learning. The authors cite various sources, including academic papers, conference proceedings, and technical reports, to support their arguments and provide context for their proposed approach. Overall, the references are well-selected and offer a comprehensive overview of the current state of research in the subject area.

 

The study includes several figures that help to illustrate the proposed approach and experimental results. The statistics are well-designed and clearly labeled, making it easy for readers to understand the key concepts and results presented in the article. Overall, the figures are a valuable addition to the writing and help to enhance its readability and visual appeal.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

The article is devoted to increasing the efficiency of solving the object detection problem. The topic of the article is relevant. The structure of the article does not correspond to that adopted in the MDPI for research articles (Introduction (including analysis of analogues), Models and methods, Results, Discussion, Conclusions). The level of English is acceptable. The article is easy to read. The figures in the article are of acceptable quality, but the tables are too small. The article cites 27 sources, most of which are not relevant. The References section is sloppy.

The following comments and recommendations can be formulated on the material of the article:

1. The size of the images themselves in the dataset is not so important: before submitting the model for training, the data is normalized to fit the grid size, for example, 300 by 300 pixels. When forming a dataset, it is important to keep in mind the dimension of the grid with which to work, and to fit the size of the images as closely as possible to the size of the model. If this is not done, then you can stumble upon a large number of surprises. How do the authors take this circumstance into account in their approach to data augmentation?

2. The main rule of any dataset is that images should be as close as possible to the real conditions in which the neural network model will work. Before collecting images, it is important to know which images the model will receive as input, where the camera will be, and the resolution of the camera. It is important to understand that if the camera has a low resolution and will capture small images, then the photos in the dataset should also be small. Beginners in the development of machine learning systems often ignore this rule, assuming that the larger and more detailed the image is given to the grid at the time of training, the better it will learn the subject and the higher the accuracy on real data will be. How do the authors take this circumstance into account in their approach to data augmentation?

3. In my experience, initially the dataset needs to be split so that there are images that the neural network has not yet seen. The training process remains the same - the mesh learns the images in Train and tests its knowledge on the images from Validate, but after the training is completed, we add an additional stage - testing. The neural network analyzes images from Test that it has never seen, and we can check if the neural network actually works. This stage allows you to find out where the neural network is wrong, what classes cause difficulties for it. How can the author's concept of data augmentation be applied to images in the Test sample?

4. In my practice, I have come across situations where data leakage significantly affected the quality of the definition. In one of my projects, the neural network consistently made mistakes when determining objects from a certain class. By manual testing, I found a data leak. After removing the duplicates, the error disappeared. How is the issue of data leak detection taken into account by the authors?

5. It is possible and necessary to augment the dataset, but the main idea of this step is not to get carried away with augmentations. When the dataset size is increased by 7 times, the grid will not become 7 times more efficient, but it may start to work worse. It is worth using only those augmentations that really help, i.e. those that meet the real conditions. If the camera is placed indoors, then the “snow” or “fog” augmentation will only hurt, because there will be no such distortions in real life. How is the question of data augmentation taken into account by the authors?

-

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 5 Report

Original Submission

Recommendation

Major (very) or rejection Revision

Comments to Author:

Title:  

 

Background instance-based copy-paste data augmentation for object detection

 

Overview and general recommendation.

 

Within this exhaustive Technical Note, a thorough investigation unfolds into the pragmatic execution and multifaceted utility of object detection. This indispensable methodology has demonstrated exceptional precision, rendering it indispensable across a spectrum of vital applications. Leveraging a reservoir of more than two decades steeped in image-related pursuits, I find myself compelled to convey my wholehearted admiration for the presented content. Yet, as an unwavering scientist, the responsibility rests upon me to meticulously evaluate and candidly analyze the material at my disposal.

The Abstract necessitates substantial enhancement, as its present form resembles an introduction more than a succinct encapsulation of the ideas and concepts investigated within the study. To guarantee lucidity and exactness, enlisting the expertise of a native English speaker is imperative. This measure is both non-negotiable and of paramount significance. Thus, I fervently advise enlisting their aid for a meticulous appraisal. If deemed essential, let us proceed with the revision of the Abstract, ensuring heightened clarity and faithful alignment with its intended purpose.

 

 

 

The introduction falls short of expectations, lacking clarity and comprehensive explanation of the techniques involved. It is essential to expound upon the methodologies, showcasing the current state-of-the-art practices. Your responsibility entails guiding readers seamlessly toward the core essence of your forthcoming presentation. Upon review, I've observed omissions of significant references, particularly newer works, which requires immediate rectification. I implore you to enhance the quality of the figures; their current state is notably subpar and demands improvement. Pls write it again!!!

Furthermore, the composition is notably subpar in its construction and design. Although commendable experimentation has been conducted, the essential remedy lies in a concerted effort towards thorough revision and rectification. The accomplishment of the desired outcome hinges upon addressing these issues comprehensively. Failure to enhance these aspects will inevitably lead to my recommendation of rejection.

While I find merit in this paper, substantial refinement of its content is imperative. Presently, I am inclined towards rejecting this submission. However, I encourage your steadfast commitment to enhancing the material. Revisit and revise the pivotal components, as I believe that through conscientious revisions following the peer review process, a significantly improved iteration can be achieved.

 

 

 

 

 

Detailed comments:

 

Figs qualities are very bad. I do not accept this!!

Table.1 and  the others are in very bad shape. Pls fix it!

 

 

 

 

 

To guarantee lucidity and exactness, enlisting the expertise of a native English speaker is imperative. This measure is both non-negotiable and of paramount significance. Thus, I fervently advise enlisting their aid for a meticulous appraisal. 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

author has incorporated all the comments 

Minor editing of English language required

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 2 Report

The article is better than the previous one.

There are such notes

1. in Table 2  in line Total "5717(remove dupli") It would be better present 5717* and *remove duplication  put as a note to the table.

2. A comparison text of the abstract and conclusion text shows they are very similar. It is recommended to improve the conclusion text,

3.  It is unclear reference to the image dataset. References 41 and 42 are articles using a data set, but not directly the link to the dataset of images.

After correction article can be published

 

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 4 Report

I formulated the following remarks to the basic version of the article:

1. The size of the images themselves in the dataset is not so important: before submitting the model for training, the data is normalized to fit the grid size, for example, 300 by 300 pixels. When forming a dataset, it is important to keep in mind the dimension of the grid with which to work, and to fit the size of the images as closely as possible to the size of the model. If this is not done, then you can stumble upon a large number of surprises. How do the authors take this circumstance into account in their approach to data augmentation?

2. The main rule of any dataset is that images should be as close as possible to the real conditions in which the neural network model will work. Before collecting images, it is important to know which images the model will receive as input, where the camera will be, and the resolution of the camera. It is important to understand that if the camera has a low resolution and will capture small images, then the photos in the dataset should also be small. Beginners in the development of machine learning systems often ignore this rule, assuming that the larger and more detailed the image is given to the grid at the time of training, the better it will learn the subject and the higher the accuracy on real data will be. How do the authors take this circumstance into account in their approach to data augmentation?

3. In my experience, initially the dataset needs to be split so that there are images that the neural network has not yet seen. The training process remains the same - the mesh learns the images in Train and tests its knowledge on the images from Validate, but after the training is completed, we add an additional stage - testing. The neural network analyzes images from Test that it has never seen, and we can check if the neural network actually works. This stage allows you to find out where the neural network is wrong, what classes cause difficulties for it. How can the author's concept of data augmentation be applied to images in the Test sample?

4. In my practice, I have come across situations where data leakage significantly affected the quality of the definition. In one of my projects, the neural network consistently made mistakes when determining objects from a certain class. By manual testing, I found a data leak. After removing the duplicates, the error disappeared. How is the issue of data leak detection taken into account by the authors?

5. It is possible and necessary to augment the dataset, but the main idea of this step is not to get carried away with augmentations. When the dataset size is increased by 7 times, the grid will not become 7 times more efficient, but it may start to work worse. It is worth using only those augmentations that really help, i.e. those that meet the real conditions. If the camera is placed indoors, then the “snow” or “fog” augmentation will only hurt, because there will be no such distortions in real life. How is the question of data augmentation taken into account by the authors?

The authors consistently responded to my comments. I liked their answers. I support the publication of the current version of the article. I wish the authors creative success.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Back to TopTop