Next Article in Journal
Detection and Classification of Knee Injuries from MR Images Using the MRNet Dataset with Progressively Operating Deep Learning Methods
Previous Article in Journal
Analysis of Explainers of Black Box Deep Neural Networks for Computer Vision: A Survey
 
 
Article
Peer-Review Record

AI-Based Video Clipping of Soccer Events

Mach. Learn. Knowl. Extr. 2021, 3(4), 990-1008; https://0-doi-org.brum.beds.ac.uk/10.3390/make3040049
by Joakim Olav Valand 1,2,†, Haris Kadragic 1,2,†, Steven Alexander Hicks 1,3, Vajira Lasantha Thambawita 1,3, Cise Midoglu 1,*, Tomas Kupka 4, Dag Johansen 5, Michael Alexander Riegler 1,5 and Pål Halvorsen 1,3,4,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Mach. Learn. Knowl. Extr. 2021, 3(4), 990-1008; https://0-doi-org.brum.beds.ac.uk/10.3390/make3040049
Submission received: 5 November 2021 / Revised: 28 November 2021 / Accepted: 4 December 2021 / Published: 8 December 2021
(This article belongs to the Topic Applied Computer Vision and Pattern Recognition)

Round 1

Reviewer 1 Report

In my opinion, the quality of this work could be improved if:

  • the three metrics from Table 4 are defined;
  • Algorithm 1 is updated: line 5 refers to thresholds and line 17 refers to end thresholds; line 12 refers to a variable cutCrowd which is not introduced;
  • the evaluation is based on randomly selected five events for each human evaluator; there are 61 evaluators. It is not clear to me if this method provides trustful results; therefore, I propose to motivate these aspects of the evaluation process.

Author Response

Dear Reviewer,
thank you very much for your comments. Please find our responses below.

Comment #1:
"In my opinion, the quality of this work could be improved if:
the three metrics from Table 4 are defined;"

Author Response: 
In this study, we use 3 traditional machine learning metrics (precision, recall and F1-score). We added a new section in the revised manuscript (Section 4.3) to define each of these metrics, including mathematical formulas for their calculation.

Comment #2:
"Algorithm 1 is updated: line 5 refers to thresholds and line 17 refers to end thresholds; line 12 refers to a variable cutCrowd which is not introduced;"

Author Response:
We updated Algorithm 1 in the revised manuscript to include a more detailed and comprehensive list of parameters. We also added a more elaborate description of our final pipeline, including the clipping protocol described in Algorithm 1, in Section 4.7.

Comment #3:
"the evaluation is based on randomly selected five events for each human evaluator; there are 61 evaluators. It is not clear to me if this method provides trustful results; therefore, I propose to motivate these aspects of the evaluation process."

Author response:
We apologize for the lack of clarity in the previous manuscript. In the revised manuscript, we updated Section 5 to explain our user study more clearly. 
The same 5 events, randomly selected and representing different goal situations as listed in Table 6, were used for all participants. Each event was clipped using 2 alternative methods, and participants were asked to assess these 2 clips for each event, in the form of 5 questions in total. Overall, each participant answered the same 5 questions, seeing a total of 10 clips.  

Other notes:
In the revised manuscript, we have also gone over all sections to improve the writing quality, replaced Figure 7 with a grayscale version for increased printing compatibility, and adjusted other figure and table sizes to improve readability.

We thank the reviewer for their feedback. 

Reviewer 2 Report

The paper deals with the automatic processing of the football matches footage. A method for automatically trimming the replays of the goals is devised and verified experimentally. The method is based on methods of detecting features in video found in the recent papers.

Subjective verification of the results is provided.

Do the results presented finally take into account the mislabelling that you detected in the dataset?

Why is that the logo detection is tested on different inputs for Eliteserien and Premier League (for example: 54x96x1 vs. 54x96x3)?

 

My suggestions are the following:

  • improve formatting of the references according to the rules
  • consider preparing the figures so that they are meaningfull when printed B&W (I know, it is archaic, but in your paper it should be easy to do)
  • Is it possible to make fig 4 and 5 any larger?
  • Personally I find Fig. 6 and Algorithm 1 confusing - I would consider revising them to make it more helpful. A good figure 6 instead of the Algorithm would be sufficient.

Author Response

Dear Reviewer,
thank you very much for your comments. Please find our responses below.

Comment #1:
"Do the results presented finally take into account the mislabelling that you detected in the dataset?"

Author response:
Our final results are based on the full SoccerNet dataset without modification (i.e., including the potentially mislabeled samples we have identified). There are 2 reasons for our decision: Firstly, as discussed in Section 4.5, we see that the scene boundary detection component with the pre-trained model already has acceptable performance and can be integrated into our final pipeline. Secondly, and more importantly, we would like to remain consistent and comparable with other works in literature using the same well-established and benchmarked dataset, by using it as it was publicly released [3]. An option would be to (a) remove samples or (b) change labels, and re-publish the dataset. However, this would defeat the purpose of using a somewhat standardized existing dataset, which is bigger and better-established than our smaller Eliteserien dataset, for presenting comparable benchmarks to the research community. We have made an explicit note of this issue in Section 4.5. We also plan to contact the authors of [3] with our findings.

Comment #2:
"Why is that the logo detection is tested on different inputs for Eliteserien and Premier League (for example: 54x96x1 vs. 54x96x3)?"

Author response:
For logo detection, we compared the performance of various ML models, with several input resolutions, and also switched between RGB ("x3") and grayscale ("x1") images to see the trade-off between computation time and accuracy. All model-resolution-color combinations were exhaustively tested on both Eliteserien and Premier League datasets (over 20 model and input configurations for each). However, due to space limitations, we only present the top 10 best performing ones for each dataset in Tables 1 and 2. 
In the revised manuscript, we have updated Section 4.4 and the captions of Table 1 and Table 2 to clarify that we actually make an exhaustive comparison, with a few selected results presented in the manuscript.

Comment #3:
"My suggestions are the following:
improve formatting of the references according to the rules"

Author response:
We use the official LaTeX templates provided by MDPI under https://0-www-mdpi-com.brum.beds.ac.uk/authors/latex for generating our manuscript. The formatting of the references in the manuscript is based on the MDPI class file, which we have not modified from its original. 
We have, however, gone through all our BibTex entries in the revised manuscript and made sure that they contain any missing relevant and required information. 

Comment #4:
"consider preparing the figures so that they are meaningfull when printed B&W (I know, it is archaic, but in your paper it should be easy to do)"

Author response:
In the revised manuscript, we updated Figure 7 to be in full grayscale, and confirmed that Figures 2 and 6 are readable when printed in B&W. Unfortunately, since Figures 1, 3, 4, and 5 are photo and/or video frames, we believe they would be rendered less meaningful if included in grayscale in the manuscript. We hope that the manuscript is adequately comprehensible.

Comment #5:
"Is it possible to make fig 4 and 5 any larger?"

Author response:
In the revised manuscript, we have increased the size of both these (Figure 4 and Figure 5), and other figures. We use the entire "page width" to display figures that previously appeared too small in the narrower "text width".

Comment #6:
"Personally I find Fig. 6 and Algorithm 1 confusing - I would consider revising them to make it more helpful. A good figure 6 instead of the Algorithm would be sufficient."

Author response:
Figure 6 provides an overview of our complete pipeline, where Algorithm 1 describes the clipping protocol used by what is referred to as the "video processing" module in Figure 6. We believe that both the high level overview and the detailed description could be useful for readers. 
In the revised manuscript, we updated Algorithm 1 to include a more comprehensive list of parameters, and added a dedicated paragraph in Section 4.7 for its description. We also updated the description of Figure 6 in this section, and related the two to each other. We hope that the manuscript is clearer now.

Other notes:
In the revised manuscript, we have also gone over all sections to improve the writing quality, adjusted figure and table sizes to improve readability, added a dedicated section for evaluation metrics, and clarified the execution of the subjective user study. 

We thank the reviewer for their feedback. 

Back to TopTop