Next Article in Journal / Special Issue
Diet and Life-History Traits of Savannah Dwelling Waterbirds in Southern Africa: Implications for Their Conservation Status
Previous Article in Journal / Special Issue
Disentangling Post-Fire Logging and High-Severity Fire Effects for Spotted Owls
 
 
Communication
Peer-Review Record

Non-Invasive Monitoring of the Spatio-Temporal Dynamics of Vocalizations among Songbirds in a Semi Free-Flight Environment Using Robot Audition Techniques

by Shinji Sumitani 1,*, Reiji Suzuki 1, Takaya Arita 1, Kazuhiro Nakadai 2,3 and Hiroshi G. Okuno 4,5
Reviewer 1:
Reviewer 2: Anonymous
Submission received: 5 March 2021 / Revised: 15 April 2021 / Accepted: 16 April 2021 / Published: 21 April 2021
(This article belongs to the Special Issue Feature Papers of Birds 2021)

Round 1

Reviewer 1 Report

A non-invasive monitoring of a spatio-temporal dynamics of vocalizations among songbirds in a semi free-flight environment using robot audition techniques

 

This study describes an initial analysis of spatial and temporal localization (as well as classification) of zebra finch vocalizations in a semi-free flight scenario using a relatively new technologies associated with robot audition methods. I think that the concept is very cool, and could be useful to behavioural ecologists if the hardware and software is obtainable or at the very least accessible in the future. However, I think that because this study is essentially a proof-of-concept it really should be labelled more explicitly as a methods paper and should also therefore be written in that style. This means being much more clear and precise about what exactly was done, how it fits into (or adds onto) existing methodological practices (instead of simply referring to the other papers) and providing a lot more detail in these terms. That way, the paper becomes a road map for other researchers and not just a summary of a trial you completed. I would suggest also rewriting the introduction to put the study into a broader context as well. Why do we need this technology exactly? Or why is it useful, what ecological questions could it help us answer?

I also think that the authors should focus on the advantage of automation rather than array (or alongside array) as one of the most practical and novel features of their study. Microphone arrays have been used for quite a while now (over a decade) to spatially locate and define interactions of singing birds (albeit mostly at larger spatial scales where localization is easier). I think really defining how this particular method stands out and builds upon these previous methods is key – you do this a little bit but it seems a bit hidden within the paper. Just explicitly state the novelty and exactly how this advances the technology and techniques we already are familiar with and use, so that the impact of the research is crystal clear.

 

I provide quite a few comments below. Most deal with language, which was at times quite difficult to follow (if in doubt, keep the wording simple), however there are a few other issues that I believe need addressing. I think the paper could do with a re-write overall as more of a guide, and suggest that the authors consider this.

 

 

Lines 19-23 – This sentence is very long, please break it up.

Line 27 – You state that this hypothesis (ANH) has been observed but you don’t define or describe it, meaning there is no context? Please elaborate.

The introduction needs to be a bit more cohesive at the beginning. At the moment it reads like just a series of facts stated one by one rather than trying to set the context for the rest of the paper.

Line 29 – More complex than what?

Line 29 – I would suggest rather than just talking about your own study, that you begin by broadly covering past literature (as you have done) central to a theme but not being explicit yet that this is what your study is about – leave that until the end of the introduction when you define your hypothesis and predictions/aims etc of your study. The first part of the introduction should define the problem and gaps in the literature that you intend to help fill.

 

Line 35 – Degree of what? This summary of the study is confusing.

Line 42 – This is an interesting question, but it would be great if we could understand why it matters a bit more…?

Paragraph at line 49 – Can you define the concept of Robot audition more thoroughly here? What does it entail? What, specifically, could it be used for (or what has it been used for)?

Line 69 – Should perhaps be “microphones”?

Line 72 – Suggest changing to “that the specific “stack call” was most frequently used…”

Line 73 – What defines a ‘newer pair’? Not having met before? Or one bird being new to the population? Do you mean pair as in any two individuals in the population or pair as in a mated pair? Some clarity needed.

Line 81 – Should be “natural” not “naturally”

Line 84 – Should be “outdoor”

Line 88 – Should be “that have applied”

Line 94 – Should be “the robot audition approach”

Line 95 – I’m still not 100% sure about the exact aim of this study. Could you concisely state this, and how it fits within the broader literature? Would it be more widely applicable for people doing these types of studies or is it more just a conceptual test? (I.e. would the hardware/software be available). Once you have a specific aim, you can be specific about your hypotheses/predictions – I know it’s not necessarily an experiment per se (which means it should probably be categorised as a methods paper above anything else) but it would still be more useful if these concepts were all clear to the reader.

Line 103 – Should be “songbirds to fly around”

Line 106 – How many individuals exactly? If you are just talking about general methods here perhaps remove this part (because you describe the release and how many ZFs are in the study later) and just include “We conducted trials from…”

Line 109 – What was the source of these finches? Were they from the same population/colony? Were they familiar with each other?


Line 111 – Remove “happen”

Line 112 – So there was only one trial? If so, then amend your previous wording in the paragraph to reflect this (you make it sound like there were multiple trials, so it’s a bit confusing).

Line 118 – Should be “corresponding”

 

Line 134 – Do you need to calibrate the arrays in the space by first playing a control sound at known sites? Or does the method not require this? I’m a bit confused as to how it all works, as the description tends to refer to previous studies quite a bit, when it would be nice to summarize some of the basics.

Figure 4 caption – Confusingly worded, perhaps try: “The spatial division of the tent area for 2D localization. For each area, we conducted 2D localization by using a pair of microphone arrays: array 4 (central) was always included, and was paired with one other array based on the target localization area (colour-matched above).

Line 146 – Should be “needed”

Line 149 – Replace “which have” with “with”

Line 159 – This sentence confuses me. Do you mean “from” a unique source?

Line 183 – Should be “did not”

Line 190 – Did the height of the bird (i.e. on the ground versus on a perch), especially relative to the microphone height not affect the localization calculations? How was this dealt with? (Or am I missing something)?

Line 201 – Should be “localized outside the tent, as observable in the lower left…”

Line 202 – Should be “This can be”

Line 205 – I still don’t really understand this error and why it occurred. Can you explain this a bit more clearly?

Line 240 – remove “the” after “keeping”

Line 272 – Change to “outdoor”

Line 282 – Except for the vocalizations that were placed outside the tent? Is there a fix for this?

Table 1 – Requires a much more descriptive caption: the caption should enable the table to stand alone! What are these methods and what are they used for? Are there examples for each you can refer to? What does “Cost – Capture and fitting” mean? This does not seem to reflect any practical cost comparisons for those readers trying to decide on a methodology…?

Line 321 – Replace “The more” with “A more”

 

 

 

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

If I've understood this ms. properly, the study is based on just 10 min of recording of 5 ZFs of one sex brought together in a novel environment for the first time. Although it is primarily concerned with evaluating an automated, remote technique for spatio-temporal mapping of vocalizations, it would still seem to make sense to conduct the test over a longer duration with more subjects (of both sexes) and when the birds have had time to settle down. If this work is part of a much bigger study, perhaps that is where it belongs. It is stated that this kind of information about vocalizations is important in understanding social interactions, but the vocal interactions occurring so soon after introduction would hardly be typical of normal, stable sociality, so is this a meaningful test of the technique's utility? Moreover, how much does knowing when and where wild birds vocalize tell us about social organization without additional information about who is vocalizing, the social context and what the response is? Further, could this technique really be applied to truly wild birds; it is difficult to envisage how this set-up could be transposed practically into a real field situation. Apart from the suggestion that males may be counter-singing, which is unsurprising so soon after being introduced to one another, this study tells us little about social organization and space use that is not obvious (i.e. birds tend to vocalize close to perches and nests; captive ZFs mostly vocalize when perching [ a bit on the ground] , so this is unremarkable). Quite a lot is made of the system's ability to distinguish songs from calls. Whilst I can understand that this would be important if you are never going to directly observe your subjects, doesn't it just underline the point that this kind of remote approach to studying sociality requires ground-truthing by some direct observation? It is very easy to distinguish between song and calls through direct observation in ZFs and most other songbirds. My main reservations with the investigation are its limited scope and the debatable underlying rationale for conducting it. 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Review of Revised Article Birds-1153204-V2

 

What an improvement! This reads much more smoothly now and I can see it being of good value to the bioacoustics and ornithological community.

 

I have a few minor suggestions mostly to do with grammar and clarity, etc., below:

 

Abstract Line 14 – I would switch the order of your last two sentences, so that your secondary description of methods comes before your concluding sentence (i.e. end the abstract with the sentence “Our proposed research…”.)

 

Line 29 – I suggest starting this sentence: “One approach for obtaining such complex vocalization and associated spatial data has been to attach…” just so the relevance is clear.

 

Line 44 – Suggest changing “aggressive relationships among them” to just “aggression”.

 

Line 64 – Change to “It is worth tackling the challenge of automatically capturing the spatial distribution…”

 

Line 118 – Change to “five male individual ZF singing”

 

Line 122-123 – Suggest changing to “The situation in which zebra finches are attempting to establish social relationships is a good example for testing the potential of our method to grasp various vocalization patterns.” (It’s just a more straightforward sentence structure here).

 

Line 154 – you have an upper-case letter in the middle of your sentence. I suggest: “A user can tweak these parameters: this helps to localize…”

 

Line 156 – This sentence is confusing. Suggestion: “This is important because the most appropriate settings depend strongly on the acoustic properties of the environment and the target sounds”.

 

Line 158 – Remove “is displayed in”

 

Line 160 – Remove “existence”

 

Line 165 – Suggest splitting the sentence, as follows: “…time, duration and direction. Annotations can also be made, and the user can check each localized sound…”

 

Line 176 – Capitalise “We” as sentence starts.

 

Line 226 – Can you verify that this doesn’t affect results too much somehow? I.e. perhaps movement in the vertical plane is likely much more limited than movement in the horizontal plane (because of available perches, etc?).

 

Line 272 – Sometimes you use the wording “mic. array #” and sometimes you spell it out. Please just keep consistent throughout the paper (I suggest spelling it out completely each time).

 

 

 

 

 

 

 

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

The revised version is considerably improved, but I still believe that this should just be a much shorter technical communication. It is essentially a ms. about proof of a technique; it does not add anything much to scientific knowledge.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop