Next Article in Journal
Dynamic Analysis on the Parametric Resonance of the Tower–Multicable–Beam Coupled System
Previous Article in Journal
Performance Index for in Home Assessment of Motion Abilities in Ataxia Telangiectasia: A Pilot Study
 
 
Article
Peer-Review Record

Understanding the Role of the Microbiome in Cancer Diagnostics and Therapeutics by Creating and Utilizing ML Models

by Miodrag Cekikj 1,*, Milena Jakimovska Özdemir 2,*, Slobodan Kalajdzhiski 1, Orhan Özcan 3 and Osman Uğur Sezerman 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Submission received: 8 January 2022 / Revised: 11 February 2022 / Accepted: 8 March 2022 / Published: 19 April 2022
(This article belongs to the Section Applied Biosciences and Bioengineering)

Round 1

Reviewer 1 Report

Although it is clear that a lot of thought and work went into this study I believe there are several issues that need to be addressed. Though blunt, my aim is to be as constructive as possible. 

 

1 - metrics

Accuracy, which you report on tables 1 and 2, by itself, is not an appropriate metric for classification problems, f1, as you supply on table S1 would be preferable.

 

Also, please consider one or more of the following classification metrics: AUC, Average Precision/AUPRC, Matthews correlation coefficient, which are all supported by Scikit Learn.  

Also, if k-fold cross-validation is being performed, please supply average and standard deviation of the metrics on all folds. 

 

2 - cross validation

You mention the use of 5-fold cross-validation and hyperparameter tuning for phase 2, how exactly this is performed is not clear to me. If this is simple (e.g. not nested) cross-validation, then this introduces a bias in the results, as the classifier should be tested on a held-out set (e.g. that is not used for training or tuning). If this is the case, please consider using nested cross-validation.

 

The above is also valid for the procedure in the screening phase, where you mention a split of the dataset into train (70%) and test (30%) (line 278). 

 

The text is somewhat confusing on the description of the cross-validation procedures. While 5-fold cross-validation is mentioned in line 394, lines 399 and 404 mention “cross-validation value of 25% test data” , this appears to contradict what is said on line 394, as 25% test data would correspond to 4-fold cross-validation.

 

3 - ML pipelines

Data normalization and scaling (mentioned in section 2.5.) should be applied to the training and testing sets independently. It is not clear to me if this was done. The text appears to suggest scaling was done to the entire dataset, before training the models.   

 

4 - Feature importance and statistical analysis

 

The presentation of these results is highly confusing and should be split into separate sections to be clearer.

 

In line 427 you mention: “From the rest of the classified bacteria”. What classified bacteria and how were they classified? Up to this point in the paper all mention of classification are those related to the ML classifiers which, as far as I can tell, classify samples not bacteria. 

 

Presentation of the feature importance and statistical analysis is also not adequately explained. In lines 301 and 302, you mention: “importance of the Random Forest algorithm's built-in features, the Permutation method, and the technique of feature importance computed with SHAP values.” On section 3.3 the feature selection results are presented but it is never mentioned where the p-values come from. This is confusing as the permutation method also generates p-values. While it is possible to infer that they come from the statistical analysis this should be mentioned explicitly. 

 

The same goes for the feature importances in tables 3 and 4. The source of the values should be explicit in the text and/or the table captions.

 

In section 3.4, you mention: “ joint feature contributions were calculated and extracted from the same Python-based Random Forest classifier”. This should be detailed in the methods section.

Once again, it is possible to infer that the treeinterpreter package was used, but it should be  explicit. 

 

5 - Readability 

Most of the issues mentioned above are not necessarily a critique of the work per se (though methodological aspects need clarification), but of the structure of the paper. Overall I feel that the paper needs to be reorganized to make the work and results clearer to the reader.

 

6 - English language

Lasly, there’s the matter of the English language. The paper contains a large number of non-idiomatic sentences scattered throughout the text. In some cases it is clear what is meant, while in others the meaning is lost or the sentence just feels “weird”. Sometimes sentences are just plain wrong. The first sentence of the paper: “Cancer incidence and mortality estimates remain the leading cause of death worldwide” is an example of this. Neither cancer incidence nor mortality estimates are the leading cause of death, cancer itself is. There are numerous more examples.

Author Response

Please see the attachment.

Thank you.

Author Response File: Author Response.pdf

Reviewer 2 Report

This study presents Machine Learning algorithms for identifying potential biomarkers metabolites and symbiotic bacteria correlations for understanding drug-resistant mechanisms in CRC patients. In this way, authors identified and interpreted the most significant genera in the cases of resistant groups. Having said the above, I recommend the manuscript for publication in Applied Sciences after addressing a number of recommendations.

 

- Lines 47-48: “One of the causes for the high mortality rate is the unreliable treatment of patients with colorectal cancer due to the gut microbiota”. The original reference for this citation should be included.

- In general, the genera and species must be in italics.

- Line 60: The correct citation should be “[5,6]”, please check the instructions for authors.

- What is the purpose of figure 1?, please consider moving it to the supplementary material.

- In general, authors should define the abbreviations when they are first mentioned in the text (examples: BMI, SVM, CEA, CIT and NDA).

- Lines 328-330: Check punctuations marks.

- In section “3.3. Highly Contributing Features and Statistical Analysis Results”, authors mentioned key biormarkers and factors, but they are not discussed in detail.

- Lines 514-515. “The influence of metabolites and their impact on the cell cycle mechanisms in the case of CRC-prior treatment is presented in Figure 6”. This image does not imply a cell cycle mechanism (in biological terms). It seems like an abundance histogram, the same goes for Figures 7 and 8, can authors argue about it? Otherwise, authors should discuss bacterial impact regarding Figure 6-8 and their correlation to a specific or several cell cycle mechanisms, this information should be included in the main document.

- Authors identified the previously reported Bacteroides genus. Could you mention another important genus that correlates with CRC within this analysis? What new findings can you discuss? all new discoveries are worth discussing.

- Footnote information should be moved to the references section and be properly cited, please, check.

- All references should be cited appropriately, according to ACS style guide. References need to keep a single format. Check Instructions for Authors.

Author Response

Please see the attachment.

Thank you.

Author Response File: Author Response.pdf

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.


Back to TopTop