Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Open AccessArticle

Peer-Review Record

Improving Genomic Prediction with Machine Learning Incorporating TPE for Hyperparameters Optimization

Biology 2022, 11(11), 1647; https://0-doi-org.brum.beds.ac.uk/10.3390/biology11111647

by Mang Liang^†, Bingxing An^†, Keanning Li, Lili Du, Tianyu Deng, Sheng Cao, Yueying Du, Lingyang Xu, Xue Gao, Lupei Zhang

, Junya Li and Huijiang Gao^*

Reviewer 1:

Yawei Li

Reviewer 2:

Hossein Azizi

Biology 2022, 11(11), 1647; https://0-doi-org.brum.beds.ac.uk/10.3390/biology11111647

Submission received: 28 September 2022 / Revised: 31 October 2022 / Accepted: 7 November 2022 / Published: 11 November 2022

(This article belongs to the Section Genetics and Genomics)

Round 1

Reviewer 1 Report (Previous Reviewer 2)

The authors have addressed my concerns

Author Response

Thank you very much for your affirmation of our work. We have supplemented the relevant content in the introduction section. Please see the revised-manuscript.

Reviewer 2 Report (New Reviewer)

This study is a good and high novelty and I see some grammar or word problems. Please correct them.

Author Response

Thank you very much for your affirmation of our work. We have corrected the spelling and grammar errors in the manuscript. Please see the revised-manuscript.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

Round 1

Reviewer 1 Report

In their manuscript the authors describe a method about tuning the hyper-parameters of machine learning based on Tree-structured Parzen Estimator (TPE). And it is demonstrated that this method promotes the application of machine learning in genomic prediction and further accelerates breeding progress. But there are some questions worthy of investigating which may help improve the impact of this paper.

· The language needs major improvements

· Inconsistency in using terms (line 10 hyper-parameters-11hyperparameters)

· In line 26, the author state “manual (MAN)” but does not offer any clear explanation in the main text of the manuscript

· In lines 44,58 what’s the purpose of using the phrase “et.al”…is there a reference missing?!

· Is noticed that references are not mentioned directly after author names but at the end of the statement, for example, lines 62-65

· In lines 71-73 the authors stated” … in the portion dataset,…” what do they mean by that?

· It is widely known that Grid search is an automatic technique, but it seems the authors declare it as manual in line 77! If not, then what do the authors mean by “manual”?

· In line 147, the author stated “ Referring to the previously reported studies”…what studies?!

· According to the stats presented in Table 1, the authors didn’t clarify how they performed data preprocessing (missing values, normalization, etc.)?

· The authors stated they used principal component analysis in the model building process (Figure 1) but failed to mention it in the main text or give any clarifications on their decision.

· Mathematical formulas starting from line 178 to182 needs to be rewritten and expressed more clearly!

· In line 245, the authors wrote “G matrix………marker” but failed to give any proper explanation!

· The Pearson correlation coefficient is a measure of linear correlation between two sets of data and the result always has a value between −1 and 1. Knowing that, in Assessing prediction performance section (lines 246-251), the authors stated “In this study, the prediction accuracies were quantified with the Pearson correlation coefficients (PCC) between predicted GEBV and the phenotypes” but failed to explain their decision on why is that? Furthermore, the authors didn’t mention these PCC values in the results section but rather used the accuracy score as a measurement to compare different models’ performances

· The presented models only present a near decent performance on only 1 out of the 5 datasets studied with accuracies between 70% - 82.5% but fail to produce even a random model performance (>=50%) on the remaining 4 datasets which raises the question of the utility of these models and the presented study?!

· Automatic hyperparameter tunning is very much time and resources consuming to deliver good results but the authors neglected to mention any information in this regard for their experiments!

Reviewer 2 Report

Liang et al. applied the Tree-structured Parzen Estimator (TPE) model to tune the hyperparameters of machine learning (ML) methods to promote the performance of ML. Through benchmarked comparisons, they aimed to demonstrate that TPE outperforms other hyperparameter tuning algorithms. However, this study neither provides any useful information about their model training and benchmarking [see comment 1], nor displays any code that relates to their approach [see comment 5]. Moreover, the benchmarked results are not convincing [see comment 3]. Therefore, I don’t think this study can be accepted.

Comments:

1. The goal of section “Method” is to describe the key methods and steps the authors have applied in their models, other than listing a stack of existing published algorithms. The authors should at least provide the range of hyperparameters they used in each model across all datasets (better use a table) and how they approach benchmark comparisons (software/languages). The preprocessing (normalization, scale, etc), input data types (matrix type, dimension, input features, etc.), and output results are also missing. I cannot evaluate the model without having some useful information.

2. Grid Search is one of the most common methods used for hyperparameter optimization. I am curious why the authors didn’t use Grid Search as a benchmark comparison.

3. In terms of the Table 2,3 and Figure 2,3, we can see that the performance of these methods is very similar. To assess prediction performance, statistical tests are needed to examine whether TPE is significantly higher than all other algorithms across all datasets.

4. Space/memory and time used are two important metrics for measuring the performance of a model. This should be included and discussed in the study.

5. For computational and ML articles, “Code Availability” should be provided. But I didn’t find this in the original article.

6. There are many errors in the Introduction, I just list some of them. The authors should carefully address these errors.

a. [Line 64] “the results indicated that the prediction accuracy of ML methods was higher than BL and RF got the best performance” RF is an ML method.

b. [Line 68] “Montesinos-López et al. evaluated the prediction accuracy of multi-layer perceptron and support vector machine with seven real datasets, the predictions of ML methods were very competitive with the advantage that the SVM was the most efficient in terms of the computational time required” Both multi-layer perceptron and support vector machine are ML methods, so I don't quite understand the meaning of the second sentence.

7. This manuscript needs careful editing by someone with expertise in technical English editing paying particular attention to English grammar, spelling, and sentence structure so that the methods and results of the study are clear to the reader.

Article Menu

Improving Genomic Prediction with Machine Learning Incorporating TPE for Hyperparameters Optimization

Further Information

Guidelines

MDPI Initiatives

Follow MDPI