Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessCommunication

Peer-Review Record

The Prediction of the Tibetan Plateau Thermal Condition with Machine Learning and Shapley Additive Explanation

Remote Sens. 2022, 14(17), 4169; https://0-doi-org.brum.beds.ac.uk/10.3390/rs14174169

by Yuheng Tang^1,2

, Anmin Duan^1,2,*, Chunyan Xiao^1,2 and Yue Xin^1,2

Reviewer 1:

Venkatesh Kolluru

Reviewer 2: Anonymous

Remote Sens. 2022, 14(17), 4169; https://0-doi-org.brum.beds.ac.uk/10.3390/rs14174169

Submission received: 27 July 2022 / Revised: 11 August 2022 / Accepted: 19 August 2022 / Published: 25 August 2022

(This article belongs to the Special Issue Artificial Intelligence for Weather and Climate)

Round 1

Reviewer 1 Report

The comments can be found in the attached PDF.

Comments for author File: Comments.pdf

Author Response

1.Choose keywords that were not presented in title. It helps to reach wider audience while searching for your article

Thank you very much for your suggestions. We have rewritten the key words (South Asian high; LightGBM; XGBoost; Climate prediction) (Lines: 21).

Cite some studies and indicate the advantages of boosted trees compared to others in introduction. You can refer few studies attached (https://0-doi-org.brum.beds.ac.uk/10.3390/app10228083; https://0-doi-org.brum.beds.ac.uk/10.3390/rs12071135; https://0-doi-org.brum.beds.ac.uk/10.1080/15481603.2019.1650447).

Thank you very much for your suggestions. We have added the advantage of boosting tree in introduction (Lines: 57-61) as follows:

When dealing with regression problems, the input of the decision tree of boosting tree models depends on the previous decision tree, which is updated cyclically through gradient descent to reduce the errors. Existing studies have confirmed that boosting tree models often outperform other machine learning models, such as random forest, in Geoscience [1-6].

Abdi, A.M. Land cover and land use classification performance of machine learning algorithms in a boreal landscape using Sentinel-2 data. GISci. Remote Sens. 2020, 57, 1-20.
Wagle, N.; Acharya, T.D.; Kolluru, V.; Huang, H.; Lee, D.H. Multi-Temporal Land Cover Change Mapping Using Google Earth Engine and Ensemble Learning Methods. Appl. Sci.-Basel 2020, 10.
Talukdar, S.; Singha, P.; Mahato, S.; Shahfahad; Pal, S.; Liou, Y.-A.; Rahman, A. Land-Use Land-Cover Classification by Machine Learning Classifiers for Satellite Observations—A Review. Remote Sens. 2020, 12.
Kang, Y.; Ozdogan, M.; Zhu, X.; Ye, Z.; Hain, C.; Anderson, M. Comparative assessment of environmental variables and machine learning algorithms for maize yield prediction in the US Midwest. Environ. Res. Lett. 2020, 15, 064005.
Zhong, J.; Zhang, X.; Gui, K.; Wang, Y.; Che, H.; Shen, X.; Zhang, L.; Zhang, Y.; Sun, J.; Zhang, W. Robust prediction of hourly PM2.5 from meteorological data using LightGBM. Natl. Sci. Rev. 2021, 8, nwaa307.
Lee, Y.; Han, D.; Ahn, M.-H.; Im, J.; Lee, S.J. Retrieval of Total Precipitable Water from Himawari-8 AHI Data: A Comparison of Random Forest, Extreme Gradient Boosting, and Deep Neural Network. Remote Sens. 2019, 11.

Please abbreviate all the features in the footnote of the figure 2 or figure 3.

Thank you very much for your suggestions. We have added the abbreviation in the footnote of Figure 2 (Lines: 206-213).

Conclusions should be after discussion. Please rearrange the sections

Thank you very much for your suggestions. We have rearranged the sections (Lines: 304-384).

Reviewer 2 Report

the paper has been improved substantially according to the comments. It could be accepted for publication.

Author Response

Thank you for your affirmation of our work.

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

Round 1

Reviewer 1 Report

I feel the manuscript does not provide enough novel understanding on thermal condition of continent surface. I have some problems mentioned in the article.

Major points:

1. The Introduction should convince readers that you clearly know why your work is useful. What is the problem? Are there any existing solutions? Which is the best? What is its main limitation? And what do you hope to achieve? What is the hypothesis of this experiment? And the objectives? Obviously, this manuscript misses all the points.

2. It is hard to believe that there was little discussion.

3. What’s the conclusion of this study?

Minor points:

1. Lines 25-33: delete this paragraph.

2. Line 40: what is the full name of SAH?

3. Figure 1: what’s the full name of JJA? What’s the definition of TPTT index?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Please find the comments from the PDF attached. The manuscript needs to be extensively revised and resubmitted.

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

This is a very good paper with very interesting results. The application of machine learning is novel and very well explained. The organization is perfect with rich discussion and comparison of the results. I do not have any issue again, therefore I suggest acceptance. I only suggest adding some references for equations 1 to 7.

Author Response

Thank you very much for your suggestions. We have added some references for equations 1 to 2 [1-3] and equations 3 to 7 [4]. Relevant changes have been made in the paper (Lines 114, 147).

References:

Yang, Y.; Yuan, Y.; Han, Z.; Liu, G. Interpretability analysis for thermal sensation machine learning models: An exploration based on the SHAP approach. Indoor Air 2022, 32, e12984.
Chang, I.; Park, H.; Hong, E.; Lee, J.; Kwon, N. Predicting effects of built environment on fatal pedestrian accidents at location-specific level: Application of XGBoost and SHAP. Accident Analysis & Prevention 2022, 166, 106545.
Barda, N.; Riesel, D.; Akriv, A.; Levy, J.; Finkel, U.; Yona, G.; Greenfeld, D.; Sheiba, S.; Somer, J.; Bachmat, E.; et al. Developing a COVID-19 mortality risk prediction model when individual-level data are not available. Nat. Commun. 2020, 11, 4439.
Li, R.; Zhong, W.; Zhu, L. Feature Screening via Distance Correlation Learning. Journal of the American Statistical Association 2012, 107, 1129-1139.

Round 2

Reviewer 2 Report

The author's responses to the following questions are not convincing. I suggest authors read more about XGBoost and Light GBM and include why they have chosen these models in the manuscript.

Point 1: Many studies have shown that random forest or neural networks have proven to be best in hydrological/meteorological studies compared to LightGBM. IS there any specific reason why authors have chosen XGBoost or LightGBM? The author's response to reducing multicollinearity is not promising. There are many other techniques to eliminate multicollinearity. There is no need to implement XGBoost and LightGBM for this.

Point 2: What’s the difference between variable/feature importance that decision tree models provide compared to SHAP? Even variable/feature importance specify which variables are important in the prediction. Discuss more about the SHAP values in the study.

Point 3: The authors mentioned that they have used regional averages. Does this mean they took the average for the entire TP? Is this analysis performed on a monthly to weekly scale? What's the sampling strength in total?

Point 4: The discussion is still poorly written. We need at least two complete paragraphs discussing your results with other studies. The current content in the discussion seems like a summary of the study, and its limitations. This is not how a discussion is framed. Please rewrite the entire discussion again by citing the relevant text, and comparing your results with previous national and international studies.

Point5: As authors have used many satellite-based datasets, how the uncertainty in the study is quantified? I couldn't see any section on uncertainty aspects.

Point 6: If the data is regionally averaged, it makes the entire signal to be lost while averaging. What variations will you observe if you average over entire region? I suggest authors conduct this study at a pixel scale. If it can't be performed, at least divide the entire study area into multiple regions based on districts/precipitation gradients/ecological biomes/climatic classifications and run these models in each part and discuss the results.

Article Menu

The Prediction of the Tibetan Plateau Thermal Condition with Machine Learning and Shapley Additive Explanation

Further Information

Guidelines

MDPI Initiatives

Follow MDPI