Next Article in Journal
Stress on Frontline Employees from Customer Aggression in the Restaurant Industry: The Moderating Effect of Empowerment
Previous Article in Journal
Pricing and Return Policies in a Competitive Market: A Consumer-Valuation Based Analysis with Valuation Uncertainties
 
 
Article
Peer-Review Record

A Clustering Framework to Reveal the Structural Effect Mechanisms of Natural and Social Factors on PM2.5 Concentrations in China

by Wentao Yang 1, Zhanjun He 2, Huikun Huang 3 and Jincai Huang 4,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Submission received: 10 December 2020 / Revised: 22 January 2021 / Accepted: 22 January 2021 / Published: 29 January 2021
(This article belongs to the Section Environmental Sustainability and Applications)

Round 1

Reviewer 1 Report

The manuscript is devoted to solving a current and very important problem, a study of the structural effects of factors associated with PM2.5 concentrations in China using a spatiotemporal analysis technique. To my mind, the manuscript is currently crude and needs serious revision in terms of both English grammar and content. Below are some comments and remarks.

  1. English grammar is not correct. The manuscript contains many grammar mistakes and incorrectness. For example: “statistical analysis models can construct random functions …”. Statistical analysis models cannon construct anything. We can construct a model based on statistical analysis methods. “Both can be applied to predict…”. Both are what? Etc.
  2. To my best mind, the manuscript will look better when all abbreviates will be described (DB - is it Davies-Bouldin criterion? Sil, etc).
  3. From the manuscript is not clear, in such a way the authors have determined the regression models optimal parameters? Did they do it?
  4. As can be seen from the charts are shown in Fig. 8, 10, 12, the regression coefficients were divided into two clusters. However, the charts of the used criteria have not extrema values in the case of two clusters structure. How can you explain this fact?
  5. For which reason have you chosen DB and Sil clustering quality criteria? There are many other clustering quality criteria. Did you compare them?
  6. What is mean the term "Validity evaluation" in Fig. 3? Do you evaluate the clustering results? Or you evaluate the validity of the obtained cluster structure?

Author Response

Dear Reviewer/ Reviewers,

 

We deeply appreciate the effort and time you’ve spent in reviewing our manuscript (ID: sustainability-1052887). Indeed, the comments are helpful for further improving the quality of the manuscript. Our responses are listed as follows, and the line numbers for the modification are given at the end of each response.

 

Comment 1 by Reviewer #1: English grammar is not correct. The manuscript contains many grammar mistakes and incorrectness. For example: “statistical analysis models can construct random functions …”. Statistical analysis models cannon construct anything. We can construct a model based on statistical analysis methods. “Both can be applied to predict…”. Both are what? Etc.

Response: Following the suggestion of the reviewer, we modified the incorrect expression (Please see Lines 51-54). In addition, the revised manuscript has been edited for proper English language, grammar, punctuation, spelling, and overall style by highly qualified native English-speaking editors. (The editing certificate has been submitted as a supplementary file)

 

Comment 2 by Reviewer #1: To my best mind, the manuscript will look better when all abbreviates will be described (DB - is it Davies-Bouldin criterion? Sil, etc).

Response: Admittedly, one of the basic requirements in literature writing is that abbreviations should be defined at first mention and used consistently thereafter. According to the reviewer’s suggestion, all the abbreviates have been described at first mention in the revised manuscript. (Please see Line 230)

 

Comment 3 by Reviewer #1: From the manuscript is not clear, in such a way the authors have determined the regression models optimal parameters? Did they do it?

Response: As mentioned in the manuscript, two types of parameters should be determined in the GTWR, one is the regression parameter and the other is spatiotemporal bandwidth parameter. The regression parameters can be directly obtained based on Equation (2), but the weight matrix in Equation 2, which is dependent on the spatiotemporal bandwidth parameters shown in Equation (3), should be first determined. Therefore, the regression coefficients are different under different spatiotemporal bandwidth parameters. The optimal parameters and the corresponding regression coefficients can be determined by the Akaike information criterion (AIC) or cross-validation function value under different spatiotemporal bandwidth parameters (Huang et al., 2010). Actually, the process of determining the optimal parameters have been integrated into the GTWR_Beta package used in the experiment.

 Following the reviewer’s suggestion, we make a clear clarification about the determining the regression parameters in the revised manuscript. (Please see Lines 189-197)

 

  • Huang, B.; Wu, B.; Barry, M. Geographically and temporally weighted regression for modeling spatio-temporal variation in house prices. Int. J. Geogr. Inf. Sci. 2010, 24(3), 383–401.

 

Comment 4 by Reviewer #1:As can be seen from the charts are shown in Fig. 8, 10, 12, the regression coefficients were divided into two clusters. However, the charts of the used criteria have not extrema values in the case of two clusters structure. How can you explain this fact?

Response: We totally agree with this comment which is also relative to why we selected two different indexes of clustering quality criteria. In the initial experiment, only Sil index was used to identify the cluster structures and we found that there were not extrema values. We suspected that it might be caused by unreasonable selection of clustering quality criteria. The DB index was further added to evaluate the results. Although both polylines have no extrema values (a maximum Sil value and a minimum DB value) in the interval, it is reasonable that the boundary point, corresponding to a relatively large Sil value and a relatively small DB value (Kryszczuk and Hurley, 2010), was identified as the optimal parameters of clustering structures. Following the reviewer’s suggestion, we made an explanation about this situation in the revised manuscript. (Please see Lines 281-285)

 

  • Kryszczuk, K., Hurley, P., 2010. Estimation of the number of clusters using multiple clustering validity indices. In: Multiple Classifier Systems. Springer Berlin Heidelberg, pp.114–123

 

Comment 5 by Reviewer #1:For which reason have you chosen DB and Sil clustering quality criteria? There are many other clustering quality criteria. Did you compare them?

Response: Indeed, we only mentioned these two indexes and the reason why we chosen them was not explained clearly in the manuscript.

Clustering evaluation is a key issue to determine whether a high-quality clustering result is obtained. Currently, as mentioned by the reviewer, many clustering evaluation indicators have been proposed. These indicators include two categories: External evaluation (Rand, 1971), and Internal evaluation (Halkidi et al., 2001, 2002; Rendón et al., 2011). The first method is statistically complex and require prior knowledge of the clustering results, while the second evaluation method does not require prior knowledge. The best clustering is obtained by comparing the results of different algorithms or different clustering parameters (Halkidi et al., 2001). In this paper, we have no prior knowledge of the PM2.5 data, thus we use internal evaluation indexes (SIL and DB index).

Commonly used indexes include silhouette (SIL) index, DB (Davies-Bouldin) index, (Calinski-Harabasz) CH index, DUNN indexes (Rendón et al., 2011). The DUNN index counts the longest and shortest distances inside and outside the cluster, which has poor robustness (Dunn, 1974). The silhouette value (range [-1,1]) is a measure of how similar an object is to its own cluster (cohesion) compared to other clusters (separation), by counting every point. Whereas the BD index measures the ratio of the intra-cluster dispersion and inter-cluster separation, the smaller the better clustering result (Rendón et al., 2011). Generally, they evaluate the clustering result from different aspects. CH index is similar with DB index; therefore, this index is not used in this paper to avoid duplication. We had compared their algorithm principles of DUNN, SIL, DB and CH index, finally we selected SIL and DB index as the representative evaluation indexes to evaluate PM2.5 clustering result and we think they generally performed well in this paper. (Please see Lines 224-230)

 

  • Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal of the American Statistical association, 66(336), 846-850.
  • Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2002). Cluster validity methods: part I. ACM Sigmod Record, 31(2), 40-45.
  • Rendón, E., Abundez, I. M., Gutierrez, C., Zagal, S. D., Arizmendi, A., Quiroz, E. M., & Arzate, H. E. (2011, January). A comparison of internal and external cluster validation indexes. In Proceedings of the 2011 American Conference, San Francisco, CA, USA (Vol. 29, pp. 1-10).
  • Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2001). On clustering validation techniques. Journal of intelligent information systems, 17(2-3), 107-145.
  • Dunn, J. C. (1974). Well-separated clusters and optimal fuzzy partitions. Journal of cybernetics, 4(1), 95-104.

 

Comment 6 by Reviewer #1:What is mean the term "Validity evaluation" in Fig. 3? Do you evaluate the clustering results? Or you evaluate the validity of the obtained cluster structure?

Response: Admittedly, the same term “Validity evaluation” appearing at two places in Figure 3 have different means. Probably, we did not introduce them clearly, which may make it difficult for the reviewer to understand. In fact, the first term refers to evaluate the effectiveness of the GTWR model by comparing with OLS and GWR. The second term means to identify the optimal clustering results. According to the reviewer’s suggestion, the term “validity evaluation” in the Figure 3 were modified as regression validity evaluation and clustering validity evaluation, respectively, and we make a further clarification about these different processes. (Please see Figure 3, Lines 146-147, and 150-151)

 

The modifications mentioned above are marked in RED color in the revised manuscript. We hope that you will be satisfied with the revised manuscript. The revised manuscript has been edited for proper English language, grammar, punctuation, spelling, and overall style by highly qualified native English-speaking editors. Finally, we would like to deeply appreciate you again for your comments which can not only improve the quality of this manuscript but also provide inspiration and guidance for our future research.

 

Thanks, and best wishes.

 

Dr. Jincai Huang

Address: No.3688, Nanhai South Road, Shenzhen University, Shenzhen, P.R. China

Email: [email protected].

Author Response File: Author Response.docx

Reviewer 2 Report

This manuscript describe the structure effect of social factors on PM2.5 concentrations in China. The method and data used in this study is proper and the structure of the results is well established with logical conclusion. 

The possible reasons why three variables (urban population, gross industrial output, and sulphur dioxide emission) have different geographical effect on PM2.5 should be discussed more. 

Author Response

Dear Reviewer/ Reviewers,

 

We deeply appreciate the effort and time you’ve spent in reviewing our manuscript (ID: sustainability-1052887). Indeed, the comments are helpful for further improving the quality of the manuscript. Our responses are listed as follows, and the line numbers for the modification are given at the end of each response.

 

Comment 1 by Reviewer #2:This manuscript describes the structure effect of social factors on PM2.5 concentrations in China. The method and data used in this study is proper and the structure of the results is well established with logical conclusion. The possible reasons why three variables (urban population, gross industrial output, and sulphur dioxide emission) have different geographical effect on PM2.5 should be discussed more.

Response: We are very grateful to the reviewers for their approval of our work. Following the reviewer's suggestion, we make an explanation about different geographical effect of three variables on PM2.5 in the discussion.

Previous researches have shown that PM2.5 pollution is greater in more populated cities because of living and production activities and their link to polluting gas emissions and higher population levels always lead to greater energy consumption and increased emissions (Lou et al., 2016). However, the relationships between population and gas emissions are not constant over space, for example, vehicle emissions were regard as one of the major sources of PM2.5 pollution in China (Zhao & Xu, 2019), and due to the spatial difference of consumption level and habit, the same increase of population may cause different vehicle increase which leads to the different change of PM2.5 concentrations at different geographical areas. Similarly, the usage of fossil fuels increases with the industrial development in a region (Shao et al., 2016), which inevitably increases the emission of atmospheric pollutants. Nevertheless, the structure of industry change across different cities and hence the same increases of gross industrial output may result in the different change of PM2.5 at different areas. Anthropogenic emissions of sulphur dioxide play critical roles in the process of secondary fine particulate matter formation (Behera et al., 2011), and the secondary pollution depend on those factors such as weather and other geographical factors. The spatial heterogeneity of these factors will cause that sulphur dioxide emissions have different geographical effect on PM2.5. (Please see Lines 360-376)

 

  • Lou, C.R., Liu, H.Y., Li, Y.F., Li, Y.L., 2016. Socioeconomic drivers of PM2.5 in the accumulation phase of air pollution episodes in the Yangtze River Delta of China. Int. J. Environ. Res. Public Health 13, 928.
  • Zhao, S, Xu, Y. Exploring the Spatial Variation Characteristics and Influencing Factors of PM2.5 Pollution in China: Evidence from 289 Chinese Cities. Sustainability, 2019, 11(17):4751.
  • Shao, X. Li, J.H. Cao, L.L. Yang Economic policy choice for haze pollution control in China: based on the spatial spillover effect. EC Res., 09 (2016), pp. 73-80
  • Behera, S.N.; Sharma, M. Degradation of SO2, NO2 and NH3 leading to formation of secondary inorganic aerosols: An environmental chamber study. Atmos. Environ. 2011, 45, 4015–4024.

 

The modifications mentioned above are marked in RED color in the revised manuscript. We hope that you will be satisfied with the revised manuscript. The revised manuscript has been edited for proper English language, grammar, punctuation, spelling, and overall style by highly qualified native English-speaking editors. Finally, we would like to deeply appreciate you again for your comments which can not only improve the quality of this manuscript but also provide inspiration and guidance for our future research.

 

Thanks, and best wishes.

 

Dr. Jincai Huang

Address: No.3688, Nanhai South Road, Shenzhen University, Shenzhen, P.R. China

Email: [email protected].

Author Response File: Author Response.docx

Reviewer 3 Report

This study aims to uncover the structural effects associated with PM2.5 concentrations in China using spatiotemporal analysis. The analytical process seems appropriate, although the results are not clearly presented. Neither the first step -a geographically and temporally weighted regression used to identify the local effect mechanisms of natural and socio-economic factors on PM2.5 concentrations- nor the second one -a spatial clustering method with dynamically constrained agglomerative clustering and partitioning algorithm-.

Despite the success of GWR in addressing spatial variations, it still faces a great challenge when temporal heterogeneity is present in dynamic geographical data. It is not clear how to improve the GWR model’s ability to handle issues involving both temporal and spatial heterogeneity.

It is lacking: Figure 7. Spatial distribution of regression coefficient for sulphur dioxide emissions in 2004, 2010, and 2016.

Asserting that the production model can be converted from resource- to technology-intensive over time is pure guesswork and it needs further proving and elaboration.

Author Response

Dear Reviewer/ Reviewers,

 

We deeply appreciate the effort and time you’ve spent in reviewing our manuscript (ID: sustainability-1052887). Indeed, the comments are helpful for further improving the quality of the manuscript. Our responses are listed as follows, and the line numbers for the modification are given at the end of each response.

 

Comment 1 by Reviewer #3:Despite the success of GWR in addressing spatial variations, it still faces a great challenge when temporal heterogeneity is present in dynamic geographical data. It is not clear how to improve the GWR model’s ability to handle issues involving both temporal and spatial heterogeneity.

Response: It is possible that we did not describe the difference between GWR and GTWR in the manuscript. Spatial heterogeneity means that the relationships between the input variables and output variables are not constant in the whole area, and a global model cannot reveal the spatial variation in the relationships among spatial data (Deng et al., 2017). Therefore, the GWR model attempts to build a local regression model at each location. Because the regression coefficients of GWR vary over space, the GWR model can well capture spatial heterogeneity. But, for dynamic geographical data, the relationship among different variables may not only change over space but also vary at different timestamp, namely temporal heterogeneity. Hence, the GTWR model aims to build a series of local models whose parameters vary across space-time locations to handle both spatial and temporal heterogeneity (Huang et al., 2010). And the detailed process has been introduced in the revised manuscript.

Following the suggestion, we make a further explanation about how the GTWR model can be used to model spatial and temporal heterogeneity. (Please see Lines 155-157 and 161-165)

 

  • Deng, M., Yang, W.T., Liu, Q.L., 2017. Geographically weighted extreme learning machine: a method for space–time prediction. Geogr. Anal. 49 (4), 433–450.
  • Huang, B.; Wu, B.; Barry, M. Geographically and temporally weighted regression for modeling spatio-temporal variation in house prices. Int. J. Geogr. Inf. Sci. 2010, 24(3), 383–401.

 

Comment 2 by Reviewer #3:It is lacking: Figure 7. Spatial distribution of regression coefficient for sulphur dioxide emissions in 2004, 2010, and 2016.

Response: Unfortunately, we don't understand this comment clearly. We guess that it is possible that the chemical symbol of sulphur dioxide, namely SO2, was mentioned directly in the Figure 7 and was not noted in the caption. In the revised manuscript, the chemical symbol SO2 was added. (Please see Line 122 and the caption of Figure 7)

 

Comment 3 by Reviewer #3:Asserting that the production model can be converted from resource- to technology-intensive over time is pure guesswork and it needs further proving and elaboration.

Response: Indeed, there is no effective material to support this assertion about the conversion from resource- to technology-intensive overtime. So, in the revised manuscript, the relate contents were replaced by the explanation about spatiotemporal variation of the effect mechanisms of socio-economic factors. (Please see Lines 360-376)

---------------------------------------------------------------------------------------------------------------------

 

The modifications mentioned above are marked in RED color in the revised manuscript. We hope that you will be satisfied with the revised manuscript. The revised manuscript has been edited for proper English language, grammar, punctuation, spelling, and overall style by highly qualified native English-speaking editors. Finally, we would like to deeply appreciate you again for your comments which can not only improve the quality of this manuscript but also provide inspiration and guidance for our future research.

 

Thanks, and best wishes.

 

Dr. Jincai Huang

Address: No.3688, Nanhai South Road, Shenzhen University, Shenzhen, P.R. China

Email: [email protected].

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

I have not any comments more

Back to TopTop