Next Article in Journal
A Coarse-Grained Molecular Model for Simulating Self-Healing of Bitumen
Next Article in Special Issue
Hybrid Decision Models of Leasing Business for Thailand Using Neural Network
Previous Article in Journal
Kerr-Lens Mode-Locking: Numerical Simulation of the Spatio-Temporal Dynamics on All Time Scales
Previous Article in Special Issue
A New Sentiment-Enhanced Word Embedding Method for Sentiment Analysis
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on PPP Enterprise Credit Dynamic Prediction Model

1
School of Civil Engineering, North China University of Technology, Beijing 100144, China
2
Department of Construction Management, Tsinghua University, Beijing 100084, China
3
Center for Public-Private Partnership, Tsinghua University, Beijing 100084, China
*
Author to whom correspondence should be addressed.
Submission received: 6 September 2022 / Revised: 8 October 2022 / Accepted: 11 October 2022 / Published: 14 October 2022
(This article belongs to the Special Issue Machine/Deep Learning: Applications, Technologies and Algorithms)

Abstract

:
The debt default risk of local government financing vehicles (LGFVs) has become a potential trigger for systemic financial risks. How to effectively prevent hidden debt risk has always been a hot issue in public-private partnership (PPP) financing management research. In recent years, machine learning has become more and more popular in the study of enterprise credit evaluation. However, most scholars only focus on the output of the model, and do not explain in detail the extent to which variables affect the model and the decision-making process of the model. In this paper, we aim to apply a better credit rating method to the key factors and analysis of LGFV’s default risk, and analyze the decision-making process of the model in a visual form. Firstly, this paper analyzes the financial data of LGFVs. Secondly, the XGBoost-logistic combination algorithm is introduced to integrate the typical characteristics of PPP projects and construct the credit evaluation model of LGFVs. Finally, we verify the feasibility of the model by K-fold cross validation and performance evaluation. The results show that: (1) net worth, total assets, operating income, and return on equity are the most critical factors affecting the credit risk of LGFVs, asset-liability ratio and tax revenue are also potentially important factors; (2) the XGBoost-logistic model can identify the key factors affecting the credit risk of LGFVs, and has better classification performance and predictive ability. (3) The influence of each characteristic variable on model decision can be quantified by the SHAP value, and the classification decision visualization of the model improves the interpretability of the model.

1. Introduction

Local government financing vehicles (LGFVs) are established by the local government and its departments and institutions, etc., through financial grants or injection of assets, such as equity, with investment financing functions and a separate legal entity, to help the local government perform a certain function as the main body of the company. In recent years, the tightening of China’s banks’ ability to invest in credit to cities has superimposed the high frequency of default events of all local governments. In the debt expansion cycle, the solvency pressure of LGFVs in all local regions has also gradually accumulated. Therefore, how to effectively protect against the hidden debt crisis of LGFVs has become an important problem for the development of high-quality service in public-private partnerships (PPP) [1].
A good credit relationship is the premise of a smooth pursuit of the PPP model. For a long time, some local governments attached more importance to financing than management, resulting in frequent debt performance disputes among some LGFVs [2]. Moreover, the risk of non-standard default is transmitted along the guarantee chain, resulting in the deterioration of the regional financing environment. China’s General Office of the State Council in No. 19, May 2022, Opinions on Further Revitalizing Stock Assets and Expanding Effective Investment, pointed out the need to improve the regulatory docking mechanism of social capital investment and financing cooperation and to prevent resolving the risk of potential debt of firms. This indicates that whether social capital parties have sufficient capacity for investment, construction, operation, management, innovation, etc., is a key prerequisite for the success of the PPP project as a whole [3,4].
At present, the academic research on the PPP model is mainly focused on the areas of benefit distribution, risk sharing, etc. Credit assessment studies are dominated by methods, including traditional expert empirical methods, discriminant analysis, and machine learning methods, such as the logistic model, SVM model, and BP neural network model, etc.
Xin [5], Chen [6], and Su [7] et al. applied machine learning algorithms, such as logistic, IVHFSs-IFAHP, KMV, random forest, SVM, and XGBoost, to identify, evaluate, and predict credit risks. However, models, such as random forests, SVM, and XGBoost are less transparent and lack the interpretability of the models, making plausibility of model results poorly interpreted [8]. Additionally, most scholars only pursue the output results of the model, but do not explain in detail the impact of variables on the model, especially the process of model decision-making.
Due to the long period of the PPP project, static credit evaluation results are difficult to fit into the authentic credit situation of participating firms [9]. LGFVs also are affected by the regional economy to a greater extent than the local government [10]. Consequently, scientific and rational analysis of risk factors, interpretability of machine learning model issues encourage us to develop further research.
In this paper, our objectives are as follows: (1) Use the XGBoost-logistic evaluation model to evaluate the credit of listed LGFVs and determine which indicators have the greatest impact on the results. (2) Perform an in-depth analysis of the impact of characteristic variables on the model and the potential risk of default, and explain the model decision-making process. (3) Using K-fold cross-validation, lift, and Gain analysis in-depth and detailed comparison of XGBoost-logistic and logistic evaluation were used to compare XGBoost-logistic and logistic in depth and in detail, and the comprehensive performance of the model was evaluated to determine the feasibility and application value of the model.

2. Materials and Methods

2.1. Comprehensive Evaluation Mechanism of Enterprise Credit Dynamics

The research mechanism of this paper is as follows: Firstly, on the basis of data feature derivative analysis, we found the key influencing factors and analyzed their risks. Secondly, we applied the XGBoost gradient boosting algorithm to bin the characteristics of the variables and calculated and the WoE (weight of evidence) values and the IV (information value) values [11]. Thirdly, we constructed the XGBoost-logistic combination model in the form of linear combination to calculate the credit risk of LGFVs and verify the feasibility and performance of the model. Finally, the influence of variables on the model and the decision-making process were displayed by visualization. The research mechanism is shown in Figure 1.

2.2. Data Processing and Characteristic Derivative Analysis

The data in this paper were derived from Wind datasets from 2019–2021 and the Chinese Statistical Year Book. Considering the external restricted access ratings of most financial institutions for LGFVs, the initial rating is selected above A level, and the actual controller is the local government or relevant departments [12] (the state-owned Assets Supervision and Administration Commission of the State Council, Ministry of Water Resources, PRC, Ministry of Housing and Urban-Rural Development, PRC, Ministry of Finance, PRC, etc.), cumulative 959 firms. The missing fields were sparse matrix filtered with a matrix of 50%. In total, 23,733 valid data were obtained.
In this paper, 959 LGFVs are added to a new tag column, and the data are merged according to the name of the enterprise as the main key to form a dataset that can be used for machine learning algorithm learning. The label column is the output variable predicted by the subsequent classification model, and the value is “0” or “1”. “0” indicates that the enterprise defaults, and “1” indicates that the enterprise is non-default. Finally, the processed financial data of LGFVs mainly includes financial index values and the target, a total of 959 valid samples, including 314 risk samples and 645 normal samples, with a ratio of 0.48.
Drawing on the resource review indicator system of China Chengxin (Asia Pacific) Credit Ratings [13] and the related research literature [14,15], these unifications are called credit rating agencies (CRA). This paper selects indicators that are closely related to the financial situations of listed firms in the Chinese construction industry, a total of 26 (including dependent variables). These metrics describe the financial landscape of LGFVs from multiple perspectives, with strong explanatory power for the likelihood that credit risks exist for LGFVs. Indicators were selected with emphasis on data integrity, validity, comprehensiveness, and local economic dependency, and this paper also incorporated regional economic indicators with growth capacity assessment [16]: public finance budget revenues, GDP Per capita, tax revenue, local GDP, and local GDP growth, as shown in Table 1.
It can be explained as follows:
  • Capital Structure: It reflects the proportion of corporate funding composition, the layout of assets, and the size of equity, and reflects the external support and its own strength in the enterprise. The capital structure of a firm is related to its solvency pressure of the firm and has a decisive effect on the financial condition of the firm.
  • Operational Capability: It is an intrinsic determinant for predicting the future direction of firms and measures the quality and sustainability of corporate credit, which reflects the high efficiency of LGFVs for economic resource management and use [17].
  • Profitability. A combination of factors, such as corporate operating model, asset quality, diversification, differentiation, technological progress, cost control ability, and management level, demonstrates that a firm with high profitability can effectively reduce the impact of income fluctuation caused by an economic cycle and enhance the economic resilience of the firm.
  • Regional Economy: It is a key measure of the size and development level of the economy in the region it is located in. Economic performance, economic stability, and growth potential affect the development of firms and even industries. The stronger the regional financial strength, the more stable the financial support for the construction of local urban infrastructure [18].
  • Debt Paying Ability: It is a key basis for judging the financial risk of LGFVs, and a corporate credit rating can predict the reliability of a firm’s payment of due debt for a certain period in the future. The pattern characteristics of PPP basically determine the industry’s greater operational leverage. However, LGFVs are characterized by large investment amounts, long investment recovery periods, and indirectness of investment returns, generally showing poor quality of assets, liquidity, and not profit for money.

2.2.1. Multivariate Joint Distribution Analysis

ρ L G F V s (Spearman’s rank correlation coefficient) is an indicator of the correlation between two variables. In this paper, the confusion matrix of 26 rating indicators is calculated using the Spearman analysis method, and the variables with correlation coefficient greater than 0.8 are retained [19], and their IV values are retained as the input variables.
As shown in Formula (1). Among them, x v i represents the rank value of the Variable 1, and y v i represents the rank value of the Variable 2, n is the numbers of samples, ρ L G F V s is the correlation coefficient between the rank variables.
ρ L G F V s = 1 6 ( x v i y v i ) 2 n ( n 2 1 )
Since the dataset samples do not obey the normal distribution law, in order to objectively predict the credit rating of LGFVs based on the machine learning model, it is necessary to use the Spearman analysis method to construct a correlation coefficient greater than 0.8 credit qualification, influence coefficient matrix, and calculate the correlation coefficient between the two indicators [20]. A total of 11 pairs of variables with indicators greater than 0.8 were obtained, as shown in Table 2.
Joint distribution can establish a pairwise comparison matrix by a heterogeneous group of factors in a multi-dimensional problem event, identifying key points based on a matrix plot. The strength performance of the correlation coefficients based on Table 2 was theoretically examined using the measure in [21]. The goodness of fit of the combined values of the various indicator variables was examined by plotting multi-variate joint distribution plots in Python for a total of seven groups of variables: Net worth, Total assets, Equity ratio, Asset-liability ratio, The public finance budget revenues, Tax revenue, and GDP per capita. As shown in Figure 2.
It can be seen that the sample scatter points corresponding to the seven groups of functions are well distributed. However, we observe the diagonal sample distribution and find that Asset-liability ratio, The public budget revenues, Tax revenue, and The local GDP have a large area of default sample distribution.
Among them, the asset-liability ratio is the most widely distributed variable among default samples. The higher the Asset-liability ratio, the higher the probability of default samples. Due to the limited liability effect of debt, firms with higher debt levels are more aggressive and have higher debt risk [22]. This shows that LGFVs should improve financial management and financial planning, prevent debt default risk, and improve credit evaluation levels.

2.2.2. Variable Correlation Matrix Analysis

Variable correlation matrix analysis is a key step to mining highly correlated variables and improving model interpretability. The deeper the color of the subgraph, the stronger the positive correlation of the feature. The lighter the color of the subgraph, the stronger the negative correlation between features. The line represents the level of correlation, and the earlier the line appears, the stronger the correlation is, as shown in Figure 3.
It can be seen that the characteristic variables with yellow outer-ring wraps have a high correlation. Among them, “Return on equity, Assets net profit margin and Return on total assets”, “Quick ratio, Current ratio, Net worth and Total assets”, and “The local GDP, Tax revenue. The public finance budget revenues” are the groups of variables with significant correlation.
In order to ensure the forecasting effect of the model, the variables with correlation coefficient greater than 0.8 are selected, and The return on equity, Quick ratio, Net worth, tax revenue of the above variable groups are retained as the model variables.

2.2.3. Risk Analysis of Financial Indicators

This chapter selects return on equity, quick ratio, net worth, tax revenue and asset-liability ratio to draw a box chart of financial capacity indicators. As shown in Figure 4, “0” represents the default sample and “1” represents the non-default sample.
From the analysis of Figure 4, it is found that return on equity and net worth are the variables with the most outliers, which have a fluctuating impact on the output of the model. The quick ratio is the most average variable in the sample distribution, so the impact on the model is relatively low.
Further analysis shows that tax revenue and asset-liability ratios are the most widely distributed variables in default samples. The statistics show that among the default enterprises, asset-liability ratio is mainly concentrated in 50–65%, and tax revenue is mainly concentrated in 27 billion–85 billion RMB. Among them, the distribution span of default enterprises in asset-liability ratio is significantly higher than that of non-default enterprises, and the distribution span of default enterprises in tax revenue is relatively wide.
This shows that third-party rating agencies and governments need to pay extra attention to the two evaluation indicators of asset-liability ratio and tax revenue since they have strong uncertainty and are the key factors affecting the credit score of enterprises. Therefore, reasonable capital structure adjustment and the regional economic environment have important reference significance for LGFVs [23].

2.3. Credit Rating of LGFVs

According to the credit rating standard of Dagong Global Credit Rating [24,25], 10 levels are established to divide the enterprise credit rating. For companies participating in PPP projects, the credit evaluation level can be divided into five states: the credit rating between 90 and 100 points is AAA credit; scores of 80 to 90 for AA credit; and by analogy in turn, the specific credit rating division is shown in Table 3.

3. Prediction Model Construction for LGFVs

At present, enterprise credit evaluation models mainly include discriminant analysis, logistic regression analysis, data mining, artificial neural networks, SVM, and fuzzy comprehensive evaluation methods [26]. Among them, discriminant analysis has strong operability and predictive ability, but higher requirements for indicators. It needs to assume the linear relationship between the variables, and the actual indicators are generally difficult to meet in this situation [27]. Machine learning algorithms show better prediction than traditional models for large and uncertain data [28,29]; logistic regression analysis is often used to solve the problem of qualitative index, but cannot solve the problem of non-linear quantitative index [30]; data mining goes through the actual data to find credit rules, rather than the establishment of credit rules, so the new credit evaluation forecast is more scientific. Under comprehensive comparison, the XGBoost algorithm has good adaptability to sparse data and can handle non-convex, non-linear, and non-local high-dimensional data samples. It can effectively solve the binning characteristics of abnormal data, with stronger robustness, suitable for a large-scale training model [31,32].

3.1. Mechanism of Model Construction

Most of the traditional XGBoost-logistic combination models only calculate the final output results. However, the decision process of the model is not decomposed, and the influence of the characteristic variables on the model output is not explained.
Therefore, we combine two advantages: the classification performance of the XGBoost model and the interpretability of the logistic model. This paper constructs an LGFVs credit evaluation combination model based on XGBoost-logistic, and specifically decomposes the model structure to explain the classification decision process of the model for LGFVs. The detailed construction principle is shown in Figure 5.
The weight of evidence (WoE) is the logarithm of the ratio of normal to defaulted firms under any segmentation in the sample of all firm data [33].
B i denotes LGFVs with a higher risk of default within the interval, B t denotes all LGFVs with a higher risk of default within the sample, G i denotes LGFVs with a lower risk within the interval, G t denotes all LGFVs with a lower risk within the sample, and the subscript i represents the ith interval for a trait, as shown in Formula (2).
w o e i = ln ( B i / B t G i / G t )
The IV (information value) is a measure of the correlation between the target variable and the characteristic variable [34]. It is generally believed that: I V < 0.1 , indicating that the predictive ability of a variable correlation is weak; 0 . 1 I V < 0.3 , indicating that the predictive ability of a variable correlation is medium; I V 0.3 , indicating strong predictive ability of variable correlation. The higher the IV value, the greater the difference in the distribution of the characteristics of the index, and the stronger the predictive ability of distinguishing whether LGFVs default or not. Measures of IV value are shown in Table 4.
The IV value is calculated as shown in Formula (3). Among them, P y i indicates the ratio of the sample number of default LGFVs in group I to all LGFVs, P n i denotes the ratio of the sample of defaulted LGFVs in group I to all defaulted LGFVs, y i indicates the number of samples of group I defaulted LGFVs, n i indicates the number of samples of defaulted LGFVs in group I, y s indicates the number of samples of all defaulted LGFVs, and n s indicates the number of samples of all defaulted LGFVs.
I V i = ( P y i P n i ) * w o e i = ( P y i P n i ) * ln P y i P n i = ( y i / y s n i / n s ) ln y i / y s n i / n s = i n I V i

3.2. Credit Rating Model and Scoring Card for LGFVs

3.2.1. WoE Binning Monotonicity Analysis

Highly correlated variables have a significant impact on the accuracy and robustness of credit scoring models [35]. Through monotonicity analysis, this chapter can analyze the fluctuation characteristics of variables and determine the distribution of default samples [36]. In this section, we analyzed the box monotonicity of the top four indicators of IV values and removed bins with less than 5% of default samples. Then, we drew a box monotonicity analysis chart of net worth, total assets, operating income, and return on equity. As shown in Figure 6, the histogram represents the distribution interval of the sample, and the line represents the proportion of default enterprises in the sample.
As can be seen from Figure 6, net worth is divided into seven intervals, total assets into nine intervals, operating income into eight intervals, and return on equity into eight intervals. Net worth, total assets, and operating income all show an approximate monotonically increasing trend, which meets the fluctuation range and entry requirements of variables.
In Figure 6, we found that the box characteristics of return on equity show a trend of decreasing first and then increasing. Among them, return on equity shows a gradual downward trend from the first to the third box. The proportion of default samples in the first bin reached 51%. Moreover, approximately 21% of LGFVs have a return on equity of less than 1.08%, indicating that companies with weaker profitability are more likely to possess default behavior. The business of LGFVs is public welfare or quasi-public welfare business. The operation of its projects and the recovery of investment are affected by the financial strength of local governments.

3.2.2. Selection of Explanatory Variable Characteristics

The original indexes were binned by feature engineering, and the woe and IV values of each variable were calculated by using the XGBoost gradient boosting algorithm, and finally, variables with IV values greater than 0.1 were selected to enter the model, which ensured that the entered variables had better prediction ability of ratings [37], as seen in Table 5.
The indicator ranking contrasts were plotted by ranking these variants by their IV values. The results of the comparative logistic regression calculations and the index ranking data of the CRA are shown in Figure 7.
As can be seen in Figure 5, the XGBoost algorithm achieved a 95% fit to the CRA, while the logistic regression was only 75%, indicating that the XGBoost algorithm had better classification prediction ability and better feature learning on the data.
Compared to the current liabilities ratio metrics of both sides, the ranking result of the XGBoost algorithm was only 12th, which was lower than the 9th for the CRA. This is due to the accrual of the size of the due bonds to 70 billion RMB as of May 2022 for LGFVs of AA and the following credit classes. In the context of a stringent regulatory policy, professional rating bodies need to re-evaluate the solvency measures of firms to determine a reasonable corporate credit level and circumvent a certain credit risk [38].

3.3. Interpretability Analysis of Characteristic Variables

Shapley additive explanations (SHAP), as an explanatory tool for black box models, can be used to evaluate the importance of feature variables in a model [39]. The SHAP value of the validation dataset is calculated using the SHAP library in Python. In this chapter, the explanatory SHAP value is applied to the model of XGBoost-logistic, and the influence of each characteristic variable on model decision is presented. Finally, dtreeviz is used to visualize the data decision binning process [40].

3.3.1. Macro Perspective Analysis

The calculation results of the SHAP value are shown in Figure 8. Among them, the darker the color, the greater the impact, and the lighter the color, the smaller the impact. SHAP values have positive and negative effects.
First, we found that net worth, total assets, and operating income deeply analyze the key indicators that affect the output of the model, and sales net interest rates are the characteristic variables that have the greatest positive impact on the model. Asset net profit margin, quick ratio, operating profit/total assets, and inventory turnover are the characteristic variables that have the most negative impact on the model. Second, Figure 4 shows that tax revenue has two characteristics at the same time: the distribution of default samples is wide, and the positive and negative impacts on the model are large. Therefore, we have reason to believe that tax revenue is potentially an important factor.

3.3.2. Micro-Perspective Analysis

Micro perspective analysis is an in-depth analysis of the impact of variables in the model. As shown in Figure 9, the deeper the color, the greater the impact on the model.
As shown in the figure, net worth and operating income have a greater positive impact on the model in the range of 0–50 billion RMB and 0–10 billion RMB, respectively, while total assets have a greater positive impact on the model in the range of 100–200 billion RMB; total assets and return on equity, ranging between 50–100 billion RMB and 25–700 million RMB, had a large negative impact on the model, respectively. This shows that LGFVs with low profitability are more likely to default on their debt. Additionally, the lower the total assets, the higher the probability of default enterprises.

3.3.3. Visual Interpretation of Model Decision

This chapter uses dtreeviz to visualize XGBoost decisions. The algorithm model of dtreeviz is to sample the data points around the explained variables, use the XGBoost classifier to obtain the predicted results, and finally weight them by the proximity to the instance. Then, by optimizing the specific equation, the prediction path of the explanatory variable is found. The classification decision path of default enterprise and non-default enterprise is shown in Figure 5.
As shown in Figure 10, the decision order of the model is determined by the influence of the characteristic variables on the model output. Firstly, the classification results of net worth ≤ 141.89 and net worth > 141.89 are determined by the feature screening of the first layer. Secondly, through the second layer of feature screening, such as return on equity, further classification is performed. Finally, through the feature-matching degree of the target enterprise, it predicts whether it is a default enterprise. The decision visualization results of the model describe the whole process of feature classification in detail, which enables the evaluators to better understand the potential risk characteristics of the enterprise.

4. Experiment

In the training process of the model, the problem of overfitting often occurs. For example, the model can well match the training datasets, but not well predict the data outside the training datasets. K-fold cross-validation is a commonly used model in the accuracy test method. This chapter uses the cross-validation of different training datasets and datasets in the data to scientifically solve the problem of algorithm accuracy estimation [41].
This chapter divides the datasets into a training dataset and a test dataset, where the test size is set to 0.3. A part of the dataset is taken as the test dataset, and the rest is used as the training dataset. The test dataset is used to evaluate the training effect of the model. The specific principle is shown in Figure 11.

4.1. Model Testing and Comparison

In order to highlight the advantages of the combined model, this chapter compares the XGBoost-logistic combined model with the logistic regression model in the classification effect, and performs 10-fold cross-validation on the two models; let K = 10. Draw the ROC comparison curve, as shown in Figure 12.
As shown in Figure 12, both models have good optimization effects. The average AUC value of the XGBoost-logistic model was 0.8359 (error value = ±0.04), and the optimal predicted AUC value reached 0.9023.
As shown in Table 6, the XGBoost-logistic model’s precision, recall, sensitivity, and F1 scores are all above 0.7. Among them, the Gini value is 0.7846, and the Ks value is 0.7198. Compared with the logistic regression model, the risk discrimination ability of the XGBoost-logistic algorithm model is improved by 16%, the prediction stability is improved by 56%, and the fitting effect of the model is better.
Gain curve and lift curve are visual aids for measuring model performance. By comparing XGBoost-logistic and logistic with gain curve and lift curve, the application value of the model can be measured.
As shown in Figure 13. The XGBoost-logistic algorithm has significant advantages in stability and computational efficiency. As shown in Figure 13a, gain curve comparison, it can be seen that the XGBoost-logistic algorithm achieves optimal classification results when covering approximately 70% of the sample. In contrast, logistic needs to cover 90% of the samples to achieve the best classification results. As shown in Figure 13b, lift curve comparison, the XGBoost-logistic algorithm has significant advantages in stability and computational efficiency. Compared with the logistic regression model, the computational performance of the XGBoost-logistic algorithm model is improved by 2.13 times. In the sample covering 40%, it can maintain a high computational efficiency and process multi-dimensional complex data.

4.2. Analysis of Results

The comparison analysis between the training datasets and the test datasets was performed by calculating the indicators from the average value, minimum, maximum, and standard deviation for the default and non-default samples for both models, as shown in Table 7.
As shown in Table 7, due to the defects of the traditional logistic model in binning classification performance, there is a large gap between the training datasets and test datasets in prediction accuracy. The XGBoost-logistic combined model showed a more stable accuracy, with a weighted prediction accuracy of 82.76%.
The following conclusions are drawn: XGBoost-logistic algorithm model has a significant optimization effect in prediction accuracy and stability, which shows the feasibility and application value of the model.

5. Conclusions and Future Work

5.1. Conclusions

This paper focuses on the research of the local government financing vehicles credit prediction model and analyzes the importance of multiple variables in the model output. On this basis, we established an enterprise credit evaluation XGBoost-logistic model. Combined with the credit evaluation problem of LGFVs and the current situation of the public-private partnership industry, the classification performance and interpretability process of the model were verified. The main conclusions are as follows:
1.
This paper proposes an innovative research method. The credit evaluation index of traditional LGFVs has been restructured. In view of the particularity of the PPP industry, the regional economy is introduced as an indicator. By judging the actual controller of the company and screening the LGFVs that meet the standards, it provides more accurate and more reliable basic data for the construction of the LGFVs prediction model.
2.
The key indicators affecting the model output were analyzed in depth. Networth, total assets, operating income, and return on equity are the variables that have the greatest impact on the model evaluation results. Moreover, asset-liability ratio and tax revenue are also important potential risk factors in LGFVs’ credit evaluation. The banks and governments need to pay extra attention to these indexes in the future.
3.
The XGBoost-logistic model shows more intuitive interpretability and stronger application performance. First, this paper proves that the XGBoost-logistic credit evaluation model is significantly superior to the traditional logistic regression model in terms of model distinguishing ability, sorting ability, accuracy, and stability. Second, we also explain how the characteristic variables affect the output of the model by SHAP value, and also apply dtreeviz to portray the classification decision process of the model for LGFVs with different characteristics. Finally, this paper provides an effective and scientific enterprise credit evaluation method for financial institutions and government departments, which can more accurately describe the credit risk status of LGFVs and greatly enhance the interpretability of the model. According to the important indicators that affect the LGFVS credit default behavior, it can provide targeted risk prevention countermeasures.

5.2. Future Work

Although the XGBoost-logistic model can effectively solve the credit evaluation problem of LGFVs, there are still some shortcomings. Due to the limitations of data, this paper only uses financial data, which cannot fully reflect the complex scene of the credit risk of LGFVs. Additionally, this model does not consider macroeconomic policy regulation and control and adjusts the index weight accordingly. Therefore, it is hoped that future research can be optimized and adjusted for the above situations.

Author Contributions

L.Z.: Methodology, Validation, Formal analysis, Writing—original draft preparation, Project administration, Writing—review and editing, Supervision and Funding acquisition; S.Y.: Data curation, Formal analysis Visualization, Code writing, Model building, Writing—original draft preparation. and Writing—review and editing; S.W.: Writing—original draft preparation, Project administration, Writing—review and editing, Supervision, Investigation and Funding acquisition; J.S.: Data curation and Visualization. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Beijing Municipal Natural Science Foundation (No. 9202006) and the Excellent Talent Project of North China University of Technology in 2019 (No. 216051360020XN225/004). The authors gratefully acknowledge this funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available upon request from the corresponding author. Data can be made available upon request for collaboration.

Acknowledgments

The authors sincerely thank all the editors and reviewers for their support and help.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Du, J.; Mao, W. Analysis on the Inducing Factors and the Mechanism of Local Government Implicit Debt in PPP Mode. Constr. Econ. 2020, 41, 83–87. [Google Scholar] [CrossRef]
  2. Qi, X.; Lu, Z.; Zhang, Q. Research on Financial Institutions Participating in PPP Project and Its Strategy. Shandong Soc. Sci. 2018, 174–179. [Google Scholar] [CrossRef]
  3. Xue, S.; Zhang, Z. Identification and Analysis of Influencing Factors for PPP Project Collaborative Supervision Based on Fuzzy-DEMATEL. Soft Sci. 2021, 35, 104–109. [Google Scholar] [CrossRef]
  4. Ke, Y. Formation of optimal private involvement in urban rail public-private partnership projects in China. Int. J. Proj. Organ. Manag. 2018, 10, 268. [Google Scholar] [CrossRef]
  5. Xin, L.; Bao, X. Research on the Credit Risk Evaluation of Listed Companies in Construction Industry: Based on the Logistic Model. Constr. Econ. 2020, 41, 98–101. [Google Scholar] [CrossRef]
  6. Chen, J.; Wang, D. Government credit risk assessment of non-profit public-private partnership projects in China based on the IVHFSs-IFAHP model. Sci. Iran. 2021, 28, 38–48. [Google Scholar] [CrossRef] [Green Version]
  7. Su, Z.; Hu, W.; Zhang, W. Measurement of Conversion Rate of Local Government Implicit Debt and Debt Risk Identification. Oper. Res. Manag. Sci. 2022, 31, 191–197. [Google Scholar]
  8. Petropoulos, A.; Siakoulis, V.; Stavroulakis, E. Towards an early warning system for sovereign defaults leveraging on machine learning methodologies. Intell. Syst. Account. Financ. Manag. 2022, 29, 118–129. [Google Scholar] [CrossRef]
  9. Feng, J.; Wang, Y.; Feng, H.; Li, M.; Xue, S. Research on the Comprehensive Dynamic Evaluation Model of Enterprise Credit under PPP Mode. Soft Sci. 2019, 33, 49–53. [Google Scholar]
  10. Deng, J.; Zhang, N.; Ahmad, F.; Draz, M.U. Local government competition, environmental regulation intensity and regional innovation performance: An empirical investigation of Chinese provinces. J. Environ. Res. Public Health 2019, 16, 2130. [Google Scholar] [CrossRef] [Green Version]
  11. Raymaekers, J.; Verbeke, W.; Verdonck, T. Weight-of-evidence through shrinkage and spline binning for interpretable nonlinear classification. Appl. Soft Comput. 2022, 115, 108160. [Google Scholar] [CrossRef]
  12. Luo, H.; Chen, L. Bond yield and credit rating: Evidence of Chinese local government financing vehicles. Rev. Quant. Financ. Account. 2019, 52, 737–758. [Google Scholar] [CrossRef]
  13. China Chengxin (Asia Pacific) Credit Ratings Company Limited. Rating Methodologies. 2022. Available online: https://www.ccxap.com/en/rating_methodologies (accessed on 3 September 2022).
  14. Jiemei, Z.; Yupei, W.; Yuping, Z. Research on the Risk Management of Local Government Financing Platform Based on the Perspective of Stakeholders. Manag. Rev. 2019, 31, 61–70. [Google Scholar]
  15. Cho, S.J.; Chung, C.Y.; Young, J. Study on the Relationship between CSR and Financial Performance. Sustainability 2019, 11, 343. [Google Scholar] [CrossRef] [Green Version]
  16. Ameyaw, E.E.; Chan, A.P.C. Evaluation and ranking of risk factors in public–private partnership water supply projects in developing countries using fuzzy synthetic evaluation approach. Expert Syst. Appl. 2015, 42, 5102–5116. [Google Scholar] [CrossRef]
  17. Zhang, Q.; Wang, J.; Lu, A.; Wang, S.; Ma, J. An improved SMO algorithm for financial credit risk assessment—Evidence from China’s banking. Neurocomputing 2018, 272, 314–325. [Google Scholar] [CrossRef]
  18. Wang, S.; Zhang, B.; Cheng, J.; Niu, Y. Study on the Influence of Government Behavior on PPP Performance. Soft Sci. 2020, 34, 1–5. [Google Scholar] [CrossRef]
  19. Shrestha, N. Detecting multicollinearity in regression analysis. Am. J. Appl. Math. Stat. 2020, 8, 39–42. [Google Scholar] [CrossRef]
  20. Wang, Y.; Jin, X. Structural risk of diversified project financing of city investment company in China based on the best worst method. Eng. Constr. Archit. Manag. 2019, 28, 196–215. [Google Scholar] [CrossRef]
  21. Arora, N.; Kaur, P.D. A Bolasso based consistent feature selection enabled random forest classification algorithm: An application to credit risk assessment. Appl. Soft Comput. 2020, 86, 105936. [Google Scholar] [CrossRef]
  22. Wang, Y.; Li, Y.; Song, J.; Ma, Z. Study on Earning Management under the Corporate Life Cycle: Based on the Debt Covenants Theory. Manag. Rev. 2016, 28, 75–91. [Google Scholar] [CrossRef]
  23. Wang, B.; Zhang, S.; Wang, X.; Han, L. The optimal capital structure of PPP projects:Based on the realoption method. J. Manag. Sci. China 2019, 22, 73–85. [Google Scholar]
  24. Dagong Global Credit Rating. Local Government Credit Rating Method. 2022. Available online: https://www.dagongcredit.com (accessed on 1 September 2022).
  25. Bush, C. The Rise of Dagong Global Credit Rating Agency and the International Credit Rating Agencies Reforms; University of Surrey: London, UK, 2021. [Google Scholar]
  26. Zhang, L.; Song, Q. Multimodel Integrated Enterprise Credit Evaluation Method Based on Attention Mechanism. Comput. Intell. Neurosci. 2022, 2022, 8612759. [Google Scholar] [CrossRef]
  27. Yu, L. Forecasting and Decision Optimization Theory and Methods Based on Artificial Intelligence. J. Manag. Sci. 2022, 35, 60–66. [Google Scholar]
  28. Lappas, P.Z.; Yannacopoulos, A.N. A machine learning approach combining expert knowledge with genetic algorithms in feature selection for credit risk assessment. Appl. Soft Comput. 2021, 107, 107391. [Google Scholar] [CrossRef]
  29. Zhou, Y.; Su, X. Credit Risk Prediction of Company Based on Optimal Feature Set. J. Syst. Manag. 2021, 30, 817–838. [Google Scholar]
  30. Luo, J.; Yan, X.; Tian, Y. Unsupervised quadratic surface support vector machine with application to credit risk assessment. Eur. J. Oper. Res. 2020, 280, 1008–1017. [Google Scholar] [CrossRef]
  31. Pigini, C. Penalized maximum likelihood estimation of logit-based early warning systems. Int. J. Forecast. 2021, 37, 1156–1172. [Google Scholar] [CrossRef]
  32. Dawood, M.; Horsewood, N.; Strobel, F. Predicting sovereign debt crises: An Early Warning System approach. J. Financ. Stab. 2017, 28, 16–28. [Google Scholar] [CrossRef]
  33. Sun, Y. Customer Stickiness Evaluation Model Research Based on Machine Learning. In Artificial Intelligence in China; Springer: Berlin/Heidelberg, Germany, 2022; pp. 522–527. [Google Scholar]
  34. Liu, C.; Xie, J.; Zhao, Q.; Xie, Q.; Liu, C. Novel evolutionary multi-objective soft subspace clustering algorithm for credit risk assessment. Expert Syst. Appl. 2019, 138, 112827. [Google Scholar] [CrossRef]
  35. Lin, J.C.; Shao, Y.; Djenouri, Y.; Yun, U. ASRNN: A recurrent neural network with an attention model for sequence labeling. Knowl. -Based Syst. 2021, 212, 106548. [Google Scholar] [CrossRef]
  36. Lee, C.Y.; Koh, S.K.; Lee, M.C.; Pan, W.Y. Application of Machine Learning in Credit Risk Scorecard. In Proceedings of the International Conference on Soft Computing in Data Science, Virtual. 2–3 November 2021; pp. 395–410. [Google Scholar]
  37. Bai, Y.; Zha, D. Commercial Bank Credit Grading Model Using Genetic Optimization Neural Network and Cluster Analysis. Comput. Intell. Neurosci. 2022, 2022, 4796075. [Google Scholar] [CrossRef] [PubMed]
  38. Wang, W.; Xiao, Y.; Li, X. Study on the Influencing Factors of Credit Spread of Municipal Bond Based on Random Forest Regression Model. Math. Pract. Theory 2020, 50, 311–320. [Google Scholar]
  39. Szczepański, M.; Choraś, M.; Pawlicki, M.; Kozik, R. Achieving Explainability of Intrusion Detection System by Hybrid Oracle-Explainer Approach. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
  40. Parr, T.; Grover, P. dtreeviz: Decision Tree Visualization. 2020. Available online: https://github.com/parrt/dtreeviz (accessed on 8 September 2022).
  41. Lyu, Z.; Yu, Y.; Samali, B.; Rashidi, M.; Mohammadi, M.; Nguyen, T.N.; Nguyen, A. Back-propagation neural network optimized by K-fold cross-validation for prediction of torsional strength of reinforced Concrete beam. Materials 2022, 15, 1477. [Google Scholar] [CrossRef]
Figure 1. Mechanism of the dynamic comprehensive evaluation of corporate credit.
Figure 1. Mechanism of the dynamic comprehensive evaluation of corporate credit.
Applsci 12 10362 g001
Figure 2. Multivariate joint distribution.
Figure 2. Multivariate joint distribution.
Applsci 12 10362 g002
Figure 3. Correlation matrix analysis of variables.
Figure 3. Correlation matrix analysis of variables.
Applsci 12 10362 g003
Figure 4. Analysis of financial indicators.
Figure 4. Analysis of financial indicators.
Applsci 12 10362 g004
Figure 5. Schematic of XGBoost-logistic combined model construction.
Figure 5. Schematic of XGBoost-logistic combined model construction.
Applsci 12 10362 g005
Figure 6. Monotonicity analysis of LGFV index bins. (a) Net worth; (b) total assets; (c) operating income; (d) return on equity.
Figure 6. Monotonicity analysis of LGFV index bins. (a) Net worth; (b) total assets; (c) operating income; (d) return on equity.
Applsci 12 10362 g006
Figure 7. Index ranking comparison of LGFVs.
Figure 7. Index ranking comparison of LGFVs.
Applsci 12 10362 g007
Figure 8. Analysis of influence degree of characteristic variables.
Figure 8. Analysis of influence degree of characteristic variables.
Applsci 12 10362 g008
Figure 9. Impact analysis of the major variables. (a) Net worth; (b) total assets; (c) operating income; (d) return on equity.
Figure 9. Impact analysis of the major variables. (a) Net worth; (b) total assets; (c) operating income; (d) return on equity.
Applsci 12 10362 g009
Figure 10. Visualization case of the XGBoost decision training model.
Figure 10. Visualization case of the XGBoost decision training model.
Applsci 12 10362 g010
Figure 11. K-fold cross-validation principle.
Figure 11. K-fold cross-validation principle.
Applsci 12 10362 g011
Figure 12. Comparative analysis of the K-fold cross-validation. (a) XGBoost-logistic combined model; (b) logistic regression model.
Figure 12. Comparative analysis of the K-fold cross-validation. (a) XGBoost-logistic combined model; (b) logistic regression model.
Applsci 12 10362 g012
Figure 13. Performance comparison and analysis of the model. (a) Gain curve comparison; (b) lift curve comparison.
Figure 13. Performance comparison and analysis of the model. (a) Gain curve comparison; (b) lift curve comparison.
Applsci 12 10362 g013
Table 1. City investment company characteristic index variables.
Table 1. City investment company characteristic index variables.
First-Grade IndexesSecond-Grade Indexes
Capital StructureNet worth
Total assets
Operating income
Operational CapabilityTotal asset turnover
Inventory turnover
Business activities cash flow/total assets
Current assets turnover
ProfitabilityReturn on equity
Main business revenue growth
Sales net interest rates
Operating profit/total assets
Assets net profit margin
Return on total assets
Regional EconomyThe public finance budget revenues
GDP per capita
Tax Revenue
The local GDP
The local GDP growth
Debt Paying AbilityCurrent liabilities ratio
Equity ratio
Financing activities cash flow/total liabilities
Quick ratio
Asset-liability ratio
A coupon multiples
Current ratio
Table 2. Characteristic variables correlation analysis.
Table 2. Characteristic variables correlation analysis.
Variable   1   ( x v i ) Variable   2   ( y v i ) Correlation Coefficient
Net worthTotal assets0.9368
Return on equityAssets net profit margin0.9165
Operating profit/Total assetsAssets net profit margin0.8540
Asset-liability ratioEquity ratio0.9924
Total assets turnoverCurrent assets turnover0.8287
Tax RevenueThe public finance budget revenues0.8219
The local GDPThe public finance budget revenues0.8523
Tax Revenue0.9934
GDP per capitaThe public finance budget revenues0.8019
Tax Revenue0.8536
The local GDP0.8631
Table 3. Credit Rating of LGFVs.
Table 3. Credit Rating of LGFVs.
Credit GradeScore IntervalImplication
AAA x 95 Strong performance capability, good reputation, and low credit risk
AA 90 x < 95 Strong performance ability, good reputation, and low credit risk
A 85 x < 90 Strong performance ability, good reputation, and low credit risk
BBB 80 x < 85 Performance ability medium to excellent, reputation medium to excellent, and medium to small credit risk
BB 75 x < 80 Medium performance, medium reputation, and medium credit risk
B 70 x < 75 Lower-middle performing ability, lower-middle credibility, andlower-middle credit risk
CCC 65 x < 70 Weak ability to perform, general credibility, and credit risk
CC 60 x < 65 Weak Performance, credit deviation, and high credit Risk
C 55 x < 65 Weak performance, poor reputation, and high credit risk
D x < 55 Great credit risk
Table 4. Measurement of IV value.
Table 4. Measurement of IV value.
IV Value ScopePredictive Ability
[−lnf, 0.02)No predictive ability
[0.02, 0.1)Weak predictive ability
[0.1, 0.3)Medium predictive ability
[0.3, +lnf)Strong predictive ability
Table 5. Explanatory variant importance ranking.
Table 5. Explanatory variant importance ranking.
First-Grade IndexesSecond-Grade IndexesIV ValueCRA Rank
XGBoostLogistic
Capital StructureNet worth2.29822.09121
Total assets2.01011.90132
Operating income1.28431.2613
Operational CapabilityTotal asset turnover0.24290.310211
Inventory turnover0.29840.319310
Business activities cash flow/total assets0.33570.35097
Current assets turnover0.17210.221617
ProfitabilityReturn on equity0.45640.44634
Main business revenue growth0.12040.20219
Sales net interest rates0.26790.269620
Operating profit/total assets0.31410.3588
Asset net profit margin0.38960.43195
Return on total assets0.23430.260612
Regional EconomyThe public finance budget revenues0.19140.4915None
GDP per capital0.33260.4628
Tax Revenue0.3190.4907
The local GDP0.3190.5031
The local GDP growth0.02170.3805
Debt Paying AbilityCurrent liabilities ratio0.13740.15729
Equity ratio0.22850.282414
Financing activities cash flow/total liabilities0.24110.288813
Quick ratio0.34050.35846
Asset-liability ratio0.22780.271915
A coupon multiple0.1540.258418
Current ratio0.21170.27516
Table 6. Model evaluation analysis.
Table 6. Model evaluation analysis.
CoefficientAUCPrecisionRecallSensitivityF1 ScoreGiniKs
Model
Logistic0.74530.78120.75380.51040.28310.70950.5515
XGBoost-logistic0.83590.87150.83910.76040.84560.78460.7198
Table 7. Datasets comparison and model validation.
Table 7. Datasets comparison and model validation.
DatasetTypeCoefficientLogisticXGBoost-Logistic
Accuracy of training setDefaultAverage value61.00883.648
Minimum59.98883.497
Maximum62.78783.813
Standard deviation0.0020.000007
Non-defaultAverage value62.18983.365
Minimum61.25982.437
Maximum63.67383.836
Standard deviation0.0080.005
TotalAverage value61.56383.156
Accuracy of test setDefaultAverage value62.61482.107
Minimum61.01881.729
Maximum63.45282.46
Standard deviation0.0130.0009
Non-defaultAverage value52.97183.286
Minimum52.16782.532
Maximum53.87583.627
Standard deviation0.0090.003
TotalAverage value57.79282.688
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zhao, L.; Yang, S.; Wang, S.; Shen, J. Research on PPP Enterprise Credit Dynamic Prediction Model. Appl. Sci. 2022, 12, 10362. https://0-doi-org.brum.beds.ac.uk/10.3390/app122010362

AMA Style

Zhao L, Yang S, Wang S, Shen J. Research on PPP Enterprise Credit Dynamic Prediction Model. Applied Sciences. 2022; 12(20):10362. https://0-doi-org.brum.beds.ac.uk/10.3390/app122010362

Chicago/Turabian Style

Zhao, Likun, Shaotang Yang, Shouqing Wang, and Jianxiong Shen. 2022. "Research on PPP Enterprise Credit Dynamic Prediction Model" Applied Sciences 12, no. 20: 10362. https://0-doi-org.brum.beds.ac.uk/10.3390/app122010362

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop