Research on PPP Enterprise Credit Dynamic Prediction Model

Zhao, Likun; Yang, Shaotang; Wang, Shouqing; Shen, Jianxiong

doi:10.3390/app122010362

Open AccessArticle

Research on PPP Enterprise Credit Dynamic Prediction Model

by

Likun Zhao

¹,

Shaotang Yang

^1,*

,

Shouqing Wang

^2,3 and

Jianxiong Shen

¹

School of Civil Engineering, North China University of Technology, Beijing 100144, China

²

Department of Construction Management, Tsinghua University, Beijing 100084, China

³

Center for Public-Private Partnership, Tsinghua University, Beijing 100084, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(20), 10362; https://0-doi-org.brum.beds.ac.uk/10.3390/app122010362

Submission received: 6 September 2022 / Revised: 8 October 2022 / Accepted: 11 October 2022 / Published: 14 October 2022

(This article belongs to the Special Issue Machine/Deep Learning: Applications, Technologies and Algorithms)

Download

Browse Figures

Versions Notes

Abstract

:

The debt default risk of local government financing vehicles (LGFVs) has become a potential trigger for systemic financial risks. How to effectively prevent hidden debt risk has always been a hot issue in public-private partnership (PPP) financing management research. In recent years, machine learning has become more and more popular in the study of enterprise credit evaluation. However, most scholars only focus on the output of the model, and do not explain in detail the extent to which variables affect the model and the decision-making process of the model. In this paper, we aim to apply a better credit rating method to the key factors and analysis of LGFV’s default risk, and analyze the decision-making process of the model in a visual form. Firstly, this paper analyzes the financial data of LGFVs. Secondly, the XGBoost-logistic combination algorithm is introduced to integrate the typical characteristics of PPP projects and construct the credit evaluation model of LGFVs. Finally, we verify the feasibility of the model by K-fold cross validation and performance evaluation. The results show that: (1) net worth, total assets, operating income, and return on equity are the most critical factors affecting the credit risk of LGFVs, asset-liability ratio and tax revenue are also potentially important factors; (2) the XGBoost-logistic model can identify the key factors affecting the credit risk of LGFVs, and has better classification performance and predictive ability. (3) The influence of each characteristic variable on model decision can be quantified by the SHAP value, and the classification decision visualization of the model improves the interpretability of the model.

Keywords:

public-private partnerships; gradient promotion algorithm; credit rating; combination model; machine-learning

1. Introduction

Local government financing vehicles (LGFVs) are established by the local government and its departments and institutions, etc., through financial grants or injection of assets, such as equity, with investment financing functions and a separate legal entity, to help the local government perform a certain function as the main body of the company. In recent years, the tightening of China’s banks’ ability to invest in credit to cities has superimposed the high frequency of default events of all local governments. In the debt expansion cycle, the solvency pressure of LGFVs in all local regions has also gradually accumulated. Therefore, how to effectively protect against the hidden debt crisis of LGFVs has become an important problem for the development of high-quality service in public-private partnerships (PPP) [1].

A good credit relationship is the premise of a smooth pursuit of the PPP model. For a long time, some local governments attached more importance to financing than management, resulting in frequent debt performance disputes among some LGFVs [2]. Moreover, the risk of non-standard default is transmitted along the guarantee chain, resulting in the deterioration of the regional financing environment. China’s General Office of the State Council in No. 19, May 2022, Opinions on Further Revitalizing Stock Assets and Expanding Effective Investment, pointed out the need to improve the regulatory docking mechanism of social capital investment and financing cooperation and to prevent resolving the risk of potential debt of firms. This indicates that whether social capital parties have sufficient capacity for investment, construction, operation, management, innovation, etc., is a key prerequisite for the success of the PPP project as a whole [3,4].

At present, the academic research on the PPP model is mainly focused on the areas of benefit distribution, risk sharing, etc. Credit assessment studies are dominated by methods, including traditional expert empirical methods, discriminant analysis, and machine learning methods, such as the logistic model, SVM model, and BP neural network model, etc.

Xin [5], Chen [6], and Su [7] et al. applied machine learning algorithms, such as logistic, IVHFSs-IFAHP, KMV, random forest, SVM, and XGBoost, to identify, evaluate, and predict credit risks. However, models, such as random forests, SVM, and XGBoost are less transparent and lack the interpretability of the models, making plausibility of model results poorly interpreted [8]. Additionally, most scholars only pursue the output results of the model, but do not explain in detail the impact of variables on the model, especially the process of model decision-making.

Due to the long period of the PPP project, static credit evaluation results are difficult to fit into the authentic credit situation of participating firms [9]. LGFVs also are affected by the regional economy to a greater extent than the local government [10]. Consequently, scientific and rational analysis of risk factors, interpretability of machine learning model issues encourage us to develop further research.

In this paper, our objectives are as follows: (1) Use the XGBoost-logistic evaluation model to evaluate the credit of listed LGFVs and determine which indicators have the greatest impact on the results. (2) Perform an in-depth analysis of the impact of characteristic variables on the model and the potential risk of default, and explain the model decision-making process. (3) Using K-fold cross-validation, lift, and Gain analysis in-depth and detailed comparison of XGBoost-logistic and logistic evaluation were used to compare XGBoost-logistic and logistic in depth and in detail, and the comprehensive performance of the model was evaluated to determine the feasibility and application value of the model.

2. Materials and Methods

2.1. Comprehensive Evaluation Mechanism of Enterprise Credit Dynamics

The research mechanism of this paper is as follows: Firstly, on the basis of data feature derivative analysis, we found the key influencing factors and analyzed their risks. Secondly, we applied the XGBoost gradient boosting algorithm to bin the characteristics of the variables and calculated and the WoE (weight of evidence) values and the IV (information value) values [11]. Thirdly, we constructed the XGBoost-logistic combination model in the form of linear combination to calculate the credit risk of LGFVs and verify the feasibility and performance of the model. Finally, the influence of variables on the model and the decision-making process were displayed by visualization. The research mechanism is shown in Figure 1.

2.2. Data Processing and Characteristic Derivative Analysis

The data in this paper were derived from Wind datasets from 2019–2021 and the Chinese Statistical Year Book. Considering the external restricted access ratings of most financial institutions for LGFVs, the initial rating is selected above A level, and the actual controller is the local government or relevant departments [12] (the state-owned Assets Supervision and Administration Commission of the State Council, Ministry of Water Resources, PRC, Ministry of Housing and Urban-Rural Development, PRC, Ministry of Finance, PRC, etc.), cumulative 959 firms. The missing fields were sparse matrix filtered with a matrix of 50%. In total, 23,733 valid data were obtained.

In this paper, 959 LGFVs are added to a new tag column, and the data are merged according to the name of the enterprise as the main key to form a dataset that can be used for machine learning algorithm learning. The label column is the output variable predicted by the subsequent classification model, and the value is “0” or “1”. “0” indicates that the enterprise defaults, and “1” indicates that the enterprise is non-default. Finally, the processed financial data of LGFVs mainly includes financial index values and the target, a total of 959 valid samples, including 314 risk samples and 645 normal samples, with a ratio of 0.48.

Drawing on the resource review indicator system of China Chengxin (Asia Pacific) Credit Ratings [13] and the related research literature [14,15], these unifications are called credit rating agencies (CRA). This paper selects indicators that are closely related to the financial situations of listed firms in the Chinese construction industry, a total of 26 (including dependent variables). These metrics describe the financial landscape of LGFVs from multiple perspectives, with strong explanatory power for the likelihood that credit risks exist for LGFVs. Indicators were selected with emphasis on data integrity, validity, comprehensiveness, and local economic dependency, and this paper also incorporated regional economic indicators with growth capacity assessment [16]: public finance budget revenues, GDP Per capita, tax revenue, local GDP, and local GDP growth, as shown in Table 1.

It can be explained as follows:

Capital Structure: It reflects the proportion of corporate funding composition, the layout of assets, and the size of equity, and reflects the external support and its own strength in the enterprise. The capital structure of a firm is related to its solvency pressure of the firm and has a decisive effect on the financial condition of the firm.
Operational Capability: It is an intrinsic determinant for predicting the future direction of firms and measures the quality and sustainability of corporate credit, which reflects the high efficiency of LGFVs for economic resource management and use [17].
Profitability. A combination of factors, such as corporate operating model, asset quality, diversification, differentiation, technological progress, cost control ability, and management level, demonstrates that a firm with high profitability can effectively reduce the impact of income fluctuation caused by an economic cycle and enhance the economic resilience of the firm.
Regional Economy: It is a key measure of the size and development level of the economy in the region it is located in. Economic performance, economic stability, and growth potential affect the development of firms and even industries. The stronger the regional financial strength, the more stable the financial support for the construction of local urban infrastructure [18].
Debt Paying Ability: It is a key basis for judging the financial risk of LGFVs, and a corporate credit rating can predict the reliability of a firm’s payment of due debt for a certain period in the future. The pattern characteristics of PPP basically determine the industry’s greater operational leverage. However, LGFVs are characterized by large investment amounts, long investment recovery periods, and indirectness of investment returns, generally showing poor quality of assets, liquidity, and not profit for money.

2.2.1. Multivariate Joint Distribution Analysis

ρ_{L G F V s}

(Spearman’s rank correlation coefficient) is an indicator of the correlation between two variables. In this paper, the confusion matrix of 26 rating indicators is calculated using the Spearman analysis method, and the variables with correlation coefficient greater than 0.8 are retained [19], and their IV values are retained as the input variables.

As shown in Formula (1). Among them,

x_{v i}^{}

represents the rank value of the Variable 1, and

y_{v i}

represents the rank value of the Variable 2,

n

is the numbers of samples,

ρ_{L G F V s}

is the correlation coefficient between the rank variables.

ρ_{L G F V s} = 1 - \frac{6 \sum {(x_{v i}^{} - y_{v i})}^{2}}{n (n^{2} - 1)}

(1)

Since the dataset samples do not obey the normal distribution law, in order to objectively predict the credit rating of LGFVs based on the machine learning model, it is necessary to use the Spearman analysis method to construct a correlation coefficient greater than 0.8 credit qualification, influence coefficient matrix, and calculate the correlation coefficient between the two indicators [20]. A total of 11 pairs of variables with indicators greater than 0.8 were obtained, as shown in Table 2.

Joint distribution can establish a pairwise comparison matrix by a heterogeneous group of factors in a multi-dimensional problem event, identifying key points based on a matrix plot. The strength performance of the correlation coefficients based on Table 2 was theoretically examined using the measure in [21]. The goodness of fit of the combined values of the various indicator variables was examined by plotting multi-variate joint distribution plots in Python for a total of seven groups of variables: Net worth, Total assets, Equity ratio, Asset-liability ratio, The public finance budget revenues, Tax revenue, and GDP per capita. As shown in Figure 2.

It can be seen that the sample scatter points corresponding to the seven groups of functions are well distributed. However, we observe the diagonal sample distribution and find that Asset-liability ratio, The public budget revenues, Tax revenue, and The local GDP have a large area of default sample distribution.

Among them, the asset-liability ratio is the most widely distributed variable among default samples. The higher the Asset-liability ratio, the higher the probability of default samples. Due to the limited liability effect of debt, firms with higher debt levels are more aggressive and have higher debt risk [22]. This shows that LGFVs should improve financial management and financial planning, prevent debt default risk, and improve credit evaluation levels.

2.2.2. Variable Correlation Matrix Analysis

Variable correlation matrix analysis is a key step to mining highly correlated variables and improving model interpretability. The deeper the color of the subgraph, the stronger the positive correlation of the feature. The lighter the color of the subgraph, the stronger the negative correlation between features. The line represents the level of correlation, and the earlier the line appears, the stronger the correlation is, as shown in Figure 3.

It can be seen that the characteristic variables with yellow outer-ring wraps have a high correlation. Among them, “Return on equity, Assets net profit margin and Return on total assets”, “Quick ratio, Current ratio, Net worth and Total assets”, and “The local GDP, Tax revenue. The public finance budget revenues” are the groups of variables with significant correlation.

In order to ensure the forecasting effect of the model, the variables with correlation coefficient greater than 0.8 are selected, and The return on equity, Quick ratio, Net worth, tax revenue of the above variable groups are retained as the model variables.

2.2.3. Risk Analysis of Financial Indicators

This chapter selects return on equity, quick ratio, net worth, tax revenue and asset-liability ratio to draw a box chart of financial capacity indicators. As shown in Figure 4, “0” represents the default sample and “1” represents the non-default sample.

From the analysis of Figure 4, it is found that return on equity and net worth are the variables with the most outliers, which have a fluctuating impact on the output of the model. The quick ratio is the most average variable in the sample distribution, so the impact on the model is relatively low.

Further analysis shows that tax revenue and asset-liability ratios are the most widely distributed variables in default samples. The statistics show that among the default enterprises, asset-liability ratio is mainly concentrated in 50–65%, and tax revenue is mainly concentrated in 27 billion–85 billion RMB. Among them, the distribution span of default enterprises in asset-liability ratio is significantly higher than that of non-default enterprises, and the distribution span of default enterprises in tax revenue is relatively wide.

This shows that third-party rating agencies and governments need to pay extra attention to the two evaluation indicators of asset-liability ratio and tax revenue since they have strong uncertainty and are the key factors affecting the credit score of enterprises. Therefore, reasonable capital structure adjustment and the regional economic environment have important reference significance for LGFVs [23].

2.3. Credit Rating of LGFVs

According to the credit rating standard of Dagong Global Credit Rating [24,25], 10 levels are established to divide the enterprise credit rating. For companies participating in PPP projects, the credit evaluation level can be divided into five states: the credit rating between 90 and 100 points is AAA credit; scores of 80 to 90 for AA credit; and by analogy in turn, the specific credit rating division is shown in Table 3.

3. Prediction Model Construction for LGFVs

At present, enterprise credit evaluation models mainly include discriminant analysis, logistic regression analysis, data mining, artificial neural networks, SVM, and fuzzy comprehensive evaluation methods [26]. Among them, discriminant analysis has strong operability and predictive ability, but higher requirements for indicators. It needs to assume the linear relationship between the variables, and the actual indicators are generally difficult to meet in this situation [27]. Machine learning algorithms show better prediction than traditional models for large and uncertain data [28,29]; logistic regression analysis is often used to solve the problem of qualitative index, but cannot solve the problem of non-linear quantitative index [30]; data mining goes through the actual data to find credit rules, rather than the establishment of credit rules, so the new credit evaluation forecast is more scientific. Under comprehensive comparison, the XGBoost algorithm has good adaptability to sparse data and can handle non-convex, non-linear, and non-local high-dimensional data samples. It can effectively solve the binning characteristics of abnormal data, with stronger robustness, suitable for a large-scale training model [31,32].

3.1. Mechanism of Model Construction

Most of the traditional XGBoost-logistic combination models only calculate the final output results. However, the decision process of the model is not decomposed, and the influence of the characteristic variables on the model output is not explained.

Therefore, we combine two advantages: the classification performance of the XGBoost model and the interpretability of the logistic model. This paper constructs an LGFVs credit evaluation combination model based on XGBoost-logistic, and specifically decomposes the model structure to explain the classification decision process of the model for LGFVs. The detailed construction principle is shown in Figure 5.

The weight of evidence (WoE) is the logarithm of the ratio of normal to defaulted firms under any segmentation in the sample of all firm data [33].

B_{i}

denotes LGFVs with a higher risk of default within the interval,

B_{t}

denotes all LGFVs with a higher risk of default within the sample,

G_{i}

denotes LGFVs with a lower risk within the interval,

G_{t}

denotes all LGFVs with a lower risk within the sample, and the subscript i represents the ith interval for a trait, as shown in Formula (2).

w o e_{i} = \ln (\frac{B_{i} / B_{t}}{G_{i} / G_{t}})

(2)

The IV (information value) is a measure of the correlation between the target variable and the characteristic variable [34]. It is generally believed that:

I V < 0.1

, indicating that the predictive ability of a variable correlation is weak;

0 . 1 \leq I V < 0.3

, indicating that the predictive ability of a variable correlation is medium;

I V \geq 0.3

, indicating strong predictive ability of variable correlation. The higher the IV value, the greater the difference in the distribution of the characteristics of the index, and the stronger the predictive ability of distinguishing whether LGFVs default or not. Measures of IV value are shown in Table 4.

The IV value is calculated as shown in Formula (3). Among them,

P_{y i}

indicates the ratio of the sample number of default LGFVs in group I to all LGFVs,

P_{n i}

denotes the ratio of the sample of defaulted LGFVs in group I to all defaulted LGFVs,

y_{i}

indicates the number of samples of group I defaulted LGFVs,

n_{i}

indicates the number of samples of defaulted LGFVs in group I,

y_{s}

indicates the number of samples of all defaulted LGFVs, and

n_{s}

indicates the number of samples of all defaulted LGFVs.

\begin{array}{l} \begin{array}{l} I V_{i} = (P_{y i} - P_{n i}) * w o e_{i} \\ = (P_{y i} - P_{n i}) * \ln \frac{P_{y_{i}}}{P_{n_{i}}} \\ = (y_{i} / y_{s} - n_{i} / n_{s}) \ln \frac{y_{i} / y_{s}}{n_{i} / n_{s}} \end{array} \\ = \overset{}{\sum_{i}^{n}} I V_{i} \end{array}

(3)

3.2. Credit Rating Model and Scoring Card for LGFVs

3.2.1. WoE Binning Monotonicity Analysis

Highly correlated variables have a significant impact on the accuracy and robustness of credit scoring models [35]. Through monotonicity analysis, this chapter can analyze the fluctuation characteristics of variables and determine the distribution of default samples [36]. In this section, we analyzed the box monotonicity of the top four indicators of IV values and removed bins with less than 5% of default samples. Then, we drew a box monotonicity analysis chart of net worth, total assets, operating income, and return on equity. As shown in Figure 6, the histogram represents the distribution interval of the sample, and the line represents the proportion of default enterprises in the sample.

As can be seen from Figure 6, net worth is divided into seven intervals, total assets into nine intervals, operating income into eight intervals, and return on equity into eight intervals. Net worth, total assets, and operating income all show an approximate monotonically increasing trend, which meets the fluctuation range and entry requirements of variables.

In Figure 6, we found that the box characteristics of return on equity show a trend of decreasing first and then increasing. Among them, return on equity shows a gradual downward trend from the first to the third box. The proportion of default samples in the first bin reached 51%. Moreover, approximately 21% of LGFVs have a return on equity of less than 1.08%, indicating that companies with weaker profitability are more likely to possess default behavior. The business of LGFVs is public welfare or quasi-public welfare business. The operation of its projects and the recovery of investment are affected by the financial strength of local governments.

3.2.2. Selection of Explanatory Variable Characteristics

The original indexes were binned by feature engineering, and the woe and IV values of each variable were calculated by using the XGBoost gradient boosting algorithm, and finally, variables with IV values greater than 0.1 were selected to enter the model, which ensured that the entered variables had better prediction ability of ratings [37], as seen in Table 5.

The indicator ranking contrasts were plotted by ranking these variants by their IV values. The results of the comparative logistic regression calculations and the index ranking data of the CRA are shown in Figure 7.

As can be seen in Figure 5, the XGBoost algorithm achieved a 95% fit to the CRA, while the logistic regression was only 75%, indicating that the XGBoost algorithm had better classification prediction ability and better feature learning on the data.

Compared to the current liabilities ratio metrics of both sides, the ranking result of the XGBoost algorithm was only 12th, which was lower than the 9th for the CRA. This is due to the accrual of the size of the due bonds to 70 billion RMB as of May 2022 for LGFVs of AA and the following credit classes. In the context of a stringent regulatory policy, professional rating bodies need to re-evaluate the solvency measures of firms to determine a reasonable corporate credit level and circumvent a certain credit risk [38].

3.3. Interpretability Analysis of Characteristic Variables

Shapley additive explanations (SHAP), as an explanatory tool for black box models, can be used to evaluate the importance of feature variables in a model [39]. The SHAP value of the validation dataset is calculated using the SHAP library in Python. In this chapter, the explanatory SHAP value is applied to the model of XGBoost-logistic, and the influence of each characteristic variable on model decision is presented. Finally, dtreeviz is used to visualize the data decision binning process [40].

3.3.1. Macro Perspective Analysis

The calculation results of the SHAP value are shown in Figure 8. Among them, the darker the color, the greater the impact, and the lighter the color, the smaller the impact. SHAP values have positive and negative effects.

First, we found that net worth, total assets, and operating income deeply analyze the key indicators that affect the output of the model, and sales net interest rates are the characteristic variables that have the greatest positive impact on the model. Asset net profit margin, quick ratio, operating profit/total assets, and inventory turnover are the characteristic variables that have the most negative impact on the model. Second, Figure 4 shows that tax revenue has two characteristics at the same time: the distribution of default samples is wide, and the positive and negative impacts on the model are large. Therefore, we have reason to believe that tax revenue is potentially an important factor.

3.3.2. Micro-Perspective Analysis

Micro perspective analysis is an in-depth analysis of the impact of variables in the model. As shown in Figure 9, the deeper the color, the greater the impact on the model.

As shown in the figure, net worth and operating income have a greater positive impact on the model in the range of 0–50 billion RMB and 0–10 billion RMB, respectively, while total assets have a greater positive impact on the model in the range of 100–200 billion RMB; total assets and return on equity, ranging between 50–100 billion RMB and 25–700 million RMB, had a large negative impact on the model, respectively. This shows that LGFVs with low profitability are more likely to default on their debt. Additionally, the lower the total assets, the higher the probability of default enterprises.

3.3.3. Visual Interpretation of Model Decision

This chapter uses dtreeviz to visualize XGBoost decisions. The algorithm model of dtreeviz is to sample the data points around the explained variables, use the XGBoost classifier to obtain the predicted results, and finally weight them by the proximity to the instance. Then, by optimizing the specific equation, the prediction path of the explanatory variable is found. The classification decision path of default enterprise and non-default enterprise is shown in Figure 5.

As shown in Figure 10, the decision order of the model is determined by the influence of the characteristic variables on the model output. Firstly, the classification results of net worth ≤ 141.89 and net worth > 141.89 are determined by the feature screening of the first layer. Secondly, through the second layer of feature screening, such as return on equity, further classification is performed. Finally, through the feature-matching degree of the target enterprise, it predicts whether it is a default enterprise. The decision visualization results of the model describe the whole process of feature classification in detail, which enables the evaluators to better understand the potential risk characteristics of the enterprise.

4. Experiment

In the training process of the model, the problem of overfitting often occurs. For example, the model can well match the training datasets, but not well predict the data outside the training datasets. K-fold cross-validation is a commonly used model in the accuracy test method. This chapter uses the cross-validation of different training datasets and datasets in the data to scientifically solve the problem of algorithm accuracy estimation [41].

This chapter divides the datasets into a training dataset and a test dataset, where the test size is set to 0.3. A part of the dataset is taken as the test dataset, and the rest is used as the training dataset. The test dataset is used to evaluate the training effect of the model. The specific principle is shown in Figure 11.

4.1. Model Testing and Comparison

In order to highlight the advantages of the combined model, this chapter compares the XGBoost-logistic combined model with the logistic regression model in the classification effect, and performs 10-fold cross-validation on the two models; let K = 10. Draw the ROC comparison curve, as shown in Figure 12.

As shown in Figure 12, both models have good optimization effects. The average AUC value of the XGBoost-logistic model was 0.8359 (error value = ±0.04), and the optimal predicted AUC value reached 0.9023.

As shown in Table 6, the XGBoost-logistic model’s precision, recall, sensitivity, and F1 scores are all above 0.7. Among them, the Gini value is 0.7846, and the Ks value is 0.7198. Compared with the logistic regression model, the risk discrimination ability of the XGBoost-logistic algorithm model is improved by 16%, the prediction stability is improved by 56%, and the fitting effect of the model is better.

Gain curve and lift curve are visual aids for measuring model performance. By comparing XGBoost-logistic and logistic with gain curve and lift curve, the application value of the model can be measured.

As shown in Figure 13. The XGBoost-logistic algorithm has significant advantages in stability and computational efficiency. As shown in Figure 13a, gain curve comparison, it can be seen that the XGBoost-logistic algorithm achieves optimal classification results when covering approximately 70% of the sample. In contrast, logistic needs to cover 90% of the samples to achieve the best classification results. As shown in Figure 13b, lift curve comparison, the XGBoost-logistic algorithm has significant advantages in stability and computational efficiency. Compared with the logistic regression model, the computational performance of the XGBoost-logistic algorithm model is improved by 2.13 times. In the sample covering 40%, it can maintain a high computational efficiency and process multi-dimensional complex data.

4.2. Analysis of Results

The comparison analysis between the training datasets and the test datasets was performed by calculating the indicators from the average value, minimum, maximum, and standard deviation for the default and non-default samples for both models, as shown in Table 7.

As shown in Table 7, due to the defects of the traditional logistic model in binning classification performance, there is a large gap between the training datasets and test datasets in prediction accuracy. The XGBoost-logistic combined model showed a more stable accuracy, with a weighted prediction accuracy of 82.76%.

The following conclusions are drawn: XGBoost-logistic algorithm model has a significant optimization effect in prediction accuracy and stability, which shows the feasibility and application value of the model.

5. Conclusions and Future Work

5.1. Conclusions

This paper focuses on the research of the local government financing vehicles credit prediction model and analyzes the importance of multiple variables in the model output. On this basis, we established an enterprise credit evaluation XGBoost-logistic model. Combined with the credit evaluation problem of LGFVs and the current situation of the public-private partnership industry, the classification performance and interpretability process of the model were verified. The main conclusions are as follows:

1.: This paper proposes an innovative research method. The credit evaluation index of traditional LGFVs has been restructured. In view of the particularity of the PPP industry, the regional economy is introduced as an indicator. By judging the actual controller of the company and screening the LGFVs that meet the standards, it provides more accurate and more reliable basic data for the construction of the LGFVs prediction model.
2.: The key indicators affecting the model output were analyzed in depth. Networth, total assets, operating income, and return on equity are the variables that have the greatest impact on the model evaluation results. Moreover, asset-liability ratio and tax revenue are also important potential risk factors in LGFVs’ credit evaluation. The banks and governments need to pay extra attention to these indexes in the future.
3.: The XGBoost-logistic model shows more intuitive interpretability and stronger application performance. First, this paper proves that the XGBoost-logistic credit evaluation model is significantly superior to the traditional logistic regression model in terms of model distinguishing ability, sorting ability, accuracy, and stability. Second, we also explain how the characteristic variables affect the output of the model by SHAP value, and also apply dtreeviz to portray the classification decision process of the model for LGFVs with different characteristics. Finally, this paper provides an effective and scientific enterprise credit evaluation method for financial institutions and government departments, which can more accurately describe the credit risk status of LGFVs and greatly enhance the interpretability of the model. According to the important indicators that affect the LGFVS credit default behavior, it can provide targeted risk prevention countermeasures.

5.2. Future Work

Although the XGBoost-logistic model can effectively solve the credit evaluation problem of LGFVs, there are still some shortcomings. Due to the limitations of data, this paper only uses financial data, which cannot fully reflect the complex scene of the credit risk of LGFVs. Additionally, this model does not consider macroeconomic policy regulation and control and adjusts the index weight accordingly. Therefore, it is hoped that future research can be optimized and adjusted for the above situations.

Author Contributions

L.Z.: Methodology, Validation, Formal analysis, Writing—original draft preparation, Project administration, Writing—review and editing, Supervision and Funding acquisition; S.Y.: Data curation, Formal analysis Visualization, Code writing, Model building, Writing—original draft preparation. and Writing—review and editing; S.W.: Writing—original draft preparation, Project administration, Writing—review and editing, Supervision, Investigation and Funding acquisition; J.S.: Data curation and Visualization. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Beijing Municipal Natural Science Foundation (No. 9202006) and the Excellent Talent Project of North China University of Technology in 2019 (No. 216051360020XN225/004). The authors gratefully acknowledge this funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available upon request from the corresponding author. Data can be made available upon request for collaboration.

Acknowledgments

The authors sincerely thank all the editors and reviewers for their support and help.

Conflicts of Interest

The authors declare no conflict of interest.

References

Du, J.; Mao, W. Analysis on the Inducing Factors and the Mechanism of Local Government Implicit Debt in PPP Mode. Constr. Econ. 2020, 41, 83–87. [Google Scholar] [CrossRef]
Qi, X.; Lu, Z.; Zhang, Q. Research on Financial Institutions Participating in PPP Project and Its Strategy. Shandong Soc. Sci. 2018, 174–179. [Google Scholar] [CrossRef]
Xue, S.; Zhang, Z. Identification and Analysis of Influencing Factors for PPP Project Collaborative Supervision Based on Fuzzy-DEMATEL. Soft Sci. 2021, 35, 104–109. [Google Scholar] [CrossRef]
Ke, Y. Formation of optimal private involvement in urban rail public-private partnership projects in China. Int. J. Proj. Organ. Manag. 2018, 10, 268. [Google Scholar] [CrossRef]
Xin, L.; Bao, X. Research on the Credit Risk Evaluation of Listed Companies in Construction Industry: Based on the Logistic Model. Constr. Econ. 2020, 41, 98–101. [Google Scholar] [CrossRef]
Chen, J.; Wang, D. Government credit risk assessment of non-profit public-private partnership projects in China based on the IVHFSs-IFAHP model. Sci. Iran. 2021, 28, 38–48. [Google Scholar] [CrossRef] [Green Version]
Su, Z.; Hu, W.; Zhang, W. Measurement of Conversion Rate of Local Government Implicit Debt and Debt Risk Identification. Oper. Res. Manag. Sci. 2022, 31, 191–197. [Google Scholar]
Petropoulos, A.; Siakoulis, V.; Stavroulakis, E. Towards an early warning system for sovereign defaults leveraging on machine learning methodologies. Intell. Syst. Account. Financ. Manag. 2022, 29, 118–129. [Google Scholar] [CrossRef]
Feng, J.; Wang, Y.; Feng, H.; Li, M.; Xue, S. Research on the Comprehensive Dynamic Evaluation Model of Enterprise Credit under PPP Mode. Soft Sci. 2019, 33, 49–53. [Google Scholar]
Deng, J.; Zhang, N.; Ahmad, F.; Draz, M.U. Local government competition, environmental regulation intensity and regional innovation performance: An empirical investigation of Chinese provinces. J. Environ. Res. Public Health 2019, 16, 2130. [Google Scholar] [CrossRef] [Green Version]
Raymaekers, J.; Verbeke, W.; Verdonck, T. Weight-of-evidence through shrinkage and spline binning for interpretable nonlinear classification. Appl. Soft Comput. 2022, 115, 108160. [Google Scholar] [CrossRef]
Luo, H.; Chen, L. Bond yield and credit rating: Evidence of Chinese local government financing vehicles. Rev. Quant. Financ. Account. 2019, 52, 737–758. [Google Scholar] [CrossRef]
China Chengxin (Asia Pacific) Credit Ratings Company Limited. Rating Methodologies. 2022. Available online: https://www.ccxap.com/en/rating_methodologies (accessed on 3 September 2022).
Jiemei, Z.; Yupei, W.; Yuping, Z. Research on the Risk Management of Local Government Financing Platform Based on the Perspective of Stakeholders. Manag. Rev. 2019, 31, 61–70. [Google Scholar]
Cho, S.J.; Chung, C.Y.; Young, J. Study on the Relationship between CSR and Financial Performance. Sustainability 2019, 11, 343. [Google Scholar] [CrossRef] [Green Version]
Ameyaw, E.E.; Chan, A.P.C. Evaluation and ranking of risk factors in public–private partnership water supply projects in developing countries using fuzzy synthetic evaluation approach. Expert Syst. Appl. 2015, 42, 5102–5116. [Google Scholar] [CrossRef]
Zhang, Q.; Wang, J.; Lu, A.; Wang, S.; Ma, J. An improved SMO algorithm for financial credit risk assessment—Evidence from China’s banking. Neurocomputing 2018, 272, 314–325. [Google Scholar] [CrossRef]
Wang, S.; Zhang, B.; Cheng, J.; Niu, Y. Study on the Influence of Government Behavior on PPP Performance. Soft Sci. 2020, 34, 1–5. [Google Scholar] [CrossRef]
Shrestha, N. Detecting multicollinearity in regression analysis. Am. J. Appl. Math. Stat. 2020, 8, 39–42. [Google Scholar] [CrossRef]
Wang, Y.; Jin, X. Structural risk of diversified project financing of city investment company in China based on the best worst method. Eng. Constr. Archit. Manag. 2019, 28, 196–215. [Google Scholar] [CrossRef]
Arora, N.; Kaur, P.D. A Bolasso based consistent feature selection enabled random forest classification algorithm: An application to credit risk assessment. Appl. Soft Comput. 2020, 86, 105936. [Google Scholar] [CrossRef]
Wang, Y.; Li, Y.; Song, J.; Ma, Z. Study on Earning Management under the Corporate Life Cycle: Based on the Debt Covenants Theory. Manag. Rev. 2016, 28, 75–91. [Google Scholar] [CrossRef]
Wang, B.; Zhang, S.; Wang, X.; Han, L. The optimal capital structure of PPP projects:Based on the realoption method. J. Manag. Sci. China 2019, 22, 73–85. [Google Scholar]
Dagong Global Credit Rating. Local Government Credit Rating Method. 2022. Available online: https://www.dagongcredit.com (accessed on 1 September 2022).
Bush, C. The Rise of Dagong Global Credit Rating Agency and the International Credit Rating Agencies Reforms; University of Surrey: London, UK, 2021. [Google Scholar]
Zhang, L.; Song, Q. Multimodel Integrated Enterprise Credit Evaluation Method Based on Attention Mechanism. Comput. Intell. Neurosci. 2022, 2022, 8612759. [Google Scholar] [CrossRef]
Yu, L. Forecasting and Decision Optimization Theory and Methods Based on Artificial Intelligence. J. Manag. Sci. 2022, 35, 60–66. [Google Scholar]
Lappas, P.Z.; Yannacopoulos, A.N. A machine learning approach combining expert knowledge with genetic algorithms in feature selection for credit risk assessment. Appl. Soft Comput. 2021, 107, 107391. [Google Scholar] [CrossRef]
Zhou, Y.; Su, X. Credit Risk Prediction of Company Based on Optimal Feature Set. J. Syst. Manag. 2021, 30, 817–838. [Google Scholar]
Luo, J.; Yan, X.; Tian, Y. Unsupervised quadratic surface support vector machine with application to credit risk assessment. Eur. J. Oper. Res. 2020, 280, 1008–1017. [Google Scholar] [CrossRef]
Pigini, C. Penalized maximum likelihood estimation of logit-based early warning systems. Int. J. Forecast. 2021, 37, 1156–1172. [Google Scholar] [CrossRef]
Dawood, M.; Horsewood, N.; Strobel, F. Predicting sovereign debt crises: An Early Warning System approach. J. Financ. Stab. 2017, 28, 16–28. [Google Scholar] [CrossRef]
Sun, Y. Customer Stickiness Evaluation Model Research Based on Machine Learning. In Artificial Intelligence in China; Springer: Berlin/Heidelberg, Germany, 2022; pp. 522–527. [Google Scholar]
Liu, C.; Xie, J.; Zhao, Q.; Xie, Q.; Liu, C. Novel evolutionary multi-objective soft subspace clustering algorithm for credit risk assessment. Expert Syst. Appl. 2019, 138, 112827. [Google Scholar] [CrossRef]
Lin, J.C.; Shao, Y.; Djenouri, Y.; Yun, U. ASRNN: A recurrent neural network with an attention model for sequence labeling. Knowl. -Based Syst. 2021, 212, 106548. [Google Scholar] [CrossRef]
Lee, C.Y.; Koh, S.K.; Lee, M.C.; Pan, W.Y. Application of Machine Learning in Credit Risk Scorecard. In Proceedings of the International Conference on Soft Computing in Data Science, Virtual. 2–3 November 2021; pp. 395–410. [Google Scholar]
Bai, Y.; Zha, D. Commercial Bank Credit Grading Model Using Genetic Optimization Neural Network and Cluster Analysis. Comput. Intell. Neurosci. 2022, 2022, 4796075. [Google Scholar] [CrossRef] [PubMed]
Wang, W.; Xiao, Y.; Li, X. Study on the Influencing Factors of Credit Spread of Municipal Bond Based on Random Forest Regression Model. Math. Pract. Theory 2020, 50, 311–320. [Google Scholar]
Szczepański, M.; Choraś, M.; Pawlicki, M.; Kozik, R. Achieving Explainability of Intrusion Detection System by Hybrid Oracle-Explainer Approach. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
Parr, T.; Grover, P. dtreeviz: Decision Tree Visualization. 2020. Available online: https://github.com/parrt/dtreeviz (accessed on 8 September 2022).
Lyu, Z.; Yu, Y.; Samali, B.; Rashidi, M.; Mohammadi, M.; Nguyen, T.N.; Nguyen, A. Back-propagation neural network optimized by K-fold cross-validation for prediction of torsional strength of reinforced Concrete beam. Materials 2022, 15, 1477. [Google Scholar] [CrossRef]

Figure 1. Mechanism of the dynamic comprehensive evaluation of corporate credit.

Figure 2. Multivariate joint distribution.

Figure 3. Correlation matrix analysis of variables.

Figure 4. Analysis of financial indicators.

Figure 5. Schematic of XGBoost-logistic combined model construction.

Figure 6. Monotonicity analysis of LGFV index bins. (a) Net worth; (b) total assets; (c) operating income; (d) return on equity.

Figure 7. Index ranking comparison of LGFVs.

Figure 8. Analysis of influence degree of characteristic variables.

Figure 9. Impact analysis of the major variables. (a) Net worth; (b) total assets; (c) operating income; (d) return on equity.

Figure 10. Visualization case of the XGBoost decision training model.

Figure 11. K-fold cross-validation principle.

Figure 12. Comparative analysis of the K-fold cross-validation. (a) XGBoost-logistic combined model; (b) logistic regression model.

Figure 13. Performance comparison and analysis of the model. (a) Gain curve comparison; (b) lift curve comparison.

Table 1. City investment company characteristic index variables.

First-Grade Indexes	Second-Grade Indexes
Capital Structure	Net worth
	Total assets
	Operating income
Operational Capability	Total asset turnover
	Inventory turnover
	Business activities cash flow/total assets
	Current assets turnover
Profitability	Return on equity
	Main business revenue growth
	Sales net interest rates
	Operating profit/total assets
	Assets net profit margin
	Return on total assets
Regional Economy	The public finance budget revenues
	GDP per capita
	Tax Revenue
	The local GDP
	The local GDP growth
Debt Paying Ability	Current liabilities ratio
	Equity ratio
	Financing activities cash flow/total liabilities
	Quick ratio
	Asset-liability ratio
	A coupon multiples
	Current ratio

Table 2. Characteristic variables correlation analysis.

$Variable 1 (x_{v i}^{})$	$Variable 2 (y_{v i})$	Correlation Coefficient
Net worth	Total assets	0.9368
Return on equity	Assets net profit margin	0.9165
Operating profit/Total assets	Assets net profit margin	0.8540
Asset-liability ratio	Equity ratio	0.9924
Total assets turnover	Current assets turnover	0.8287
Tax Revenue	The public finance budget revenues	0.8219
The local GDP	The public finance budget revenues	0.8523
The local GDP	Tax Revenue	0.9934
GDP per capita	The public finance budget revenues	0.8019
	Tax Revenue	0.8536
	The local GDP	0.8631

Table 3. Credit Rating of LGFVs.

Credit Grade	Score Interval	Implication
AAA	$x \geq 95$	Strong performance capability, good reputation, and low credit risk
AA	$90 \leq x < 95$	Strong performance ability, good reputation, and low credit risk
A	$85 \leq x < 90$	Strong performance ability, good reputation, and low credit risk
BBB	$80 \leq x < 85$	Performance ability medium to excellent, reputation medium to excellent, and medium to small credit risk
BB	$75 \leq x < 80$	Medium performance, medium reputation, and medium credit risk
B	$70 \leq x < 75$	Lower-middle performing ability, lower-middle credibility, andlower-middle credit risk
CCC	$65 \leq x < 70$	Weak ability to perform, general credibility, and credit risk
CC	$60 \leq x < 65$	Weak Performance, credit deviation, and high credit Risk
C	$55 \leq x < 65$	Weak performance, poor reputation, and high credit risk
D	$x < 55$	Great credit risk

Table 4. Measurement of IV value.

IV Value Scope	Predictive Ability
[−lnf, 0.02)	No predictive ability
[0.02, 0.1)	Weak predictive ability
[0.1, 0.3)	Medium predictive ability
[0.3, +lnf)	Strong predictive ability

Table 5. Explanatory variant importance ranking.

First-Grade Indexes	Second-Grade Indexes	IV Value		CRA Rank
First-Grade Indexes	Second-Grade Indexes	XGBoost	Logistic	CRA Rank
Capital Structure	Net worth	2.2982	2.0912	1
	Total assets	2.0101	1.9013	2
	Operating income	1.2843	1.261	3
Operational Capability	Total asset turnover	0.2429	0.3102	11
	Inventory turnover	0.2984	0.3193	10
	Business activities cash flow/total assets	0.3357	0.3509	7
	Current assets turnover	0.1721	0.2216	17
Profitability	Return on equity	0.4564	0.4463	4
	Main business revenue growth	0.1204	0.202	19
	Sales net interest rates	0.2679	0.2696	20
	Operating profit/total assets	0.3141	0.358	8
	Asset net profit margin	0.3896	0.4319	5
	Return on total assets	0.2343	0.2606	12
Regional Economy	The public finance budget revenues	0.1914	0.4915	None
	GDP per capital	0.3326	0.4628
	Tax Revenue	0.319	0.4907
	The local GDP	0.319	0.5031
	The local GDP growth	0.0217	0.3805
Debt Paying Ability	Current liabilities ratio	0.1374	0.1572	9
	Equity ratio	0.2285	0.2824	14
	Financing activities cash flow/total liabilities	0.2411	0.2888	13
	Quick ratio	0.3405	0.3584	6
	Asset-liability ratio	0.2278	0.2719	15
	A coupon multiple	0.154	0.2584	18
	Current ratio	0.2117	0.275	16

Table 6. Model evaluation analysis.

	AUC	Precision	Recall	Sensitivity	F1 Score	Gini	Ks
Model	AUC	Precision	Recall	Sensitivity	F1 Score	Gini	Ks
Logistic	0.7453	0.7812	0.7538	0.5104	0.2831	0.7095	0.5515
XGBoost-logistic	0.8359	0.8715	0.8391	0.7604	0.8456	0.7846	0.7198

Table 7. Datasets comparison and model validation.

Dataset	Type	Coefficient	Logistic	XGBoost-Logistic
Accuracy of training set	Default	Average value	61.008	83.648
		Minimum	59.988	83.497
		Maximum	62.787	83.813
		Standard deviation	0.002	0.000007
	Non-default	Average value	62.189	83.365
		Minimum	61.259	82.437
		Maximum	63.673	83.836
		Standard deviation	0.008	0.005
	Total	Average value	61.563	83.156
Accuracy of test set	Default	Average value	62.614	82.107
		Minimum	61.018	81.729
		Maximum	63.452	82.46
		Standard deviation	0.013	0.0009
	Non-default	Average value	52.971	83.286
		Minimum	52.167	82.532
		Maximum	53.875	83.627
		Standard deviation	0.009	0.003
	Total	Average value	57.792	82.688

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, L.; Yang, S.; Wang, S.; Shen, J. Research on PPP Enterprise Credit Dynamic Prediction Model. Appl. Sci. 2022, 12, 10362. https://0-doi-org.brum.beds.ac.uk/10.3390/app122010362

AMA Style

Zhao L, Yang S, Wang S, Shen J. Research on PPP Enterprise Credit Dynamic Prediction Model. Applied Sciences. 2022; 12(20):10362. https://0-doi-org.brum.beds.ac.uk/10.3390/app122010362

Chicago/Turabian Style

Zhao, Likun, Shaotang Yang, Shouqing Wang, and Jianxiong Shen. 2022. "Research on PPP Enterprise Credit Dynamic Prediction Model" Applied Sciences 12, no. 20: 10362. https://0-doi-org.brum.beds.ac.uk/10.3390/app122010362

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on PPP Enterprise Credit Dynamic Prediction Model

Abstract

1. Introduction

2. Materials and Methods

2.1. Comprehensive Evaluation Mechanism of Enterprise Credit Dynamics

2.2. Data Processing and Characteristic Derivative Analysis

2.2.1. Multivariate Joint Distribution Analysis

2.2.2. Variable Correlation Matrix Analysis

2.2.3. Risk Analysis of Financial Indicators

2.3. Credit Rating of LGFVs

3. Prediction Model Construction for LGFVs

3.1. Mechanism of Model Construction

3.2. Credit Rating Model and Scoring Card for LGFVs

3.2.1. WoE Binning Monotonicity Analysis

3.2.2. Selection of Explanatory Variable Characteristics

3.3. Interpretability Analysis of Characteristic Variables

3.3.1. Macro Perspective Analysis

3.3.2. Micro-Perspective Analysis

3.3.3. Visual Interpretation of Model Decision

4. Experiment

4.1. Model Testing and Comparison

4.2. Analysis of Results

5. Conclusions and Future Work

5.1. Conclusions

5.2. Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI