Next Article in Journal
Volatility Spillover Effects in the Moroccan Interbank Sector before and during the COVID-19 Crisis
Previous Article in Journal
Expectations of Macroeconomic News Announcements: Bitcoin vs. Traditional Assets
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Meta-Learning Approaches for Recovery Rate Prediction

by
Paolo Gambetti
1,
Francesco Roccazzella
2,* and
Frédéric Vrins
2,3
1
CRIF S.p.A., via M. Fantin, 1-3, 40131 Bologna, Italy
2
LFIN/LIDAM, UCLouvain, Voie du Roman Pays 34, B-1348 Louvain-la-Neuve, Belgium
3
Department of Decision Science, HEC Montréal, Montréal, QC H3T 2A7, Canada
*
Author to whom correspondence should be addressed.
Submission received: 29 April 2022 / Revised: 2 June 2022 / Accepted: 4 June 2022 / Published: 12 June 2022

Abstract

:
While previous academic research highlights the potential of machine learning and big data for predicting corporate bond recovery rates, the operations management challenge is to identify the relevant predictive variables and the appropriate model. In this paper, we use meta-learning to combine the predictions from 20 candidates of linear, nonlinear and rule-based algorithms, and we exploit a data set of predictors including security-specific factors, macro-financial indicators and measures of economic uncertainty. We find that the most promising approach consists of model combinations trained on security-specific characteristics and a limited number of well-identified, theoretically sound recovery rate determinants, including uncertainty measures. Our research provides useful indications for practitioners and regulators targeting more reliable risk measures in designing micro- and macro-prudential policies.

1. Introduction

Credit risk is the main concern for financial institutions. It is governed by three factors: the exposure at default (EAD), the default likelihood (that is, the probability of default, PD) and the loss given default (LGD). The latter can also be expressed as ( 1 R R ) , where the recovery rate R R represents the percentage of EAD that can be recovered in the event of default of the reference entity. For a long time, financial institutions focused on the modeling of PDs. Ratings, for example, essentially quantify the risk of a default event, the loss severity receiving little attention. However, EAD and R R are key drivers of credit risk. For instance, they act as scaling constants when computing banks’ regulatory capital requirements (BCBS 2017; Loterman et al. 2012; Zhang and Thomas 2012) in Basel II (BCBS 2006) and Basel III frameworks (BCBS 2011) that banks can estimate using internal models in the advanced (A-IRB) approach. But R R also influences the value of non-performing loans (Bellotti et al. 2021), corporate bonds and credit derivatives (Andersen and Sidenius 2004; Berd 2005; Gambetti et al. 2018; Pykthin 2003), as well as the price of mainstream OTC derivatives’ products such as equity options and interest rates swaps due to counterparty risk (Gregory 2012). This explains why recovery rates modeling and forecasting receive more and more attention in current times.
Academic research highlights how nonlinear techniques (Loterman et al. 2012; Nazemi and Fabozzi 2018; Nazemi et al. 2018; Qi and Zhao 2011; Tobback et al. 2014; Yao et al. 2015) and ensemble methods (Bastos 2014; Bellotti et al. 2021; Hartmann-Wendels et al. 2014; Nazemi et al. 2017) should be preferred to the traditional parametric regressions used in earlier studies on recovery rate determinants. This finding is not confined only to RR forecasting, but it is also in credit scoring (Liu et al. 2022; Wang et al. 2022). However, the possibility of combining predictions from a large number of different algorithms remains widely unexplored. This is surprising given that it is now well recognized that combining models might be preferable (Atiya 2020) to selecting a single model. In this paper, we confirm this conclusion by showing that meta-learning techniques exhibit better forecasting performance and lower model risk than methods in previous research when dealing with  R R .1
Meta-learning can be defined as the act of a) selecting and b) optimally combining the outputs of multiple learning machines (first-level learners), according to some combination scheme that is learned from the data (meta-learner or second-level learner) (Santos et al. 2017). Meta-learning is very flexible: the models to be combined can be represented by expert judgment, individual learners (e.g., lasso, MARS) or ensemble learners (e.g., random forests, boosted trees), whose predictions themselves are already combinations of the outputs of different models. Furthermore, the models to combine can also be trained with different predictors. This contrasts with homogeneous ensemble methods in which, despite the model combinations that are involved, they generally combine weak learners2 of the same type, e.g., regression trees are combined into a random forest.
The main advantage of meta-learning is, hence, the ability to exploit a wide spectrum of functional forms. Its promising applicability for recovery rate prediction was first reported in Nazemi et al. (2017) using a fuzzy decision fusion approach. However, several other alternatives are worth exploring. For example, Roccazzella et al. (2021) propose a robust and sparse combination method that mitigates estimation uncertainty and implicitly features forecast selection in a single step. This approach has the additional advantage of relying on a closed-form solution for the estimation of the optimal combination strategy.
The quality of the forecasts depends also on the predictive variables we consider. Prior studies on defaulted bonds highlight the importance of security-specific characteristics (Altman and Kishore 1996; Bris et al. 2006; Jankowitsch et al. 2014; Schuermann 2004), economic conditions and the credit cycle (Acharya et al. 2007; Altman et al. 2005; Betz et al. 2018, 2020; Bruche and González-Aguado 2010).3 Models based on big data and variable selection techniques have been proposed by Nazemi et al. (2017, 2018) and Nazemi and Fabozzi (2018). Gambetti et al. (2019) show that economic uncertainty is the most important systematic determinant of recovery rate distributions. However, the latter study is based on an ex post analysis. Given that uncertainty shapes the economic outlook (ECB 2009, 2016; Gieseck and Largent 2016; Kose and Terrones 2012) and its proxies are particularly capable of anticipating economic downturns (Ludvigson et al. 2019), it remains to be explored whether uncertainty proxies can also improve predictive performance.
In this paper, we extend the literature on recovery rate modeling studying the forecasting performance of linear, nonlinear, rule-based and meta-learning algorithms across different specifications of the predictors set.
We start our analysis by studying whether a larger set of macro-financial indicators, uncertainty measures and idiosyncratic features in bond recovery rate forecasting models offers a significant advantage compared to a more parsimonious but economically grounded framework. Specifically, we extend the set of macroeconomic predictors used in Nazemi et al. (2017) and Nazemi et al. (2018) with 55 pricing factors and industry portfolio returns. We also enlarge the spectrum of uncertainty proxies considered by Gambetti et al. (2019) with 11 additional measures of economic uncertainty derived from text-analysis techniques. This offers two advantages. First, as Jurado et al. (2015) point out, text-based indexes can show significant independent variations from other popular uncertainty proxies, suggesting that much of the variation in these proxies is not driven by uncertainty itself. Second, textual data is more granular, allowing for the identification of the categories of uncertainty, e.g. government spending uncertainty or monetary policy uncertainty, that better predict recovery rates. In contrast to previous studies (Nazemi and Fabozzi 2018; Nazemi et al. 2017, 2018), we find that more parsimonious models that rely on well-documented recovery rate determinants outperform those making use of the entire set of predictors and those relying on data-driven variable selection. A limited number of economically grounded predictors also makes the model easier to implement and manage. Among those, uncertainty measures are particularly relevant for recovery rate prediction. Moreover, it makes its results more transparent, hence, more prone to regulatory validation. This is consistent with the regulatory requirements of IRB approaches (BCBS 2006, 2017).
Second, we provide the largest benchmark study of machine learning methods in the context of bond recovery rate prediction. We consider a total of 20 predictive algorithms, and we obtain similar conclusions to those of Bellotti et al. (2021) in the case of recovery rates for nonperforming loans: random forests, quantile random forests, boosted trees and cubist display the most promising performance and seem to be the best class of models. Bagged MARS and model-averaged neural networks also seem to be competitive.
Third, we empirically show that meta-learning can be used to improve recovery rate predictions compared to traditional machine learning methods while considerably reducing model risk. This evidence is preserved both when looking at the predictive performance within a chosen predictor set and when jointly considering predictions across all specifications of the predictor set. The proposed methods outperform competitive combining methods such as the equally weighted forecast and the hill-climbing algorithm of Caruana et al. (2004), which showed promising results in the field of credit risk scoring (for more details, see Lessmann et al. 2015). Furthermore, despite the main concern of this paper, which is recovery rate predictions, the idea of combining heterogeneous models trained using diverse predictors sets is not specific to credit risk modeling, and it can be used to hedge model uncertainty across the whole field of management science.
The remainder of the paper is structured as follows. Section 2 contains an overview of the machine learning algorithms involved in our meta-learning approach and explains the latter. Section 3 provides a description of the data. Section 4 describes the main results of our benchmark study. Section 5 discusses the practical implications implied by our results. Section 6 concludes the paper.

2. Methodology

An overview of our predictive strategy can be found in Figure 1. After specifying the bond recovery rate predictors, first-level learners are trained to minimize the mean square error (MSE). Subsequently, the fitted residuals of the first-level learners (meta-data) become the input of second-level learners (meta-learners), which combine the original models with the goal of making the aggregate forecast error variance as small as possible. Finally, we evaluate the predictive performance of the various classes of linear, nonlinear, rule-based and meta-learning methods.
We briefly review the predictive algorithms and combination strategies that will serve as first- and second-level learners, respectively.

2.1. First-Level Learners

To undertake an unbiased benchmark study, the spectrum of competing models should be as rich as possible. This requirement has rarely been respected in the context of recovery rate modeling, except for the large-scale benchmark studies of Loterman et al. (2012) and Bellotti et al. (2021).
For comparison purposes, we used a similar set of techniques as Bellotti et al. (2021), who employ 20 algorithms belonging to three different classes: linear, nonlinear and rule-based.
We provide the list in Table 1 together with the corresponding R implementations. For more details on nonlinear and rule-based methods, we refer to Appendix A.4

2.1.1. Linear Models

We consider seven linear models with and without penalization. Following Bellotti et al. (2021), they can be framed as the following minimization problem:
arg min β 0 , β R p y X β β 0 2 2 + λ ( 1 α ) β 2 2 + α β 1
where y denotes the vector of the observations in the sample, X is the N-by-p matrix of predictors and β stands for the vector of regression coefficients. Different specifications of the penalty term λ 0 and the mixing factor 0 α 1 yield the standard linear regression model (with and without backward selection), ridge, lasso and elastic net. Models of these types can only reproduce linear relationships such as those reproduced in Figure 2.

2.1.2. Nonlinear Models

We dealt with six types of nonlinear models. Among the kernel methods, we considered support vector regression (SVR), relevance vector machines (RVM) regression and Gaussian processes. We further considered multivariate adaptive regression splines (MARS) and two nonlinear ensembles: model-averaged neural networks and bagged MARS. We refer the reader to Bellotti et al. (2021) for an extended treatment of each of them. These models are naturally suited to capture nonlinear relationships, but they can be prone to overfit. An example of nonlinear fit is included in Figure 2.

2.1.3. Rule-Based Models

According to Gambetti et al. (2019), rule-based methods are able to identify clusters of data with similar properties and to reproduce step-like relationships (with or without slopes depending on the particular algorithm). We included seven types of rule-based methods. Individual models include regression trees and conditional inference trees, while ensembles are represented by cubist, random forests, quantile random forests and boosted trees with and without stochastic gradient boosting. An example of rule-based model fit is included in Figure 2.

2.2. Second-Level Learners

Starting from m predictions y ^ 1 , , y ^ m of the random variable y returned by m first-level learners, two strategies can be adopted. The standard procedure consists of selecting the best forecast, say y ^ : = y ^ i , where model i is selected according to some criteria. Alternatively, these predictions can be aggregated to form a single prediction y ^ : = ϕ ( y ^ 1 , , y ^ m ) using some function ϕ . Model combination offers diversification gains that makes it attractive when we cannot identify ex ante the best single model.5 In addition, in the rare cases where the best model can be identified, meta-learning techniques could still take full advantage of the available information when the first-level learners rely on various data sources or cover a wide spectrum of modeling assumptions.
Meta-learning learns the combination strategy ϕ directly from the data with the explicit goal of minimizing a loss function. Precisely, let y be an n-dimensional column vector containing n observations of the target variable and Y ^ be an n-by-m matrix of m unbiased candidate forecasts of y . The optimal combination strategy consists of estimating the function ϕ ( Y ^ ) that solves
ϕ * : = arg min ϕ Φ y ϕ ( Y ^ ) 2 2 ,
where Φ : = R n × R m R n is some conformable functional space.
Nevertheless, its success depends on how accurately the combination strategy can be determined. The use of in-sample data both to train the individual models and to consecutively combine them on the basis of their respective fitted residuals can significantly amplify the initial estimation error and consequently produce poor out-of-sample predictions.
Meta-learning relies on validation techniques to assess how well the combination weights will generalize to the out-of-sample predictive exercise. We opt to combine by minimizing the variance of the aggregate prediction error and include robust combination methods to hedge against potential instability.6

2.2.1. Linear Meta-Learners

Restricting ourselves to the class of linear combinations with weight summing to one, i.e., for Φ being the set of functions from R m × R n R of the form ϕ ( x 1 , , x m ) : = i = 1 m w i x i satisfying i = 1 m w i = 1 leads to a combined prediction of the form y ^ = i = 1 m w i y ^ i . In this case, the optimization problem becomes
w * : = arg min w W y Y ^ w 2 2 where W : = { w R m | 1 T w = 1 } .
Constraining the weights to sum to 1 keeps the aggregate prediction unbiased provided that the candidate forecasts are also unbiased (which is the case in this paper). By additionally constraining the weights to be nonnegative, this approach corresponds to Breiman’s stacked regressions (Breiman 1996), where linear combinations of different predictors are considered to further improve prediction accuracy. Granger and Ramanathan (1984) show that this approach is equivalent to selecting the weights w * that minimize the variance of the meta-learner’s prediction error.
Nevertheless, this combination strategy can lack robustness when the covariance matrix of prediction errors is poorly estimated. This often occurs because of sample size limitations, considerable background noise or when first-level forecasts are highly collinear (Claeskens et al. 2016). We can tackle the potential instability of w * by using the linear shrinkage estimator of the covariance matrix of prediction errors, which we denote by Σ λ . The latter consists of combining the sample covariance matrix (which is easy to compute, asymptotically unbiased but prone to estimation errors) with an estimator that is misspecified and biased but more robust to estimation errors (Ledoit and Wolf 2004). This approach, initially derived in a portfolio optimization context, was recently transposed in Roccazzella et al. (2021) to the forecast combination problem, showing that constrained optimization with shrinkage (COS) of Σ can provide a single-step, fast and robust optimal forecast combination strategy. Here we adapt the COS to act as a linear meta-learner, which combines the first-level learners only on the basis of in-sample information. This leads to the following optimization problem:
w λ * : = arg min w W + w T Σ λ w where Σ λ : = 1 λ Σ + λ Σ ¯ ,
where λ [ 0 , 1 ] is the shrinkage intensity, Σ is the sample covariance matrix of prediction errors and Σ ¯ is a predetermined reference covariance matrix. We estimate the optimal shrinkage intensity by minimizing the expected Frobenius norm of the difference between Σ λ and the population covariance matrix of prediction errors S (Ledoit and Wolf 2004).7
λ * : = arg min λ [ 0 , 1 ] E Σ λ S 2 2 .
We consider two shrinkage directions for Σ ¯ .
Definition 1 (COS-E—Constrained Optimization with Shrinkage towards Equal weights).
The target covariance matrix Σ ¯ C O S E corresponds to the case where first-level learners are assumed to have identical prediction error variance σ ¯ 2 and identical pairwise correlation coefficients ρ ¯ :
Σ ¯ C O S E : = σ ¯ 2 1 ρ ¯ ρ ¯ ρ ¯ 1 ρ ¯ ρ ¯ ρ ¯ 1 .
In practice, we set σ ¯ 2 = 1 m i = 1 m σ i 2 , where σ i 2 is the sample prediction error variance of model i (the average error prediction variance in the set of first-level learners) and by ρ ¯ = 2 m ( m 1 ) i = 1 m 1 j = i + 1 m ρ i , j (the average correlation coefficient of first learners’ in-sample prediction error).
Definition 2 (COS-IL—Constrained Optimization with Shrinkage towards Inverse Loss-based weights).
The target covariance matrix Σ ¯ C O S I L corresponds to the case where first-level learners have an identical pairwise correlation coefficient ρ ¯ , but their respective prediction error variance is estimated using sample data:
Σ ¯ C O S I L : = σ 1 2 ρ ¯ σ 1 σ 2 ρ ¯ σ 1 σ m ρ ¯ σ 2 σ 1 σ 2 2 σ m 1 2 ρ ¯ σ m 1 σ m ρ ¯ σ m σ 1 ρ ¯ σ m σ m 1 σ m 2 .
As benchmarks, we also include the equally weighted forecast, i.e., ϕ ( x 1 , , x m ) : = i = 1 m 1 m x i and the hill-climbing algorithm of Caruana et al. (2004). In this latter case, forward step-wise selection is used to identify the equally weighted combination that minimizes the root mean square error (RMSE) in a validation fold sampled from the training data.

2.2.2. Nonlinear Meta-Learners

Thus far, we have described linear weighting schemes. Nevertheless, meta-learning is more general. For example, in the field of image classification, meta-learning techniques typically involve nonlinear methods such as deep learning or recurrent models (Santoro et al. 2016), metric learning (Koch et al. 2015) and learning optimizers (Ravi and Larochelle 2017). Despite being more flexible than linear methods, nonlinear meta-learners are also more complex and prone to overfit, especially in a relatively small data set.8 For these reasons, we opt for ensembles of shallow (one hidden layer) feed-forward neural networks (NNs), which can still approximate any measurable function at any desired degree of accuracy provided that sufficiently many hidden units are available (Hornik et al. 1989). In this case, we consider Φ to be the set of functions from R m × R n R k of the form ϕ ( x 1 , , x m ) : = i = 1 k w i 1 + exp { j = 1 m α i j x j } , where k is the number of neurons in the hidden layer.
As for the linear combination schemes, we train the NNs to minimize the in-sample MSE using, as input, the first-level predictions. However, in this case, the output of the meta-learning is not a linear function of the inputs, and the combining weights are not constrained to sum to one. Therefore, studying the marginal contribution of each first-level prediction onto the aggregate prediction is not straightforward. In our empirical exercise, we consider two feed-forward neural networks with 1 (NN-1) and 2 (NN-2) hidden layers and 8 and 16-8 neurons, respectively.

3. Data

3.1. Recovery Rates and Security-Specific Characteristics

The main references on bond recovery rate modeling are generally based on the Moody’s Default & Recovery Database (Moody’s DRD) or the Standard & Poor’s Capital IQ Database. We employ Moody’s DRD in this paper.
Following the standard market convention (Mora 2015; Schuermann 2004), the recovery rate of each bond is computed as the bond price measured 30 days after the default date, declared by the rating agency, and divided by the face value. To make our analyses more comparable with earlier studies such as Nazemi et al. (2017), Nazemi et al. (2018) and Nazemi and Fabozzi (2018), we apply a similar filtering strategy. We selected dollar-denominated bonds issued by U.S. companies and with at least USD 5 million in face value. To replicate the same economic conditions, we also filter for default issues in the period 2002–2012. We only retain the observations associated with known values for the following security-specific characteristics: debt seniority, issuer’s industrial sector, default type, coupon level, maturity, presence of additional guarantees different from the issuer’s asset and default date. We are thus left with 768 observations. Notice that machine learning methods provide useful insights even with a sample of this size. In particular, the considered predictive models mitigate the risk of overfitting, can identify the most relevant predictive variables and explore the existence of potential nonlinearities in the data (nonlinear and rule-based methods).
Figure 3 includes a histogram of our recovery rate sample. We provide the corresponding summary statistics in Table 2.
Summary statistics of recovery rates conditional on the seniority of the defaulted bond, issuer’s industrial sector and default type are provided in Table 3, Table 4 and Table 5, respectively. They are consistent with previous findings on recovery rate determinants. For instance, bonds with higher seniority are associated with higher recovery rates on average, and those of senior secured bonds are the most dispersed (Altman and Kalotay 2014; Altman and Kishore 1996). Recovery rates are also higher, on average, when the issuer is operating in an industrial sector featuring higher asset tangibility and in the utility sector in particular (Acharya et al. 2007; Altman and Kishore 1996; Schuermann 2004). Similarly, milder default procedures lead to higher recovery rates (Altman and Karlin 2009; Bris et al. 2006; Davydenko and Franks 2008; Franks and Torous 1994). Defaults on security cash flows are expected to recover more than company reorganizations or liquidations. Controlled reorganizations (prepackaged Chapter 11) also display higher recovery than uncontrolled reorganizations and asset liquidations (receivership and procedures included in the “others” category). Table 6, Table 7 and Table 8 show that our sample of recovery rates is also consistent with prior findings about the role of coupons, maturity and backing guarantees. Recovery rates are higher in the presence of backing guarantees, increase with coupons and decrease with maturity (Jankowitsch et al. 2014). All these security-specific characteristics have been used as control variables in recent articles on bond recovery rate determinants (François 2019; Gambetti et al. 2019) and are used as predictors in our study.

3.2. Systematic Factors

We extract industry default rates from Moody’s DRD and the remaining systematic factors from the databases managed by the Federal Reserve Bank of St. Louis: FRED and FRED-MD (McCracken and Ng 2015). The latter include a large set of time series referring to output and income, labor market, housing, consumption, orders and inventories, money and credit, interest and exchange rates, prices and the stock market.9
We then extend this data set with a large number of economic uncertainty measures from three different classes: survey-based, news-based and volatility-based.10 All measures are retrieved from the websites of the authors (Baker et al. 2016; Jurado et al. 2015; Ludvigson et al. 2019). We consider all five measures employed in the study by Gambetti et al. (2019) plus 11 additional measures of categorical economic policy uncertainty. Table 9 gives an overview. Factors relating to the market price of risk and industry portfolio returns are instead retrieved from the Fama-French database. Predictors are measured one month before the default date.
We now proceed to present the results (Section 4) and to discuss their relevance with respect to practical implementation in the industry (Section 5).

4. Results

We compare the performance of the considered algorithms across nine specifications of the predictor set. All model specifications feature security-specific characteristics but differ in the systematic variables that are included. We use two approaches to determine the latter.
(a)
Statistical approach: the algorithms can access either the full data set of systematic variables or a set created with variable selection techniques. For variable selection, we consider a model based on lasso-selected systematic variables and one based on lasso with stability selection (Meinshausen and Bühlmann 2010). While the latter has been used in Nazemi and Fabozzi (2018) to check the reliability of their lasso-selected macroeconomic variables, those retained by lasso with stability selection have never been used to feed predictive algorithms.11
(b)
Economic approach: we create models by relying exclusively on well-identified factors based on the results of Gambetti et al. (2019) and prior studies on recovery rate determinants. Table 10 includes a summary of the model specifications.
We compare the considered predictive strategies by analyzing the root mean square forecast error (RMFE), and we also assess the significance of a performance difference via a model confidence set test (hereafter MCS, Hansen et al. 2011).12

4.1. Predictive Models vs. Historical Averages

The performance of our predictive algorithms across model specifications is summarized in Figure 4; the detailed out-of-sample RMSE are reported in Table 11. Each line corresponds to a different algorithm. The solid horizontal line (in red) corresponds to the model where we use the in-sample mean recovery rate as a prediction; it can be assimilated to the regulatory standard approach and foundation IRB approach (BCBS 2006, 2011), which are largely based on historical LGD values. Figure 4 highlights that forecasting recovery rates using an algorithm always leads to more precise estimates than using the average of previously observed values. The only exception is the linear regression, which, as expected, suffers from the curse of dimensionality when trained on the full set of predictors (specification 1).

4.2. Models Based on Systematic Variables

Table 11 and Figure 4 also make clear that, in line with the literature, models exploiting systematic variables (specifications 1 to 8) always yield better forecasting performance than a model based on security-specific factors alone (specification 9).
For the selection of macroeconomic variables with data-driven methods, we observe from Table 11 that lasso selection seems to bring more benefit to linear models than to nonlinear or rule-based models. In particular, it appears that ensemble methods (neural networks, bagged MARS, boosted trees, random forests, quantile random forests, cubist) are deprived of useful predictive information when they are trained on lasso-selected variables (specification 2). The performance of these latter algorithms improves if we instead implement the lasso procedure with stability selection (specification 3). We obtain particularly competitive RMSE values in this latter case; the best performance of first-level learners reaches an RMSE of 0.174 with boosted trees. From an aggregate perspective, it is clear that models relying on lasso selection (with or without stability control) reduce the embedded model risk compared to using the complete set of predictors.
We find that lasso with stability selection retains financial uncertainty, uncertainty related to government spending and uncertainty related to sovereign debt and currency crises (Table 12). Financial uncertainty is associated with the highest probability of being selected ( 95 % ). The second and third factors, an index of consumption expenditures and a housing market indicator, are associated with probabilities of 88 % and 86 % , respectively.13 Given these results, we confirm the importance of uncertainty measures for forecasting bond recovery rates.
Moreover, if we consider the features of data-driven selection methods against those of economically motivated models, we should prioritize the use of the latter. Notwithstanding that both methods exhibit comparable predictive performance, the economically motivated models offer several advantages. By being based on a low-dimensional set of well-identified economic factors, they are easier to implement and monitor and yield more interpretable results. In contrast, data-driven selection methods can fail to correctly select the best subset of predictors when the latter feature high correlation and require the specification of additional hyperparameters. This increases the complexity and the computational burden, with no guarantee of perfectly selecting the full set of significant predictors.

4.3. First-Level Learners

We now discuss the findings regarding the out-of-sample performance of our algorithms across model structures. In Figure 4, we highlight (in black) the performance of ensemble methods: model-averaged NNs, bagged MARS, boosted trees (with and without stochastic gradient boosting), cubist, random forests and quantile random forests.
In this respect, our results point in the same direction as what Bellotti et al. (2021) discover for nonperforming loans. Rule-based ensemble methods are always associated with the most promising performance compared to both individual learners (linear or nonlinear) and the two other ensembles (bagged MARS and model-averaged NNs). Nonetheless, even if they are less competitive than rule-based ensembles, bagged MARS and model-averaged neural networks outperform individual learners in almost all model specifications.
This can be explained as follows. First, the prediction errors of ensemble methods generally have a lower variance than those of individual learners. This effect is actually a consequence of the aggregation of several base learners; it is particularly visible when comparing the performance of MARS against that of its bagged version. Second, rule-based ensembles are better suited to capture subgroups of data with similar properties and to build a separate model for each group. Recovery rates are particularly prone to behave in this manner, as explained in Yao et al. (2015), Nazemi et al. (2018) and Bellotti et al. (2021). Third, rule-based methods are better suited to reproduce predictor–response relationships that are defined on a closed interval, as is generally the case for recovery rates.
It appears that the only individual learner that can compete with rule-based ensembles is the SVR. However, this method yields good performance only when trained using the full set of predictors or when systematic predictors are selected using lasso without stability selection (which corresponds to the framework adopted in Nazemi and Fabozzi (2018)).

4.4. Meta-Learning: Within and across Predictor Sets

Let us now analyze the performance of linear meta-learning algorithms within each specification of the predictors set (highlighted in Figure 4 and Table 11). We notice a sharp drop in the average RMSE metrics for almost all specifications compared to the other models. In fact, the architecture of these algorithms allows the creation of more flexible functional forms thanks to the combination of various first-level learners. The different strengths of first-level learners are merged together to yield more accurate forecasts. The performance of linear meta-learning algorithms can be compared to those of the best rule-based ensembles. Indeed, Table 11 shows that OPT+, COS-E and COS-IL always join the superior set of models with at least 10% probability outperforming the equally weighted forecast and the hill-climbing algorithm, which, nevertheless, remain competitive especially in specification 3, 4 and 5.
The variation of the aggregate errors within linear meta-learners is much lower than within the class of individual or ensemble models, and the same holds true for the maximum average loss. Therefore, model risk is sensibly reduced when relying on meta-learners compared to on both individual learners and ensembles. This feature can be observed from Figure 4 and holds across all model specifications. We also find that nonlinear meta-learners (NNs) should generally be avoided. They usually display larger RMSE values than those of linear meta-learners across all specifications of the predictor set. Moreover, their partial advantage over ensemble methods and first-level learners in some specifications is neutralized by considerably higher errors in others (i.e., specifications 1, 8 and 9). Hence, in terms of model risk, ensembles and linear meta-learners should be preferred.14
Nevertheless, in practice, we do not know ex ante which among the nine specifications of the predictors set will offer the best performance. While we have previously employed meta-learning to learn the predictive setup within each specification, we now use the same tools to combine all models across all specifications. In other words, we now jointly consider 180 base learners (20 models times 9 predictors specifications). Table 13 displays the RMSE, mean absolute error (MAE) and R 2 of the best 20 predictive strategies in this exercise. Boosted trees (specification 2, 4, 6, 7), quantile random forest (specification 3) and random forest (specification 3) are the only base learners that join the superior set of models. However, their performance clearly depends on the specification of the predictors set. Among meta-learners, while Opt suffers from the increased dimensionality of the problem and performs poorly, meta-learners with nonnegative combining weights (Opt+, COS-E and COS-IL) occupy three of the top four places and join the superior set of models, hence mitigating the uncertainty related to the choice of both the predictors set and the modeling framework, i.e., individual vs. ensembles, linear vs. nonlinear vs. rule-based methods. The results do not significantly differ when analyzing R 2 instead of RMSE; this is not surprising given the close relationship between the two metrics. The ranking remains coherent with previous results, especially among the top-performing methods: linear meta-learning techniques are still at the top, beaten only by boosted trees in specification 7, i.e., when inflation uncertainty and federal/state/local expenditures uncertainty are considered together with industry default rate, commercial and industrial loans delinquency rates, industrial production, market index returns, PMI and individual characteristics. Nevertheless, this difference in performance is not statistically significant: COS-E, COS-IL and Opt+ join the superior set of models with 10% confidence level. Conversely, equally weighted forecast and the hill-climbing algorithm perform poorly and are both excluded from the superior set of models. The main difference that emerges when considering the MAE is that quantile random forests (QRF) outperform their competitors, on average. This should not come as a surprise: QRF are trained to minimize the MAE to mitigate the impact of outliers on estimation error, while the loss functions of other predictive strategies are instead related to the RMSE. Overall, COS-E, COS-IL and Opt+ still provide very competitive MAE, while, at the same time, mitigating the uncertainty related to the choice of both the predictors set and the modeling framework.

5. Discussion and Practical Considerations

The results of the previous subsections convey important practical considerations that we now summarize.
First, financial institutions should accelerate the implementation of LGD internal models instead of maintaining the use of historical averages. We show in Section 4.1 that the predictive model does not need to be complicated to be effective. All model specifications outperform the historical average approach, which, however, is the method underlying the standard and foundation IRB frameworks. Unfortunately, recent regulatory guidelines (BCBS 2017) are now pointing in the opposite direction, favoring fixed recovery rate approaches. Our results suggest that this is perhaps not the best route to follow, as LGD internal models can produce more reliable risk figures.
Second, in Section 4.2, we confirm the necessity of including economic factors that can capture systematic fluctuations in recovery rates. This result is in line with the current regulatory prescriptions. Models relying on well-identified economic determinants provide remarkable results, while big data and variable selection procedures do not necessarily translate into better performance because results strongly depend on the considered predictive algorithms. We found that financial uncertainty and text-based economic uncertainty measures are relevant predictors of the recovery rate ex ante. This extends the finding of Gambetti et al. (2019), where financial uncertainty and the Economic Policy Uncertainty Index were found to explain ex post recovery rates. Similarly, we also confirm the finding of Nazemi and Fabozzi (2018): inflation measures, inflation expectation, industrial production, corporate profits after tax, indicators of housing market and stock market volatility are not only systemic determinants of recovery rates but also important predictors. This is also confirmed when using lasso with stability selection (Table 12): the probability of including financial uncertainty and text-based economic policy uncertainty measures is particularly high. Specifically, financial uncertainty, economic policy uncertainty/government spending, economic policy uncertainty/sovereign debt currency crises and economic policy uncertainty/trade policy have 95%, 50%, 40% and 29% probability of being selected, respectively. Including such predictors in more parsimonious and theoretically justified models makes the predictive setting easier to implement, interpret and backtest, making it more prone to be validated by regulators.
Third, in a context where increasing amounts of data become publicly available, the common practice of using standard linear regression to forecast recovery rates should be abandoned. In Section 4.1, we confirm that this is incompatible with the possibility of exploiting the information contained in a large data set of candidate predictors and embeds high model risk. This result is in line with the recent findings of Dong et al. (2020) in a similar financial context.
Fourth, we show in Section 4.3 that ensemble methods and meta-learning techniques should be fostered in the industry. The choice of the A-IRB model should be directed toward ensemble methods with respect to individual learners, as they always yield better performance. However, an incorrect choice among ensembles may still lead to worse results than an incorrect choice among meta-learning techniques. Hence, meta-learning considerably reduces model risk compared to ensembles; it should be considered by the regulator for a new set of A-IRB guidelines. This is particularly true for linear meta-learning techniques that also have the advantage of providing interpretable insights about the contribution of each individual model. As they show, the benefits of diversification can already be appreciated by combining a limited number of high-performing algorithms.

6. Conclusions

In this paper, we explore for the first time the applicability and the potential of meta-learning in order to forecast bond recovery rates. Meta-learning consists of combining the predictions arising from several first-level algorithms into a single aggregated forecast. More specifically, our purpose is to test the performance of a wide set of machine learning methods for predicting recovery rates on defaulted corporate bonds using a data set of predictors of unprecedented size and see whether combining them might display superior performance. We consider as first-level algorithms both individual and ensemble methods belonging to the classes of linear, nonlinear and rule-based models.
We contribute to the literature on bond recovery rate prediction in three ways. Our first contribution deals with the set of predictors to use. We find that including a limited number of well-identified recovery rate determinants and economic uncertainty yields better predictive performance than using the full set of macroeconomic predictors, even when combined with variable selection techniques such as lasso. This finding prevails in all considered algorithms and contrasts with the current big data trend. The use of a limited number of economically grounded predictors offers the additional advantages of making the model easier to implement, to backtest and therefore, when applicable, to be validated by regulatory instances because of mitigating the risk of overfitting and data mining. Interestingly, we confirm the central importance of uncertainty measures that are always retained by the considered variable selection methods.
The second and third contributions relate to the predictive method to use. Random forests, quantile random forests, boosted trees and cubist display the most promising results, but their performances are unstable across the considered specifications of the predictors set: it seems difficult to identify ex ante the right pair of model and predictors set to use. By contrast, we empirically show that meta-learning can beat this challenge. Indeed, meta-learning can improve recovery rate predictions compared to traditional individual learning machines while, at the same time, considerably reduce model risk. This evidence is preserved both when looking at the predictive performance within a chosen predictor set and when jointly considering predictions across all specifications of the predictor set.
In all specifications, the historical average approach performs significantly worse. Yet, this is the method underlying the standardized approach and foundation IRB framework used to compute regulatory capital reserves, which is the primary figure used worldwide to monitor the credit worthiness of banks. Therefore, our findings suggest that regulators and policy makers promoting the use of more reliable risk figures should foster the implementation of LGD internal models that use meta-learning techniques, while the use of traditional linear regressions or historical averages should be challenged. Eventually, our findings will be of practical interest to design new tools for internal economic capital calculations as well as for stress testing purposes.

Author Contributions

Data collection, P.G.; Methodology, data analysis, writing, reviewing and editing, P.G., F.R., F.V.; Supervision, Project supervision, Funding acquisition, F.V. All authors have read and agreed to the published version of the manuscript.

Funding

Francesco Roccazzella gratefully acknowledges financial support from the Economics School of Louvain. Paolo Gambetti’s research is funded by the Fonds de la Recherche Scientifique (F.S.R.-FNRS) under Grant ASP FC23545. The work of Frédéric Vrins is supported by the Fonds de la Recherche Scientifique—(F.S.R.-FNRS) grant CDR J.0037.18 and by the grant ARC 18-23/089 of the Belgian Federal Science Policy Office.

Acknowledgments

Francesco Roccazzella gratefully acknowledges the hospitality of Universitat Pompeu Fabra where part of this project was completed.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Details on Nonlinear and Rule-Based Methods

Following Bellotti et al. (2021), we give more details on the nonlinear and rule-based methods we have considered in this study. We refer to Hastie et al. (2009) for a textbook treatment of the predictive algorithms.

Appendix A.1. Multivariate Adaptive Regression Splines (MARS)

The multivariate adaptive regression splines method of Friedman (1991) features piecewise transformations of the original predictive variables. Specifically, given a set of cut points t, each predictor X j is transformed into a set of reflected pairs obtained by:
( X j t ) + = X j t , if X j > t 0 , otherwise and ( t X j ) + = t X j , if X j > t 0 , otherwise
where j = ( 1 , 2 , . . . , p ) and t { x 1 j , x 2 j , . . . , x N j } . A linear regression model is then estimated following a greedy procedure on the reflected pairs that are selected for each predictor. The number of selected pairs for each predictor defines the degree of the MARS algorithm. A backward pruning procedure handles overfitting by dropping the features that are associated with the smallest error rate when excluded. In this study, we consider MARS of degree one, and the number of terms to be retained in the final model is estimated via 10-fold cross-validation.

Appendix A.2. K-Nearest Neighbors

K-nearest neighbors is a non-parametric method that forecasts the target variable by using the average of the K training observations that are closest in the covariate space. Closeness between observations is defined by the Euclidean distance. In this paper, we tune the size of the neighborhood K via 10-fold cross-validation.

Appendix A.3. Model-Averaged Neural Networks

Model-averaged neural networks (Ripley 1996) is an ensemble of feed-forward neural networks where the base learners use different initial values for the optimization procedure to estimate the parameters. In addition to reducing the model risk via averaging the multiple predictions of the base learners, the random initialization of the parameters mitigates the risks of converging in local optima when dealing with a backpropagation algorithm. Individual neural networks can also include a weight decay that penalizes large coefficients further mitigating the risk of overfitting. In this paper, we employ an ensemble of five neural networks, and we tune the number of hidden units as well as the weight decay via 10-fold cross-validation.

Appendix A.4. Support Vector Regression, Relevance Vector Machine and Gaussian Processes

Support vector regression (SVR) (Vapnik 1995) is a kernel-based model estimated from the following minimization problem:
arg min β 0 , β C i = 1 N L ϵ ( y i f ( x i ) ) + j = 1 p β j 2 ,
where L ϵ is an ϵ -insensitive loss function, and C is the penalty assigned to residuals greater or equal to ϵ . The solution to this problem (A2) can be expressed in terms of a set of weights w i and a positive definite kernel function K ( · ) that depends on the training set data points:
f ( x ) = w 0 + i = 1 N w i K ( x , x i ) .
Training observation associated to non-zero weights are called support vectors, and they are used to estimate the model. We consider the radial basis kernel K ( x , x ) = exp ( σ x x ) , and we estimate the penalty C via 10-fold cross-validation while the scaling parameter is σ , following Caputo et al. (2002).
Relevance vector machine (RVM) (Tipping 2001) is a kernel method similar to SVR, but the weights are now computed using a Bayesian framework. This allows for a probabilistic interpretation of the model predictions, and it offers the additional advantage of producing sparser models than standard SVR.
Williams and Rasmussen (1996) introduce Gaussian processes, a non-sparse non-parametric generalization of the RVM framework. Gaussian processes impose a prior distribution directly on the function values and consider Gaussian distribution with zero mean and covariance matrix equal to the kernel matrix K i j = K ( x i , x j ) . In this work, we implement Gaussian processes using a radial basis kernel.

Appendix A.5. Regression Trees

Regression trees (Breiman et al. 1984) partition the predictors’ space into a set of non-overlapping regions and fit a model in each of them. In their simplest version, predictions are formed using the average of the target variables associated with each region. The algorithm employs a top-down partitioning to estimate the regions, starting with the full dataset and sequentially splitting it into groups according to the predictors and cut points that achieve the largest decrease in the residual sum of square errors. The splitting procedure stops when a certain criterion is met, e.g., the number of observations in the terminal nodes. To limit overfitting, the tree is then pruned back by using cost and/or complexity criterion, and the amount of regularization can be tuned via cross-validation.

Appendix A.6. Conditional Inference Trees

The conditional inference trees algorithm (Hothorn et al. 2006) was proposed to overcome the potential selection bias of regression trees. In fact, in the standard regression trees algorithm, there is a greater chance of selecting the predictors featuring a higher number of candidate cut points during the tree growing step. For each predictor, conditional inference trees use statistical hypothesis testing to evaluate the difference between the means of the two samples created by a candidate split. Multiple comparison corrections reduce the selection bias for highly granular predictors (Westfall and Young 1993). In this work, we estimate the p-value threshold, determining whether to consider a new split, and the tree maximum depth via 10-fold cross-validation.

Appendix A.7. Bagged Trees and Random Forests

Bagged trees (Breiman 1996) is an ensemble method resulting from the aggregation of the predictions of multiple regression trees estimated on bootstrapped samples of the training set. In this work, we tune the number of trees composing the ensemble by 10-fold cross-validation.
Random forests (Breiman 2001) is an evolution of the bagged trees algorithm that was motivated by overcoming the problem of high correlation between individual trees. In fact, for bagged trees, all predictors are considered at each split during the tree-growing process, which results in trees grown in very similar structures. The random forests algorithm instead considers a subset of randomly selected predictors at each split. This reduces the correlation between the trees and the variance of the ensemble prediction. We tune the number of randomly selected predictors via 10-fold cross-validation.

Appendix A.8. Boosted Trees

Boosted trees is an ensemble algorithm where the individual trees are sequentially fitted by multiple weak learners, and to avoid overfitting, only a percentage of each fitted value (called learning rate) is subtracted from the residual from the previous learner. The number of boosting iterations, i.e., the number of trees, the learning rate and the individual trees’ depth, are the main hyper-parameters for this model. Stochastic gradient boosting is a version of the algorithm that includes a random sampling scheme of the training data at each iteration to improve computational efficiency and further mitigate the risk of overfitting. In this work, we tune the hyper-parameters via 10-fold cross-validation.

Appendix A.9. Cubist

Cubist (Quinlan 1993) combines the M5 model tree of Quinlan (1992) with some features from boosting and K-nearest neighbors algorithms. Cubist has a tree structure where each node contains a linear regression model whose covariates are the same variables that satisfy the rule defining a specific node. After the model is estimated, the predictions in each node are recursively smoothed by using the fitted values from the respective parent node. Smoothing consists of linearly combining the two models, with the one that has the smallest RMSE having the largest weight. To limit overfitting, the rules can be pruned using the adjusted error rate criterion as in M5. Cubist can also adjust the forecast using a weighted average of sample neighbors. Committees, i.e., ensembles, can also be created using multiple model trees in a boosting-like framework. In this work, we tune the number of neighbors and the number of committees via 10-fold cross-validation.

Appendix B. A Closer Look at Uncertainty Measures

Following Gambetti et al. (2019), we provide more details on the financial and economic uncertainty measures employed in this study.
We consider two measures of financial uncertainty: (a) stock market volatility and (b) the financial uncertainty index of Jurado et al. (2015). On the one hand, the stock market volatility-based measure of financial uncertainty included in our data set is represented by the stock market implied volatility index VIX, retrieved from the Chicago Board Options Exchange database. It has monthly frequency, and it is also included in the FRED monthly database. On the other hand, we select the one-month horizon financial uncertainty indicator of Jurado et al. (2015) and Ludvigson et al. (2019) that is instead based on forecast error volatility. This latter method identifies uncertainty with unpredictability, i.e., the more or less the macroeconomic series have become predictable, the less or more uncertainty agents face. Specifically, Jurado et al. (2015) define h-period ahead uncertainty in the variable Y to be the conditional volatility of the purely unforecastable (at horizon h) component of the future value of the series. Data was retrieved from the authors’ website. Uncertainty measures can also be based on text analysis. This is the case of the economic policy uncertainty index and its sub-components from Baker et al. (2016). The series, available monthly, are retrieved from https://www.policyuncertainty.com/ (accessed on 28 April 2022) and include a range of sub-indexes based solely on news data. Specifically, these are extracted from the Access World News database of over 2000 US newspapers and measure coverage frequency. For example, each sub-index measures the coverage frequency of economic, uncertainty and policy terms as well as a set of categorical policy terms, e.g., monetary policy, tax government spending or trade policy (see the news-based measures in Table 9 for the complete list), in the pool of articles under analysis. For instance, articles that fulfill the requirements to be coded as economic policy uncertainty and contain the term ’federal reserve’ would be included in the monetary policy uncertainty sub-index. For more details, please refer to Appendix B of Baker et al. (2016). We also consider survey-based measures that are available at https://www.policyuncertainty.com/ (accessed on 28 April 2022). The first, drawing on reports by the Congressional Budget Office, reflects the number of federal tax code provisions set to expire over the next 10 years. The second, relying on the Federal Reserve Bank of Philadelphia’s Survey of Professional Forecasters, measures the dispersion between individual forecasters’ predictions about future levels of the Consumer Price Index, federal expenditures and state and local expenditures among economic forecasters as a proxy of uncertainty about policy-related macroeconomic variables.

Appendix C. Economic Data

The list of the variables considered in this study is provided below. We indicate with Δ the differencing order and with l o g the natural logarithm operator. The column Tcode denotes the following data transformation for a series x: (1) no transformation; (2) Δ x t ; (3) Δ 2 x t ; (4) l o g ( x t ) ; (5) Δ l o g ( x t ) ; (6) Δ 2 l o g ( x t ) ; (7) Δ ( x t x t 1 1 ) . For more details on the calculation of the variables denoted with (*), please refer to the data appendix of Jurado et al. (2015).
VariableDescriptionTcode
COUP_RATECoupon Rate1
BACK_FPresence of additional backing guarantees1
different from the issuer’s assets
DEF_DEBT_SENR.Senior.SubordinatedSeniority Status1
DEF_DEBT_SENR.Senior.UnsecuredSeniority Status1
DEF_DEBT_SENR.SubordinatedSeniority Status1
MOODYS_11_CODE.Capital.IndustriesIndustry Code1
MOODYS_11_CODE.Consumer.IndustriesIndustry Code1
MOODYS_11_CODE.Energy...EnvironmentIndustry Code1
MOODYS_11_CODE.FIREIndustry Code1
MOODYS_11_CODE.Media...PublishingIndustry Code1
MOODYS_11_CODE.Retail...DistributionIndustry Code1
MOODYS_11_CODE.TechnologyIndustry Code1
MOODYS_11_CODE.TransportationIndustry Code1
MOODYS_11_CODE.UtilitiesIndustry Code1
DEF_TYP_CD.Missed.interest.paymentDeafult Type1
DEF_TYP_CD.Missed.principal.and.interest.paymentsDeafult Type1
DEF_TYP_CD.Missed.principal.paymentDeafult Type1
DEF_TYP_CD.OthersDeafult Type1
DEF_TYP_CD.Prepackaged.Chapter.11Deafult Type1
DEF_TYP_CD.ReceivershipDeafult Type1
DEF_TYP_CD.Suspension.of.paymentsDeafult Type1
FinUnc_h.1Financial Uncertainty1
Baseline_overall_indexEconomic Policy Uncertainty index1
News_Based_Policy_Uncert_IndexNewspaper-Based Policy Uncertainty1
FedStateLocal_Ex_disagreementFederal Tax Code Uncertainty1
CPI_disagreementCPI Survey Disagreement1
X1..Economic.Policy.Uncertainty1. Economic Policy Uncertainty1
X2..Monetary.policy2. Monetary Policy1
Fiscal.Policy..Taxes.OR.Spending.Fiscal Policy (Taxes OR Spending)1
X4..Government.spending4. Government spending1
X5..Health.care5. Healthcare1
X6..National.security6. National Security1
X7..Entitlement.programs7. Entitlement Programs1
X8..Regulation8. Regulation1
Financial.RegulationFinancial Regulation1
X9..Trade.policy9. Trade Policy1
X10..Sovereign.debt..currency.crises10. Sovereign Debt, Currency Crises1
RPIReal Personal Income5
W875RX1Real Personal Income excluding5
R current transfer receipts
DPCERA3M086SBEAReal Personal Consumption Expenditures5
CMRMTSPLxReal Manu. and Trade Industries Sales5
RETAILxRetail and Food Services Sales5
IPFINALIP: Final Products (Market Group)5
IPCONGDIP: Consumer Goods5
IPDCONGDIP: Durable Consumer Goods5
IPNCONGDIP: Nondurable Consumer Good5
IPBUSEQIP: Business Equipment5
IPMATIP: Materials5
IPDMATIP: Durable Materials5
IPNMATIP: Nondurable Materials5
IPB51222SIP: Residential Utilities5
IPFUELSIP: Fuels5
CUMFNSCapacity Utilization: Manufacturing2
HWIHelp-Wanted Index for U.S.2
HWIURATIORatio of Help Wanted/No. Unemployed2
CLF16OVCivilian Labor Force5
CE16OVCivilian Employment5
UNRATECivilian Unemployment Rate2
UEMPMEANAverage Duration of Unemployment (Weeks)2
UEMPLT5Civilians Unemployed for Less Than 5 Weeks5
UEMP5TO14Civilians Unemployed for 5-14 Weeks5
UEMP15OVCivilians Unemployed for 15 Weeks and Over5
UEMP15T26Civilians Unemployed for 15-26 Weeks5
UEMP27OVCivilians Unemployed for 27 Weeks and Over5
CLAIMSxInitial Claims5
PAYEMSAll Employees: Total Nonfarm5
CES1021000001All Employees: Mining and Logging: Mining5
USCONSAll Employees: Construction5
DMANEMPAll Employees: Manufacturing5
NDMANEMPAll Employees: Nondurable goods5
USWTRADEAll Employees: Wholesale Trade5
USTRADEAll Employees: Retail Trade5
USFIREAll Employees: Financial Activities5
USGOVTAll Employees: Government5
CES0600000007Avg Weekly Hours: Goods-Producing1
AWOTMANAvg Weekly Overtime Hours: Manufacturing2
M1SLM1 Money Stock6
M2REALReal M2 Money Stock5
AMBSLSt. Louis Adjusted Monetary Base6
TOTRESNSTotal Reserves of Depository Institutions6
NONBORRESReserves Of Depository Institutions7
BUSLOANSCommercial and Industrial Loans6
REALLNReal Estate Loans at All Commercial Banks6
NONREVSLTotal Nonrevolving Credit6
CONSPINonrevolving Consumer Credit to Personal Income2
S.P..industS&P’s Common Stock Price Index: Industrials5
S.P.div.yieldS&P’s Composite Common Stock: Dividend Yield5
S.P.PE.ratioS&P’s Composite Common Stock: Price–Earnings Ratio5
FEDFUNDSEffective Federal Funds Rate2
CP3Mx3-Month AA Financial Commercial Paper Rate2
TB3MS3-Month Treasury Bill2
TB6MS6-Month Treasury Bill2
GS11-Year Treasury Rate2
GS55-Year Treasury Rate2
AAAMoody’s Seasoned Aaa Corporate Bond Yield2
BAAMoody’s Seasoned Baa Corporate Bond Yield2
COMPAPFFx3-Month Commercial Paper Minus FEDFUNDS1
TB3SMFFM3-Month Treasury C Minus FEDFUNDS1
TB6SMFFM6-Month Treasury C Minus FEDFUNDS1
T1YFFM1-Year Treasury C Minus FEDFUNDS1
T10YFFM10-Year Treasury C Minus FEDFUNDS1
AAAFFMMoody’s Aaa Corporate Bond Minus FEDFUNDS1
BAAFFMMoody’s Baa Corporate Bond Minus FEDFUNDS1
TWEXMMTHTrade Weighted U.S. Dollar Index: Major Currencies, Goods5
EXSZUSxSwitzerland/U.S. Foreign Exchange Rate5
EXJPUSxJapan/U.S. Foreign Exchange Rate5
EXUSUKxU.S./U.K. Foreign Exchange Rate5
EXCAUSxCanada/U.S. Foreign Exchange Rate5
WPSFD49207PPI: Finished Goods6
WPSID62PPI: Crude Materials6
OILPRICExCrude Oil, Spliced WTI and Cushing6
PPICMMPPI: Metals and Metal Products6
CPIAPPSLCPI: Apparel6
CPIMEDSLCPI: Medical Care6
CUSR0000SADCPI: Durables6
CUSR0000SASCPI: Services6
DDURRG3M086SBEAPersonal Cons. Exp: Durable Goods6
DNDGRG3M086SBEA Personal Cons. Exp: Nondurable Goods6
DSERRG3M086SBEAPersonal Cons. Exp: Services6
CES0600000008Avg Hourly Earnings: Goods-Producing6
CES2000000008Avg Hourly Earnings: Construction6
CES3000000008Avg Hourly Earnings: Manufacturing6
UMCSENTxConsumer Sentiment Index2
MZMSLMZM Money Stock6
DTCOLNVHFNMConsumer Motor Vehicle Loans Outstanding6
DTCTHFNMTotal Consumer Loans and Leases Outstanding6
INVESTSecurities in Bank Credit at All Commercial Banks6
DelRate.ConsumerLoansDelinquency Rates on Consumer Loans1
DelRate.CreditCardLoansDelinquency Rates on Credit Card Loans1
DelRate.CommIndLoans Delinquency Rates on Commercial and Industrial Loans 1
AMR.Def.RateAmerican Default Rate1
D_log.DIV.CRSP - Dividends *5
D_PreinvestedCRSP - Price Under Reinvestment *5
d.pCRSP - Dividend to Price *5
R15.R11FF Factor (Small, High) Minus (Small, Low)1
Sorted On (size, Book-to-Market)
CP.factorFF Factor - Cash Profitability1
SMBFF Factor - Small - Big1
UMDFF Factor - Momentum1
AgricPortfolio Return1
FoodPortfolio Return1
Beer & LiquorPortfolio Return1
SmokePortfolio Return1
Toys - RecreationPortfolio Return1
Fun - EntertaimentPortfolio Return1
Books - Printing and PublishingPortfolio Return1
Hshld - Consumer GoodsPortfolio Return1
Clths - ApparelPortfolio Return1
MedEq - Medical EquipmentPortfolio Return1
Drugs - Pharmaceutical ProductsPortfolio Return1
Chems - ChemicalsPortfolio Return1
Rubbr - Rubber and Plastic ProductsPortfolio Return1
Txtls - TextilesPortfolio Return1
BldMt - Construction MaterialsPortfolio Return1
ConstructionPortfolio Return1
SteelPortfolio Return1
MachineryPortfolio Return1
Electrical EquipmentPortfolio Return1
Autos - Automobiles and TrucksPortfolio Return1
Aero - AircraftPortfolio Return1
ShipsPortfolio Return1
Mines - Non-Metallic and Industrial Metal MiningPortfolio Return1
CoalPortfolio Return1
OilPortfolio Return1
Util - UtilitiesPortfolio Return1
Telcm - CommunicationPortfolio Return1
PerSv - Personal ServicesPortfolio Return1
BusSv - Business ServicesPortfolio Return1
Hardw - ComputersPortfolio Return1
Chips - Electronic EquipmentPortfolio Return1
LabEq - Measuring and Control EquipmentPortfolio Return1
Paper - Business SuppliesPortfolio Return1
Boxes - TransportationPortfolio Return1
Trans - TransportationPortfolio Return1
Whlsl - WholesalePortfolio Return1
Rtail - RetailPortfolio Return1
Meals - Restaurants, Hotels, MotelsPortfolio Return1
Banks - BankingPortfolio Return1
Insur - InsurancePortfolio Return1
RlEst - Real EstatePortfolio Return1
Fin - TradingPortfolio Return1
OtherPortfolio Return1
A032RC1A027NBEANational Income5
HOUSTNEHousing Starts, Northeast4
HOUSTWHousing Starts, West4
ACOGNONew Orders for Consumer Goods5
AMDMNOxNew Orders for Durable Goods5
ANDENOxNew Orders for Nondefense Capital Goods5
AMDMUOx Unfilled Orders for Durable Goods5
BUSINVxTotal Business Inventories5
ISRATIOxTotal Business: Inventories to Sales Ratio2

Notes

1
We define model risk from three perspectives: (i) maximum average loss across model specifications and model classes, (ii) average loss and (iii) its variability within each model class.
2
A weak learner is any machine learning algorithm that provides an accuracy slightly better than random guessing.
3
Models based on economic principles approximate the latter using industry default rates, loan delinquency rates, market and industrial production returns and recession indicators (Altman et al. 2005; Gambetti et al. 2019; Jankowitsch et al. 2014; Mora 2015).
4
Hyperparameters for first-level learners are tuned using 10-fold cross-validation in the training sample. Folds are created using stratified sampling based on seniority type, as in Nazemi et al. (2017), Nazemi and Fabozzi (2018) and Nazemi et al. (2018). The same applies for generating the training and test sets, with proportions of 70 % and 30 % .
5
Forecast selection outperforms the forecast combination only in very specific situations that are typically not encountered in practice: for instance, when the variance of the prediction errors of one model is lower than those of the others by several orders of magnitude, see, e.g., Roccazzella et al. (2021).
6
Another strategy consists of using an additional validation fold (Wolpert 1992). This has the drawback of extending the original data with potentially informative observations that would unevenly boost the performance of meta-learning techniques with respect to those of individual models and ensemble methods. In this paper, we empirically show that combining schemes that do not rely on additional sample splitting perform remarkably well compared to the a posterior best predictive framework. This is surprising especially for combination schemes whose weights are estimated using the same in sample information.
7
For further details on the COS methodology and for the explicit formula to estimate the optimal shrinkage intensity, we refer to Roccazzella et al. (2021).
8
For example, Dodge and Karam (2016) documents that deep learning methods are particularly sensitive to noise levels in image classification tasks.
9
We refer to Appendix C for the full list of the variables and the transformations performed on the raw data.
10
The reader can refer to Appendix B for more details and to Gambetti et al. (2019) for a detailed literature review on the topic.
11
We apply lasso with stability selection based on the R implementation stabs by Hofner and Hothorn (2017). We determine the dimension of bootstrapped lasso models using pointwise control (Meinshausen and Bühlmann 2010). Moreover, we specify a threshold of 0.6 for the selection probability as in Nazemi and Fabozzi (2018).
12
The MCS tests whether a subset of methods enters jointly in the superior set of models by repeatedly testing the null hypothesis of equal predictive performance with significance level α . Let M 0 be the set of all forecasting models (both individual candidates and forecasts combinations), and let M * be the superior set of models. Formally, the MCS tests H 0 : E d i , j = 0 , i , j . If the null hypothesis is rejected, then the procedure eliminates the model with the greatest relative loss from the set M . This procedure is sequentially repeated until the null hypothesis is not rejected at the chosen probability level α . We compute the MCS p-values via bootstrapping (10,000 replications) and using the Oxford MFE Toolbox publicly available at https://www.kevinsheppard.com/code/matlab/mfe-toolbox/ (accessed on 28 April 2022).
13
We find our list of selected variables to be largely consistent with those highlighted in Nazemi and Fabozzi (2018). A table of predictor probabilities is included in the Appendix C.
14
We find this conclusion to be robust to different specifications of the nonlinear meta-learner’s architecture (i.e., the number of hidden units in the artificial neural networks). The results are available upon request.

References

  1. Acharya, Viral V., Sreedhar T. Bharath, and Anand Srinivasan. 2007. Does industry-wide distress affect defaulted firms? Evidence from creditor recoveries. Journal of Financial Economics 85: 787–821. [Google Scholar] [CrossRef]
  2. Alexopoulos, Michelle, and Jon Cohen. 2015. The power of print: Uncertainty shocks, markets, and the economy. International Review of Economics & Finance 40: 8–28. [Google Scholar]
  3. Altman, Edward, Brooks Brady, Andrea Resti, and Andrea Sironi. 2005. The link between default and recovery rates: Theory, empirical evidence, and implications. The Journal of Business 78: 2203–27. [Google Scholar] [CrossRef]
  4. Altman, Edward I., and Egon A. Kalotay. 2014. Ultimate recovery mixtures. Journal of Banking & Finance 40: 116–129. [Google Scholar]
  5. Altman, Edward I., and Brenda Karlin. 2009. The re-emergence of distressed exchanges in corporate restructurings. Journal of Credit Risk 5: 43–56. [Google Scholar] [CrossRef]
  6. Altman, Edward I., and Vellore M. Kishore. 1996. Almost everything you wanted to know about recoveries on defaulted bonds. Financial Analysts Journal 52: 57–64. [Google Scholar] [CrossRef]
  7. Andersen, Leif, and Jakob Sidenius. 2004. Extensions to the Gaussian copula: Random recovery and random factor loadings. Journal of Credit Risk 1: 29–70. [Google Scholar] [CrossRef]
  8. Atiya, Amir F. 2020. Why does forecast combination work so well? International Journal of Forecasting 36: 197–200. [Google Scholar] [CrossRef]
  9. Bachmann, Rüdiger, Steffen Elstner, and Eric R. Sims. 2013. Uncertainty and economic activity: Evidence from business survey data. American Economic Journal: Macroeconomics 5: 217–49. [Google Scholar] [CrossRef]
  10. Baker, Scott R., Nicholas Bloom, and Steven J. Davis. 2016. Measuring Economic Policy Uncertainty. The Quarterly Journal of Economics 131: 1593–636. [Google Scholar] [CrossRef]
  11. Basel Committee on Banking Supervision (BCBS). 2006. Basel II: International Convergence of Capital Measurement and Capital Standards: A Revised Framework—Comprehensive Version. Available online: https://www.bis.org/publ/bcbs128.pdf (accessed on 28 April 2022).
  12. Basel Committee on Banking Supervision (BCBS). 2011. Basel III: A Global Regulatory Framework for More Resilient Banks and Banking Systems—Revised Version June 2011. Available online: https://www.bis.org/publ/bcbs189.pdf (accessed on 28 April 2022).
  13. Basel Committee on Banking Supervision (BCBS). 2017. Basel III: Finalising Post-Crisis Reforms. Available online: https://www.bis.org/bcbs/publ/d424.pdf (accessed on 28 April 2022).
  14. Bastos, João. 2014. Ensemble predictions of recovery rates. Journal of Financial Services Research 46: 177–93. [Google Scholar] [CrossRef]
  15. Bekaert, Geert, Marie Hoerova, and Marco Lo Duca. 2013. Risk, uncertainty and monetary policy. Journal of Monetary Economics 60: 771–88. [Google Scholar] [CrossRef]
  16. Bellotti, Anthony, Damiano Brigo, Paolo Gambetti, and Frédéric Vrins. 2021. Forecasting recovery rates on non-performing loans with machine learning. International Journal of Forecasting 37: 428–44. [Google Scholar] [CrossRef]
  17. Berd, Arthur. 2005. Recovery swaps. Journal of Credit Risk 1: 61–70. [Google Scholar] [CrossRef]
  18. Betz, Jennifer, Ralf Kellner, and Daniel Rösch. 2018. Systematic effects among loss given defaults and their implications on downturn estimation. European Journal of Operational Research 271: 1113–44. [Google Scholar] [CrossRef]
  19. Betz, Jennifer, Steffen Krüger, Ralf Kellner, and Daniel Rösch. 2020. Macroeconomic effects and frailties in the resolution of non-performing loans. Journal of Banking & Finance 112: 105212. [Google Scholar]
  20. Bloom, Nicholas. 2009. The impact of uncertainty shocks. Econometrica 77: 623–85. [Google Scholar]
  21. Breiman, Leo. 1996. Bagging predictors. Machine Learning 24: 123–40. [Google Scholar] [CrossRef]
  22. Breiman, Leo. 2001. Random forests. Machine Learning 45: 5–32. [Google Scholar] [CrossRef]
  23. Breiman, Leo, Jerome Friedman, Richard Olshen, and Charles Stone. 1984. Classification and Regression Trees. Monterey: Wadsworth and Brooks. [Google Scholar]
  24. Bris, Arturo, Ivo Welch, and Ning Zhu. 2006. The costs of bankruptcy: Chapter 7 liquidation versus chapter 11 reorganization. The Journal of Finance 61: 1253–303. [Google Scholar] [CrossRef]
  25. Bruche, Max, and Carlos González-Aguado. 2010. Recovery rates, default probabilities and the credit cycle. Journal of Banking & Finance 34: 754–64. [Google Scholar]
  26. Caputo, Barbara, Kim Lan Sim, Fredrik Furesjo, and Alexander J. Smola. 2002. Appearance-based object recognition using SVMs: Which kernel should I use? Paper presented at the NIPS Workshop on Statistical Methods for Computational Experiments in Visual Processing and Computer Vision, Vancouver, BC, Canada, 3–8 December 2001. [Google Scholar]
  27. Caruana, Rich, Alexandru Niculescu-Mizil, Geoff Crew, and Alex Ksikes. 2004. Ensemble selection from libraries of models. Paper presented at the Twenty-First International Conference on Machine Learning, ICML ’04, New York, NY, USA, July 4–8; New York: Association for Computing Machinery, p. 18. [Google Scholar]
  28. Claeskens, Gerda, Jan R. Magnus, Andrey L. Vasnev, and Wendun Wang. 2016. The forecast combination puzzle: A simple theoretical explanation. International Journal of Forecasting 32: 754–62. [Google Scholar] [CrossRef]
  29. Davydenko, Sergei A., and Julian R. Franks. 2008. Do bankruptcy codes matter? a study of defaults in france, germany, and the UK. Journal of Finance 63: 565–608. [Google Scholar] [CrossRef]
  30. Dodge, Samuel, and Lina Karam. 2016. Understanding how image quality affects deep neural networks. Paper presented at the 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX), Lisbon, Portugal, June 6–7; pp. 1–6. [Google Scholar]
  31. Dong, Xi, Li Yan, David E. Rapach, and Guofu Zhou. 2020. Anomalies and the expected market return. The Journal of Finance 77: 639–81. [Google Scholar] [CrossRef]
  32. ECB. 2009. Uncertainty and the economic prospects for the euro area. ECB Economic Bulletin, August. 58–61. [Google Scholar]
  33. ECB. 2016. The impact of uncertainty on activity in the euro area. ECB Economic Bulletin, August. 55–74. [Google Scholar]
  34. François, Pascal. 2019. The determinants of market-implied recovery rates. Risks 7: 57. [Google Scholar] [CrossRef]
  35. Franks, Julian R., and Walter N. Torous. 1994. A comparison of financial recontracting in distressed exchanges and Chapter 11 reorganizations. Journal of Financial Economics 35: 349–70. [Google Scholar] [CrossRef]
  36. Friedman, Jerome, Trevor Hastie, Rob Tibshirani, Narasimhan Balasubramanian, and Simon Noah. 2019. Glmnet: Lasso and Elastic-Net Regularized Generalized Linear Models. R Package Version 3.0-2. Vienna: R Foundation for Statistical Computing. [Google Scholar]
  37. Friedman, Jerome H. 1991. Multivariate adaptive regression splines. The Annals of Statistics 19: 1–67. [Google Scholar] [CrossRef]
  38. Gambetti, Paolo, Geneviève Gauthier, and Frédéric Vrins. 2018. Stochastic recovery rate: Impact of pricing measure’s choice and financial consequences on single-name products. In New Methods in Fixed Income Analysis. Edited by Mehdi Mili, Reyes Samaniego Medina and Filippo Di Pietro. Cham: Springer, pp. 181–203. [Google Scholar]
  39. Gambetti, Paolo, Geneviève Gauthier, and Frédéric Vrins. 2019. Recovery rates: Uncertainty certainly matters. Journal of Banking & Finance 106: 371–83. [Google Scholar]
  40. Gieseck, Arne, and Yannis Largent. 2016. The impact of macroeconomic uncertainty on activity in the euro area. Review of Economics 67: 25–52. [Google Scholar] [CrossRef]
  41. Granger, Clive W. J., and Ramu Ramanathan. 1984. Improved methods of combining forecasts. Journal of Forecasting 3: 197–204. [Google Scholar] [CrossRef]
  42. Greenwell, Brandon, Bradley Boehmke, Jay Cunningham, and Gbm Developers. 2018. GBM: Generalized Boosted Regression Models. R Package Version 2.1.4. Vienna: R Foundation for Statistical Computing. [Google Scholar]
  43. Gregory, Jon. 2012. Counterparty Credit Risk and Credit Value Adjustment: A Continuing Challenge for Global Financial Markets. Hoboken: Wiley. [Google Scholar]
  44. Hansen, Peter R., Asger Lunde, and James M. Nason. 2011. The model confidence set. Econometrica 79: 453–97. [Google Scholar] [CrossRef]
  45. Hartmann-Wendels, Thomas, Patrick Miller, and Eugen Töws. 2014. Loss given default for leasing: Parametric and nonparametric estimations. Journal of Banking & Finance 40: 364–375. [Google Scholar]
  46. Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning. New York: Springer. [Google Scholar]
  47. Hofner, Benjamin, and Torsten Hothorn. 2017. Stabs: Stability Selection with Error Control. R Package Version 0.6-3. Vienna, Austria: R Foundation for Statistical Computing. [Google Scholar]
  48. Hornik, Kurt, Maxwell Stinchcombe, and Halbert White. 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2: 359–366. [Google Scholar] [CrossRef]
  49. Hothorn, Torsten, Kurt Hornik, and Achim Zeileis. 2006. Unbiased recursive partitioning: A conditional inference framework. Journal of Computational and Graphical Statistics 15: 651–74. [Google Scholar] [CrossRef]
  50. Jankowitsch, Rainer, Florian Nagler, and Marti G. Subrahmanyam. 2014. The determinants of recovery rates in the US corporate bond market. Journal of Financial Economics 114: 155–77. [Google Scholar] [CrossRef]
  51. Jurado, Kyle, Sydney C. Ludvigson, and Serena Ng. 2015. Measuring uncertainty. American Economic Review 105: 1177–216. [Google Scholar] [CrossRef]
  52. Karatzoglou, Alexandros, Alex Smola, Kurt Hornik, and Achim Zeileis. 2004. Kernlab—An S4 package for kernel methods in R. Journal of Statistical Software 11: 1–20. [Google Scholar] [CrossRef]
  53. Koch, Gregory, Richard Zemel, and Ruslan Salakhutdinov. 2015. Siamese Neural Networks for One-Shot Image Recognition. Paper presented at the 32nd International Conference on Machine Learning, Lille, France, July 6–11; vol. 37. [Google Scholar]
  54. Kose, M. A., and M. Terrones. 2012. How does uncertainty affect economic performance? IMF World Economic Outlook, October. 49–53. [Google Scholar]
  55. Kuhn, Max, and Ross Quinlan. 2018. Cubist: Rule- And Instance-Based Regression Modeling. R Package Version 0.2.2. Vienna: R Foundation for Statistical Computing. [Google Scholar]
  56. Ledoit, Olivier, and Michael Wolf. 2004. Honey, i shrunk the sample covariance matrix. The Journal of Portfolio Management 30: 110–19. [Google Scholar] [CrossRef]
  57. Lessmann, Stefan, Bart Baesens, Hsin-Vonn Seow, and Lyn C. Thomas. 2015. Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research 247: 124–36. [Google Scholar] [CrossRef]
  58. Liaw, Andy, and Matthew Wiener. 2002. Classification and regression by randomforest. R News 2: 18–22. [Google Scholar]
  59. Liu, Jiaming, Sicheng Zhang, and Haoyue Fan. 2022. A two-stage hybrid credit risk prediction model based on xgboost and graph-based deep neural network. Expert Systems with Applications 195: 116624. [Google Scholar] [CrossRef]
  60. Loterman, Gert, Iain Brown, David Martens, Christophe Mues, and Bart Baesens. 2012. Benchmarking regression algorithms for loss given default modeling. International Journal of Forecasting 28: 161–70. [Google Scholar] [CrossRef]
  61. Ludvigson, Sydney C., Sai Ma, and Serena Ng. 2019. Uncertainty and business cycles: Exogenous impulse or endogenous response? American Economic Journal: Macroeconomics 13: 369–410. [Google Scholar]
  62. Lumley, Thomas. 2017. Leaps: Regression Subset Selection. R Package Version 3.0. Vienna: R Foundation for Statistical Computing. [Google Scholar]
  63. McCracken, Michael W., and Serena Ng. 2015. FRED-MD: A Monthly Database for Macroeconomic Research. Working Paper 2015-12. St. Louis: Federal Reserve Bank of St. Louis. [Google Scholar]
  64. Meinshausen, Nicolai. 2017. quantregForest: Quantile Regression Forests. R Package Version 1.3-7. Vienna: R Foundation for Statistical Computing. [Google Scholar]
  65. Meinshausen, Nicolai, and Peter Bühlmann. 2010. Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72: 417–73. [Google Scholar] [CrossRef]
  66. Milborrow, Stephen. 2018. Earth: Multivariate Adaptive Regression Splines. R Package Version 4.6.3. Vienna: R Foundation for Statistical Computing. [Google Scholar]
  67. Mora, Nada. 2015. Creditor recovery: The macroeconomic dependence of industry equilibrium. Journal of Financial Stability 18: 172–86. [Google Scholar] [CrossRef]
  68. Nazemi, Abdolreza, and Frank J. Fabozzi. 2018. Macroeconomic variable selection for creditor recovery rates. Journal of Banking & Finance 89: 14–25. [Google Scholar]
  69. Nazemi, Abdolreza, Farnoosh Fatemi Pour, Konstantin Heidenreich, and Frank J. Fabozzi. 2017. Fuzzy decision fusion approach for loss-given-default modeling. European Journal of Operational Research 262: 780–91. [Google Scholar] [CrossRef]
  70. Nazemi, Abdolreza, Konstantin Heidenreich, and Frank J. Fabozzi. 2018. Improving corporate bond recovery rate prediction using multi-factor support vector regressions. European Journal of Operational Research 271: 664–75. [Google Scholar] [CrossRef]
  71. Pykthin, Michael. 2003. Unexpected recovery risk. Risk 16: 74–78. [Google Scholar]
  72. Qi, Min, and Xinlei Zhao. 2011. Comparison of modeling methods for loss given default. Journal of Banking & Finance 35: 2842–855. [Google Scholar]
  73. Quinlan, J. Ross. 1993. Combining instance-based and model-based learning. Paper presented at the Tenth International Conference on International Conference on Machine Learning, ICML’93, Amherst, MA, USA, July 27–29; San Francisco: Morgan Kaufmann Publishers Inc., pp. 236–43. [Google Scholar]
  74. Quinlan, Ross. 1992. Learning with continuous classes. Paper presented at the 5th Australian Joint Conference on Artificial Intelligence, Hobart, Tasmania, November 16–18. [Google Scholar]
  75. R Core Team. 2017. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing. [Google Scholar]
  76. Ravi, Sachin, and Hugo Larochelle. 2017. Optimization as a model for few-shot learning. Paper presented at 5th International Conference on Learning Representations, Toulon, France, April 24–26. [Google Scholar]
  77. Ripley, Brian D. 1996. Pattern Recognition and Neural Networks. Cambridge: Cambridge University Press. [Google Scholar]
  78. Ripley, Brian D., and William N. Venables. 2016. Nnet: Feed-Forward Neural Networks and Multinomial Log-Linear Models. R Package Version 7.3-12. Vienna: R Foundation for Statistical Computing. [Google Scholar]
  79. Roccazzella, Francesco, Paolo Gambetti, and Frédéric Vrins. 2021. Optimal and robust combination of forecasts via constrained optimization and shrinkage. International Journal of Forecasting 38: 97–116. [Google Scholar] [CrossRef]
  80. Santoro, Adam, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap. 2016. Meta-learning with memory-augmented neural networks. Paper presented at 33rd International Conference on Machine Learning, New York, NY, USA, June 19–24; vol. 48, pp. 1842–50. [Google Scholar]
  81. Santos, Andrey Bicalho, Arnaldo de Albuquerque Araújo, Jefersson A. dos Santos, William Robson Schwartz, and David Menotti. 2017. Combination techniques for hyperspectral image interpretation. Paper presented at the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, July 23–28; pp. 3648–51. [Google Scholar]
  82. Schuermann, Til. 2004. What do we know about loss given default? In Wharton Financial Institutions Center Working Paper. Working Paper No. 04-01. Philadelphia: The University of Pennsylvania’s Wharton Financial Institutions Center. [Google Scholar]
  83. Therneau, Terry, Beth Atkinson, and Brian Ripley. 2017. rpart: Recursive Partitioning and Regression Trees. R Package Version 4.1-11. Vienna: R Foundation for Statistical Computing. [Google Scholar]
  84. Tipping, Michael E. 2001. Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research 1: 211–44. [Google Scholar]
  85. Tobback, Ellen, David Martens, Tony Van Gestel, and Bart Baesens. 2014. Forecasting loss given default models: Impact of account characteristics and the macroeconomic state. Journal of the Operational Research Society 65: 376–92. [Google Scholar] [CrossRef]
  86. Vapnik, Vladimir N. 1995. The Nature of Statistical Learning Theory. New York: Springer. [Google Scholar]
  87. Wang, Tianhui, Renjing Liu, and Guohua Qi. 2022. Multi-classification assessment of bank personal credit risk based on multi-source information fusion. Expert Systems with Applications 191: 116236. [Google Scholar] [CrossRef]
  88. Wang, Zhu. 2018. bst: Gradient Boosting. R Package Version 0.3-15. Vienna: R Foundation for Statistical Computing. [Google Scholar]
  89. Westfall, Peter H., and S. Stanley Young. 1993. Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment. Wiley Series in Probability and Statistics; Hoboken: Wiley. [Google Scholar]
  90. Williams, Christopher K. I., and Carl Edward Rasmussen. 1996. Gaussian processes for regression. In Advances in Neural Information Processing Systems 8. Cambridge: MIT Press, pp. 514–20. [Google Scholar]
  91. Wolpert, David H. 1992. Stacked generalization. Neural Networks 5: 241–59. [Google Scholar] [CrossRef]
  92. Yao, Xiao, Jonathan Crook, and Galina Andreeva. 2015. Support vector regression for loss given default modelling. European Journal of Operational Research 240: 528–38. [Google Scholar] [CrossRef]
  93. Zarnowitz, Victor, and Louis A. Lambros. 1987. Consensus and uncertainty in economic prediction. Journal of Political Economy 95: 591–621. [Google Scholar] [CrossRef]
  94. Zhang, Jie, and Lyn C. Thomas. 2012. Comparisons of linear regression and survival analysis using single and mixture distributions approaches in modelling LGD. International Journal of Forecasting 28: 204–15. [Google Scholar] [CrossRef]
Figure 1. Predictive strategy with meta-learning techniques.
Figure 1. Predictive strategy with meta-learning techniques.
Risks 10 00124 g001
Figure 2. Examples of fitted relationships for linear, nonlinear and rule-based models. The true data generating process (dots) follows a shifted sine wave with Gaussian noise. Panel a represents the fit obtained by a linear regression. Panel b represents the fit obtained by support vector regression. Panel c represents the fit obtained by boosted trees with stochastic gradient boosting.
Figure 2. Examples of fitted relationships for linear, nonlinear and rule-based models. The true data generating process (dots) follows a shifted sine wave with Gaussian noise. Panel a represents the fit obtained by a linear regression. Panel b represents the fit obtained by support vector regression. Panel c represents the fit obtained by boosted trees with stochastic gradient boosting.
Risks 10 00124 g002
Figure 3. Histogram of recovery rates in our sample. The dashed line represents the sample mean. The average recovery rate is 30.98 % , while the standard deviation is equal to 27.58 , which is a large value compared to the mean. The distribution is also highly skewed, a typical feature of recovery rate data.
Figure 3. Histogram of recovery rates in our sample. The dashed line represents the sample mean. The average recovery rate is 30.98 % , while the standard deviation is equal to 27.58 , which is a large value compared to the mean. The distribution is also highly skewed, a typical feature of recovery rate data.
Risks 10 00124 g003
Figure 4. Illustration of forecasting performance across model specifications. The horizontal red line is the model based on the in-sample mean recovery rate. Panel (a) highlights the performance of linear models. Panel (b) highlights the performance of nonlinear models. Panel (c) highlights the performance of rule-based models. Panel (d) illustrates the performance of linear and nonlinear meta-learning methods.
Figure 4. Illustration of forecasting performance across model specifications. The horizontal red line is the model based on the in-sample mean recovery rate. Panel (a) highlights the performance of linear models. Panel (b) highlights the performance of nonlinear models. Panel (c) highlights the performance of rule-based models. Panel (d) illustrates the performance of linear and nonlinear meta-learning methods.
Risks 10 00124 g004
Table 1. List of first-level learners and corresponding R algorithms. The three blocks of table identify linear (top), nonlinear (middle) and rule-based (bottom) models. Note that we consider two versions of ridge and lasso: (i) the standard one involving the penalty term associated with the best-in-sample performance and (ii) that based on the one-standard-error rule of Hastie et al. (2009).
Table 1. List of first-level learners and corresponding R algorithms. The three blocks of table identify linear (top), nonlinear (middle) and rule-based (bottom) models. Note that we consider two versions of ridge and lasso: (i) the standard one involving the penalty term associated with the best-in-sample performance and (ii) that based on the one-standard-error rule of Hastie et al. (2009).
DescriptionAcronymR AlgorithmReference
Linear regressionlmlmR Core Team (2017)
Backward step-wise selectionlm_bsleapsLumley (2017)
Ridge regressionridgeglmnetFriedman et al. (2019)
Lasso regressionLassoglmnet
Elastic net regressionelnetglmnet
MARSmarsearthMilborrow (2018)
Bagged MARSbmarsearth
Model-averaged neural networksavnnetnnetRipley and Venables (2016)
Support vector regressionsvrksvrKaratzoglou et al. (2004)
Relevance vector regressionrvmrvm
Gaussian processesgaussgausspr
Regression treescartrpartTherneau et al. (2017)
Conditional inference treescitctreeHothorn et al. (2006)
Boosted treebstbstWang (2018)
Stochastic gradient boostinggbmgbmGreenwell et al. (2018)
Random forestsrfrandomForestLiaw and Wiener (2002)
Quantile random forestsqrfquantregForestMeinshausen (2017)
CubistcubistcubistKuhn and Quinlan (2018)
Table 2. Summary statistics of our recovery rate sample.
Table 2. Summary statistics of our recovery rate sample.
NMin.1st Qu.MedianMean3rd Qu.Max.St. Dev.
Recovery rate7680.01%10.00%20.00%30.98%51.41%118.00%27.58%
Table 3. Summary statistics of recovery rates according to the seniority of the defaulted bond.
Table 3. Summary statistics of recovery rates according to the seniority of the defaulted bond.
Debt SeniorityNMedianMeanSt. Dev.Skewness
Senior Secured8563.00%60.92%32.27% 0.29
Senior Unsecured53319.00%28.03%23.90% 0.98
Senior Subordinated12919.13%25.77%26.87% 1.33
Subordinated219.13%16.71%23.37% 2.20
Table 4. Summary statistics of recovery rates according to the industrial sector of the bond issuer.
Table 4. Summary statistics of recovery rates according to the industrial sector of the bond issuer.
Industrial SectorNMedianMeanSt. Dev.Skewness
Banking1818.00%23.47%24.44% 0.46
Capital Industries18929.00%36.44%28.55% 0.72
Consumer Industries8830.25%39.86%28.68% 0.49
Energy & Environment4540.00%44.10%25.76% 0.64
FIRE16610.00%11.95%10.32% 3.88
Media & Publishing9043.62%40.72%31.45% 0.001
Retail & Distribution3236.25%34.92%25.89% 0.58
Technology7215.00%19.89%16.99% 2.45
Transportation5722.25%31.32%23.81% 1.16
Utilities1192.50%91.89%6.54% 0.32
Table 5. Summary statistics of recovery rates according to the default type.
Table 5. Summary statistics of recovery rates according to the default type.
Default TypeNMedianMeanSt. Dev.Skewness
Chapter 1137110.50%25.68%25.97% 1.53
Missed interest payment28128.50%37.55%26.76% 0.53
Missed principal and interest payments1458.12%56.14%18.85% 0.05
Missed principal payment823.04%29.76%29.12% 0.99
Others611.00%15.59%17.47% 1.50
Prepackaged Chapter 117112.00%31.95%33.01% 0.71
Receivership70.50%4.57%9.70% 2.01
Suspension of payments1018.50%30.05%26.75% 2.01
Table 6. Average recovery rate by coupon level.
Table 6. Average recovery rate by coupon level.
Coupon [ 0 % 2.5 % ) [ 2.5 % 5 % ) [ 5 % 7.5 % ) [ 7.5 % 10 % ) 10 %
Average RR14.82%24.22%23.01%33.11%36.95%
Table 7. Average recovery rate by maturity level.
Table 7. Average recovery rate by maturity level.
Maturity (Years) [ 0 5 y ) [ 5 y 10 y ) [ 10 y 15 y ) [ 15 y 20 y ) 20 y
Average RR43.30%37.88%31.92%19.39%18.90%
Table 8. Average recovery rate for bonds with and without backing.
Table 8. Average recovery rate for bonds with and without backing.
BackingYesNo
Average RR40.02%29.33%
Table 9. List of uncertainty measures considered in this study.
Table 9. List of uncertainty measures considered in this study.
NameTypeMethodologyReferences
Inflation uncertaintySurvey-basedDispersion of forecasts from
the Federal Reserve Bank of Philadelphia’s
Survey of Professional Forecasters.
Zarnowitz and Lambros (1987)
Bachmann et al. (2013)
Baker et al. (2016)
Federal/State/Local expenditures uncertainty
Economic policy uncertaintyNews-basedNormalized volume of newspaper articles
published in a given month containing
expressions referring to specific types of
economic uncertainty.
Baker et al. (2016)
Alexopoulos and Cohen (2015)
Monetary policy uncertainty
Fiscal policy (taxes or spending) uncertainty
Tax uncertainty
Government spending uncertainty
Healthcare uncertainty
National security uncertainty
Entitlement programs uncertainty
Regulation uncertainty
Financial regulation uncertainty
Trade policy uncertainty
Sovereign debt, currency crises uncertainty
VIXVolatility-based
(Stock market)
Stock market implied volatility index
from the Chicago Board Options Exchange.
Bloom (2009)
Bekaert et al. (2013)
Financial uncertaintyVolatility-based
(Forecast error)
Conditional volatility of the purely
unforecastable prediction error of
financial time series.
Jurado et al. (2015)
Ludvigson et al. (2019)
Table 10. Description of the different model specifications.
Table 10. Description of the different model specifications.
Specification IDSystematic VariablesReference
1Full data set-
2Lasso-selected macroeconomic variablesNazemi and Fabozzi (2018)
3Lasso-selected variables with stability control-
4Industry default rate, commercial and industrial loans delinquency rates,
industrial production, market index returns, PMI
M 3 + in Gambetti et al. (2019)
5As in 4, with financial uncertainty substituted with industry default rates M 4 + in   ″
6As in 4, with VIX substituted with industry default rates M 5 + in   ″
7As in 4, plus financial uncertainty, news-based economic policy uncertainty,
inflation uncertainty and federal/state/local expenditures uncertainty
M 4 + + in   ″
8As in 4, plus all uncertainty measures of Table 9-
9No systematic variables M 2 in   ″
Nazemi and Fabozzi (2018)
Nazemi et al. (2018)
Table 11. Out-of-sample root mean square error (RMSE) metrics across model specifications. We indicate in bold the RMSE of the models that join the superior set of models with 10% confidence level.
Table 11. Out-of-sample root mean square error (RMSE) metrics across model specifications. We indicate in bold the RMSE of the models that join the superior set of models with 10% confidence level.
123456789Mean
Linear Models0.2910.2040.2110.2110.2110.2100.2120.2130.2360.222
Lin. regression0.7740.2060.2050.2060.2060.2060.2060.2090.2330.272
Lin. reg. back. sel.0.2210.2050.2050.2070.2070.2060.2090.2140.2350.212
Lasso 10.2030.2030.2060.2070.2070.2070.2080.2090.2330.209
Lasso 20.2180.2030.2210.2210.2200.2180.2200.2210.2410.220
Ridge 10.2020.2030.2080.2070.2070.2070.2080.2090.2330.209
Ridge 20.2220.2110.2260.2200.2210.2200.2220.2230.2420.223
Elastic net0.2010.2020.2060.2070.2080.2070.2090.2090.2330.209
Nonlinear Models0.2110.2030.2070.2060.2110.2080.2070.2130.2370.211
MARS0.2070.1990.2140.1970.2020.1980.1940.2160.2370.207
Gaussian processes0.2110.2040.2050.2070.2070.2080.2070.2110.2340.210
RVM0.2360.2120.2000.2070.2230.2140.2160.2120.2300.217
SVM0.1910.1970.2120.2120.2120.2120.2110.2130.2450.212
Rule-based Models0.2320.2250.2330.2160.2170.2130.2210.2160.2400.224
Regression tree0.2300.2230.2440.2180.2150.2150.2220.2110.2560.226
Conditional inference tree0.2340.2260.2220.2140.2190.2110.2200.2210.2250.221
Nonlinear Ensembles0.1950.1970.1970.1960.2000.2000.2060.2060.2350.203
Neural networks0.2020.2010.2020.1990.2040.2070.2090.2080.2380.208
Bagged MARS0.1890.1940.1910.1930.1950.1930.2030.2040.2330.199
Rule-based Ensembles0.1890.1910.1860.1860.1870.1850.1830.1850.2290.191
Cubist0.1840.1820.1840.1870.1940.1860.1880.1830.2290.191
Boosted trees s.g.b.0.1940.1980.1950.1890.1870.1880.1890.1870.2380.196
Boosted trees0.1820.1910.1740.1770.1850.1790.1770.1820.2230.185
Quantile random forests0.1930.1950.1920.1920.1850.1870.1800.1870.2360.194
Random forests0.1900.1870.1870.1850.1850.1870.1820.1840.2200.190
Linear Meta-Learning0.1870.1910.1800.1850.1790.1840.1790.1790.2280.188
Opt0.1880.1930.1810.1830.1800.1820.1770.1810.2270.188
Opt+0.1870.1910.1800.1860.1790.1840.1800.1780.2280.188
COS-E0.1870.1910.1800.1860.1790.1840.1800.1780.2280.188
COS-IL0.1870.1910.1800.1860.1790.1840.1800.1780.2280.188
Equally weighted for.0.1860.1870.1880.1870.1880.1880.1870.1880.2240.192
Hill climbing0.1930.1950.1800.1890.1780.1880.1890.1870.2250.192
NonLinear Meta-Learning0.2240.2000.1940.1940.1920.1910.1950.2130.2470.205
NN - 10.2410.1980.1930.1870.1870.1800.1890.2340.2410.205
NN - 20.2080.2030.1950.2010.1960.2010.2010.1920.2530.206
Table 12. List of predictor probabilities from lasso with stability selection.
Table 12. List of predictor probabilities from lasso with stability selection.
VariableSelection Probability
Financial Uncertainty 0.95
Consumer Price Index for All Urban Consumers: Apparel 0.88
New One-Family Houses Sold: United States 0.86
Industrial Production: Fuels 0.82
Number Unemployed for 5–14 Weeks 0.77
Continued Claims (Insured Unemployment) 0.73
ISM Manufacturing: Supplier Deliveries Index 0.72
Securities in Bank Credit, All Commercial Banks 0.71
Industry Returns: Agricultural 0.70
Money Zero Maturity: Money Stock 0.69
Total Consumer Loans and Leases Owned and Securitized by Finance Companies 0.66
Industrial Production: Residential Utilities 0.62
Employment Cost Index: Benefits: Private Industry Workers 0.61
Reserves of Depository Institutions, Nonborrowed 0.58
Number Unemployed for Less than 5 Weeks 0.53
Total Borrowings of Depository Institutions from the Federal Reserve 0.53
Gross Saving 0.53
Economic Policy Uncertainty: Government Spending 0.50
M1 Money Stock 0.46
Light Weight Vehicle Sales: Autos and Light Trucks 0.44
Industry Portfolio Returns: Other 0.43
Economic Policy Uncertainty: Sovereign Debt Currency Crises 0.40
Fama-French Factor: Momentum 0.38
All Employees, Government 0.36
Civilian Employment Level 0.35
Change in Private Inventories 0.35
Industry Portfolio Returns: Drugs 0.34
Nonperforming Commercial Loans 0.34
Industry Portfolio Returns: Smoke 0.33
CBOE NASDAQ 100 Volatility Index 0.33
Consumer Sentiment Index 0.30
Economic Policy Uncertainty: Trade policy 0.29
Consumer Price Index for All Urban Consumers: Medical Care 0.29
Industrial Production: Nondurable Consumer Goods 0.28
National Income 0.28
Number Unemployed for 15–26 Weeks 0.27
Civilian Labor Force Level 0.26
University of Michigan: Inflation Expectation 0.26
Corporate Profits after Tax with IVA and CCAdj: Net Dividends 0.26
Industrial Production: Materials 0.25
Excess Reserves of Depository Institutions 0.25
Table 13. Ranking of top 20 predictive frameworks according to out-of-sample root mean square error (RMSE) metrics across all specifications of the predictors set. The symbols *, ** and *** mark 1 % , 5 % and 10 % significance levels of the model confidence test with L2 loss, respectively. For the sake of comparison, we also display the results corresponding to the equally weighted forecast and hill-climbing algorithm, despite ranking, respectively, 24th and 37th.
Table 13. Ranking of top 20 predictive frameworks according to out-of-sample root mean square error (RMSE) metrics across all specifications of the predictors set. The symbols *, ** and *** mark 1 % , 5 % and 10 % significance levels of the model confidence test with L2 loss, respectively. For the sake of comparison, we also display the results corresponding to the equally weighted forecast and hill-climbing algorithm, despite ranking, respectively, 24th and 37th.
ModelSpecificationRMSEMAE R 2 MCS
Boosted trees70.17350.10760.6298***
Opt+.0.17520.10410.6175***
COS-E.0.17520.10410.6175***
COS-IL.0.17520.10410.6175***
Boosted trees30.17650.11230.6142***
Boosted trees60.17710.11400.6115***
Boosted trees40.17890.11180.6048***
Quantile random forests30.17960.09630.6080***
Boosted trees20.18190.11170.5920***
Random forests30.18200.11400.5927***
Boosted trees90.18230.11400.5886
Cubist80.18240.10530.5878
Cubist20.18310.10530.5858
Cubist70.18360.10760.5841
Cubist90.18360.10290.5843
Random forests20.18420.11360.5818
Opt.0.18430.10780.5852
Boosted trees50.18450.12050.5781
Random forests60.18490.11640.5869
Quantile random forests50.18530.09640.5887
Equally weighted for..0.18620.12890.5962
Hill climbing.0.18900.12130.5622
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Gambetti, P.; Roccazzella, F.; Vrins, F. Meta-Learning Approaches for Recovery Rate Prediction. Risks 2022, 10, 124. https://0-doi-org.brum.beds.ac.uk/10.3390/risks10060124

AMA Style

Gambetti P, Roccazzella F, Vrins F. Meta-Learning Approaches for Recovery Rate Prediction. Risks. 2022; 10(6):124. https://0-doi-org.brum.beds.ac.uk/10.3390/risks10060124

Chicago/Turabian Style

Gambetti, Paolo, Francesco Roccazzella, and Frédéric Vrins. 2022. "Meta-Learning Approaches for Recovery Rate Prediction" Risks 10, no. 6: 124. https://0-doi-org.brum.beds.ac.uk/10.3390/risks10060124

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop