Next Article in Journal
Preface to the Special Issue “Mathematical Optimization and Evolutionary Algorithms with Applications”
Previous Article in Journal
Achieving a Secure and Traceable High-Definition Multimedia Data Trading Scheme Based on Blockchain
Previous Article in Special Issue
A Fuzzy Random Survival Forest for Predicting Lapses in Insurance Portfolios Containing Imprecise Data
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Modelling the Time to Write-Off of Non-Performing Loans Using a Promotion Time Cure Model with Parametric Frailty

by
Janette Larney
1,*,†,
James Samuel Allison
2,†,
Gerrit Lodewicus Grobler
2,† and
Marius Smuts
2,†
1
Centre for Business Mathematics and Informatics, North-West University, Potchefstroom 2531, South Africa
2
School of Mathematical and Statistical Sciences, North-West University, Potchefstroom 2531, South Africa
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Submission received: 21 March 2023 / Revised: 21 April 2023 / Accepted: 5 May 2023 / Published: 9 May 2023
(This article belongs to the Special Issue Application of Survival Analysis in Economics, Finance and Insurance)

Abstract

:
Modelling the outcome after loan default is receiving increasing attention, and survival analysis is particularly suitable for this purpose due to the likely presence of censoring in the data. In this study, we suggest that the time to loan write-off may be influenced by latent competing risks, as well as by common, unobservable drivers, such as the state of the economy. We therefore expand on the promotion time cure model and include a parametric frailty parameter to account for common, unobservable factors and for possible observable covariates not included in the model. We opt for a parametric model due to its interpretability and analytical tractability, which are desirable properties in bank risk management. Both a gamma and inverse Gaussian frailty parameter are considered for the univariate case, and we also consider a shared frailty model. A Monte Carlo study demonstrates that the parameter estimation of the models is reliable, after which they are fitted to a real-world dataset in respect of large corporate loans in the US. The results show that a more flexible hazard function is possible by including a frailty parameter. Furthermore, the shared frailty model shows potential to capture dependence in write-off times within industry groups.

1. Introduction

Risk management in the credit risk environment has advanced considerably over the last few decades, and during the same time, statistical models developed for this purpose have grown in variety and sophistication. Initially, statistical modelling focused on the assessment of a counterparty’s creditworthiness (i.e., default risk), but the modelling of the recovery process and outcome after default are receiving increasing attention. Two main outcomes or resolutions are possible after default: either the outstanding amount is fully recovered or the loan is written off. Modelling the time to write-off of a defaulted loan is valuable to a bank, as the time on book of such loans not only affects its operational decisions but also its profitability [1]. Survival analysis has enjoyed increasing popularity in the credit risk environment for modelling of the time to default and, more recently, to model the time to resolution of a defaulted loan (see, e.g., [2,3,4,5]). Since write-offs are typically associated with longer resolution times than full recoveries, the number of write-offs is usually under-represented in samples of defaulted loans [6]. This makes survival analysis particularly suitable for modelling the time to write-off because the status at the end of the observation period is often recorded as “not resolved”, i.e., censoring is present.
With the fast-growing body of literature on the use of survival analysis (and specifically the Cox proportional hazard (CPH) model) to model default risk, an exploration of the use of the methodology to describe the post-default outcome of a loan is a natural outflow. Several authors have made use of Cox regression to demonstrate that loan and obligor characteristics, as well as macroeconomic factors, drive the resolution time of defaulted loans (see e.g., [3,4,7,8]).
Recently, both mixture cure and promotion time cure (PTC) models have found applications in the credit risk environment (see e.g., [9,10,11,12]). Recognising that there is a proportion of defaulted loans that is not “susceptible” to a full recovery, de Oliveira and Louzada [5] modelled the recovery process of defaulted loans using a PTC model. The model considers the full repayment of a defaulted loan as the event of interest and allows for latent factors that may impact the post-default resolution outcome.
The time to write-off or full recovery may, in addition to latent competing risks, also be influenced by common, unobservable drivers, such as the state of the economy [3]. In this study, we propose an expansion of the PTC model by including a frailty parameter to control for unobserved heterogeneity within our population of defaulted loans. Underlying the post-default recovery process is the macroeconomic environment, which has a systematic effect on all borrowers’ ability to resolve their defaulted obligations. Although some of these are observable and can be included as covariates, we propose a frailty parameter that can account for common, unobservable factors, as well as for possible observable covariates not included in the model. We consider both the univariate and the shared frailty parameter. Global regulatory standards, specifically those set by the Basel Accords, require banks to have a comprehensive framework for model risk management, which includes ongoing monitoring and assessment of model performance. We demonstrate how inclusion of a frailty parameter in the PTC model results in improved model performance when estimating the time to write-off of a non-performing loan.
Our paper offers the following contributions. First, we formulate the PTC model with gamma frailty and derive its likelihood function. Second, we compare the performance of this model to that of the standard PTC model, as well as to a PTC model that includes an inverse Gaussian frailty parameter, as proposed by Barriga et al. [13]. For the comparison, we first perform a simulation study, then fit the models to both a simulated and a real-world dataset relating to the resolution outcomes of defaulted corporate loans. Third, we extend the PTC model to include a shared frailty parameter. We derive the likelihood function of this model and evaluate its performance on the same real-world dataset.
The probability of write-off (or recovery) is a key component within two-step loss given default (LGD) models, where the post-default outcome probability and loss severity are characterised separately (see, e.g., [14]). Improving on the accuracy of LGD models should therefore enable a bank to attain more accurate estimates of this risk parameter for the determination of credit loss provisions, which should, in turn, result in more stable earnings. It will also enable the bank to more accurately assess its regulatory and/or economic capital requirements, which are crucial for a bank’s financial stability. Obtaining a more accurate estimate of the time to write-off will, furthermore, enable the bank to better manage its cash flow requirements and reduce the liquidity risk to which it is exposed.
The remainder of this article is structured as follows. In the next section, an overview of survival modelling is provided, including cure models and their application in the credit risk environment. The formulation of the proposed model, namely the PTC model with parametric frailty, follows in Section 3. In Section 4, the likelihood function of this model is derived, and the finite sample performance of maximum likelihood estimators (MLEs) of the frailty models is analysed through a Monte Carlo simulation study. We apply our model to both a simulated dataset and a real-world loan loss dataset in Section 5. In Section 6, we describe the extension of the PTC model with a shared frailty parameter, including an application to a real-world dataset. We provide some concluding remarks and avenues for future research in Section 7.

2. Background

Survival modelling is a statistical method used to analyse and predict the time until an event of interest occurs. The event of interest is often referred to as the “failure” or “death” event, and the time until this event is referred to as the “survival time”. The event of interest in our context is the write-off of a defaulted loan. In general, the survival function is defined as S ( t ) = P ( T > t ) , where T is the random variable representing the time to the event, and t is a specific point in time.
Within the resolution process of defaulted loans, random right censoring is normally present due to the fixed duration of the observation period. Not all subjects experience the event of interest, i.e., write-off, within that time period. In right censoring, there are two latent random variables: T and C, where, T is the time of failure, and C is the censoring time. The random observed variable is ( Y , δ ) , where
Y = min ( T , C )
and
δ = 1 if T C 0 if T > C .
The simplest way of modelling the time to write-off is to assign a distribution with positive support to T, for example, the gamma or Weibull distributions.
However, for credit risk modelling, specifying S ( t ) unconditionally without taking covariates into account is too simplistic, and the CPH model has therefore dominated.
The CPH model, which was proposed in the seminal paper of Cox [15], assumes that the hazard function is a function of a set of covariates and that the effect of the covariates on the hazard function is proportional over time.
The hazard function h ( t ) = f ( t ) / S ( t ) of a survival process represents the instantaneous risk of the event of interest at a given time, where f is the density of T. For the CPH model, the hazard function, conditional on a vector of covariates x = x 1 , x 2 , , x m T , has the form
h ( t | x ) = h 0 ( t ) e β T x ,
where h 0 ( t ) is the baseline hazard function, and β = β 1 , β 2 , , β m T is the vector of unknown regression parameters associated with x . The conditional survival function is then given by
S ( t | x ) = P T > t | x = S 0 ( t ) exp β T x ,
where S 0 ( t ) = S ( t | x = 0 ) is the baseline survival function.
The earliest studies to make use of the CPH model to model default risk include those of Banasik et al. [16], Stepanova and Thomas [17] and Cao et al. [18]; subsequently, its application to credit risk problems has increased markedly. Several authors, including Betz et al. [3], Malwandla et al. [4] and Joubert et al. [8], have since made use of Cox regression to model the post-default outcome of non-performing loans.
Notwithstanding the popularity of the CPH model, cure models have recently enjoyed increasing attention within the credit risk domain. This is particularly true for the modelling of the probability of default, which, in many high-quality loan portfolios, represents a rare event (see, e.g., [9,10,11]). Cure models, as all survival models, had their origin in medicine, where Boag [19] first suggested that there exists a certain proportion of cancer patients who will never experience the event of interest, i.e., they can be considered cured. By not accounting for the cure proportion, the estimates of the survival of susceptible subjects may be biased [20].
In early work by Berkson and Gage [21] the following simple model was proposed:
S ( t ) = π 0 S c ( t ) + ( 1 π 0 ) S s ( t ) ,
where S c and S s are the survival functions for non-susceptible and susceptible individuals, respectively, and π 0 is the proportion of individuals considered non-susceptible, also known as the cure proportion. Note that S c ( t ) = 1 for all t 0 , as the individuals being modelled are not susceptible to the event of interest and therefore have a probability of survival equal to one.
The model in (2) was extended by Farewell [22] to allow for covariates. Suppose another set of covariates y that may be equal to x is observed. The conditional survival function for a mixture cure model is defined as:
S ( t | x , y ) = π 0 ( x ) + 1 π 0 ( x ) S s ( t | y ) ,
where π 0 ( x ) is commonly modelled using a logistic regression model, and S s ( t | y ) is modelled by a CPH or accelerated failure time model. For a detailed overview of mixture cure models, the reader is referred to the review paper by Amico and van Keilegom [23].
The second type of cure model, known as the promotion time cure (PTC) model or, alternatively, the bounded cumulative hazard model, was first introduced by Yakovlev et al. [24]. The PTC model has an intuitive biological interpretation: an individual cancer patient has a number of carcinogenic cells (denoted by N) left active after initial treatment, which have the potential to activate over time and produce a significant relapse of cancer. Chen et al. [25] further proposed that an individual is at risk of relapse if he or she is exposed to at least one of the so-called latent factors, and relapse (or failure) is observed when one (or more) of these latent factors becomes activated. If the individual is not considered to be at risk, he or she is considered cured.
Formally, let N be the unobservable number of causes of the event of interest (write-off in our case) for a given subject in the population. For a subject for which N 1 , the event time is defined as
T = min Z 1 , Z 2 , , Z N ,
where Z 1 , Z 2 , , Z N represents the time required for the jth unobservable cause to be realised. A subject is cured if N = 0 , which implies that T = , i.e., write-off will not happen. The set of observed covariates is denoted by x = 1 , x 1 , x 2 , , x m T . Rodrigues et al. [26] demonstrate that the long-term survival function satisfies the proportional hazards property if and only if the number of risks follows a Poisson distribution. Assume that N follows a Poisson distribution with a mean of θ ( x ) > 0 , where θ ( x ) = g ( β T x ) for a strictly increasing function g (see e.g., Zhang and Peng [27] and Portier et al. [28]). Furthermore, assume that Z 1 , Z 2 , , Z N are i.i.d. from a distribution F η with parameter η , which could be a vector and independent of N. We then arrive at the well-known PTC model,
S η ( t | x ) = P T > t | x = e θ ( x ) F η ( t ) .
For a short derivation of (4), the interested reader is referred to the work of Chen et al. [25]. It is clear from (4) that the cure fraction (i.e., for those not susceptible to write-off) is given by
lim t S η ( t | x ) = e θ ( x ) = π 0 ( x ) .
To obtain a similar expression as in (3), the survival function in (4) can be decomposed to obtain
S η ( t | x ) = π 0 ( x ) + 1 π 0 ( x ) S η s ( t | x ) ,
where
S η s ( t | x ) = e θ ( x ) F η ( t ) π 0 ( x ) 1 π 0 ( x )
represents the survival function of those who are susceptible to write-off. Note that S η s ( t | x ) is a proper survival function in that lim t S η s ( t | x ) = 0 .
The distribution function F η can be specified parametrically, e.g., by considering an exponential, Weibull or gamma distribution or, alternatively, left completely unspecified, resulting in a semiparametric PTC model. For an in-depth discussion relating to the estimation and inference of (4), the reader is referred to the review paper by Amico and van Keilegom [23].
We argue that the PTC model is also interpretable in the post-default loan resolution context. There exist several obligor- and loan-specific factors that may influence a loan’s probability of being written off. Some of these factors are observable and can be included in a model as covariates, but some factors, for example, an individual’s level of discretionary expenditure and undisclosed debt, are latent and therefore analogous to the carcinogenic cells modelled by Chen et al. [25]. There also exists a proportion of defaulted loans for which the outstanding amounts will be fully recovered and is therefore not exposed to write-off. In our context, these loans can be considered “cured”.
de Oliveira and Louzada [5] made use of a PTC model to model the recovery process of defaulted loans within the context of latent competing risks. The considered event of interest is full loan recovery, and the cure proportion represents the proportion of obligors not susceptible to fully repay the outstanding loan amount.
The time to write-off or full recovery may, in addition to latent competing risks, also be influenced by common, unobservable drivers, such as the state of the economy. Within the credit risk environment and specifically in modelling the time to default, several authors have accounted for these unobservable factors by including a frailty parameter within the CPH framework (see e.g., [29,30,31]). An advantageous feature of the CPH model is the ease with which a frailty parameter can be included by multiplication with the baseline hazard function. A frailty parameter is a random variable that can account for unobserved heterogeneity in a population [32]. With the univariate frailty model, the frailty term is assumed to be a random variable that affects the failure times of individuals independently. In a shared frailty model, on the other hand, the frailty term is assumed to be shared among the individuals within a cluster or group [33]. For the CPH model, the frailty parameter is incorporated as follows:
h ( t | x , W ) = W h 0 ( t ) e β T x ,
where the frailty term W is a random variable with non-negative support, such as the gamma distribution, inverse Gaussian (IG) distribution or log-normal distribution. The conditional survival function of the CPH model, given the frailty W = w , is
S ( t | x , w ) = P T > t | x , W = w = S 0 ( t ) w exp β T x .
In the context of the post-default loan resolution process, Betz et al. [3] accounted for unobservable systematic effects in the comovement of the resolution times of defaulted loans by including a random frailty parameter in the CPH model.
We propose including a frailty component in the PTC model of Chen et al. [25] to model the time to write-off. The frailty parameter, which is assumed to have a parametric form, can control for common unobservable factors, as well as for possible observable covariates not included in the model. The PTC model with an IG frailty was used by Barriga et al. [13] in a survival study of colorectal cancer patients, which is, to the best of our knowledge, the only study to include a frailty effect within the PTC framework.

3. Proposed Model: The Promotion Time Cure Model with Parametric Frailty

To account for the heterogeneity in the time to write-off of individual obligors, we propose an extension of the model in (4) by including a parametric frailty component. We opt for a parametric model due to its interpretability and analytical tractability. These characteristics are highly valued in the bank risk management environment, where most models are subjected to intense regulatory scrutiny. A further benefit of parametric models is the ease with which they can be subjected to stress and sensitivity testing.
We first consider the univariate case and include a gamma distributed frailty parameter as follows. Assume that N has a Poisson distribution with parameter W θ ( x ) , where W is an unobserved gamma distributed random variable with parameter σ 2 > 0 and density function
f σ 2 ( w ) = σ 2 / σ 2 Γ ( 1 / σ 2 ) w 1 / σ 2 1 e w / σ 2 , w 0 .
Li et al. [34] showed that the model is identifiable if the distribution function of W is specified. In this case, E [ W ] = 1 , which assures that the model is identifiable, and Var [ W ] = σ 2 .
From (4), we derive the conditional survival function, given W = w , as
S ˜ η ( t | x , w ) = P T > t | x , W = w = e w θ ( x ) F η ( t ) .
The unconditional survival function is then
S ˜ η ( t | x ) = 0 S ˜ η ( t | x , w ) f W ( w ) d w = 0 e w θ ( x ) F η ( t ) f W ( w ) d w = φ w θ ( x ) F η ( t ) = 1 + σ 2 θ ( x ) F η ( t ) 1 / σ 2 ,
where φ w ( s ) = E e s W is the Laplace transform of the frailty W.
It is clear from (9) that
lim t S ˜ η ( t | x ) = 1 + σ 2 θ ( x ) 1 / σ 2 = π 0 ( x ) ,
where π 0 is the cure fraction of the PTC model with gamma frailty, and σ 2 can be seen as a heterogeneity parameter.
From some algebraic manipulation, it follows that (9) can be expressed as
S ˜ η ( t | x ) = π 0 ( x ) + 1 π 0 ( x ) S ˜ η s ( t | x ) ,
where
S ˜ η s ( t | x ) = 1 + σ 2 θ ( x ) F η ( t ) 1 / σ 2 π 0 ( x ) 1 π 0 ( x )
represents the survival function of those who are susceptible to write-off. Note that S ˜ η s ( t | x ) is a proper survival function in the sense that lim t S ˜ η s ( t | x ) = 0 and lim t 0 S ˜ η s ( t | x ) = 1 . From (11), it is clear that the survival function of the non-susceptible individuals is 1, whereas the proper survival function of those who are susceptible is a function of the frailty parameter, σ 2 . However, from (10), it follows that the cure fraction is dependent on the frailty parameter.

4. Estimation and Simulation

In this section, we discuss estimation of the parametric PTC model with a gamma frailty parameter using maximum likelihood estimation (MLE). We illustrate the small sample performance of the MLEs through a simulation study.

4.1. Maximum Likelihood Estimation of the PTC Model with Gamma Frailty

For the jth individual, j = 1 , 2 , , n , we observe the triplet ( Y j , δ j , x j ) , where Y j = min ( T j , C j ) is the jth observed time, δ j = I T j C j is the corresponding indicator, T j is the time to write-off, C j the censoring time and x j = 1 , x 1 j , x 2 j , , x m j T is the vector of m covariates associated with the jth individual. Throughout, we assume that the model specified in (8) is a parametric model, i.e., we assume that F has a parametric form with an unknown parameter η and that θ is related to the covariates by the relationship θ β ( x j ) = e β T x j , where β = β 0 , β 1 , , β m T is a vector of unknown regression parameters.
The likelihood function is given by
L = j = 1 n S ˜ η ( Y j | x j ) h ( Y j | x j ) δ j = j = 1 n S ˜ η ( Y j | x j ) d d t log S ˜ η ( Y j | x j δ j ,
where S ˜ η ( t | x j ) is given in (9). The log-likelihood of the PTC model with gamma frailty is therefore
β , η , σ 2 = j = 1 n 1 σ 2 log 1 + σ 2 θ β ( x j ) F η ( Y j ) + j = 1 n δ j log θ β ( x j ) f η ( Y j ) 1 + σ 2 θ β ( x j ) F η ( Y j ) .
For mixture cure models, a popular approach to estimate model parameters is to apply the EM algorithm. To this end, the mixture cure representation of the model in (11) is used. Recall that in a mixture cure model,
S s ( t | x ) = P T > t | B = 1 , x ,
where B is the cure status (or susceptibility) indicator, B = 1 indicates that an individual is susceptible to the event of interest (i.e., uncured) and B = 0 corresponds to individuals who are non-susceptible (i.e., cured). For the proposed model, (10) indicates that
P B = 0 | x = π 0 ( x ) = 1 + σ 2 θ ( x ) 1 / σ 2 .
Suppose that for the jth individual, j = 1 , 2 , , n , we observe not only the triplet ( Y j , δ j , x j ) but also B j , the indicator showing whether an observation is susceptible or not. It is then possible to define the complete likelihood (see [35]) as
L = j = 1 n 1 π 0 ( x j ) h ˜ η s ( Y j | x ) S ˜ η s ( Y j | x ) δ j B j × j = 1 n π 0 ( x j ) 1 B j 1 π 0 ( x j ) S ˜ η s ( Y j | x j ) B j 1 δ j ,
where π 0 ( x j ) and S ˜ η s are given in (10) and (12), respectively, while
h ˜ η s ( t | x j ) = d d t log S ˜ η s ( t | x j )
represents the hazard function of those who are susceptible to write-off.
This likelihood can be rewritten as
L = j = 1 n 1 π 0 ( x j ) δ j B j π 0 ( x j ) ( 1 δ j ) ( 1 B j ) 1 π 0 ( x j ) ( 1 δ j ) B j × j = 1 n h ˜ η s ( Y j | x ) S ˜ η s ( Y j | x ) δ j B j S ˜ η s ( Y j | x j ) ( 1 δ j ) B j .
For the mixture cure model in (3), the survival function of the susceptible individuals S s ( t | y ) is not a direct function of the cure fraction π 0 . Unfortunately, this is not the case for the survival function S ˜ s ( t | x ) , when the PTC model is written in its mixture cure form, as can be seen from (12). Consequently, unlike for the “traditional” mixture cure model, the likelihood presented above cannot be split into two separate components, namely a latency part and an incidence part, which could have then be maximised separately (see, e.g., [9,36]). We therefore settle for the conventional method to obtain maximum likelihood estimators for the parameters of the PTC model with gamma frailty, i.e., we find the set of parameter values ( β , ν and σ 2 ) that maximises the log-likelihood function in (13). The required optimisation was performed by applying the Nelder–Mead algorithm, which is included in the optim function of the R statistical software package [37].
Similar to the study by Barriga et al. [13], we also consider a PTC model with IG frailty. In this case, the log-likelihood function is given by
β , λ , k , ζ = n ζ + j = 1 n δ j log θ β ( x j ) + j = 1 n δ j log F η ( Y j ) 1 2 j = 1 n δ j log 1 + 2 ζ θ β ( x j ) F η ( Y j ) 1 ζ j = 1 n 1 + 2 ζ θ β ( x j ) F η ( Y j ) ,
where ζ is the frailty parameter.
In the simulation study, we choose F η as the Weibull and linear failure rate (LFR) distributions. In the case of the Weibull distribution, η = ( λ , k ) T , where λ > 0 is a scale parameter and k > 0 is a shape parameter with distribution function
F λ , k ( t ) = 1 e t λ k , t 0
and density function
f λ , k ( t ) = k λ t λ k 1 e t λ k , t 0 .
In the case of the LFR distribution, η = ( a , b ) T , where a > 0 and b > 0 are shape parameters, and the distribution function is given by
F a , b ( t ) = 1 e a t b 2 t 2 , t 0
with density function
f a , b ( t ) = a + b t e a t b 2 t 2 , t 0 .

4.2. Simulation Study

To evaluate the performance of the MLEs, based on the log-likelihood given in (13) (and also for comparison purposes, the log-likelihood given in (14)), we conducted a simulation study as follows. Two covariates are used; the first, x j 1 , j = 1 , , n , is generated from a standard exponential distribution, whereas the second, x j 2 , j = 1 , , n , is generated from a Bernoulli distribution with a probability of success of 0.6 . These covariates remain fixed throughout the Monte Carlo simulation study. The cure proportion is given by
p j = 1 + σ 2 θ β ( x j ) 1 / σ 2 ,
where
θ β ( x j ) = e β T x j = e β 0 + β 1 x j 1 + β 2 x j 2 .
As mentioned earlier, we take F η to be either the Weibull or LFR distribution. In the former case, η = ( λ , k ) T = ( 2 , 1 ) T , and in the latter case, η = ( a , b ) T = ( 1.5 , 2 ) T . Censoring times C j , j = 1 , , n are sampled from a standard uniform distribution, and the frailty parameter, σ 2 , is set to 0.5 . In order to control the cured and censoring proportions, we consider two different values for the regression coefficient vector, namely β = ( 2 , 1 , 2 ) T and β = ( 0.5 , 1 , 2 ) T . The first calibration (in conjunction with the parameter choices for F η ) results in an average cure proportion of 10 % and a 20 % censoring proportion, while the second results in an average cure proportion of 30 % , with a 45 % censoring proportion. The jth resolution time T j from a PTC model with gamma frailty is generated as follows:
  • Generate B j from a Bernoulli distribution:
    B j = 1 with probability 1 p j 0 with probability p j
  • If B j = 0 , then set T j = ;
  • If B j = 1 , generate U j U [ 0 , 1 ] , and calculate
    T j = F η 1 U j ( 1 p j ) + p j σ 2 1 σ 2 θ β ( x j ) ,
    where F η 1 is the inverse cumulative distribution function;
  • Generate censoring times C j U [ 0 , 1 ] ;
  • The simulated data are then ( Y j , δ j , x j 1 , x j 2 ) , j = 1 , , n , where
    Y j = min ( T j , C j ) ,
    and
    δ j = I ( T j C j ) .
All calculated values are based on 1000 independent Monte Carlo replications for samples of size n = 200 , 300 , 500 and 1000.
Table 1 contains the approximate average values of the MLEs, as well as the corresponding standard errors given in parenthesis, for the PTC model with either the gamma or IG frailty and F η as either a Weibull or LFR distribution. The average cure fraction is 10 % , and the censoring proportion is 20 % . Table 2 shows the results for a 30 % cure fraction and a 45 % censoring proportion, with all other aspects identical to the set-up of Table 1.
Table 1 and Table 2 show the properties that we expect from MLEs, namely that they are unbiased and consistent. This is shown by the average estimates approaching the true values of the parameters, as well as the decreasing standard errors, as the sample size increases. Table 1 shows that for the IG frailty parameter ζ , the bias and the standard errors of the estimates are greater for smaller samples when F η is the Weibull distribution compared to the LFR distribution. We also observe a slower rate of convergence in this case. The same holds for the rate of convergence in the case of a higher cure fraction and censoring percentage, as observed in Table 2. When F η is the LFR distribution, larger standard errors are observed for smaller sample sizes ( n = 200 and n = 300 ). Overall, the results demonstrate that the MLEs, based on the log-likelihood for the PTC model with gamma frailty, are accurate for small samples, especially when comparing their performance to those of the PTC model with an IG frailty. Although not shown in this paper, confidence intervals for the parameters can be estimated using the following bootstrap algorithm, which makes use of the ”cases bootstrap” approach discussed in [38]:
  • Resample ( Y 1 * , δ 1 * , x 1 * ) , , ( Y n * , δ n * , x n * ) i.i.d from the empirical distribution function of ( Y 1 , δ 1 , x 1 ) , , ( Y n , δ n , x n ) ;
  • Fit the model in (9) on the resampled data to obtain estimates ϑ ^ * = β ^ * , η ^ * , σ ^ * T . Denote the first of these by ϑ ^ * ( 1 ) ;
  • Repeat steps 1 and 2 B times to obtain ϑ ^ * ( 1 ) , ϑ ^ * ( 2 ) , , ϑ ^ * ( B ) .
Now, suppose that we want to construct a confidence interval for the parameter β 1 . From ϑ ^ * ( 1 ) , ϑ ^ * ( 2 ) , , ϑ ^ * ( B ) , we have the bootstrap estimators β ^ 1 * ( 1 ) , , β ^ 1 * ( B ) , from which a confidence interval can be obtained. Alternatively, one can make use of the large sample properties of MLEs to construct a confidence interval (see, e.g., [13]).

5. Application

In this section, we analyse whether the proposed models, as described in Section 3, are able to account for common unobservable factors by fitting them to two datasets. Before fitting the frailty models to a real-world dataset, we apply the models to a simulated dataset, where we simulate an “unobservable” variable not directly accounted for in our frailty models. We then proceed to apply the proposed models to a real-world loss dataset obtained from Global Credit Data (GCD).

5.1. Simulated Dataset

In order to further investigate the performance of the PTC model with gamma frailty, we simulate a single dataset ( Y j , δ j , x j 1 , x j 2 ) , j = 1 , , n , of size n = 1000 from a PTC model without frailty, i.e., from the model in (4). The two covariates are generated as in Section 4.2, whereas the cure proportion is now given by p j = e θ β ( x j ) , where
θ β ( x j ) = e β T x j = e β 0 + β 1 x j 1 + β 2 x j 2
with β = ( 2 , 1 , 2 ) T . F η is taken to be the Weibull distribution, with the same parameters as in Section 4.2 and censoring times are again assumed to follow a standard uniform distribution. With this configuration, our simulated dataset has a non-susceptible percentage of 6.4 % , and 16.2 % of the resolution times are censored.
One of the covariates is now dropped from the simulated dataset so that we only “observe” the set ( Y j , δ j , x j 1 ) , j = 1 , , n . On this observed dataset, we fit three different models:
  • The PTC model with no frailty in (4);
  • The PTC model in (8) with gamma frailty and frailty parameter σ 2 ;
  • The PTC model in (8) with an IG frailty and frailty parameter ζ .
For all three models, we have
θ β ( x j ) = e β T x j = e β 0 + β 1 x j 1 .
In Table 3, the estimates of all three models are shown, as well as their respective calculated log likelihoods, Akaike information criteria (AIC) and Bayesian information criteria (BIC).
Based on both the AIC and BIC, the two frailty models outperform the PTC model, with the IG frailty yielding the best performance. The estimates of the β 0 and β 1 parameters for the two frailty models are very similar but differ notably from the estimate for the PTC model. This highlights the idiosyncratic attributes of the frailty models compared to the PTC model and, in conjunction with the lower AIC and BIC values, suggests that the frailty models are better at capturing the effect of the unobserved covariate than the PTC model.

5.2. Real-World Dataset

Our real-world loss dataset was provided by GCD, formally known as the Pan-European Credit Data Consortium (PECDC). GCD is a non-profit organisation that collects and analyses historical loss data from member banks [40].
We use a subsample of the loss database consisting of large corporations (LCs) from the United States. All loans that defaulted in the period 1 January 2005 to 31 December 2019 are included. For materiality purposes, we further filter the data to only include loans for which the exposure amount is larger than USD 500. The dataset includes information on when the loan went into default, whether (and when) the loan was fully settled, whether (and when) the loan was written off and whether the case has been resolved.
We include the industry of the obligor as a categorical covariate in the model based on the study by Schuermann [41], which demonstrates that the industry of an obligor has an impact on the recovery rate. In industries in which tangible assets represent a substantial portion of a company’s assets, such as manufacturing or transportation, the bankruptcy process may be resolved more quickly, since the assets can be more easily sold to repay creditors.
Obligor industries are aggregated into five industry groups, as indicated in Table 4, with a sixth (denoted by 1 ) for loans for which the industry is unknown. Also shown in the rightmost column of Table 4 is the proportion of censored cases for each industry group. We consider defaulted loans that were fully repaid, as well as those for which the resolution process had not yet been completed at the end of the observation period, as censored.
In Figure 1, the Kaplan–Meier estimate of the survival probability of all data is shown on the left, and that of the data stratified by industry ( 1 , 1 , , 5 ) is shown on the right. From the left pane graph, it is evident that a cure proportion is present in the aggregated dataset. The narrow confidence interval can be attributed to the large sample size ( n = 1558 ). From the right pane graph, notable differences in the cure proportions for the different industry groups can be observed, suggesting that the industry of a borrower has a definite impact on the probability of recovering the full loan amount after default.
Besides industry groups, which are included as dummy variables with corresponding coefficients ( β 2 , , β 6 ), we include the annualised seasonally adjusted GDP growth in the month of default as a covariate, with the associated coefficient β 1 . Our choice of GDP growth is motivated by the study of Khieu et al. [42], in which the authors report a positive relationship between annual GDP growth and recovery rates of defaulted corporate debt in the US. Furthermore, Betz et al. [43] found positive dependencies between default resolution times and final loan loss rates.
The MLEs of the model parameters are shown in the first three columns of Table 5. For both frailty models, the frailty parameters ( σ 2 and ζ , respectively) are significant at the 5% level based on the adjusted log-likelihood test. In line with the findings of Khieu et al. [42] and Betz et al. [43], a negative relationship between GDP growth and time to write-off is observed. This means that higher GDP growth at the time of default is associated with a higher probability of full loan repayment. This makes intuitive sense, especially when considering the price at which collateral can be sold or the prospects for a firm to recover from financial distress and settle outstanding loan obligations. Also shown in Table 5 are the calculated log likelihoods, AIC and BIC. Based on all three goodness-of-fit measures, the PTC model with gamma frailty performs the best out of the three models.
In Figure 2, the survival curves (left pane) and hazard rates (right pane) for the PTC model, as well as the PTC with gamma and IG frailties, are plotted. From the left pane graph, it is evident that the fitted PTC model exhibits a much smaller cure fraction than the two frailty models, for which the frailty parameters were found to be significant. This further motivates the inclusion of a frailty parameter within the PTC model to capture residual heterogeneity. We observe that the hazard functions of the frailty models have very different shapes when compared to the standard PTC model. The hazard function has a significant impact on the survival function, especially on the cure rate, as can be seen from the graph on the left-hand side. The hazard functions of the two frailty models both exhibit a non-monotone shape. By including a frailty parameter in the model, the hazard function can become more flexible and may better capture non-standard patterns. Consequently, due to the flexibility of the frailty parameter, possibly less reliance may be placed on the choice of F η .

6. Extension to the Shared Gamma Frailty Model

In many survival studies, the failure times of individuals are not independent but dependent within groups or clusters. This can be due to shared environmental factors, such as exposure to air pollution in the case of cancer data or the state of the economy in the case of credit risk. By including a shared frailty term in a survival model, the dependence between the failure times of individual subjects within a cluster can be allowed for, leading to more accurate estimates of the hazard rates and the survival probabilities.
Suppose that we have n clusters of size n i , i = 1 , 2 , , n , with each cluster sharing a common risk, assumed to be random, and modelled by an i.i.d. gamma distributed frailty W i , i = 1 , , n , each with a density of f σ 2 given in (7). The multivariate conditional survival function, given W i = w i , is then
S ˜ η ( t i 1 , , t i n i | x i , w i ) = P T i 1 > t i 1 , , T i n i > t i n i | x i , W i = w i = P T i 1 > t i 1 | x i , W i = w i P T i n i > t i n i | x i , W i = w i = e w i j = 1 n i θ β ( x i j ) F η ( t i j ) ,
where x i = x i 1 , , x i n i is the covariate matrix of the individual in the ith cluster, and F η is, as in the univariate case, some specified distribution with parameter vector η .
For the jth individual in the ith cluster, j = 1 , 2 , , n i , we observe the triplet ( Y i j , δ i j , x i j ) , where Y i j = min ( T i j , C i j ) , δ i j = I T i j C i j , and x i j = ( 1 , x 1 i j , x 2 i j , , x m i j ) T is the vector of m covariates associated with the jth individual in the ith cluster. Similar to the univariate case, θ β ( x i j ) = e β T x i j , where β = ( β 0 , β 1 , , β m ) T is a vector of unknown regression parameters. Furthermore, we denote the number of observed events in the ith cluster by d i = j = 1 n i δ i j . The likelihood is then given by
L = i = 1 n 0 j = 1 n i w i θ β ( x i j ) f η ( Y i j ) δ i j e w i θ β ( x i j ) F η ( Y i j ) f σ 2 ( w i ) d w i ,
where f η is the density associated with distribution F η .
By some algebraic manipulation, and by taking the natural logarithm, the log-likelihood function in closed form is given by
( β , σ , η ) = i = 1 n j = 1 n i δ i j log θ β ( x i j ) f η ( Y i j ) + i = 1 n log Γ 1 σ 2 + d i i = 1 n 1 σ 2 + d i log 1 σ 2 + j = 1 n i θ β ( x i j ) F η ( Y i j ) ) 2 n σ 2 log σ n log Γ 1 σ 2 .
For the application of the shared frailty model to our real-world dataset described in Section 5.2, we cluster our data with reference to industry group based on the intuitive reasoning that obligors within the same industry group are subject to similar forces in the economy, which is a source of dependency within these groups. Instead of treating the industry group as a covariate, a shared frailty parameter is fitted as a means to account for the dependence of write-off times within industry groups. Again, we assume F η to be the Weibull distribution function with η = ( λ , k ) as defined in (15).
The final column of Table 5 lists the MLEs of the PTC model with shared frailty. It is important to note that the significant frailty variance parameter σ 2 of the univariate gamma frailty model confirms that the model succeeds in capturing unobserved heterogeneity. However, for the shared frailty model, a significant frailty variance parameter σ 2 confirms that a relationship exists within each industry cluster and that its value can be interpreted to measure the dependence between loan write-off times within each industry cluster (see discussion in [32] (p. 144)).
According to the values of the calculated log likelihood, AIC and BIC, the shared frailty model performs favourably compared to the standard PTC model, despite the shared frailty model being a more parsimonious model. This also suggests that the shared frailty component does succeed in capturing some of the dependence between loan write-off times attributable to the industry of the obligor.

7. Concluding Remarks

In this paper, a PTC model with gamma frailty was developed and applied to a real-world loss dataset. The applicability of a cure proportion in the context of the post default loan resolution process was demonstrated, and we also showed how the inclusion of a frailty parameter in the PTC model can allow for the modelling of a more flexible shape of the hazard function. This may improve the accuracy of the survival model and, equally importantly, provide a better understanding of the underlying processes that drive the occurrence of write-off over time. Furthermore, the flexibility offered by the inclusion of a frailty parameter affords for possibly less reliance to be placed on the selection of a “baseline” distribution F η in the PTC model.
The proportional hazard assumption of the popular CPH model implies that the relative hazard rate remains constant over time with different covariate levels. By including a parametric frailty parameter in the PTC model with fewer covariates, a more flexible relationship between potential (latent) predictors and the time to write-off can be accommodated. Through a Monte Carlo study, we demonstrated that the parameter estimates of the PTC model with gamma frailty are reliable.
We also showed that, by including a shared gamma frailty term in a PTC model, the dependence between the failure times of individual subjects within a cluster can be accounted for, which, in turn, may lead to more accurate estimates of the hazard rate and the write-off times. Our initial exploration of the use of a shared frailty model warrants future investigation, with specific focus on the small sample performance of shared frailty models. Other PTC models with shared frailties can also be considered, such as a log-normal shared frailty model or a compound Poisson shared frailty model that includes two important special cases, namely gamma and IG distributions.
A shared frailty model takes the dependency of the event times within clusters into account, but other forms of dependency may also exist that were not explored in this study. The models we considered assume independence between the censoring time and write-off time of individual loans. A model that allows for possible dependence between these times may be considered in a future study. This is referred to as dependent censoring and is rapidly gaining traction in contemporary research (see, e.g., [44,45,46]).
In a highly regulated industry such as banking, parametric models are often preferred to non-parametric models due to their analytical tractability and interpretability. Parametric models also easily allow for sensitivity analysis and stress testing by varying parameter values. As demonstrated by this study, parametric frailty models offer the potential to characterise unexplained heterogeneity in the write-off times of individual loans. Besides the time to write-off, which can be predicted more accurately, these models also offer the prospect of modelling the probability of write-off and full repayment in the context of censored data. As such, the model offers a viable alternative for modelling the resolution outcome in two-step loss, given default modelling, while making use of the most recent (and therefore most relevant) loan resolution data. The reliance of these models on accurate and trustworthy data requires banks to make substantial investments in systems, expertise and validation efforts. Modellers should take care to avoid spurious accuracy and to understand the economic and operational drivers of credit risk before and after loan default. Inclusion of a frailty parameter to capture residual heterogeneity should therefore not be regarded as a substitute for including known risk drivers as covariates within a prediction model.

Author Contributions

Conceptualization, J.L., J.S.A., G.L.G. and M.S.; Methodology, J.L., J.S.A., G.L.G. and M.S.; Software, M.S.; Validation, J.L., J.S.A. and G.L.G.; Formal analysis, J.S.A. and G.L.G.; Investigation, J.L.; Resources, J.L.; Writing—original draft, J.L.; Writing—review & editing, J.L., J.S.A., G.L.G. and M.S.; Visualization, J.S.A. and G.L.G.; Supervision, J.S.A., G.L.G. and M.S.; Project administration, J.L. The authors all contributed equally. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset presented in this study is available on application from Global Credit Data. https://globalcreditdata.org/.

Acknowledgments

The work of J.S. Allison is based on research supported by the National Research Foundation (NRF). Any opinion, finding, conclusion or recommendation expressed in this material is that of the authors, and the NRF does not accept any liability in this regard.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Botha, A.; Beyers, C.; De Villiers, P. Simulation-based optimisation of the timing of loan recovery across different portfolios. Expert Syst. Appl. 2021, 177, 114878. [Google Scholar] [CrossRef]
  2. Fenech, J.P.; Yap, Y.K.; Shafik, S. Modelling the recovery outcomes for defaulted loans: A survival analysis approach. Econ. Lett. 2016, 145, 79–82. [Google Scholar] [CrossRef]
  3. Betz, J.; Krüger, S.; Kellner, R.; Rösch, D. Macroeconomic effects and frailties in the resolution of non-performing loans. J. Bank. Financ. 2017, 112, 105212. [Google Scholar] [CrossRef]
  4. Malwandla, M.C.; Marimo, M.; Breed, D.G. A cross-sectional survival analysis regression model with applications to consumer credit risk. S. Afr. Stat. J. 2017, 51, 217–234. [Google Scholar]
  5. de Oliveira, M., Jr.; Louzada, F. Recovery Risk: Application of the Latent Competing Risks Model to Non-Performing Loans. arXiv 2014, arXiv:1408.4380. [Google Scholar]
  6. Hibbeln, M.; Gürtler, M. Pitfalls in Modeling Loss Given Default of Bank Loans. J. Bank. Financ. 2013, 37, 2354–2366. [Google Scholar]
  7. Allen, L.N.; Rose, L.C. Financial survival analysis of defaulted debtors. J. Oper. Res. Soc. 2006, 57, 630–636. [Google Scholar] [CrossRef]
  8. Joubert, M.; Verster, T.; Raubenheimer, H.G. Making use of survival analysis to indirectly model loss given default. ORiON 2019, 34, 107–132. [Google Scholar] [CrossRef]
  9. Tong, E.N.C.; Mues, C.; Thomas, L.C. Mixture cure models in credit scoring: If and when borrowers default. Eur. J. Oper. Res. 2012, 218, 132–139. [Google Scholar] [CrossRef]
  10. Smuts, M.; Allison, J.S. An overview of survival analysis with an application in the credit risk environment. ORiON 2021, 36, 89–110. [Google Scholar] [CrossRef]
  11. Dirick, L.; Bellotti, T.; Claeskens, G.; Baesens, B. Macro-Economic Factors in Credit Risk Calculations: Including Time-Varying Covariates in Mixture Cure Models. J. Bus. Econ. Stat. 2019, 37, 40–53. [Google Scholar] [CrossRef]
  12. de Oliveira, M., Jr.; Moreira, F.; Louzada, F. The zero-inflated promotion cure rate model applied to financial data on time-to-default. Cogent Econ. Financ. 2017, 5, 1395950. [Google Scholar] [CrossRef]
  13. Barriga, G.D.C.; Cancho, V.G.; Garibay, D.V.; Cordeiro, G.M.; Ortega, E.M. A new survival model with surviving fraction: An application to colorectal cancer data. Stat. Methods Med Res. 2018, 28, 2665–2680. [Google Scholar] [CrossRef] [PubMed]
  14. Leow, M.; Mues, C. Predicting loss given default (LGD) for residential mortgage loans: A two-stage model and empirical evidence for UK bank data. Int. J. Forecast. 2012, 28, 183–195. [Google Scholar] [CrossRef]
  15. Cox, D.R. Regression models and life-tables. J. R. Stat. Soc. Ser. B (Methodol.) 1972, 34, 187–202. [Google Scholar] [CrossRef]
  16. Banasik, J.; Crook, J.N.; Thomas, L.C. Not if but when will borrowers default. J. Oper. Res. Soc. 1999, 50, 1185–1190. [Google Scholar] [CrossRef]
  17. Stepanova, M.; Thomas, L. Survival analysis methods for personal loan data. Oper. Res. 2002, 50, 277–289. [Google Scholar] [CrossRef]
  18. Cao, R.; Vilar, J.M.; Devia, A. Modelling consumer credit risk via survival analysis. Sort-Stat. Oper. Res. Trans. 2009, 33, 3–30. [Google Scholar]
  19. Boag, J.W. Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J. R. Stat. Society. Ser. B (Methodol.) 1949, 11, 15–53. [Google Scholar] [CrossRef]
  20. Peng, Y.; Yu, B. Cure Models: Methods, Applications, and Implementation; Chapman and Hall/CRC: Boca Raton, FL, USA, 2021. [Google Scholar]
  21. Berkson, J.; Gage, R.P. Survival curve for cancer patients following treatment. J. Am. Stat. Assoc. 1952, 47, 501–515. [Google Scholar] [CrossRef]
  22. Farewell, V.T. The use of mixture models for the analysis of survival data with long-term survivors. Biometrics 1982, 38, 1041–1046. [Google Scholar] [CrossRef] [PubMed]
  23. Amico, M.; van Keilegom, I. Cure models in survival analysis. Annu. Rev. Stat. Appl. 2018, 5, 311–342. [Google Scholar] [CrossRef]
  24. Yakovlev, A.Y.; Asselain, B.; Bardou, V.; Fourquet, A.; Hoang, T.; Rochefediere, A.; Tsodikov, A. A simple stochastic model of tumor recurrence and its application to data on premenopausal breast cancer. Biom. Anal. Donnees Spatio-Temporelles 1993, 12, 66–82. [Google Scholar]
  25. Chen, M.H.; Ibrahim, J.G.; Sinha, D. A New Bayesian Model For Survival Data with a Surviving Fraction. J. Am. Stat. Assoc. 1999, 94, 909–919. [Google Scholar] [CrossRef]
  26. Rodrigues, J.; Cancho, V.G.; de Castro, M.; Louzada-Neto, F. On the unification of long-term survival models. Stat. Probab. Lett. 2009, 79, 753–759. [Google Scholar] [CrossRef]
  27. Zhang, J.; Peng, Y. A new estimation method for the semiparametric accelerated failure time mixture cure model. Stat. Med. 2007, 26, 3157–3171. [Google Scholar] [CrossRef]
  28. Portier, F.; El Ghouch, A.; van Keilegom, I. Efficiency and bootstrap in the promotion time cure model. Bernoulli 2017, 23, 3437–3468. [Google Scholar] [CrossRef]
  29. Delloye, M.; Fermanian, J.D.; Sbai, M. Dynamic frailties and credit portfolio modelling. Risk 2006, 19, 100. [Google Scholar]
  30. Chih-Wei, L.; Chang, M.J. A credit risk model with dynamic frailties for default intensity estimation. Asia Pac. Manag. Rev. 2008, 13, 557–566. [Google Scholar]
  31. Chamboko, R.; Bravo, J.M. Frailty correlated default on retail consumer loans in Zimbabwe. Int. J. Appl. Decis. Sci. 2019, 12, 257–270. [Google Scholar] [CrossRef]
  32. Wienke, A. Frailty Models in Survival Analysis; Chapman and Hall/CRC: Boca Raton, FL, USA, 2010; pp. 57–63. [Google Scholar]
  33. Balan, T.A.; Putter, H. A tutorial on frailty models. Stat. Methods Med Res. 2020, 29, 3424–3454. [Google Scholar] [CrossRef] [PubMed]
  34. Li, C.S.; Taylor, J.M.; Sy, J.P. Identifiability of cure models. Stat. Probab. Lett. 2001, 54, 389–395. [Google Scholar] [CrossRef]
  35. Legrand, C. Advanced Survival Models; CRC Press: Boca Raton, FL, USA, 2021. [Google Scholar]
  36. Dirick, L.; Claeskens, G.; Baesens, B. An Akaike information criterion for multiple event mixture cure models. Eur. J. Oper. Res. 2015, 241, 449–457. [Google Scholar] [CrossRef]
  37. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2022. [Google Scholar]
  38. Efron, B.; Tibshirani, R. Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Stat. Sci. 1986, 1, 54–75. [Google Scholar] [CrossRef]
  39. Claeskens, G.; Nguti, R.; Janssen, P. One-sided tests in shared frailty models. Test 2008, 17, 69–82. [Google Scholar] [CrossRef]
  40. Brumma, N.; Winckle, P. LGD Report 2018-Large Corporate Borrowers. SSRN 2018. [Google Scholar] [CrossRef]
  41. Schuermann, T. What do we know about Loss Given Default? Wharton Financial Institutions Center Working Paper. SSRN 2004. [Google Scholar] [CrossRef]
  42. Khieu, H.D.; Mullineaux, D.J.; Yi, H.C. The determinants of bank loan recovery rates. J. Bank. Financ. 2012, 36, 923–933. [Google Scholar] [CrossRef]
  43. Betz, J.; Kellner, R.; Rösch, D. Time matters: How default resolution times impact final loss rates. J. R. Stat. Soc. Ser. C 2021, 70, 619–644. [Google Scholar] [CrossRef]
  44. Emura, T.; Chen, Y.H. Analysis of Survival Data with Dependent Censoring: Copula-Based Approaches; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
  45. Deresa, N.W.; van Keilegom, I. A multivariate normal regression model for survival data subject to different types of dependent censoring. Comput. Stat. Data Anal. 2020, 144, 106879. [Google Scholar] [CrossRef]
  46. Deresa, N.W.; van Keilegom, I.; Antonio, K. Copula-based inference for bivariate survival data with left truncation and dependent censoring. Insur. Math. Econ. 2022, 107, 1–21. [Google Scholar] [CrossRef]
Figure 1. The Kaplan–Meier estimate of the survival probability of all data with 95% confidence intervals (left pane) and that of the data grouped by industry (right pane).
Figure 1. The Kaplan–Meier estimate of the survival probability of all data with 95% confidence intervals (left pane) and that of the data grouped by industry (right pane).
Mathematics 11 02228 g001
Figure 2. The estimated survival probability (left pane) and hazard rates (right pane) for PTC (black), PTC with gamma frailty (blue) and PTC with IG frailty (grey) models.
Figure 2. The estimated survival probability (left pane) and hazard rates (right pane) for PTC (black), PTC with gamma frailty (blue) and PTC with IG frailty (grey) models.
Mathematics 11 02228 g002
Table 1. Simulation results with 10 % cure fraction and 20 % censoring.
Table 1. Simulation results with 10 % cure fraction and 20 % censoring.
Parameter n = 200 n = 300 n = 500 n = 1000
PTC with gamma frailty and F η a Weibull distribution
β 0 = 2 2.314 (0.988)2.165 (0.706)2.120 (0.512)2.049 (0.330)
β 1 = 1 −1.035 (0.191)−1.016 (0.154)−1.016 (0.116)−1.004 (0.085)
β 2 = 2 2.047 (0.347)2.025 (0.276)2.017 (0.216)2.011 (0.152)
λ = 2 2.070 (1.217)2.064 (0.988)2.027 (0.782)1.998 (0.560)
ν = 1 1.018 (0.113)1.012 (0.090)1.009 (0.069)1.002 (0.048)
σ 2 = 0.5 0.517 (0.302)0.506 (0.247)0.516 (0.179)0.502 (0.127)
PTC with gamma frailty and F η a LFR distribution
β 0 = 2 2.318 (0.833)2.199 (0.651)2.170 (0.493)2.110 (0.353)
β 1 = 1 −1.059 (0.195)−1.028 (0.152)−1.022 (0.108)−1.009 (0.076)
β 2 = 2 2.103 (0.361)2.050 (0.262)2.029 (0.197)2.023 (0.135)
a = 1.5 1.338 (0.804)1.391 (0.682)1.381 (0.551)1.398 (0.415)
b = 2 3.337 (4.018)2.820 (2.966)2.531 (2.510)2.257 (1.882)
σ 2 = 0.5 0.569 (0.308)0.529 (0.238)0.528 (0.161)0.513 (0.107)
PTC with IG frailty and F η a Weibull distribution
β 0 = 2 2.556 (1.461)2.385 (1.227)2.312 (1.018)2.107 (0.496)
β 1 = 1 −1.079 (0.234)−1.057 (0.210)−1.050 (0.167)−1.018 (0.122)
β 2 = 2 2.134 (0.447)2.110 (0.407)2.089 (0.330)2.040 (0.231)
λ = 2 2.228 (1.346)2.181 (1.044)2.096 (0.806)2.046 (0.558)
ν = 1 1.060 (0.168)1.051 (0.159)1.042 (0.130)1.016 (0.090)
ζ = 0.5 1.638 (3.506)1.554 (3.553)1.213 (2.284)0.768 (1.261)
PTC with IG frailty and F η a LFR distribution
β 0 = 2 2.344 (0.973)2.226 (0.703)2.183 (0.497)2.106 (0.332)
β 1 = 1 −1.058 (0.184)−1.035 (0.150)−1.023 (0.105)−1.012 (0.077)
β 2 = 2 2.095 (0.337)2.069 (0.278)2.037 (0.196)2.028 (0.139)
a = 1.5 1.362 (0.816)1.382 (0.673)1.374 (0.551)1.404 (0.402)
b = 2 3.459 (4.147)3.052 (3.301)2.585 (2.561)2.350 (2.054)
ζ = 0.5 0.927 (1.549)0.886 (2.742)0.637 (0.503)0.567 (0.293)
Table 2. Simulation results with 30 % cure fraction and 45 % censoring.
Table 2. Simulation results with 30 % cure fraction and 45 % censoring.
Parameter n = 200 n = 300 n = 500 n = 1000
PTC with gamma frailty and F η a Weibull distribution
β 0 = 0.5 0.789 (0.887)0.686 (0.755)0.581 (0.409)0.546 (0.254)
β 1 = 1 −1.067 (0.245)−1.023 (0.189)−1.022 (0.141)−1.010 (0.102)
β 2 = 2 2.097 (0.426)2.038 (0.337)2.020 (0.247)2.021 (0.179)
λ = 2 2.153 (1.188)2.078 (0.982)2.077 (0.751)1.999 (0.533)
ν = 1 1.039 (0.135)1.020 (0.109)1.013 (0.079)1.005 (0.055)
σ 2 = 0.5 0.595 (0.480)0.530 (0.375)0.525 (0.273)0.517 (0.182)
PTC with gamma frailty and F η a LFR distribution
β 0 = 0.5 0.812 (0.874)0.695 (0.591)0.618 (0.395)0.579 (0.262)
β 1 = 1 −1.092 (0.264)−1.039 (0.200)−1.030 (0.142)−1.014 (0.098)
β 2 = 2 2.158 (0.494)2.073 (0.361)2.038 (0.252)2.029 (0.175)
a = 1.5 1.328 (0.830)1.344 (0.669)1.399 (0.518)1.409 (0.370)
b = 2 3.045 (3.104)2.705 (2.683)2.476 (2.226)2.145 (1.645)
σ 2 = 0.5 0.671 (0.578)0.572 (0.401)0.548 (0.280)0.526 (0.174)
PTC with IG frailty and F η a Weibull distribution
β 0 = 0.5 0.958 (1.203)0.802 (0.950)0.746 (0.791)0.623 (0.433)
β 1 = 1 −1.095 (0.271)−1.062 (0.241)−1.054 (0.205)−1.033 (0.146)
β 2 = 2 2.164 (0.51)2.135 (0.466)2.108 (0.383)2.069 (0.286)
λ = 2 2.202 (1.254)2.206 (1.014)2.131 (0.771)2.018 (0.521)
ν = 1 1.070 (0.179)1.060 (0.166)1.050 (0.137)1.023 (0.100)
ζ = 0.5 2.139 (5.047)1.988 (4.589)1.544 (3.243)1.045 (2.389)
PTC with IG frailty and F η a LFR distribution
β 0 = 0.5 0.932 (0.970)0.829 (0.970)0.690 (0.488)0.628 (0.341)
β 1 = 1 −1.098 (0.266)−1.068 (0.235)−1.038 (0.170)−1.028 (0.120)
β 2 = 2 2.168 (0.483)2.142 (0.426)2.080 (0.318)2.058 (0.238)
a = 1.5 1.256 (0.795)1.298 (0.684)1.350 (0.558)1.374 (0.412)
b = 2 3.138 (3.305)2.888 (2.833)2.584 (2.381)2.248 (1.888)
ζ = 0.5 5.661 (70.567)2.762 (20.101)1.108 (2.504)0.781 (1.191)
Table 3. Parameter estimates and goodness-of-fit measures for all three models fitted to the simulated dataset.
Table 3. Parameter estimates and goodness-of-fit measures for all three models fitted to the simulated dataset.
ParameterPTCPTC with Gamma FrailtyPTC with IG Frailty
β 0 1.724 3.695 3.603
β 1 0.629 0.954 0.978
λ 3.513 1.589 2.778
k 0.790 0.967 1.025
σ 2 0.825 *
ζ 2.244 *
log-likelihood 1485.493 1493.001 1496.808
AIC 2962.986 2976.002 2983.616
BIC 2943.355 2951.463 2959.077
* These parameters are significant at a 5% significance level according to the adjusted likelihood ratio test (see Claeskens et al. [39]).
Table 4. Summary of industry groupings, industry frequency and censored proportion of US large corporation dataset.
Table 4. Summary of industry groupings, industry frequency and censored proportion of US large corporation dataset.
Industry GroupIndustry DescriptionFrequencyCensored
1Primary IndustriesAgriculture, Hunting and Forestry 14.5 % 28.8 %
Fishing and Fishing Products
Mining
2Manufacturing, Utilities and ConstructionManufacturing 33.3 % 33.5 %
Utilities
Construction
3Trade IndustriesWholesale and Retail Trade 24 % 28.1 %
Transportation and Storage
4Service IndustriesHotels and Restaurants 17 % 27.9 %
Finance and Insurance
Real Estate Rental and Leasing
Professional, Scientific and Technical Services
Private Sector Services (Household)
5OtherCommunications 4.7 % 35.6 %
Public Administration and Defence
Education
Health and Social Services
Other Community, Social and Personal Services
−1UnknownUnknown 6.5 % 30.7 %
Table 5. Estimation results of different models on the US large corporation dataset.
Table 5. Estimation results of different models on the US large corporation dataset.
ParameterPTCPTC with Gamma FrailtyPTC with IG FrailtyPTC with Shared Frailty
β 0 3.9206.9682.1410.843
β 1 −0.025−0.021−0.029−0.023
β 2 0.1870.0940.134
β 3 0.0370.2540.083
β 4 0.1750.5010.297
β 5 0.0500.0640.093
β 6 0.0760.1680.117
λ 4.462 × 10 4 4.644 × 10 8 2.523 × 10 5 3.343 × 10 4
k0.8521.5791.3271.073
σ 2 2.157 * 3.926
ζ 2.360 *
log-likelihood−8813.626−8735.423−8753.194−8787.344
AIC17,645.20017,490.85017,526.39017,584.690
BIC17,693.41217,544.35817,579.90017,611.444
* These parameters are significant at the 5% level according to the adjusted likelihood ratio test.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Larney, J.; Allison, J.S.; Grobler, G.L.; Smuts, M. Modelling the Time to Write-Off of Non-Performing Loans Using a Promotion Time Cure Model with Parametric Frailty. Mathematics 2023, 11, 2228. https://0-doi-org.brum.beds.ac.uk/10.3390/math11102228

AMA Style

Larney J, Allison JS, Grobler GL, Smuts M. Modelling the Time to Write-Off of Non-Performing Loans Using a Promotion Time Cure Model with Parametric Frailty. Mathematics. 2023; 11(10):2228. https://0-doi-org.brum.beds.ac.uk/10.3390/math11102228

Chicago/Turabian Style

Larney, Janette, James Samuel Allison, Gerrit Lodewicus Grobler, and Marius Smuts. 2023. "Modelling the Time to Write-Off of Non-Performing Loans Using a Promotion Time Cure Model with Parametric Frailty" Mathematics 11, no. 10: 2228. https://0-doi-org.brum.beds.ac.uk/10.3390/math11102228

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop