Next Article in Journal
The Switching Generator: New Clock-Controlled Generator with Resistance against the Algebraic and Side Channel Attacks
Next Article in Special Issue
A Penalized Likelihood Approach to Parameter Estimation with Integral Reliability Constraints
Previous Article in Journal / Special Issue
A Robust Bayesian Approach to an Optimal Replacement Policy for Gas Pipelines
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Density Regression Based on Proportional Hazards Family

1
Business School, Shihezi University, Xinjiang, 831300, China
2
School of Management, Hefei University of Technology, Hefei, 230009, China
3
Department of Mathematics, College of Engineering, Design and Physical Sciences, Brunel University London, Uxbridge, UB8 3PH, UK
*
Author to whom correspondence should be addressed.
Entropy 2015, 17(6), 3679-3691; https://0-doi-org.brum.beds.ac.uk/10.3390/e17063679
Submission received: 16 March 2015 / Revised: 21 May 2015 / Accepted: 29 May 2015 / Published: 4 June 2015
(This article belongs to the Special Issue Inductive Statistical Methods)

Abstract

:
This paper develops a class of density regression models based on proportional hazards family, namely, Gamma transformation proportional hazard (Gt-PH) model. Exact inference for the regression parameters and hazard ratio is derived. These estimators enjoy some good properties such as unbiased estimation, which may not be shared by other inference methods such as maximum likelihood estimate (MLE). Generalised confidence interval and hypothesis testing for regression parameters are also provided. The method itself is easy to implement in practice. The regression method is also extended to Lasso-based variable selection.

1. Introduction

Many regression models can be derived by allowing probability distribution parameter(s) to depend on covariates, such as generalised linear model (GLM) [1] which typically assumes the mean of a distribution depends on covariates via a link function. Examples include scale parameter regression [26] and variance function regression [7]. In particular, scale parameter is often well-defined when variance function may be infinity under heavy tail circumstance. Scale function provides a more nature dispersion measure than variance for the non-Gaussian case [8]. Moreover, scale function is a robust measure [2,9] of loss amounts paid by an insurance company and demand of a product as depicted by its sales via the popular ‘The SEVERITY Procedure’ in software SAS ( www.sas.com).
In this paper we propose a class of density regression models based on proportional hazards family, which is introduced via the scale parameter of the proportional hazards distribution. The distribution family is defined as follows: assume that the response variables Y has the probability distribution belong to proportional hazards family F (y; λ, θ) or proportional reverse hazards family Fr(y; λ, θ) below:
F ( y ; λ , θ ) = 1 [ 1 G ( y ; λ ) ] θ , F r ( y ; λ , θ ) = [ G ( y ; λ ) ] θ ,
where θ is usually a scale parameter and G(·; λ) is a distribution function possibly dependent only on λ. This family of distributions {F (y; λ, θ), θ > 0} is discussed in [10] (Section 7.E. ff.). In general, we will call θ the proportional parameter and λ the G-parameter. G(y; λ) could even belong to a one-parameter exponential family: its density function is given by a(x) exp(λTT (x)−A(λ)). Therefore, the distribution family (1) is indeed a big family of distributions. Examples of family (1) include Weibull distribution, Gompertz distribution, Lomax distribution, Exponential distribution, Burr type XII distribution, Kumaraswamy distribution and so on. For example, when G(y; λ) = 1 − exp{−yλ} in family (1), we have the two-parameter Weibull distribution.
The proportional hazard family (1) has been extensively used to model failure time and carry out survival analysis. For example, based on the proportional hazards family F (y; λ,θ) = 1−[1−G(y; λ)]θ, the hazard rate of F equals to θ g ( ; λ ) G ¯ ( y ; λ ), which is proportional to the hazard rate g ( ; λ ) G ¯ ( y ; λ ) of G. Where g is the density function of G, and G ¯ ( y ; λ ) = 1 G ( y ; λ ) is the survival function or reliability function of G(y; λ).
Like GLM, the effect of covariates x on Y could be set up by modelling parameter θ as a function of x. In this paper we model log(θ) as a linear regression function of covariates x, that is,
log ( θ ( x ) ) = x T β ,
where β standards for regression coefficients and xT is the transpose of x. Clearly, model (2) includes the methods in the SAS SEVERITY procedure which can model the effect of exogenous or regressor variables on a probability distribution, as long as it has a scale parameter.
Next we develop a method to inference β in Equation (2) via a combination of Gamma random variable based transformation and ordinary least squares (OLS) estimate. This method is particularly suitable for small sample without using bootstrap or Bayesian inference. This method is totally different from existing methods used to fit parametric regression models such as maximum likelihood estimate (MLE). Section 2 details the exact inference with known parameter λ or G(y; λ), including derivation of an unbiased minimum variance estimate for β and hazard ratio of the family (1). Generalised confidence interval estimation and hypothesis testing of regression models are also provided. Section 3 extends the method to unknown parameter λ. Section 4 further introduces adaptive-Lasso based method for the proposed Gamma transformation proportional hazard (Gt-PH) regression when variable selection is required for checking the effect of a big number of covariates x. The numerical performance of the proposed regression model estimation, particularly, the algorithm and the comparison with MLE and asymptotic confidence interval, as well as a real data analysis, are illustrated in Section 5. Finally, a brief conclusion is presented in Section 6.

2. Inference Method for the Proportional Hazard Model

From now on we focus on proportional hazards family F (y; λ, θ) = 1−[1−G(y; λ)]θ but the method can be applied to proportional reverse hazards family without much change, see the brief discussion in Section 6.
First assume that λ or G(y; λ) is known for the proportional hazards family (1). Given independent observations { x i , Y i } i = 1 n of (x, Y), the MLE of regression coefficient β in Equation (2) is usually derived by a likelihood function L(β) such as
L ( β ) = i = 1 n exp ( x i T β ) [ G ¯ ( Y i ; λ ) ] exp ( x i T β ) 1 g ( Y i ; λ ) .
The MLEs of β have no explicit form and only asymptotic unbiased under some regular conditions.
In this section we aim at a simply explicit estimation of β and derivation of its best linear unbiased estimators (BLUE). Here “best” means the lowest variance of the estimate among linear unbiased estimates.
Note that Y ~ F (y; λ, θ), so F (Y ; λ, θ) ~ U[0, 1), and − log[1 − F (Y ; λ, θ)] ~ Exp(1). That is, θ log ( G ¯ ( Y ; λ ) ) ~ E x p ( 1 ).
Given { x i , Y i } i = 1 n, let θi ≡ θ(xi), S i = log ( G ¯ ( Y i ; λ ) ) then 2 θi Si ~ χ2(2). As the distribution χ2(2) is a special case of a Gamma distribution from a Gamma random variable, saying, Γ, and note that a Gamma distribution is the maximum entropy probability distribution of Γ for which E(log(Γ)) = ψ(shape − parameter) − log(1/(scale − parameter)) is fixed, where ψ(t) = d log(Γ(t))/dt is the digamma function. Therefore we have
E [ log ( S i ) + log ( θ i ) ] = ψ ( 1 ) , V ar [ log ( S i ) + log ( θ i ) ] = ψ ( 1 ) ,
where ψ′(t) = d2 log(Γ(t))/dt2, ψ(1) = −γ with the Euler-Mascheroni constant γ ~ 0.5772 and ψ ( 1 ) = π 2 6.
Let
U i = log ( S i ) γ ,
then Equation (3) can be re-written as a standard GLM:
E ( U i ) = x i T β , V ar ( U i ) = ψ ( 1 ) .
Or, in terms of vector and matrix, we have that the regression parameter vector β satisfies GLM
E ( U ) = X β , V ar ( U ) = ψ ( 1 ) 1 ,
with known model variance ψ′(1). Where X is the matrix consisting of observations on covariates and has 1 as the 1st column, U is the vector consisting of observations of Ui. We name this method as Gt-PH model.
Therefore, according to the Gauss-Markov theorem, an exploit form of the BLUE of regression parameter vector β and the variance of the estimators of Gt-PH Model can be obtained as follows:
Theorem 1. Under the distribution family (1) and Gt-PH Model (2), the BLUE of β is given by
β ˜ = ( X T X ) 1 X T U ,
and the variance of this estimator is given by
V ar ( β ˜ ) = ψ ( 1 ) ( X T X ) 1 .
Specially, under a simple linear model of (2): log(θ) = β0 + β1x, we have β ^ 0 = U ¯ x ¯ β ^ 1 = 1 n i log ( S i ) γ x ¯ β ^ 1 and β ^ 1 = j = 1 n ( x 1 j x ¯ ) U j j = 1 n ( x j x ¯ ) 2 = j ( x j x ¯ ) log S j j ( x j x ¯ ) 2. v a r ( β ^ 0 ) = ψ ( 1 ) x 2 ¯ j = 1 n ( x j x ¯ ) 2, v a r ( β ^ 1 ) = ψ ( 1 ) j = 1 n ( x j x ¯ ) 2, c o v ( β ^ 0 , β ^ 1 ) = ψ ( 1 ) x ¯ j = 1 n ( x j x ¯ ) 2.
When Theorem 1 provides the BLUE of regression parameter β, we should discuss the estimation of θ ( x 0 ) = exp ( x 0 T β ) at covariates x = x0 and the hazard ratio HR = exp((xaxb)T β) at covariates xa and xb, which are often interests of practical issues. Theorem 2 below gives the unbiased estimators of θ(x0) and HR.
Theorem 2. Under the assumptions of Theorem 1, θ(x0) and hazard ratio HR are estimated by θ ˜ ( x 0 ) = exp ( x 0 T β ˜ ) and H R ˜ = exp ( ( x a x b ) T β ˜ ), respectively. And their unbiased estimators are provided as follows.
For i = 1,⋯, n, let the indicator vector ei = (0,⋯, 0, 1, 0,⋯, 0)T with ith component as non-zero value 1 only, and let constant c i = x 0 T [ X T X ] 1 X T e i, d i = ( x a x b ) T [ X T X ] 1 X T e i , then
  • if all 1 + ci > 0, the unbiased estimator of θ(x0) is given by
    θ ˜ ( x 0 ) = exp ( γ i = 1 n c i ) θ ^ ( x 0 ) i = 1 n Γ 2 ( 1 + c j ) ,
    with variance (if 1 + 2ci > 0):
    V ar ( θ ˜ ( x 0 ) ) j = 1 n ( Γ ( 1 + 2 c j ) Γ 2 ( 1 + c j ) 1 ) θ ( x 0 ) 2 .
  • if all 1 + di > 0, the unbiased estimator of HR is given by
    H R ˜ = exp ( γ i = 1 n d i ) H R ˜ i = 1 n Γ ( 1 + d i ) ,
    with variance (if all 1 + 2di > 0):
    V ar ( H R ˜ ) = j = 1 n ( Γ ( 1 + 2 d j ) Γ 2 ( 1 + 2 d j ) 1 ) H R 2 .
Proof. When λ is known, we try to derive the expectation of θ(x0) at covariate vector x0 = (1, x01, ⋯, x0n)T.
From
log θ ˜ ( x 0 ) log θ ( x 0 ) = x 0 T ( β ˜ β ) = x 0 T [ X T X ] 1 X T [ U E ( U ) ] ,
and note that
U = ( log S 1 , , log S n ) T γ ,
and
E ( U ) = ( log θ 1 , , log θ n ) T .
where each Si ~ Exp(θi).
Let constant c i = x 0 T [ X T X ] 1 X T e i, then
log θ ˜ ( x 0 ) log θ ( x 0 ) = γ i = 1 n c i + log ( j = 1 n ( θ j S j ) c j ) .
Note that each θjSj ~ Exp(1) and all θjSj are independent, so
θ ˜ ( x 0 ) θ ( x 0 ) = exp ( γ i = 1 n c i ) j = 1 n ( θ j S j ) c j . E ( θ ˜ ( x 0 ) θ ( x 0 ) ) = exp ( γ i = 1 n c i ) j = 1 n Γ ( 1 + c j ) ,
subject to all 1 + cj > 0.
V ar ( θ ˜ ( x 0 ) θ ˜ ( x 0 ) ) = exp ( 2 γ j = 1 n ) j = 1 n ( Γ ( 1 + 2 c j ) Γ 2 ( 1 + c j ) ) .
subject to all 1 + 2cj > 0. Therefore, the unbias estimator of θ(x0) in Theorem 2 can be obtained from a simple modification of the estimator θ ˜ ( x 0 ). Along the exact same line, the unbias estimator of HR in Theorem 2 can be obtained from a simple modification of the estimator H R ˜ = exp ( ( x a x b ) T β ˜ ).
Clearly, these estimators provide exact estimation while exact statistical inference is preferable for many reasons, particularly when the sample size is small or not big enough.

2.1. Confidence Intervals

Confidence intervals (or prediction intervals) for fitting a regression function x 0 T β of xTβ at x = x0 are useful and often required in practice. Recall an asymptotic normal based confidence interval is given by point estimate ± (critical value) (standard error of the estimate). The asymptotic normality based confidence interval can also be derived for x 0 T β and θ ( x 0 ) = ( exp ( x 0 T β ) ). In fact, as the BLUE of a regression function x 0 T β could be estimated by x 0 T β ˜ with known variance π 2 6 x 0 T ( X T X ) 1 x 0, given a significant level 0 < α < 1, the approximate (1 − α)% confidence interval of the regression function x 0 T β is given by
( x 0 T β ˜ Z α / 2 π x 0 T ( X T X ) 1 x 0 / 6 , x 0 T β ˜ Z 1 α / 2 π x 0 T ( X T X ) 1 x 0 / 6 ) .
Similarly, the approximate (1 − α)% confidence interval of θ(x0) is given by
( θ ˜ ( x 0 ) e Z α / 2 π x 0 T ( X T X ) 1 x 0 / 6 , θ ˜ ( x 0 ) e Z 1 α / 2 π x 0 T ( X T X ) 1 x 0 / 6 ) ,
where Zα is the α quantiles of standard normal distribution. Because θ ˜ ( x 0 ) exp ( x 0 T β ˜ ) is approximately log-normal while x 0 T β ˜ is approximately normal.
However, under the proposed approach, Generalised confidence intervals for the regression function x 0 T β and θ(x0) are available. In fact, under the assumptions and conclusions of Theorems 1 and 2, note that x 0 T β ˜ x 0 T β γ j = 1 n c i = j = 1 n c i log ( ξ j ) with ξj = θjSj independently following Exp(1). For a given dataset (n, x0, β ˜), consider pivotal quantity for β : η = x 0 T β ˜ x 0 T β γ i = 1 n c i, which is a log-linear function of independent Exp(1) variables. If ηα denotes the upper α percentile of η, then the values η1−α and ηα can be obtained using Monte Carlo simulations, that is, repeatedly generating the values of η m-times via sampling from log-exponential vector log ξ for the fixed values of (n, x0, β ˜). Let η1−α and ηα+ are the 1 − α generalized lower and upper confidence limits for x 0 T β ˜ x 0 T β γ i = 1 n c i, are respectively. Therefore, the (1 − α)% confidence interval for the regression function x 0 T β and θ(x0) are given respectively by
( x 0 T β ˜ γ i = 1 n c i η α / 2 , x 0 T β ˜ γ i = 1 n c i η 1 α / 2 ) ,
and
( θ ˜ ( x 0 ) exp ( γ i = 1 n c i η α / 2 ) , θ ˜ ( x 0 ) exp ( γ i = 1 n c i η 1 α / 2 ) ) ,
Section 5 will illustrate the performance of asymptotic confidence intervals and exact ones numerically.

2.2. Regression Parameter Testing

Theorems 1 and 2 give the estimation of regression parameter and hazard ratio. In this section we discuss hypothesis testing for regression parameter.
In practice we could test a simple regression parameter or a subset of vector β. Without loss of generality, consider a single regression parameter test: to test if the kth regression coefficient βk (k = 0, 1,⋯, p) equal to a known value βk0.
H 0 : β k = β k 0 v s H 0 : β k β k 0 .
Note that
β k = ( 0 , , 0 , 1 , 0 , , 0 ) T β ,
and let β0 = β except kth component βk = βk0.
Let xk = (0,⋯, 0, 1, 0, ⋯, 0)T, then the test is equivalent to
H 0 : β = β 0 v s H 0 : β β 0 .
Consider the test statistic T of the from:
T = x k T β ˜ x k T β 0 ,
For i = 1 , , n , let h i = x k T ( X T X ) 1 X T e i. Note that, when H0 is true,
T = γ i = 1 n h i + i = 1 n h i log ξ i ,
where ξi are independent Exp(1) variables.
For a given dataset (n, X, U), consider test statistic: η = T γ i = 1 n h i. If ηα denotes the upper α percentile of η, then given a significant level 0 < α < 1, reject H0 when η > ηα/2 or η < η1−α/2.
The values ηα can be obtained using Monte Carlo simulations, that is, repeatedly generating the values of η m-times via sampling from log-exponential vector log(ξ) for the fixed values of (n, X, U).

3. Estimation of G-Parameter λ

When λ or G is unknown, whatever the method to obtain λ or the survival function G ¯ of G, as long as G ¯ then data U are available, Theorems 1 and can be applied to estimate regression parameter β and hazard ratio HR respectively. Below provides a method to estimate both β and λ simultaneously.
Let V ( λ , β ) 2 i = 1 n exp ( x i T β ) ( log ( G ¯ ( Y i ; λ ) ) ), then conditional on x,
  • V (λ, β) ~ χ2(2n) and
  • V (λ, β) is a monotone function of λ.
Because, conditional on x,
  • 2 exp ( x i T β ) ( log ( G ¯ ( Y i ; λ ) ) ) = 2 θ i S i ~ χ 2 ( 2 ) from Section 2 and
  • V ( λ , β ) λ = 2 i = 1 n θ i g ( λ ) G ¯ ( Y i ; λ ) and assume g ( λ ) = g ( Y ; λ ) λ > 0.
Therefore, we may combine Theorem 1 and
V ( λ , β ) = 2 ( n 1 )
to inference both β and λ simultaneously. The numerical studies in Section 5 provide the details of an algorithm for obtaining λ ^ and β ˜.
At the same time, the conditional interval estimate of λ can be given by
V λ 1 ( χ α / 2 2 ( 2 n ) ) , V λ 1 ( χ 1 α / 2 2 ( ( 2 n ) ) ) .

4. Regression Variable Selection Via Adaptive Lasso

If variable selection is required for the proposed Gt-PH model, we outline that any modern Lasso-type of estimate for ordinary linear regression can be implemented here straightaway. We use the adaptive lasso [11] to outline variable selection in the regression model.
The lasso estimates for a regression model E(y) = xTβ are defined as
β ^ ( l a s s o ) = argmin β y j = 1 p x j β j 2 + δ j = 1 p | β j | ,
where δ is a nonnegative regularization parameter. The second term in (7) is the so-called L1 penalty, which is crucial for the success of the lasso.
An ideal lasso procedure should be able to identify the true model with probability one, and provide consistent and efficient estimators for the relevant regression coefficients. We use a convex adaptive lasso penalty to illustrate its suitability of our regression model, but many penalties can be applied with regression model (2). This adaptive penalty adapts each coefficient with a weight to reflect the importance of the corresponding covariate, which is equivalent to using different tuning parameters for different coefficients. The coefficients of unimportant covariates are assigned larger weights so that they can be shrunk to zero more easily, leading to the oracle property [11].
When β in regression function (2) satisfies regression model (5), an adaptive lasso for estimating β could be derived via
β ^ ( a d a p t l a s s o ) = argmin β U j = 0 p x j β j 2 + δ n j = 0 p w j | β j | ,
where w = (w0,.., wp)T is a known weights vector. If the weights are data-dependent such as w j = 1 / | β ^ j ( i n i t i a l ) | with an initial estimator β ^ j ( i n i t i a l ), then the weighted lasso can have the oracle properties. The reciprocal of any consistent estimator of β can be used as the adapting weights; here we may suggest the maximum likelihood estimator of β.
That is, let A = { j : β j 0 } and assume that | A | = q < p, then the true regression model depends only on a subset of x. According to Theorem 1 that the n ( β ˜ β ) has zero bias and variance ψ ( 1 ) ( X T X n ) 1, so that n ( β ˜ β ) = O p ( 1 ) when n → ∞. According to the Theorem 2 in [11] we have
Theorem 3. suppose that δ n = o ( n ) and δnn(ν−1)/2 → ∞, then
  • β ˜ ( a d a p t l a s s o ) can identify the right subset model A.
  • β ˜ ( a d a p t l a s s o ) has the optimal estimation rate,
    n ( β ˜ ( a d a p t l a s s o ) β ( a d a p t l a s s o ) ) N ( 0 , ) ,
    where Σ = ψ ( 1 ) ( X A T X A n ) 1 with X A derived from a sub-matrix of X which corresponds to the true subset model. Clearly, X A T X A is a q × q matrix.
Finally, existing algorithms and software for adaptive lasso such as the R-package Package “parcor” [12] can be implemented for the proposed Gt-PH model straightaway.

5. Numerical Analysis

In this section we first carry out some numerical analysis to illustrate the performance of the proposed Gt-PH regression method and make some comparison with MLE-based inference, which we focus on finite sample performance of estimators of regression parameter β and G-parameter λ. Then we apply the proposed Gt-PH model to a real data analysis which models the effect of age, gender and body mass index (BMI) on the length of stay (LOS) of heart attach patients.
Our algorithm for estimating λ and β in the proposed Gt-PH model is in Algorithm 1:
Algorithm 1:. Algorithm for estimating λ and β in the proposed Gt-PH model.
Algorithm 1:. Algorithm for estimating λ and β in the proposed Gt-PH model.
  • Given data (x, Y ), use Y only to fit F(y; λ, θ) by a method such as MLE with R-function fitdistr to obtain an initial estimate of λ;
  • known λ, obtain observed vector U from U i = log ( G ¯ ( Y i ; λ ) γ;
  • obtain the estimators of β via a linear regression model (5) with data (x, U) with R-function lm;
  • plug the estimators of β into (III) and estimate λ via Equation (6): V (λ, β) = 2(n − 1);
  • repeat steps (II), III) and (IV) until convergence.
In contrast, given data (x, Y ) the MLE-based algorithm to fit regression model (2) is based on the log-likelihood function of (β, λ), which is given by
l ( β , λ ) i = 1 n j = 1 q β j x i j + i = 1 n log g ( Y i ; λ ) + i = 1 n j = 1 q ( β j x i j ) log ( G ¯ ( Y i ; λ ) ) .
Then MLEs of β and λ can be derived via the partial derivatives of l(β, λ).
For example, consider a simple linear regression model via parameter θ: θ = exp(β0 + β1x) depends on x, for a Weibull distribution with Y ~ F (y; λ, θ) = 1 − exp(−θyλ), then G(y; λ) = 1 −exp(−yλ) and the (conditional) likelihood function is given by
L ( ( x , y ) ; β 0 , β 1 , λ ) = λ n i = 1 n θ i Y i λ 1 exp ( θ i Y i λ ) ,
and then the log-likelihood function is given by
l ( β 0 , β 1 , λ ) = n log ( λ ) + i log ( θ i ) + ( λ 1 ) i log ( Y i ) i θ i Y i λ . = n log ( λ ) + n β 0 + β 1 i x i + ( λ 1 ) i log Y i + exp ( β 0 ) i exp ( β 1 x i ) Y i λ .
Then the MLEs for (λ, β0, β1) satisfy
0 = λ 2 exp ( β 0 ) 1 n i exp ( β 1 x i ) Y i λ 1 + λ 1 n i log ( Y i ) + 1 β 0 = log ( 1 n i exp ( β 1 x i ) Y i λ ) exp ( β 0 ) i x i = i x i Y i λ exp ( β 1 x i ) .
Now let the true values (β0, β1, λ) = (−1, −1, 0.5). We assess the performance of the proposed Gt-PH algorithm and MLE via the experiment with three different sample sizes: n = 20, 50, 100 for data (xi, yi) (i = 1,⋯, n). Table 1 summaries the biases and mean square error (MSE)s of each parameter estimator from both Gt-PH model and MLE method under 2000 times of replications.
Clearly, the Gt-PH model is premising for sample size less than 100, but it’s does not always outperform over MLE for sample size equal 100 or more than 100.
We have also checked the performance of new method with other values of λ = 1, 1.5. It seem that the G-parameter has little impact on regression estimation. That is, given the regression model (2), selection of λ = 1 or λ > 1 or λ < 1 has very little impact on the estimate of β.
In terms of confidence interval for regression function and hazard function, under the liner regression log(θ(x)) = β0 + β1x, we have x T ( X T X ) 1 x = X ¯ 2 2 x X ¯ + x 2 i = 1 n ( X i X ) 2. Figure 1 at the bottom of this paper plots the 95% confidence intervals for both fitted regression line and hazard function. Clearly, generalised confidence intervals have good coverage properties and much shorter interval lengths than approximate intervals.

LOS of Worcester Heart Attack Study

Based on the Worcester Heart Attack Study [13] we aim to investigate how the age (years), gender (female = 1 and male = 0) and BMI affect LOS. As the distribution of LOS is typically skewed, we use a Weibull distribution to modeling LOS and then fit a regression of LOS over gender, age, interaction of gender and age as well as BMI via the proposed Gt-PH regression, based on the data WHAS100 Data [13] whose size is 100.
The regression model (2) for this special case and aim is introduced as
log ( θ ) = β 0 + β 1 G e n d e r + β 2 A g e + β 3 A g e × G e n d e r + β 4 B M I .
We are then able to check and compare the effect of the factors on the distribution of LOS via the estimators of these regression parameters.
According to our algorithm, we first fit a Weibull distribution F (y; λ, θ) = 1 − exp(−θyλ) to obtain an initial value of λ0 = 1.421, then replace the θ by the regression above and start the Gt-PH algorithm. After around 10 times of iteration we observe the convergence and obtain the fitted regression model as
log ( θ ^ ) = 2 . 745 + 25 . 825 × G e n d e r 0 . 104 × A g e 0 . 461 × A g e × G e n d e r 0 . 224 × B M I .
and λ = 1.432. Then we could have many different ways for interpretation of the analysis. For example, we could get proper interpretation of the analysis via the median of LOS: note that the logarithm of the median of LOS under the Weibull assumption and fitted Gt-PH model is given by
log ( m e d i a n ) = ( log ( θ ) + log ( 2 ) ) / λ = 2 . 610 18 . 034 × G e n d e r + 0 . 0727 × A g e + 0 . 322 × G e n d e r × A g e + 0 . 156 × B M I .
Clearly, in terms of logarithm of the median of LOS, female patients stay about 18 days shorter than male patients in hospital, and it increases 0.0727 days when patient age increases 1 year. Finally, the interaction of gener and age as well BMI have positive effect on LOS.

6. Discussion

Regression analysis is one of the most important methods in statistics, which is widely used in almost all science and social science research. The proposed Gt-PH (proportional hazard family-based regression) models and their inference methods in this paper are suitable for not only small data analysis but also big data based variable selection. The method and algorithm are easy to implement and have good interpretation in practice.
The method can also be applied to the proportional reverse hazard family Fr(y; λ, θ) defined in (1) straightway. In fact, if a random variable Y belongs to the family, then − log(Fr(Y ; λ, θ)) ~ Exp(1), so θ(− log(G(λ))) ~ Exp(1). Let the random variable S = − log(G(Y ; λ)), then Equation (3) holds.
However, the new method is only suitable for the proportional hazard family and reverse hazard family. Extension to more general family of distribution will be discussed in another paper.

Acknowledgments

This work was partially supported by the National Natural Science Foundation of China (Grant No. 71490725, 71071087 and 11261048).

Author Contributions

Under the idea and detailed guidance of the corresponding author, the first author carried out all numerical calculations and a first draft. Both authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Nelder, J.A.; Wedderburn, R.W.M. Generalized linear models. J. R. Stat. Soc. A 1972, 135, 370–384. [Google Scholar]
  2. Carroll, R.J.; Ruppert, D. Transformationand Weighting in Regression; Chapman & Hall: London, UK, 1988. [Google Scholar]
  3. Smyth, G.K. Generalized linear models with varying dispersion. J. R. Stat. Soc. B 1989, 51, 47–60. [Google Scholar]
  4. Smyth, G.K. An efficient algorithm for REML in heteroscedastic regression. J. Comput. Graph. Stat. 2002, 11, 836–847. [Google Scholar]
  5. Engel, J.; Huele, A.F. A generalized linear modeling approach to robust design. Technometrics 1996, 38, 365–373. [Google Scholar]
  6. Lee, Y.; Nelder, J.A. Double hierarchical generalized linear models. Appl. Stat. 2006, 55, 139–185. [Google Scholar]
  7. Western, B.; Bloome, D. Variance function regression for studying inequality. Sociol. Methodol. 2009, 39, 293–325. [Google Scholar]
  8. Bickel, P.J.; Lehmann, E. Descriptive statistics for nonparametric models. III: dispersion. Ann. Stat. 1976, 4, 1139–1158. [Google Scholar]
  9. Newey, W.K.; Powell, J.L. Asymmetric Least Squares Estimation and Testing. Econometrica 1987, 55, 819–847. [Google Scholar]
  10. Marshall, A.W.; Olkin, I. Life Distributions; Structure of Nonparametric, Semiparametric, and Parametric Families; Springer: New York, NY, USA, 2007. [Google Scholar]
  11. Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar]
  12. Package “parcor”. Available online: www.cran.r-project.org/web/packages/parcor/parcor.pdf accessed on 2 June 2015.
  13. Hosmer, D.; Lemeshow, S.; May, S. Applied Survival Analysis: Regression Modeling of Time to Event Data, 2nd ed; Wiley: New York, NY, USA, 2008. [Google Scholar]
Figure 1. (Left): Fitted regression line (black line) for log θ(x) = −1 − x and its 95% generalised confidence intervals by the proposed method (blue lines) and normal approximate method (red lines). (Right): Fitted hazard function θ(x) (black curve) and its 95% generalised confidence intervals by the proposed method (blue curves) and normal approximate method (red curves).
Figure 1. (Left): Fitted regression line (black line) for log θ(x) = −1 − x and its 95% generalised confidence intervals by the proposed method (blue lines) and normal approximate method (red lines). (Right): Fitted hazard function θ(x) (black curve) and its 95% generalised confidence intervals by the proposed method (blue curves) and normal approximate method (red curves).
Entropy 17 03679f1
Table 1. The biases and mean square error (MSE)s of the estimators of the parameters (λ, β0, β1).
Table 1. The biases and mean square error (MSE)s of the estimators of the parameters (λ, β0, β1).
nParameterMethodBiasMSE
λGt-PH−0.00810.0905
MLE0.05220.1083

20β0Gt-PH0.00360.0904
MLE0.01520.0989

β1Gt-PH−0.00140.0989
MLE0.016320.1231

λGt-PH−0.00320.0712
MLE0.03740.0923

50β0Gt-PH0.00200.0336
MLE−0.01450.0892

β1Gt-PH−0.00540.0359
MLE0.00950.0892

λGt-PH0.00180.0523
MLE0.00750.0801

100β0Gt-PH0.00010.0173
MLE−0.00230.0154

β1Gt-PH0.00010.0174
MLE0.00150.0452

Share and Cite

MDPI and ACS Style

Dang, W.; Yu, K. Density Regression Based on Proportional Hazards Family. Entropy 2015, 17, 3679-3691. https://0-doi-org.brum.beds.ac.uk/10.3390/e17063679

AMA Style

Dang W, Yu K. Density Regression Based on Proportional Hazards Family. Entropy. 2015; 17(6):3679-3691. https://0-doi-org.brum.beds.ac.uk/10.3390/e17063679

Chicago/Turabian Style

Dang, Wei, and Keming Yu. 2015. "Density Regression Based on Proportional Hazards Family" Entropy 17, no. 6: 3679-3691. https://0-doi-org.brum.beds.ac.uk/10.3390/e17063679

Article Metrics

Back to TopTop