Next Article in Journal
Thermoelectric Relations in the Conformal Limit in Dirac and Weyl Semimetals
Previous Article in Journal
Significance of Non-Linear Terms in the Relativistic Coupled-Cluster Theory in the Determination of Molecular Properties
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multivariate Gamma Regression: Parameter Estimation, Hypothesis Testing, and Its Application

1
Department of Statistics, Institut Teknologi Sepuluh Nopember, Surabaya 60111, Indonesia
2
Department of Statistics, Bina Nusantara University, Jakarta 11480, Indonesia
*
Author to whom correspondence should be addressed.
Submission received: 17 April 2020 / Revised: 3 May 2020 / Accepted: 7 May 2020 / Published: 14 May 2020

Abstract

:
Gamma distribution is a general type of statistical distribution that can be applied in various fields, mainly when the distribution of data is not symmetrical. When predictor variables also affect positive outcome, then gamma regression plays a role. In many cases, the predictor variables give effect to several responses simultaneously. In this article, we develop a multivariate gamma regression (MGR), which is one type of non-linear regression with response variables that follow a multivariate gamma (MG) distribution. This work also provides the parameter estimation procedure, test statistics, and hypothesis testing for the significance of the parameter, partially and simultaneously. The parameter estimators are obtained using the maximum likelihood estimation (MLE) that is optimized by numerical iteration using the Berndt–Hall–Hall–Hausman (BHHH) algorithm. The simultaneous test for the model’s significance is derived using the maximum likelihood ratio test (MLRT), whereas the partial test uses the Wald test. The proposed MGR model is applied to model the three dimensions of the human development index (HDI) with five predictor variables. The unit of observation is regency/municipality in Java, Indonesia, in 2018. The empirical results show that modeling using multiple predictors makes more sense compared to the model when it only employs a single predictor.

1. Introduction

Gamma distribution is one family of continuous probability distributions and generalizations of exponential distributions [1]. Nagar, Correa, and Gupta [2] mentioned that the gamma distribution function was first introduced by Swiss mathematician Leonhard Euler (1729). Because this function is considered important, many researchers have studied and developed it. Bhattacharya [3], among others, conducted a study on testing the homogeneity of the parameters (shape and scale) of the gamma distribution. Chen and Kotz [4] conducted a study on the probability density function (pdf) of gamma distribution with three parameters (shape, scale, and location). Many researchers also study and develop bivariate gamma distribution; among others are Schickedanz and Krause [5], who conducted a study on testing scale parameters from two gamma-distributed data using the generalized likelihood ratio (GLR). Nadarajah [6] studied the types of bivariate gamma distribution. Next, Nadarajah and Gupta [7] developed two new bivariate gamma distributions based on gamma and beta random variables. In addition, Mathai and Moschopoulos [8] discussed joint densities, product moments, conditional densities, and conditional moments that were developed from two bivariate gamma distributions.
One statistical method that can be applied to analyze the data that follow gamma distribution and its predictor variables is gamma regression. Gamma regression is a type of non-linear regression. A non-linear regression contains at least one parameter with a non-linear form [9,10]. The gamma regression with multiple responses is the so-called multivariate gamma regression (MGR).
The MGR model proposed in this article is the extension of the trivariate gamma regression (TGR) proposed by Rahayu, Purhadi, Sutikno, and Prastyo [11], which describes the theory of parameter estimation and its hypothesis testing. The MGR is developed based on multivariate gamma distribution with three parameters (shape, scale, and location). The supporting references about multivariate gamma distribution were written by Mathai and Moschopoulos [12], and Vaidyanathan and Lakshmi [13]. The parameter estimation method for MGR in this study uses maximum likelihood estimation (MLE). However, the solution cannot be obtained in the closed form. Therefore, a numerical method is needed to achieve the parameter estimator value. The numerical optimization used in this study is the Berndt–Hall–Hall–Hausman (BHHH).
Based on the previously mentioned background, the aims of this study are: (i) how to construct the MGR model, (ii) how to estimate the parameters, and (iii) how to test the significance of the model as well as the significance of the individual parameter. The last objective of this work is how to apply the proposed MGR model to real data. The case study used in this study includes the factors that affect the life expectancy index (first response), education index (second response), and expenditure index (third response), the three indexes that compose the human development dimensions. The unit of observation is the regency/municipality in Java, Indonesia, in 2018. The predictor variables include the percentage of households that have a private toilet, net enrollment rate of schooling, population density, the percentage of poor people, and the unemployment rate.
The rest of the article is organized as follows. Section 2 introduces the detail of the proposed MGR model. Section 3 and Section 4 explore the data and application, respectively. The last section contains conclusions and further research.

2. Multivariate Gamma Regression Model

Suppose y l is the response variables data ( y l 1 , y l 2 , , y l k ) that follows multivariate gamma distribution and x l is the corresponding predictor variables ( x l 1 , x l 2 , , x l s ) , with sample size as n observations ( l = 1 , 2 , , n ) . In this section, we discuss the construction of the MGR model, its parameter estimation, and hypothesis testing. A short explanation about univariate gamma regression is introduced to make a smooth transition into the MGR model.
According to Balakrishnan and Wang [14], a random variable Y follows univariate gamma distribution with three parameters ( α , γ , λ ) , denoted by Y ~ Gamma ( α , γ , λ ) , with pdf formulated in Equation (1).
f ( y ) = { 1 γ α Γ ( α ) ( y λ ) α 1 e y λ γ ; α , γ , λ > 0 ,   λ < y < , ; otherwise .
If Y ~ Gamma ( α , γ , λ ) , then the statistics are as follows [15,16].
μ = E ( y ) = γ α + λ , V a r ( y ) = α γ 2 , S t d e v ( y ) = α γ , and the skewness is γ 1 = 2 α .
Mathai and Moschopoulos (1992) defined the pdf as in Equation (2) for a pair of random variables ( Y 1 , Y 2 ) that follows bivariate gamma distribution as:
f ( y 1 , y 2 ) = ( y 1 λ 1 ) α 1 1 ( y 2 y 1 λ 2 ) α 2 1 e y 2 i = 1 2 λ i γ γ α 2 * i = 1 2 Γ ( α i ) ,
with α i > 0 ,   γ > 0 ,   λ i R ,   λ 1 < y 1 < ,   λ 2 < y 2 < ,   α 2 * = α 1 + α 2 ,   i = 1 , 2 , f ( y 1 , y 2 ) = 0 for otherwise.
The mean for Y 1 and Y 2 are E ( Y 1 ) = γ α 1 + λ 1 and E ( Y 2 ) = γ ( α 1 + α 2 ) + λ 1 + λ 2 , while the variances are V a r ( Y 1 ) = γ 2 α 1   and   V a r ( Y 2 ) = γ 2 ( α 1 + α 2 ) .
Suppose there are k response variables; the pdf for random variables ( Y 1 , Y 2 , , Y k ) that follow multivariate gamma distribution (Mathai and Moschopoulos, 1992) is:
f ( y 1 , y 2 , , y k ) = ( y 1 λ 1 ) α 1 1 ( y 2 y 1 λ 2 ) α 2 1 ( y k y k 1 λ k ) α k 1 e y k i = 1 k λ i γ γ α k * i = 1 k Γ ( α i ) ,
with α i > 0 , γ > 0 , λ i R , λ 1 < y 1 < , λ 2 < y 2 < , λ k < y k < , α k * = α 1 + α 2 + + α k , i = 1 , 2 , k , otherwise f ( y 1 , y 2 , , y k ) = 0 .
The mean and variance for Y i are E ( Y i ) = γ α i * + λ i * and V a r ( Y i ) = γ 2 α i * with α i * = α 1 + α 2 + + α i and λ i * = λ 1 + λ 2 + + λ i . The MGR model can be stated in Equation (4).
E ( Y i ) = γ α i * + λ i * = e x T β i ,   i = 1 , 2 , , k ,
with α i * = α 1 + α 2 + + α i ,   λ i * = λ 1 + λ 2 + + λ i .
The pdf for the lth observation is formulated in Equation (5) which will be used to compose the likelihood function in Equation (6).
f ( y l 1 , y l 2 , , y l k ) = ( y l 1 λ 1 ) α 1 1 ( y l 2 y l 1 λ 2 ) α 2 1 ( y l k y l ( k 1 ) λ k ) α k 1 e y l k i = 1 k λ i γ γ α k * Γ ( α 1 ) Γ ( α 2 ) Γ ( α k ) ,
with α i > 0 , γ > 0 , λ i R , λ 1 < y l 1 < , λ 2 < y l 2 < , λ k < y l k < ,
  • α 1 = e x l T β 1 λ 1 γ , α 2 = e x l T β 2 e x l T β 1 λ 2 γ , , α k = e x l T β k e x l T β k 1 λ k γ ,
  • α k * = α 1 + α 2 + + α k = e x l T β k λ 1 λ 2 λ k γ , otherwise f ( y l 1 , y l 2 , , y l k ) = 0 .
Later, we discuss parameter estimation on MGR using MLE. The likelihood function constructed from Equation (5) is:
L ( γ , λ 1 , λ 2 , , λ k , β 1 , β 2 , , β k ) = l = 1 n   f ( y l 1 , y l 2 , , y l k ) = l = 1 n ( ( y l 1 λ 1 ) α 1 1 ( y l 2 y l 1 λ 2 ) α 2 1 ( y l k y l ( k 1 ) λ k ) α k 1 e y l k i = 1 k λ i γ γ α k * Γ ( α 1 ) Γ ( α 2 ) Γ ( α k ) ) ,
with values α 1 , α 2 , α k , and   α k * based on Equation (5).
The log-likelihood function from Equation (6) is:
log L ( γ , λ 1 , λ 2 , , λ k , β 1 , β 2 , , β k ) = l = 1 n log [ ( y l 1 λ 1 ) α 1 1 ( y l 2 y l 1 λ 2 ) α 2 1 ( y l k y l ( k 1 ) λ k ) α k 1 e y l k i = 1 k λ i γ γ α k *   Γ ( α 1 )   Γ ( α 2 ) Γ ( α k ) ] .
By substituting the values of α 1 , α 2 , α k , and   α k * according to Equation (5), the log-likelihood function is:
log L ( γ , λ 1 , λ 2 , , λ k , β 1 , β 2 , , β k ) = l = 1 n e x l T β 1 λ 1 γ γ log ( y l 1 λ 1 ) + l = 1 n e x l T β 2 e x l T β 1 λ 2 γ γ log ( y l 2 y l 1 λ 2 ) + + l = 1 n e x l T β k e x l T β k 1 λ k γ γ log ( y l k y l ( k 1 ) λ k ) l = 1 n y l k λ 1 λ 2 λ k γ l = 1 n e x l T β k λ 1 λ 2 λ k γ log γ l = 1 n log Γ ( e x l T β 1 λ 1 γ ) l = 1 n log Γ ( e x l T β 2 e x l T β 1 λ 2 γ ) l = 1 n log Γ ( e x l T β k e x l T β k 1 λ k γ ) .
In this article, the log value is based on e or natural logarithm. The first derivatives of the log-likelihood function for each parameter are as follows.
log L ( γ , λ 1 , λ 2 , , λ k , β 1 , β 2 , , β k ) γ = l = 1 n ( λ 1 e x l T β 1 γ 2 log ( y l 1 λ 1 ) ) + l = 1 n ( e x l T β 1 e x l T β 2 + λ 2 γ 2 log ( y l 2 y l 1 λ 2 ) ) + + l = 1 n ( e x l T β k 1 e x l T β k + λ k γ 2 log ( y l k y l ( k 1 ) λ k ) ) + l = 1 n y l k γ 2 n λ 1 γ 2 n λ 2 γ 2 n λ 3 γ 2 ( n ( log γ ) λ 1 γ 2 n λ 1 γ 2 + n ( log γ ) λ 2 γ 2 n λ 2 γ 2 + n ( log γ ) λ 3 γ 2 n λ 3 γ 2 + l = 1 n ( ( log γ ) e x l T β k γ 2 + e x l T β k γ 2 ) ) l = 1 n ( 1 γ 2 ( Ψ ( e x l T β 1 λ 1 γ ) ) ( e x l T β 1 λ 1 ) ) l = 1 n ( 1 γ 2 ( Ψ ( e x l T β 2 e x l T β 1 λ 2 γ ) ) ( e x l T β 2 e x l T β 1 λ 2 ) ) l = 1 n ( 1 γ 2 ( Ψ ( e x l T β k e x l T β k 1 λ k γ ) ) ( e x l T β k e x l T β k 1 λ k ) ) ,
log L ( γ , λ 1 , λ 2 , , λ k , β 1 , β 2 , , β k ) λ 1 = l = 1 n ( log ( y l 1 λ 1 ) γ e x l T β 1 λ 1 γ γ ( y l 1 λ 1 ) ) + n γ + n ( log γ ) γ l = 1 n ( 1 γ Ψ ( e x l T β 1 λ 1 γ ) ) ,
log L ( γ , λ 1 , λ 2 , , λ k , β 1 , β 2 , , β k ) λ 2 = l = 1 n ( log ( y l 2 y l 1 λ 2 ) γ e x l T β 2 e x l T β 1 λ 2 γ γ ( y l 2 y l 1 λ 2 ) ) + n γ + n ( log γ ) γ l = 1 n ( 1 γ Ψ ( e x l T β 2 e x l T β 1 λ 2 γ ) ) ,
log L ( γ , λ 1 , λ 2 , , λ k , β 1 , β 2 , , β k ) λ k = l = 1 n ( log ( y l k y l ( k 1 ) λ k ) γ e x l T β k e x l T β k 1 λ k γ γ ( y l k y l ( k 1 ) λ k ) ) + n γ + n ( log γ ) γ l = 1 n ( 1 γ Ψ ( e x l T β k e x l T β k 1 λ k γ ) ) ,
log L ( γ , λ 1 , λ 2 , , λ k , β 1 , β 2 , , β k ) β 1 = l = 1 n ( ( log ( y l 1 λ 1 ) ) x l T e x l T β 1 γ ) l = 1 n ( ( log ( y l 2 y l 1 λ 2 ) ) x l T e x l T β 1 γ ) l = 1 n ( ( Ψ ( e x l T β 1 λ 1 γ ) ) x l T e x l T β 1 γ ) l = 1 n ( ( Ψ ( e x l T β 2 e x l T β 1 λ 2 γ ) ) x l T e x l T β 1 γ ) ,
log L ( γ , λ 1 , λ 2 , , λ k , β 1 , β 2 , , β k ) β 2 = l = 1 n ( ( log ( y l 2 y l 1 λ 2 ) ) x l T e x l T β 2 γ ) l = 1 n ( ( Ψ ( e x l T β 2 e x l T β 1 λ 2 γ ) ) x l T e x l T β 2 γ ) l = 1 n ( ( Ψ ( e x l T β 3 e x l T β 2 λ 3 γ ) ) x l T e x l T β 2 γ ) ,
log L ( γ , λ 1 , λ 2 , , λ k , β 1 , β 2 , , β k ) β k = l = 1 n ( ( log ( y l k y l ( k 1 ) λ k ) ) x l T e x l T β k γ ) l = 1 n ( ( log γ ) x l T e x l T β k γ ) l = 1 n ( ( Ψ ( e x l T β k e x l T β k 1 λ k γ ) ) x l T e x l T β k γ ) ,
with Ψ ( z ) = digamma function, which is the first derivative of gamma function, formulated with Ψ ( z ) = [ log Γ ( z ) ] d z = Γ ( z ) Γ ( z ) .
A maximum likelihood (ML) can be found by setting all the derivatives above to zero and solving the system. No closed-form solution to that system can be found. A numerical method is needed to obtain the solution, i.e., parameter estimate γ ^ , λ ^ 1 , λ ^ 2 , , λ ^ k , β ^ 1 , β ^ 2 , , β ^ k . One of the numerical techniques that can be employed is the BHHH algorithm as follows.
  • Step 1. Determine the initial value for θ ^ ( 0 ) = [ γ ^ ( 0 )   λ ^ 1 ( 0 )   λ ^ 2 ( 0 )   λ ^ k ( 0 )   β ^ 1 T ( 0 )   β ^ 2 T ( 0 ) β ^ k T ( 0 ) ] T , where γ ^ ( 0 ) > 0 , λ ^ 1 ( 0 ) , λ ^ 2 ( 0 ) , , λ ^ k ( 0 ) R satisfies the constraints in Equation (5), and β ^ 1 T ( 0 ) , β ^ 2 T ( 0 ) , , β ^ k T ( 0 ) are obtained from the estimate of univariate gamma regression. The Hessian H ( θ ^ ) in BHHH is approximated as the negative of the sum of the outer products of the gradients of individual observations. The gradient vector g ( θ ^ ) is a vector with each element consisting of the first derivative of the log-likelihood function for each of the estimated parameters.
  • Step 2. Determine the tolerance limit so that the BHHH iteration process stops. In this study, the tolerance value used is ε = 10 8 .
  • Step 3. Start the BHHH iteration using the following formula.
    θ ^ ( p + 1 ) = θ ^ ( p ) H 1 ( θ ^ ( p ) ) g ( θ ^ ( p ) ) ,
    with p = 0 , 1 , 2 , , p * .
  • Step 4. The iteration stops at the p * t h iteration if it satisfies θ ^ ( p * + 1 ) θ ^ ( p * ) ε . When converging, the last iteration produces an estimator value for each parameter.
The null hypothesis on the MGR model is H 0 : β 11 = β 21 = = β s 1 = β 12 = β 22 = = β s 2 = = β 1 k = β 2 k = = β s k = 0 and alternative hypothesis H1. At least one β q i 0 , with q = 1 , 2 , , s , i = 1 , 2 , , k . Ω = { γ , λ 1 , , λ k , β 1 , β 2 , , β k } is the set of parameters under the population. The ω = { γ , λ 1 , , λ k , β 01 , β 02 , , β 0 k } is the set of parameters under the null hypothesis. The first derivatives of the log-likelihood function for each parameter under the null hypothesis are provided in Appendix A.
Proposition 1.
If Ω is a set of parameters under the population, ω is a set of parameters under the null hypothesis, and the hypothesis being used is the simultaneous test of MGR model, then the test statistic is G 2 = 2 log Λ = 2 log L ( Ω ^ ) 2 log L ( ω ^ ) .
A Corollary of Proposition 1:
The hypothesis being used in the simultaneous test of the MGR model in Section 2 can be stated in the following form: the null hypothesis is β * = 0 ( s k × 1 ) and the alternative hypothesis is β * 0 ( s k × 1 ) , with β * = [ β 1 * T β 2 * T β i * T β k * T ] T and β i * = [ β 1 i β 2 i β s i ] for i = 1 , 2 , , k .
It is noted that θ ^ Ω and θ ^ ω are estimators that maximize the likelihood and the log-likelihood functions under the population and under the null hypothesis. The principle of the MLE method is to maximize the likelihood functions [17]. The following are test statistics for the hypothesis being used in the simultaneous test of the MGR model in Section 2.
Λ = L ( ω ^ ) L ( Ω ^ ) < Λ 0 ,
where Λ 0 is a constant value between 0 < Λ 0 1 .
L ( ω ^ ) and L ( Ω ^ ) in Equation (16) are:
L ( ω ^ ) = l = 1 n ( ( y l 1 λ ^ 1 ) α ^ 11 1 ( y l 2 y l 1 λ ^ 2 ) α ^ 22 1 ( y l k y l ( k 1 ) λ ^ k ) α ^ k k 1 e y l k i = 1 k λ ^ i γ ^ γ ^ α ^ k k * Γ ( α ^ 11 ) Γ ( α ^ 22 ) Γ ( α ^ k k ) ) ,
and
L ( Ω ^ ) = l = 1 n ( ( y l 1 λ ^ 1 ) α ^ 1 1 ( y l 2 y l 1 λ ^ 2 ) α ^ 2 1 ( y l k y l ( k 1 ) λ ^ k ) α ^ k 1 e y l k i = 1 k λ ^ i γ ^ γ ^ α ^ k * Γ ( α ^ 1 ) Γ ( α ^ 2 ) Γ ( α ^ k ) ) ,
with α ^ 11 = e β ^ 01 λ ^ 1 γ ^ , α ^ 22 = e β ^ 02 e β ^ 01 λ ^ 2 γ ^ , , α ^ k k = e β ^ 0 k e β ^ 0 ( k 1 ) λ ^ k γ ^ ,
  • α ^ k k * = α ^ 11 + α ^ 22 + + α ^ k k = e β ^ 0 k λ ^ 1 λ ^ 2 λ ^ k γ ^ ,
  • α ^ 1 = e x l T β ^ 1 λ ^ 1 γ ^ , α ^ 2 = e x l T β ^ 2 e x l T β ^ 1 λ ^ 2 γ ^ , , α ^ k = e x l T β ^ k e x l T β ^ k 1 λ ^ k γ ^ , and α ^ k * = α ^ 1 + α ^ 2 + + α ^ k = e x l T β ^ k λ ^ 1 λ ^ 2 λ ^ k γ ^ .
Based on Equation (17), L ( ω ^ ) L ( Ω ^ ) is difficult to simplify. To simplify the calculation, the test statistics in Equation (16) are expressed in a form equivalent to:
( Λ ) 2 = ( L ( ω ^ ) L ( Ω ^ ) ) 2 = ( L ( Ω ^ ) L ( ω ^ ) ) 2 .
The application of natural logarithms in Equation (18) obtains the following test statistics.
G 2 = 2 log Λ = 2 log ( L ( ω ^ ) L ( Ω ^ ) ) = 2 log ( L ( Ω ^ ) L ( ω ^ ) ) = 2 log L ( Ω ^ ) 2 log L ( ω ^ ) ,
with log L ( Ω ^ M G R ) = l = 1 n log ( f ( y l 1 , y l 2 , , y l k | Ω ^ M G R ) )
log L ( Ω ^ M G R ) = l = 1 n e x l T β ^ 1 λ ^ 1 γ ^ γ ^ log ( y l 1 λ ^ 1 ) + l = 1 n e x l T β ^ 2 e x l T β ^ 1 λ ^ 2 γ ^ γ ^ log ( y l 2 y l 1 λ ^ 2 ) + +     l = 1 n e x l T β ^ k e x l T β ^ k 1 λ ^ k γ ^ γ ^ log ( y l k y l ( k 1 ) λ ^ k ) l = 1 n y l k λ ^ 1 λ ^ 2 λ ^ k γ ^ l = 1 n e x l T β ^ k λ ^ 1 λ ^ 2 λ ^ k γ ^ log γ ^ l = 1 n log Γ ( e x l T β ^ 1 λ ^ 1 γ ^ ) l = 1 n log Γ ( e x l T β ^ 2 e x l T β ^ 1 λ ^ 2 γ ^ )     l = 1 n log Γ ( e x l T β ^ k e x l T β ^ k 1 λ ^ k γ ^ ) ,
log L ( ω ^ M G R ) = l = 1 n log ( f ( y l 1 , y l 2 , , y l k | ω ^ M G R ) )
log L ( ω ^ M G R ) = l = 1 n e β ^ 01 λ ^ 1 γ ^ γ ^ log ( y l 1 λ ^ 1 ) + l = 1 n e β ^ 02 e β ^ 01 λ ^ 2 γ ^ γ ^ log ( y l 2 y l 1 λ ^ 2 ) + +   l = 1 n e β ^ 0 k e β ^ 0 ( k 1 ) λ ^ k γ ^ γ ^ log ( y l k y l ( k 1 ) λ ^ k ) l = 1 n y l k λ ^ 1 λ ^ 2 λ ^ k γ ^   e β ^ 0 k λ ^ 1 λ ^ 2 λ ^ k γ ^ log γ ^ log Γ ( e β ^ 01 λ ^ 1 γ ^ ) log Γ ( e β ^ 02 e β ^ 01 λ ^ 2 γ ^ )   log Γ ( e β ^ 0 k e β ^ 0 ( k 1 ) λ ^ k γ ^ ) .
Proposition 2.
Based on Proposition 1, the distribution of test statistics G 2 is Chi-square with sk degrees of freedom, which can be written as follows.
G 2 = 2 log L ( Ω ^ ) 2 log L ( ω ^ ) d χ s k 2 , n .
A Corollary of Proposition 2:
If θ ^ Ω is an estimator that maximizes the likelihood and the log-likelihood functions under the population, θ ^ ω is an estimator that maximizes the likelihood and the log-likelihood functions under the null hypothesis, based on Equation (19), so:
G 2 = 2 log L ( θ ^ Ω ) 2 log L ( θ ^ ω )   = 2 ( log L ( θ ^ Ω ) log L ( θ ω ) ) 2 ( log L ( θ ^ ω ) log L ( θ ω ) ) .
Log L ( θ ω ) function can be approached by Taylor’s second-degree expansion around θ ^ Ω as follows.
log L ( θ ω ) log L ( θ ^ Ω ) + g ( θ ^ Ω ) ( θ ω θ ^ Ω ) 1 2 ( θ ω θ ^ Ω ) T [ I ( θ ^ Ω ) ] ( θ ω θ ^ Ω ) ,
with g ( θ ^ Ω ) = log L ( θ Ω ) θ Ω | θ Ω = θ ^ Ω = 0 and I ( θ ^ Ω ) = 2 log L ( θ Ω ) θ Ω ( θ Ω ) T | θ Ω = θ ^ Ω .
Because g ( θ ^ Ω ) = 0 , then Equation (21) becomes:
log L ( θ ω ) log L ( θ ^ Ω ) 1 2 ( θ ω θ ^ Ω ) T [ I ( θ ^ Ω ) ] ( θ ω θ ^ Ω ) 2 ( log L ( θ ^ Ω ) log L ( θ ω ) ) ( θ ^ Ω θ ω ) T [ I ( θ ^ Ω ) ] ( θ ^ Ω θ ω ) .
Log L ( θ ω ) function can be approached by Taylor’s second-degree expansion around θ ^ ω as follows.
log L ( θ ω ) log L ( θ ^ ω ) + g ( θ ^ Ω ) ( θ ω θ ^ ω ) 1 2 ( θ ω θ ^ ω ) T [ I ( θ ^ Ω ) ] ( θ ω θ ^ ω ) .
Because g ( θ ^ Ω ) = 0 , then Equation (23) becomes:
log L ( θ ω ) log L ( θ ^ ω ) 1 2 ( θ ω θ ^ ω ) T [ I ( θ ^ Ω ) ] ( θ ω θ ^ ω ) 2 ( log L ( θ ^ ω ) log L ( θ ω ) ) ( θ ^ ω θ ω ) T [ I ( θ ^ Ω ) ] ( θ ^ ω θ ω ) .
Based on Equations (22) and (24), the test statistics on Equation (20) can be stated as follows.
G 2 = 2 ( log L ( θ ^ Ω ) log L ( θ ω ) ) 2 ( log L ( θ ^ ω ) log L ( θ ω ) )
G 2 ( θ ^ Ω θ ω ) T [ I ( θ ^ Ω ) ] ( θ ^ Ω θ ω ) ( θ ^ ω θ ω ) T [ I ( θ ^ Ω ) ] ( θ ^ ω θ ω ) .
Equation (25) can be simplified by outlining the quadratic form of ( θ ^ Ω θ ω ) T [ I ( θ ^ Ω ) ] ( θ ^ Ω θ ω ) , so we obtained:
2 ( log L ( θ ^ Ω ) log L ( θ ^ ω ) ) β ^ * T ( [ I 11 ] [ I 12 ] [ I 22 ] 1 [ I 21 ] ) β ^ * β ^ * T [ I 11 ] 1 β ^ * .
From Equation (26), this can be obtained:
β ^ * d N ( 0 , [ I 11 ] ( s k × s k ) ) , n ,
[ I 11 ] 1 2 β ^ * d N ( 0 , I s k ) .
Based on Equation (28), the quadratic form given by Equation (26) distributed Chi-square with sk degrees of freedom is:
2 ( log L ( θ ^ Ω ) log L ( θ ^ ω ) ) [ [ I 11 ] 1 2 β ^ * ] T [ [ I 11 ] 1 2 β ^ * ]   = z T z d χ s k 2 , n ,
with z = [ I 11 ] 1 2 β ^ * d N ( 0 , I s k ) , n .
sk is a vector dimension β * or the difference between the number of parameter sets under the population with the number of parameter sets under the null hypothesis, symbolized by n ( Ω ) n ( ω ) .
Proposition 3.
The critical area for testing the hypothesis of the MGR model regression parameters simultaneously with regard to Equation (16) is:
α = P ( Λ < Λ 0 )   = P ( 2 log Λ > 2 log Λ 0 )   = P ( G 2 > c 1 ) ,   with   c 1 = 2 log Λ 0   = P ( G 2 > χ α , s k 2 )   = P ( G 2 > χ α , n ( Ω ) n ( ω ) 2 ) .
Based on Proposition 2 and Proposition 3, the decision to reject the null hypothesis is made if G 2 > χ α ; d f 2 , with d f = n ( Ω ) n ( ω ) , n ( Ω ) is the number of parameters under the population, and n ( ω ) is the number of parameters under the null hypothesis.
The null hypothesis for the partial test is H 0 : β q i = 0 , whereas the alternative is H 1 : β q i 0 , with q = 1 , 2 , , s , i = 1 , 2 , k . According to Pawitan [18], the test statistic is stated in Equation (31).
Z = β ^ q i S E ( β ^ q i ) ,
with S E ( β ^ q i ) = var ^ ( β ^ q i ) . The var ^ ( β ^ q i ) is diagonal elements that correspond to the H 1 ( θ ^ ) matrix. The null hypothesis is rejected if | Z | > Z α / 2 .

3. Data and Method

The parameter estimation and hypothesis testing on MGR were done based on the following steps. The MGR model was specified based on the pdf in Equation (5) for n observations, l = 1, 2, …, n, to construct the likelihood and the log-likelihood functions. The first derivative of the log-likelihood function for each parameter was computed, then equalized to zero. If the solutions were closed-form, then the parameter estimators were obtained. Otherwise, numerical optimization was needed. As shown in the previous section, the solution for parameter optimization was not closed-form, such that the BHHH algorithm was employed in this work.
The overall test for MGR’s significance was done using the maximum likelihood ratio test (MLRT). The test statistic was formulated in Equation (19). Meanwhile, the partial test for individual parameter significance in MGR was done using the Wald test [18]. Its test statistics are provided in Equation (31). The proposed MGR model, along with its parameter estimation and hypothesis testing, was applied on real data as an application of this study.
This study used secondary data obtained from Statistics Indonesia. The data used were three response variables, i.e., the life expectancy index, education index, and expenditure index, with six predictor variables: percentage of households that have a private toilet, net enrollment rate of schooling, population density, percentage of poor people, and unemployment rate. The data were observed for 119 regencies/municipalities in Java, Indonesia, in the year 2018.

4. Application on Human Development Dimensions Data

First, testing the gamma distribution was done using the Kolmogorov–Smirnov (KS) test. The null hypothesis is the data that follows the gamma distribution against the alternative hypothesis that data does not follow the gamma distribution. The test statistic value of the KS test for each response variable is presented in Table 1. In this paper, the goodness of fit is done univariately as the test for multivariate gamma distribution is not available yet. The test for that is another extensive work that is not covered in this paper. Once each response follows gamma distribution, we assume the multiresponses data follow a multivariate gamma distribution. This assumption is the limitation of this work, such that the proposed model can be applied to real data without delay.
Each response variable has D n < D ( 0.05 ) and p-value > α. The test concludes not to reject the null hypothesis, meaning that the data of life expectancy index (Y1), the education index (Y2), and the expenditure index (Y3) follow the gamma distribution. Therefore, as our research limitation, as mentioned previously, the three response variables are assumed to follow MG distribution.
To support our assumption, we calculated the correlation between the pair of the response variables to show there are dependencies among responses. The correlation coefficients for each pair are as follows: (i) Y1 and Y2 is 0.398 with p-value close to zero, (ii) Y1 and Y3 is 0.324 with p-value close to zero, (iii) Y2 and Y3 is 0.818 with p-value close to zero. The correlation coefficient between education index (Y2) and expenditure index (Y3) is stronger than the other pairs. To find out whether there is dependency among the response variables, one can use Bartlett’s test of sphericity so that the data are feasible for multivariate analysis. This test has statistic value χ 2 = 148.735 and p-value = 2.22 × 10−16. The χ 2 > χ 2 3 ; 0 , 05 (or 7.815) and p-value < α, and alpha is 0.05. The decision is to reject the null hypothesis (Pearson correlation matrix not equal to an identity matrix), which means the correlation between the response variables is significant in the multivariate sense. Therefore, the data analysis needs to be done in a multivariate way using the MGR model.
We also tested the multicollinearity among the predictor variables. The variance inflation factor (VIF) value for each predictor variable is 1.358 (for X1), 1.350 (X2), 1.560 (X3), 1.849 (X4), and 1.211 (X5). The VIF value for each of the predictor variables is less than ten which shows there is no multicollinearity among the predictor variables.
In Table 2, the mean values for response variables Y1, Y2, and Y3 are 0.806, 0.632, and 0.735. Although Y1, Y2, and Y3 have mean values that do not differ greatly, they are not necessarily of the same quality; it depends on the size of the spread of the data. One measure of data distribution that can be used is the coefficient of variation (CoV). The CoV for Y1, Y2, and Y3 are 5.200, 12.210, and 9.140, respectively. The CoV for education index (Y2) is the highest among others, which means that the variable is more heterogeneous. The CoV for predictor variables X1, X2, X3, X4, and X5 are, respectively, 12.100, 16.750, 136.250, 43.750, and 45.530. The CoV for population density (X3) is the highest among other predictor variables as its range is also the biggest one.
The dependency between response and predictor variables can be shown visually by the matrix plot, as exhibited in Figure 1. The correlation between X3 and X4 (−0.585) is stronger than the correlation between X4 and the other predictor variables, even stronger than other pairs. The correlation between X1 and X5 (−0.017) is weakest compared to the correlation of other couples. There are indications that the relationship is non-linear between X3 with the response variable and the other predictor variables. For the correlation between response and predictor variables, log(Y1) has the strongest correlation with X1 (0.434) compared to other predictors. The log(Y2) and X3 have the strongest correlation (0.705) compared with other predictors, while the correlation of log(Y3) and X3 is the strongest one (0.744). This value shows that log expenditure index and population density has the strongest relationship among other pairs.
To find out which predictor variables significantly predicted response variables, we employed the MGR model. Table 3 presents the ML estimates of the MGR model with a single predictor and their corresponding standard errors, z score, and p-value. Every single predictor does not affect any response variables. Only the intercepts when the MGR model employs X3 as a single predictor are significant.
The MGR model with a single predictor (for example, the X5) for the life expectancy index, education index, and expenditure index is obtained as follows.
μ ^ l 1 = exp ( 0.168892 0.006666 X l 5 ) , μ ^ l 2 = exp ( 0.460389 + 0.006020 X l 5 ) , μ ^ l 3 = exp ( 0.279763 + 0.003974 X l 5 ) .
As summarized in Table 3, it is shown that all predictor variables are not significant. For comparison, we also did MGR modeling with multiple predictors. Table 4 presents the ML estimates of the MGR model with multiple predictors along with their corresponding standard errors, z score, and p-value.
The estimate of the scale parameter is 0.649423, with its standard error 0.000028. The estimate of λ 1 , the location parameter for Y1, is 0.670845 (standard error 0.006884); meanwhile, the estimate for λ 2 is −0.309362, with standard error 0.006507, and for λ 3 is 0.000468 (standard error 0.006530). The significant parameters are the scale parameter γ , the location parameter for Y1 and Y2, respectively, and λ 1 and λ 2 , as their p-values are less than α = 10 % . The estimate of each parameter corresponding to each predictor is summarized in Table 4. Therefore, the MGR model for the life expectancy index, education index, and expenditure index is obtained as follows.
μ ^ l 1 = exp ( 0.353421 + 0.002005 X l 1 + 0.000653 X l 2 + 0.000004 X l 3 0.000469 X l 4 0.009502 X l 5 ) μ ^ l 2 = exp ( 0.606408 + 0.000207 X l 1 + 0.004706 X l 2 + 0.000012 X l 3 0.011729 l 4 0.011194 X l 5 ) μ ^ l 3 = exp ( 0.274026 + 0.000779 X l 1 + 0.000298 X l 2 + 0.000013 X l 3 0.006542 X l 4 0.007994 X l 5 )
The Akaike information criterion (AIC) value is −63.903, and the corrected Akaike information criterion (AICc) value is −53.361. To know the average squared difference between the estimated and the actual values, one can use the mean square error (MSE). The MSEs for the life expectancy index, education index, and expenditure index are 0.001, 0.002, and 0.003, respectively. As the MSE is an unbiased estimator of variance, the MSE value is expected to be not much different from the variance of each response variable, i.e., 0.002 (expectancy index), 0.006 (education index), and 0.004 (expenditure index).
We can perform the simultaneous test for the model’s significance using Wilk’s likelihood ratio statistics derived based on the MLRT. The test statistic value is 46.682, and the value of the Chi-square table with 15 degrees of freedom and α = 10 % is 22.307. The test statistical value is larger than the value of the Chi-square table; therefore, the decision is to reject the null hypothesis. It means that the five predictor variables have a significant effect on the response variables simultaneously. To find out the predictor variables that partially affect the response variable, one can use test statistics in Equation (31). From Table 4, it can be seen that the significant predictor variable that influences the life expectancy index is the unemployment rate (X5); meanwhile the education index and expenditure index are significantly affected by the percentage of poor people (X4) and unemployment rate (X5).
Based on the results of MGR modeling with a single predictor (Table 3) and multiple predictors (Table 4), it can be determined the differences in the coefficient signs only happen for X5 in response to Y2 and Y3, as shown in Table 5. We can also find the supports of this evidence from the matrix plot in Figure 1, that individually, X5 has a negative relationship with Y1, while it has positive dependencies with Y2 and Y3. On the other side, the X4 has a stronger negative individual relationship with all responses. Therefore, when X4 and X5 are used as predictors together in the MGR model, the sign of X5 changes as there is a significant correlation (−0.391 with p-value < 0.05) between X4 and X5, where X4 affects the response of Y2 and Y3 is stronger than X5.
.
Recall the VIF value for X5 is 1.211, which is small. This value means that there is a weak relationship between X5 and (X1, X2, X3, and X4). However, there is a significant correlation between X4 alone and X5. In the MGR with multiple predictors, the positive sign for X5 will not change if X4 also has a positive sign for responses X2 and X3. Unfortunately, that is not the case. The correlation between X4 and X2 has a different sign compared with the correlation between X5 and X2. The sign of X4 and X5 in MGR with multiple predictors can change depending on its correlation with the response variable. The same explanation pertains to response X3.
Life expectancy index (Y1) has a negative association with the percentage of poor people (X4), even though it is not significant for regency/municipality in Java. This finding means that an increase in life expectancy index is not affected by the percentage of poor people in Java. The education index (Y2) and expenditure index (Y3) have a significant negative dependency on the percentage of poor people in Java.
The predictions resulting from the MGR model are expected to be close to the actual values. The closer those two values, the narrower the spread, as displayed in Figure 2. It can be seen that fitting values for Y2 and Y3 are better than those of Y1. This result is also supported by significant predictors, as reported in Table 4. The life expectancy index has one significant predictor, while the other two responses have two significant predictors that increase their coefficients of determination.

5. Conclusions

The proposed MGR model has been developed along with its parameter estimation and hypothesis testing. The solution of parameter estimation using MLE is not closed-form such that it is optimized numerically using the BHHH algorithm. The MLRT and Wald tests are employed for testing the model’s significance and the individual parameter, respectively. The proposed MGR model is applied to model the three dimensions of the human development index (HDI) with five predictor variables. The empirical results show that modeling using multiple predictors makes more sense compared to the model when it only employs a single predictor. When multiple predictors are used in the MGR model, there is a possibility that the sign of a particular parameter changes compared to when it is employed alone. This is a common problem that arises in modeling caused by collinearity among predictors. This issue can be overcome in future work.

Author Contributions

Conceptualization, A.R., P., S., and D.D.P.; methodology, P. and S.; software, A.R. and D.D.P.; validation, P. and D.D.P.; formal analysis, A.R. and D.D.P.; investigation, A.R.; data curation, A.R.; writing—original draft preparation, A.R.; writing—review and editing, D.D.P.; visualization, S.; supervision, P. and S.; project administration, P. and S.; All authors have read and agreed to the published version of the manuscript.

Funding

The first author thanks the Kemendikbud, the Republic of Indonesia, which has given the BPPDN scholarship, and Bina Nusantara University. All authors thank LPPM (Research center) of the Institut Teknologi Sepuluh Nopember that funded this study via the Postgraduate Research Scheme in 2019 with grant number: 1153/PKS/ITS/2019.

Acknowledgments

The authors thank the editor and the reviewers for their constructive and helpful comments.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

AIC
Akaike information criterion
AICc
Corrected Akaike information criterion
BHHH
Berndt–Hall–Hall–Hausman
CoV
Coefficient of variation
GLR
Generalized likelihood ratio
HDI
Human development index
KS
Kolmogorov–Smirnov
MG
Multivariate gamma
MGR
Multivariate gamma regression
ML
Maximum likelihood
MLE
Maximum likelihood estimation
MLRT
Maximum likelihood ratio test
MSE
Mean square error
Pdf
Probability density function
TGR
Trivariate gamma regression
VIF
Variance inflation factor

Appendix A

The first derivatives of the log-likelihood function for each parameter under the null hypothesis.
log L ( γ , λ 1 , λ 2 , , λ k , β 01 , β 02 , , β 0 k ) γ = l = 1 n ( λ 1 e β 01 γ 2 log ( y l 1 λ 1 ) ) +
l = 1 n ( e β 01 e β 02 + λ 2 γ 2 log ( y l 2 y l 1 λ 2 ) ) + + l = 1 n ( e β 0 ( k 1 ) e β 0 k + λ k γ 2 log ( y l k y l ( k 1 ) λ k ) ) +
l = 1 n y l k γ 2 n λ 1 γ 2 n λ 2 γ 2 n λ 3 γ 2 ( n ( log γ ) λ 1 γ 2 n λ 1 γ 2 + n ( log γ ) λ 2 γ 2 n λ 2 γ 2 + n ( log γ ) λ 3 γ 2 n λ 3 γ 2 + ( ( log γ ) e β 0 k γ 2 + e β 0 k γ 2 ) ) ( 1 γ 2 ( Ψ ( e β 01 λ 1 γ ) ) ( e β 01 λ 1 ) ) ( 1 γ 2 ( Ψ ( e β 02 e β 01 λ 2 γ ) ) ( e β 02 e β 01 λ 2 ) )
( 1 γ 2 ( Ψ ( e β 0 k e β 0 ( k 1 ) λ k γ ) ) ( e β 0 k e β 0 ( k 1 ) λ k ) ) .
log L ( γ , λ 1 , λ 2 , , λ k , β 01 , β 02 , , β 0 k ) λ 1 = l = 1 n ( log ( y l 1 λ 1 ) γ e β 01 λ 1 γ γ ( y l 1 λ 1 ) ) + n γ + n ( log γ ) γ ( 1 γ Ψ ( e β 01 λ 1 γ ) ) .
log L ( γ , λ 1 , λ 2 , , λ k , β 01 , β 02 , , β 0 k ) λ 2 = l = 1 n ( log ( y l 2 y l 1 λ 2 ) γ e β 02 e β 01 λ 2 γ γ ( y l 2 y l 1 λ 2 ) ) + n γ + n ( log γ ) γ ( 1 γ Ψ ( e β 02 e β 01 λ 2 γ ) ) .
log L ( γ , λ 1 , λ 2 , , λ k , β 01 , β 02 , , β 0 k ) λ k = l = 1 n ( log ( y l k y l ( k 1 ) λ k ) γ e β 0 k e β 0 ( k 1 ) λ k γ γ ( y l k y l ( k 1 ) λ k ) ) + n γ +
n ( log γ ) γ ( 1 γ Ψ ( e β 0 k e β 0 ( k 1 ) λ k γ ) ) .
log L ( γ , λ 1 , λ 2 , , λ k , β 01 , β 02 , , β 0 k ) β 01 = l = 1 n ( ( log ( y l 1 λ 1 ) ) e β 01 γ ) l = 1 n ( ( log ( y l 2 y l 1 λ 2 ) ) e β 01 γ )
( Ψ ( e β 01 λ 1 γ ) ) e β 01 γ ( ( Ψ ( e β 02 e β 01 λ 2 γ ) )   e β 01 γ ) .
log L ( γ , λ 1 , λ 2 , , λ k , β 01 , β 02 , , β 0 k ) β 02 = l = 1 n ( ( log ( y l 2 y l 1 λ 2 ) ) e β 02 γ )
l = 1 n ( ( log ( y l 3 y l 2 λ 3 ) ) e β 02 γ ) ( Ψ ( e β 02 e β 01 λ 2 γ ) ) e β 02 γ ( ( Ψ ( e β 03 e β 02 λ 3 γ ) ) e β 02 γ ) .
log L ( γ , λ 1 , λ 2 , , λ k , β 01 , β 02 , , β 0 k ) β 0 k = l = 1 n ( ( log ( y l k y l ( k 1 ) λ k ) ) e β 0 k γ ) ( log γ ) e β 0 k γ
( Ψ ( e β 0 k e β 0 ( k 1 ) λ k γ ) ) e β 0 k γ .

References

  1. Tripathi, R.C.; Gupta, C.R.; Pair, K.P. Statistical test involving several independent gamma distribution. J. Ann. Inst. Stat. Math 1993, 773–786. [Google Scholar] [CrossRef]
  2. Nagar, D.K.; Correa, A.R.; Gupta, A.K. Extended matrix variate gamma and beta functions. J. Multivar. Anal. 2013, 122, 53–69. [Google Scholar] [CrossRef]
  3. Bhattacharya, B. Tests of parameters of several gamma distributions with inequality restrictions. J. Ann. Inst. Stat. Math 2002, 54, 565–576. [Google Scholar] [CrossRef]
  4. Chen, W.W.S.; Kotz, S. The riemannian structure of the three parameter gamma distribution. J. Appl. Math. 2013, 4, 514–522. [Google Scholar] [CrossRef] [Green Version]
  5. Schickedanz, P.T.; Krause, G.F.A. Test for the scale parameters of two gamma distributions using the generalized likelihood ratio. J. Appl. Meteorol. 1970, 9, 13–16. [Google Scholar] [CrossRef] [Green Version]
  6. Nadarajah, S. Reliability for some bivariate gamma distributions. Math. Probl. Eng. 2005, 2, 151–163. [Google Scholar] [CrossRef] [Green Version]
  7. Nadarajah, S.; Gupta, A.K. Some bivariate gamma distributions. Appl. Math. Lett. 2006, 19, 767–774. [Google Scholar] [CrossRef] [Green Version]
  8. Mathai, A.M.; Moschopoulos, P.G. A Form of multivariate gamma distribution. J. Ann. Inst. Stat. Math 1992, 44, 97–106. [Google Scholar] [CrossRef]
  9. Bates, D.M.; Watts, D.G. Nonlinear Regression Analysis and Its Applications, 2nd ed.; John Wiley & Sons, Inc.: New York, NY, USA, 1988; ISBN: 9780470316757 (online), ISBN: 9780471816430 (print). [Google Scholar] [CrossRef] [Green Version]
  10. Pan, J.; Mahmoudi, M.R.; Baleanu, D.; Maleki, M. On comparing and classifying several independent linear and non-linear regression models with symmetric errors. Symmetry 2019, 11, 820. [Google Scholar] [CrossRef] [Green Version]
  11. Rahayu, A.; Purhadi; Sutikno; Prastyo, D.D. Trivariate gamma regression. IOP Conf. Ser. Mater. Sci. Eng. 2019, 546, 052062. [Google Scholar] [CrossRef]
  12. Mathai, A.M.; Moschopoulos, P.G. On a multivariate gamma. J. Multivar. Anal. 1991, 39, 135–153. [Google Scholar] [CrossRef] [Green Version]
  13. Vaidyanathan, V.S.; Lakshmi, R.V. Parameter estimation in multivariate gamma distribution. Stat. Optim. Inf. Comput. 2015, 3. [Google Scholar] [CrossRef]
  14. Balakrishnan, N.; Wang, J. Simple efficient estimation for the three-parameter gamma distribution. J. Stat. Plan. Inference 2000, 85, 115–126. [Google Scholar] [CrossRef]
  15. Ewemoje, T.A.; Ewemooje, O.S. Best distribution and plotting positions of daily maximum flood estimation at ona river in Ogun-Oshun river Basin, Nigeria. Agric. Eng. Int. 2011, 13, 1–13, EID: 2-s2.0-84877825735. [Google Scholar]
  16. Bono, R.; Arnau, J.; Alarcon, R.; Blanca, M.J. Bias, precision, and accuracy of skewness and kurtosis estimators for frequently used continuous distributions. Symmetry 2020, 12, 19. [Google Scholar] [CrossRef] [Green Version]
  17. Usman, M.; Zubair, M.; Shiblee, M.; Rodrigues, P.; Jaffar, S. Probabilistic modeling of speech in spectral domain using maximum likelihood estimation. Symmetry 2018, 10, 750. [Google Scholar] [CrossRef] [Green Version]
  18. Pawitan, Y. All Likelihood: Statistical Modelling and Inference Using Likelihood, 1st ed.; Clarendon Press: Oxford, UK, 2001; pp. 41–42. ISBN 9780199671229. [Google Scholar]
Figure 1. The matrix plot of the response and predictor variables.
Figure 1. The matrix plot of the response and predictor variables.
Symmetry 12 00813 g001
Figure 2. The actual values and the estimated values.
Figure 2. The actual values and the estimated values.
Symmetry 12 00813 g002
Table 1. Gamma distribution test with Kolmogorov–Smirnov (KS) for α = 0.05 .
Table 1. Gamma distribution test with Kolmogorov–Smirnov (KS) for α = 0.05 .
Response D n D ( 0 , 05 ) p-Value
Y10.1180.1240.066
Y20.1070.1240.123
Y30.0650.1240.667
Table 2. Description of data.
Table 2. Description of data.
VariablesMeanSDCoefficient of VariationMinMax
Life expectancy index (Y1)0.8060.0425.2000.6800.890
Education index (Y2)0.6320.07712.2100.4700.850
Expenditure index (Y3)0.7350.0679.1400.6200.960
Percentage of households that have a private toilet (X1)80.2159.71012.10037.82098.010
Net enrollment rate of schooling (X2)63.57910.65016.75034.22089.460
Population density (X3)32984493136.25027819757
Percentage of poor people (X4)9.6234.21143.7501.68021.210
Unemployment rate (X5)5.3372.43045.5301.43012.770
Table 3. Parameter estimation of multivariate gamma regression (MGR) model with a single predictor.
Table 3. Parameter estimation of multivariate gamma regression (MGR) model with a single predictor.
ParameterEstimateStandard Errorzp-Value
( Y 1 , Y 2 , Y 3 ) ~ X 1
β 01 −0.4222840.646166−0.6540.513
β 11 0.0027580.0079700.3460.729
β 02 −0.7620381.120002−0.6800.496
β 12 0.0042970.0166110.2590.796
β 03 −0.4377650.760203−0.5760.565
β 13 0.0023390.0112500.2080.835
( Y 1 , Y 2 , Y 3 ) ~ X 2
β 01 −0.3051020.518863−0.5880.557
β 21 0.0018960.0080760.2350.814
β 02 −0.8811840.575235−1.5320.126
β 22 0.0073630.0085630.8600.390
β 03 −0.4300930.857503−0.5020.616
β 23 0.0027890.0146850.1900.849
( Y 1 , Y 2 , Y 3 ) ~ X 3
β 01 −0.2051480.000421−486.7160.000 *
β 31 0.0000020.0000540.0290.977
β 02 −0.4838980.000262−1846.3530.000 *
β 32 0.0000200.0000210.9300.352
β 03 −0.3068200.000209−1465.6980.000 *
β 33 0.0000160.0000141.1320.258
( Y 1 , Y 2 , Y 3 ) ~ X 4
β 01 −0.1662900.581543−0.2860.775
β 41 −0.0023070.019731−0.1170.907
β 02 −0.2380491.200557−0.1980.843
β 42 −0.0194480.046895−0.4150.678
β 03 −0.1360181.044931−0.1300.896
β 43 −0.0124800.035650−0.3500.726
( Y 1 , Y 2 , Y 3 ) ~ X 5
β 01 −0.1688920.555574−0.3040.761
β 51 −0.0066660.057555−0.1160.908
β 02 −0.4603890.862713−0.5340.594
β 52 0.0060200.0923710.0650.948
β 03 −0.2797630.831607−0.3360.737
β 53 0.0039740.0400460.0990.921
* Significant at α = 10 % .
Table 4. Parameter estimation of MGR model with multiple predictors.
Table 4. Parameter estimation of MGR model with multiple predictors.
ParameterEstimateStandard Errorzp-Value
Life expectancy index (Y1)
β 01 −0.3534210.000119−2969.3830.000 **
β 11 0.0020050.0029870.6710.502
β 21 0.0006530.0034940.1870.852
β 31 0.0000040.0000290.1240.902
β 41 −0.0004690.002831−0.1660.868
β 51 −0.0095020.005252−1.8090.070 **
Education index (Y2)
β 02 −0.6064080.000120−5069.1560.000 **
β 12 0.0002070.0038000.0550.956
β 22 0.0047060.0049520.9500.342
β 32 0.0000120.0000180.6770.498
β 42 −0.0117290.004125−2.8430.004 **
β 52 −0.0111940.006636−1.6870.092 **
Expenditure index (Y3)
β 03 −0.2740260.000101−2723.5330.000 **
β 13 0.0007790.0023370.3330.739
β 23 0.0002980.0026210.1140.910
β 33 0.0000130.0000140.9430.346
β 43 −0.0065420.003125−2.0930.036 **
β 53 −0.0079940.003900−2.0500.040 **
** Significant at α = 10 % .
Table 5. The difference in coefficient signs and significance of the parameters in the MGR model.
Table 5. The difference in coefficient signs and significance of the parameters in the MGR model.
Response VariablesPredictor VariablesMGR Modeling
Multiple PredictorsSingle Predictor
X1X2X3X4X5
Y1X1++
X2+ +
X3+ +
X4
X5− ***
Y2X1++
X2+ +
X3+ +
X4− ***
X5− *** +
Y3X1++
X2+ +
X3+ +
X4− ***
X5− *** +
*** Significant at α = 10 % .

Share and Cite

MDPI and ACS Style

Rahayu, A.; Purhadi; Sutikno; Prastyo, D.D. Multivariate Gamma Regression: Parameter Estimation, Hypothesis Testing, and Its Application. Symmetry 2020, 12, 813. https://0-doi-org.brum.beds.ac.uk/10.3390/sym12050813

AMA Style

Rahayu A, Purhadi, Sutikno, Prastyo DD. Multivariate Gamma Regression: Parameter Estimation, Hypothesis Testing, and Its Application. Symmetry. 2020; 12(5):813. https://0-doi-org.brum.beds.ac.uk/10.3390/sym12050813

Chicago/Turabian Style

Rahayu, Anita, Purhadi, Sutikno, and Dedy Dwi Prastyo. 2020. "Multivariate Gamma Regression: Parameter Estimation, Hypothesis Testing, and Its Application" Symmetry 12, no. 5: 813. https://0-doi-org.brum.beds.ac.uk/10.3390/sym12050813

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop