Next Article in Journal
What Exactly is the Nusselt Number in Convective Heat Transfer Problems and are There Alternatives?
Previous Article in Journal
Insights into Entropy as a Measure of Multivariate Variability
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Two-Stage Maximum Entropy Prior of Location Parameter with a Stochastic Multivariate Interval Constraint and Its Properties

Department of Statistics, Dongguk University-Seoul, Seoul 100-715, Korea
Submission received: 25 January 2016 / Revised: 4 May 2016 / Accepted: 10 May 2016 / Published: 20 May 2016
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
This paper proposes a two-stage maximum entropy prior to elicit uncertainty regarding a multivariate interval constraint of the location parameter of a scale mixture of normal model. Using Shannon’s entropy, this study demonstrates how the prior, obtained by using two stages of a prior hierarchy, appropriately accounts for the information regarding the stochastic constraint and suggests an objective measure of the degree of belief in the stochastic constraint. The study also verifies that the proposed prior plays the role of bridging the gap between the canonical maximum entropy prior of the parameter with no interval constraint and that with a certain multivariate interval constraint. It is shown that the two-stage maximum entropy prior belongs to the family of rectangle screened normal distributions that is conjugate for samples from a normal distribution. Some properties of the prior density, useful for developing a Bayesian inference of the parameter with the stochastic constraint, are provided. We also propose a hierarchical constrained scale mixture of normal model (HCSMN), which uses the prior density to estimate the constrained location parameter of a scale mixture of normal model and demonstrates the scope of its applicability.

1. Introduction

Suppose y i ’s are independent observations from a scale mixture of a p-variate normal distribution with the p × 1 location parameter θ and known scale matrix. Then, a simple location model for the p-variate observations with y i R p is:
y i = θ + ϵ i , i = 1 , , n ,
where the distribution of the p × 1 vector variable ϵ i is F F with:
F = F : N p 0 , κ ( η ) Λ , η G ( η ) with κ ( η ) > 0 , and η > 0 ,
where η is a mixing variable with the cdf G ( η ) and κ ( η ) is a suitably-chosen weight function.
Bayesian analysis of the model (1) begins with the specification of a prior distribution, which represents the information about the uncertain parameter θ that is combined with the joint probability distribution of y i ’s to yield the posterior distribution. When there are no constraints on the location parameter, then usual priors (e.g., Jeffreys invariant prior or an informative normal conjugate prior) can be used, and posterior inference can be performed without any difficulty. In some practical situations, however, we may have prior information that θ have a multivariate interval constraint, and thus, the value of θ needs to be located in a restricted space C R p , where C = ( a , b ) is a p-variate interval with a = ( a 1 , , a p ) and b = ( b 1 , , b p ) . For the remainder of this paper, we use θ C to denote the multivariate interval constraint:
θ ; a i θ i b i , i = 1 , , p , where θ = ( θ 1 , , θ p ) .
When we have sufficient evidence that the constraint condition on the model (1) is true, then a suitable restriction on the parameter space, such as using a truncated prior distribution, is expected. See, e.g., [1,2,3,4], for various applications of the truncated prior distribution in Bayesian inference. However, it is often the case that prior information about the constraint is not certain for Bayesian inference. Further, even the observations from the assumed model (1) often do not provide strong evidence that the constraint is true and, therefore, may appear to contradict the assumption of the model associated with the constraint. In this case, it is expected that the uncertainty about the constraint is taken into account in eliciting a prior distribution of θ . When the parameter constraint is not certain for Bayesian estimation in the univariate normal location model, the seminal work by [5] proposed the use of a two-stage hierarchical prior distribution by constructing a family of skew densities based on the positively-truncated normal prior distribution. Generalizing the framework of the prior hierarchy proposed by [5,6,7,8,9,10], among others, various priors were considered for the Bayesian estimation of normal and scale mixture of normal models with uncertain interval constraints. In particular, [7] obtained the prior of θ as the normal selection distribution (see, e.g., [11]) and, thus, exploited the class of weighted normal distribution by [12] for reflecting the uncertain prior belief on θ . On the other hand, there are situations to set up a prior density of θ on the basis of information regarding the moments of the density, such as the mean and covariance matrix. A useful method of dealing with this situation is through the concept of entropy by [13,14]. Other general references where moment inequality constraints have been considered include [15,16]. To the best of our knowledge, however, a formal method to set up a prior density of θ , consistent with information regarding the moments of the density, as well as the uncertain prior belief on the location parameter, has not previously been investigated in the literature. Thus, such practical considerations motivate us to develop a prior density of θ , which is tackled in this paper.
As discussed by [17,18,19,20], the entropy has a direct relationship to information theory and measures the amount of uncertainty inherent in the probability distribution. Using this property of the entropy, we propose a two-stage hierarchical method for setting up the two-stage maximum entropy prior density of θ . The method will enable us to elicit information regarding the moments of the prior distribution, as well as the degree of belief in the constraint θ C . Furthermore, this paper also suggests an objective method to measure the degree of belief regarding the multivariate interval constraint accounted for by using the prior. We also propose a simple way of controlling the degree of belief regarding the constraint of θ in Bayesian inference. This is done by investigating the relation between the degree of belief and the enrichment of the hyper-parameters of the prior density. In this respect, the study concerning the two-stage maximum entropy prior is interesting both from a theoretical and an applied point of view. On the theoretical side, it develops yet another conjugate prior of constrained θ based on the maximum entropy approach. The study provides several properties of the proposed prior, which advocate the idea of two stages of a prior hierarchy to elicit information regarding the moments of the prior and the stochastic constraint of θ . From the applied view point, the prior is especially useful for a Bayesian subjective methodology for inequality constrained multivariate linear models.
The remainder of this paper is arranged as follows. In Section 2, we propose the two-stage maximum entropy prior of θ by applying Boltzmann’s maximum entropy theorem (see, e.g., [21,22]) to the frame of the two-stage prior hierarchy by [5]. We also suggest an objective measure of uncertainty regarding the stochastic constraint of θ that is accounted for by the two-stage maximum entropy prior. In Section 3, we briefly discuss the properties of the proposed prior of θ , which will be useful for the Bayesian analysis of θ subject to uncertainty regarding the multivariate interval constraint θ C . Section 4 provides a hierarchical scale mixture of normal model of Equation (1) using the two-stage prior, referred to as the hierarchical constrained scale mixture of normal model (HCSMN). Section 4 explores the Bayesian estimation model (1) by deriving the posterior distributions of the unknown parameters under the HCSMN and discusses the properties of the proposed measure of uncertainty that can be explained in the context of the HCSMN. In Section 5, we compare the empirical performance of the proposed prior based on synthetic data and real data applications with the HCSMN models for the estimation of θ with a stochastic multivariate interval constraint. Finally, the concluding remarks along with a discussion are provided in Section 6.

2. Two-Stage Maximum Entropy Prior

2.1. Maximum Entropy Prior

Sometimes, we have a situation in which partial prior information is available, outside of which it is desirable to use a prior that is as non-informative as possible. Assume that we can specify the partial information concerning θ in Equation (1) with continuous space Θ . That is:
E [ t j ( θ ) ] = Θ t j ( θ ) π ( θ ) d θ = t j , j = 1 , , k .
The maximum entropy prior can be obtained by choosing π ( θ ) that maximizes the entropy:
ξ ( π ) = - Θ π ( θ ) log π ( θ ) d θ ,
in the presence of the partial information in the form of Equation (4). A straightforward application of the calculus of variation leads us to the following theorem.
Lemma 1. 
(Boltzmann’s maximum entropy theorem): The density π ( θ ) that maximizes ξ ( π ) , subject to the constraints E [ t j ( θ ) ] = t j , j = 1 , , k , takes the k-parameter exponential family form:
π m a x ( θ ) exp λ 1 t 1 ( θ ) + λ 2 t 2 ( θ ) + λ k t k ( θ ) , θ Θ ,
where λ 1 , λ 2 , , λ k can be determined, via the k-constraints, in terms of t 1 , …, t k .
Proof. 
See, [22] for the proof. ☐
When the partial information is about the mean and covariance matrix of θ , outside of which it is desired to use a prior that is as non-informative as possible, then the theorem yields the following result.
Corollary 1. 
As partial prior information, let the parameter θ = ( θ 1 , , θ p ) have a probability distribution on R p with mean vector θ 0 = ( θ 01 , , θ 0 p ) and covariance matrix Σ , then the maximum entropy prior of θ is:
π m a x ( θ ) = ( 2 π ) - p / 2 | Σ | - 1 / 2 exp - 1 2 ( θ - θ 0 ) Σ - 1 ( θ - θ 0 ) , θ R p ,
a density of the N p ( θ 0 , Σ ) distribution.
Proof. 
According to Lemma 1, the partial information gives t j ( θ ) = θ j and t j = θ 0 j for j = 1 , , p , t p + 1 ( θ ) = tr Σ - 1 ( θ - θ 0 ) ( θ - θ 0 ) and t p + 1 = p . R p π m a x ( θ ) d θ = 1 requires λ 1 = = λ p = 0 and λ p + 1 < 0 . Thus, the density π m a x ( θ ) is proportional to exp - λ p + 1 tr Σ - 1 ( θ - θ 0 ) ( θ - θ 0 ) . Setting λ p + 1 = - t p + 1 / 2 p and obtaining the normalizing constant, then we see that the maximum entropy prior of the parameter in Equation (1) is Equation (6). ☐
In practical situations, we sometimes have partial information about a multivariate interval constraint (i.e., θ C ) in addition to the first two moments as given in Corollary 1.
Corollary 2. 
Assume that the prior distribution of θ = ( θ 1 , , θ p ) has the mean vector θ 0 = ( θ 01 , , θ 0 p ) and covariance matrix Σ . Further assume, a priori, that the space of θ is constrained to a multivariate interval, { θ ; θ C } given in Equation (3). Then, a constrained maximum entropy prior of θ is given by:
π c o n s t ( θ ) = ( 2 π ) - p / 2 | Σ | - 1 / 2 exp - 1 2 ( θ - θ 0 ) Σ - 1 ( θ - θ 0 ) P r ( θ C ) , θ C ,
a density of the N p ( θ 0 , Σ ) I ( θ C ) distribution, which is a p-dimensional truncated N p ( θ 0 , Σ ) distribution with the space C .
Proof. 
The certain multivariate interval constraint, P r ( θ C ) = 1 , can be expressed in terms of moment, E [ I ( θ C ) ] = 1 . Upon applying Lemma 1 with t j ( θ ) = θ j and t j = θ 0 j for j = 1 , , p , t p + 1 ( θ ) = tr Σ - 1 ( θ - θ 0 ) ( θ - θ 0 ) and t p + 1 = p , t p + 2 ( θ ) = I ( θ C ) and t p + 2 = 1 , and C π c o n s t ( θ ) d θ = 1 , we see that λ 1 = = λ p = 0 , λ p + 1 < 0 , λ p + 2 = 1 , and π c o n s t ( θ ) exp - λ p + 1 tr Σ - 1 ( θ - θ 0 ) ( θ - θ 0 ) . Setting λ p + 1 = - 1 / 2 , and obtaining the normalizing constant, we obtain Equation (7). ☐

2.2. Two-Stage Maximum Entropy Prior

This subsection considers the case where the maximum entropy prior of θ has stochastic constraint in the form of a multivariate interval, i.e., Pr ( θ C ) = γ , where C is defined by Equation (3) and γ [ γ m a x , 1 ] . Here, γ m a x is Pr ( θ C ) calculated by using the maximum entropy prior distribution in Equation (6). We develop a two-stage prior of θ , denoted by π t w o ( θ ) , which has a different formula according to the degree of belief, γ , regarding the constraint.
Suppose we have only partial information about the covariance matrix, Ω 2 , of the parameter θ , in the first stage of a prior elicitation. Then, for a given mean vector μ 0 , we may construct the maximum entropy prior, Equation (6), so that the first stage maximum entropy prior will be π m a x ( θ | μ 0 ) , which is the density of the N p ( μ 0 , Ω 2 ) distribution. In addition to the information, suppose we have collected prior information about the unknown μ 0 , which gives a value of the mean vector θ 0 and covariance matrix Ω 1 , as well as a stochastic (or certain) constraint, indicating Pr ( μ 0 C ) = 1 . Then, in the second stage of the prior elicitation, one can elicit the additional prior partial information by using the constrained maximum entropy prior in Equation (7).
Analogous to the work of [5], we can specify all of the partial information about θ by following two stages of the maximum entropy prior hierarchy over θ R p :
π m a x ( θ | μ 0 ) = ϕ p ( μ 0 , Ω 2 ) ,
π c o n s t ( μ 0 ) = ϕ p ( θ 0 , Ω 1 ) I ( μ 0 C ) ,
where ϕ p ( θ 0 , Ω 1 ) I ( μ 0 C ) is a truncated normal density, i.e., the density of the N p ( θ 0 , Ω 1 ) I ( μ 0 C ) variate, and Ω 1 + Ω 2 = Σ . Thus, the two stages of prior hierarchy are as follows. In the first stage, given μ 0 , θ has a maximum entropy prior that is the N p ( μ 0 , Ω 2 ) distribution as in Equation (6). In the second stage, μ 0 has a distribution obtained by truncating the maximum entropy prior distribution to elicit uncertainty about the prior information that θ C . It may be sensible to assume that the value of θ 0 is located in the multivariate interval C or in the centroid of the interval.
Definition 1. 
The marginal prior density of θ , obtained from the two stages of the maximum entropy prior hierarchy Equations (8) and (9), is called as a two-stage maximum entropy prior of θ .
Since Ω 1 + Ω 2 = Σ , if the constraint is completely certain (i.e., γ = 1 ), we may set Ω 2 O to get the π c o n s t ( θ ) from the two stages of maximum entropy prior, while the two-stage prior yields π m a x ( θ ) with γ = γ m a x for the case where Ω 1 = O . Thus, the hyper-parameters Ω 1 and Ω 2 may need to be assessed to achieve the degree of belief γ about the stochastic constraint. When Ω 1 O and Ω 2 O , the above hierarchy of priors yields the following marginal prior of θ .
Lemma 2. 
The two stages of the prior hierarchy of Equations (8) and (9) yield the two-stage maximum entropy prior distribution of θ given by:
π t w o ( θ ) = ϕ p ( θ ; θ 0 , Σ ) Φ ¯ p ( C ; μ , Q ) Φ ¯ p ( C ; θ 0 , Ω 1 ) , θ R p
where ϕ p ( x ; c , A ) denotes the pdf of X N p ( c , A ) and Φ ¯ p ( C ; c , A ) denotes a p-dimensional rectangle probability of the distribution of X , i.e., P ( X C ) , μ = θ 0 + Ω 1 Σ - 1 ( θ - θ 0 ) , Σ = Ω 1 + Ω 2 and Q = ( Ω 1 - 1 + Ω 2 - 1 ) - 1 .
Proof. 
π t w o ( θ ) = μ 0 C ϕ p ( θ ; μ 0 , Ω 2 ) ϕ p ( μ 0 ; θ 0 , Ω 1 ) d μ 0 P r ( μ 0 C ) , = ϕ p ( θ ; θ 0 , Σ ) μ 0 C ϕ p ( μ 0 ; μ , Q ) d μ 0 Φ ¯ p ( C ; θ 0 , Ω 1 ) ,
because μ = θ 0 + Ω 1 Σ - 1 ( θ - θ 0 ) = θ + Ω 2 Σ - 1 ( θ 0 - θ ) and θ Ω 2 - 1 θ + θ 0 Ω 1 - 1 θ 0 - μ Ω 2 - 1 θ + Ω 1 - 1 θ 0 = ( θ - θ 0 ) Σ - 1 ( θ - θ 0 ) .  ☐
In fact, the density π t w o ( θ ) belongs to the family of rectangle screened multivariate normal ( R S N ) distributions studied by [23].
Corollary 3. 
The distribution law of θ with the density in Equation (10) is:
θ = d X 2 | X 1 C R S N p ( C ; τ , Ψ ) ,
which is a p-dimensional R S N distribution with respective location and scale parameters τ and Ψ and the rectangle screening space C . Here, the joint distribution of X 1 and X 2 is N 2 p ( τ , Ψ ) , where τ = ( θ 0 , θ 0 ) and Ψ = Ω 1 Ω 1 Ω 1 Σ .
Proof. 
The density of [ X 2 | X 1 C ] is:
π t w o ( x 2 ) = ϕ p x 2 ; θ 0 , Σ C ϕ p x 1 ; μ x 1 | x 2 , Ω x 1 | x 2 d x 1 C ϕ p x 1 ; θ 0 , Ω 1 d x 1 = ϕ p x 2 ; θ 0 , Σ Φ ¯ p C ; μ x 1 | x 2 , Ω x 1 | x 2 Φ ¯ p C ; θ 0 , Ω 1
where μ x 1 | x 2 = θ 0 + Ω 1 Σ - 1 ( x 2 - θ 0 ) and Ω x 1 | x 2 = Ω 1 - Ω 1 Σ - 1 Ω 1 . By use of the binomial inverse theorem (see, e.g., [24] p. 23), one can easily see that μ x 1 | x 2 and Ω x 1 | x 2 are respectively equivalent to μ and Q , in Equation (10), provided that x 2 is changed to θ .  ☐
According to [23], we see that the stochastic representation for the R S N vector θ R S N p ( C ; τ , Ψ ) is:
θ = d θ 0 + Y 1 ( α , β ) + Y 2 ,
where Y 1 N p ( 0 , Ω 1 ) and Y 2 N p ( 0 , Ω 2 ) are independent random vectors. Here, Y 1 ( α , β ) denotes a doubly-truncated multivariate normal random vector whose distribution is defined by Y 1 ( α , β ) = d [ Y 1 | Y 1 ( α , β ) ] with α = a - θ 0 and β = b - θ 0 . This representation enables us to implement a one-for-one method for generating a random vector with the R S N p ( C , τ , Ψ ) distribution. For generating the doubly-truncated multivariate normal vector Y 1 ( α , β ) , the R package tmvtnorm by [25] can be used, where R is a computer language and an environment for statistical computing and graphics.

2.3. Entropy of a Maximum Entropy Prior

Suppose we have partial a priori information that we can specify values for the covariance matrices Ω 1 and Ω 2 , where Σ = Ω 1 + Ω 2 .

2.3.1. Case 1: Two-stage Maximum Entropy Prior

When the two-stage maximum entropy prior π t w o ( θ ) is assumed for the prior distribution of θ , its entropy is given by:
E n t ( π t w o ( θ ) ) = - R p π t w o ( θ ) log π t w o ( θ ) d θ = p 2 log ( 2 π ) + log Φ ¯ p ( C ; θ 0 , Ω 1 ) + 1 2 tr Σ - 1 E t w o ( θ - θ 0 ) ( θ - θ 0 ) + 1 2 log | Σ | - E t w o log h ( θ ) ,
where Σ = Ω 1 + Ω 2 , h ( θ ) = Φ ¯ p C ; θ 0 + Ω 1 Σ - 1 ( θ - θ 0 ) , Ω 1 - Ω 1 Σ - 1 Ω 1 , and the E t w o denotes the expectation with respect to the R S N distribution with the density π t w o ( θ ) . Equation (12) shows that E [ θ ] = θ 0 + ξ , and C o v ( θ ) = Ω 2 + H . Here, ξ = ( ξ 1 , , ξ p ) and H = { h i j } , i , j = 1 , , p , are the mean vector and covariance matrix of the doubly-truncated multivariate normal random vector, Y 1 N p 0 , Ω 1 I y 1 ( α , β ) . Readers are referred to [25] with the R package tmvtnorm and [26] with the R package mvtnorm for implementing the respective calculations of doubly-truncated moments and integrations. As seen in Equation (13), an analytic calculation of E log h ( θ ) involves a complicated integration. Instead, by using a Monte Carlo integration, we may calculate it approximately. According to Equation (12), it follows that the stochastic representation of the prior distribution θ R S N p ( C ; τ , Ψ ) with density π t w o ( θ ) is useful for generating θ ’s from the prior distribution θ by using the R packages mvtnorm and tmvtnorm and, hence, implementing the Monte Carlo integration.

2.3.2. Case 2: Constrained Maximum Entropy Prior

When the constrained maximum entropy prior π c o n s t ( θ ) in Equation (7) is assumed for the prior distribution of θ , its entropy is given by:
E n t ( π c o n s t ( θ ) ) = - C π c o n s t ( θ ) log π c o n s t ( θ ) d θ = p 2 log ( 2 π ) + log Φ ¯ p ( C ; θ 0 , Σ ) + 1 2 log | Σ | + 1 2 tr Σ - 1 E c o n s t ( θ - θ 0 ) ( θ - θ 0 ) .
The E c o n s t denotes the expectation with respect to the doubly-truncated multivariate normal distribution with the density π c o n s t ( θ ) , and its analytic calculation is not possible. Instead, the R packages tmvtnorm and mvtnorm are available for calculating the respective moment and integration in the expression of E n t ( π c o n s t ) .

2.3.3. Case 3: Maximum Entropy Prior

On the other hand, if the maximum entropy prior π m a x ( θ ) is assumed for the prior distribution of the location parameter θ , its entropy is given by:
E n t ( π m a x ( θ ) ) = - R p π m a x ( θ ) log π m a x ( θ ) d θ = p 2 + p 2 log ( 2 π ) + 1 2 log | Σ | .
The following theorem asserts the relationship among the degrees of belief, accounted for by the three priors, about the a priori uncertain constraint θ ; θ C .
Theorem 1. 
The degrees of belief γ m a x , γ t w o , and γ c o n s t about the a priori constraint θ ; θ C , accounted for by π m a x ( θ ) , π t w o ( θ ) and π c o n s t ( θ ) , have the following relation:
γ m a x γ t w o γ c o n s t ,
provided that the parameters of π t w o ( θ ) in Equation (10) satisfy:
Φ ¯ 2 p C * ; τ , Ψ Φ ¯ p C ; θ 0 , Ω 1 Φ ¯ p C ; θ 0 , Σ ,
where C * = { x ; x 1 C , x 2 C } denotes the 2 p -variate interval of random vector X = ( X 1 , X 2 ) , the equality γ m a x = γ t w o holds for Ω 1 = O , γ t w o = γ c o n s t holds for Ω 2 = O and γ m a x = γ t w o = γ c o n s t holds for C = R p .
Proof. 
The conditions for equalities are straightforward from the stochastic representation in Equation (12). Under the π m a x ( θ ) in Equation (6),
γ m a x = P r ( θ C ) = Φ ¯ p ( C ; θ 0 , Σ ) , γ t w o = θ C π t w o ( θ ) d θ = Φ ¯ 2 p C * ; τ , Ψ Φ ¯ p C ; θ 0 , Ω 1 ,
because π t w o ( θ ) is the density of θ R S N p ( C ; τ , Ψ ) , and γ c o n s t = θ C π c o n s t ( θ ) d θ = 1 . Therefore, the condition Φ ¯ 2 p C * ; τ , Ψ Φ ¯ p C ; θ 0 , Ω 1 Φ ¯ p C ; θ 0 , Σ gives the inequality relation. ☐

3. Properties

3.1. Objective Measure of Uncertainty

In constructing the two stages of prior hierarchy over θ R p , the usual practice is to set the value of θ 0 as the centroid of the uncertain constrained multivariate interval C = ( a , b ) . In this case, we have the following result.
Corollary 4. 
In the case where the value of θ 0 in π t w o ( θ ) is the centroid of the multivariate interval C ,
γ m a x γ t w o γ c o n s t .
Proof. 
Equation (12) indicates that:
γ t w o = P r Y 1 + Y 2 ( α , β ) | Y 1 ( α , β ) and γ m a x = P r Y 1 + Y 2 ( α , β ) ,
where Y 1 N p ( 0 , Ω 1 ) and Y 2 N p ( 0 , Ω 2 ) are independent random vectors, α = a - θ 0 , and β = b - θ 0 . When θ 0 is the centroid of C , α = - β , and hence:
P r Y 1 + Y 2 ( α , β ) , Y 1 ( α , β ) P r Y 1 + Y 2 ( α , β ) P r Y 1 ( α , β )
by the theorem of [27]. This leads to the first inequality, γ m a x γ t w o . Since γ c o n s t = 1 , we see that the second inequality in Equation (15) holds.
The following are immediate from Theorem 1 and Corollary 4: (i) The two-stage maximum entropy prior achieves γ t w o for the degree of belief about the uncertain multivariate interval constraint θ ; θ C , and its value satisfies γ t w o [ γ m a x , 1 ] if the condition in the theorem is satisfied. Note that the equality γ t w o = 1 holds for Ω 2 = O ; (ii) The degree of belief about the multivariate interval constraint is a function of the covariance matrices Ω 1 and Ω 2 . Thus, if we have the partial a priori information that specifies values of the covariance matrices Ω 1 and Ω 2 , the degree of belief γ t w o , associated with π t w o ( θ ) , can be assessed.
Figure 1 compares the degrees of belief about the uncertain multivariate interval constraint θ ; θ C , accounted for by the three priors of θ . The figure is obtained in terms of δ [ 0 , 1 ] with Ω 1 = δ Σ and Ω 2 = ( 1 - δ ) Σ , p = 3 , C = ( a , 2 1 p + a ) and θ 0 = 0 , where a = ( - 0 . 1 × p ) 1 p , Σ = σ 2 ( 1 - ρ ) I p + σ 2 ρ 1 p 1 p is an intra-class covariance matrix, and 1 p denotes a p × 1 summing vector whose every element is unity. When the constraint is changed to C = ( - ( 2 1 p + a ) , - a ) in this comparison, one can easily check that the degrees of belief do not change and give the same results seen in Figure 1. The figure depicts exactly the same inequality relationship given in Theorem 1. In comparison with γ t w o and γ c o n s t = 1 , we see that the degree of belief in the uncertain constraint, accounted for by using π t w o ( θ ) , becomes large as Ω 2 O (or equivalently Ω 1 Σ ) . In particular, this tendency is more evident for small σ 2 and large ρ values. Third, the difference in γ t w o and γ m a x in the right panel suggests that the difference becomes large as Ω 2 tends to O . In particular, for fixed values of δ and ρ , the figure shows that the difference increases as the value of σ 2 decreases, while it decreases as the value of ρ increases for fixed values of δ and σ 2 . Therefore, the figure confirms that the two-stage maximum entropy prior π t w o ( θ ) accounts for the a priori uncertain constraint θ ; θ C with the degree of belief γ t w o [ γ m a x , 1 ] . The figure also notes that the magnitude of γ t w o depends on both the first stage covariance Ω 2 and the second stage covariance Ω 1 in the two stages of prior hierarchy in Equations (8) and (9). All other choices of the values of p , ρ , and C , satisfying the condition in Theorem 1, produced similar graphics depicted in Figure 1, with the exception of the magnitude of the differences among the degrees of belief.

3.2. Properties of the Entropy

The expected uncertainty in the multivariate interval constraint of the location parameter θ , θ ; θ C , accounted for by the two-stage prior π t w o ( θ ) , is measured by its entropy E n t ( π t w o ( θ ) ) , and information about the constraint is defined by - E n t ( π t w o ( θ ) ) . Thus, as considered by [20,28], the difference between the Shannon measures of information, before and after applying the uncertain constraint { θ ; θ C } , can be explained by the following property.
Corollary 5. 
When θ 0 is the centroid of the multivariate interval C ,
E n t ( π m a x ( θ ) ) E n t ( π t w o ( θ ) ) E n t ( π c o n s t ( θ ) ) ,
where E n t ( π t w o ( θ ) ) reduces to E n t ( π m a x ( θ ) ) for Ω 1 = O , while E n t ( π t w o ( θ ) ) is equal to E n t ( π c o n s t ( θ ) ) for Ω 2 = O . All of the equalities hold for C = R p .
Proof. It is straightforward to check the equalities by using the stochastic representation in Equation (12). Since π m a x ( θ ) is the maximum entropy prior, it is sufficient to show that E n t ( π t w o ( θ ) ) E n t ( π c o n s t ( θ ) ) . First, Σ - Ω 1 = Ω 2 > 0 implies that Φ ¯ p ( C ; θ 0 , Ω 1 ) Φ ¯ p ( C ; θ 0 , Σ ) by the lemma of [27]. Second, γ c o n s t = P r Y 1 + Y 2 ( α , β ) | Y 1 + Y 2 ( α , β ) P r Y 1 + Y 2 ( α , β ) | Y 1 ( α , β ) = γ t w o by Corollary 4. This and the lemma of [27] indicate that C o v Y 1 + Y 2 | Y 1 ( α , β ) - C o v Y 1 + Y 2 | Y 1 + Y 2 ( α , β ) is a positive-semi-definite, and hence, tr Σ - 1 E t w o ( θ - θ 0 ) ( θ - θ 0 ) tr Σ - 1 E c o n s t ( θ - θ 0 ) ( θ - θ 0 ) by ([29], p. 54), where E t w o ( θ - θ 0 ) ( θ - θ 0 ) = C o v Y 1 + Y 2 | Y 1 ( α , β ) and E c o n s t ( θ - θ 0 ) ( θ - θ 0 ) = C o v Y 1 + Y 2 | Y 1 + Y 2 ( α , β ) for α = - β . These two results give the inequality E n t ( π t w o ( θ ) ) E n t ( π c o n s t ( θ ) ) , because E t w o log h ( θ ) 0 .
Figure 2 depicts the difference between E n t ( π m a x ( θ ) ) , E n t ( π t w o ( θ ) ) and E n t ( π c o n s t ( θ ) ) using the same parameter values used in constructing Figure 1. Figure 2 coincides with the inequality relation given in Corollary 5 and indicates the following consequences: (i) Even though θ 0 is not the centroid of the multivariate interval C , we see that E n t ( π m a x ( θ ) ) > E n t ( π t w o ( θ ) ) > E n t ( π c o n s t ( θ ) ) for δ ( 0 , 1 ) . (ii) The difference E n t ( π t w o ( θ ) ) - E n t ( π c o n s t ( θ ) ) is a monotone decreasing function of δ , while E n t ( π m a x ( θ ) ) - E n t ( π t w o ( θ ) ) is a monotone increasing function. (iii) The differences get bigger the larger σ 2 becomes for δ ( 0 , 1 ) . This indicates that the entropy of π t w o ( θ ) is associated not only with the covariance of the first stage prior Ω 2 , but that of the second stage prior Ω 1 in Equations (8) and (9), respectively. (iv) Upon comparing Figure 1 and Figure 2, the entropy E n t ( π t w o ( θ ) ) is closely related to the degree of belief γ t w o , such that:
E n t ( π t w o ( θ ) ) = c t w o ( 1 - γ t w o ) ,
where c t w o > 0 is obtained by using Equations (13) and (16) and 1 - γ t w o denotes the degree of uncertainty in a priori information regarding the multivariate interval constraint θ ; θ C elicited by π t w o ( θ ) . These consequences and Corollary 5 indicate that 1 - γ t w o stands between 1 - γ c o n s t and 1 - γ m a x . Thus, the two-stage prior π t w o ( θ ) is useful for eliciting uncertain information about the multivariate interval constraint. Theorem 1 and the above statements produce an objective method for eliciting the stochastic constraint { θ ; θ C } via π t w o ( θ ) .
Corollary 6. 
Suppose the degree ( 1 - γ t w o ) of uncertainty associated with the stochastic constraint { θ ; θ C } is given. An objective way of eliciting the prior information by using π t w o ( θ ) is to choose the covariance matrices Ω 1 and Ω 2 in π t w o ( θ ) , such that γ t w o = Φ ¯ 2 p C * ; τ , Ψ / Φ ¯ p C ; θ 0 , Ω 1 , where Σ = Ω 1 + Ω 2 is known and Ω 1 = δ Σ with δ [ 0 , 1 ] .
Since γ c o n s t = 1 , the degree of uncertainty ( 1 - γ t w o ) is equal to γ c o n s t - γ t w o . The left panel of Figure 1 plots a graph of 1 - γ t w o against δ . The graph indicates that a δ value for π t w o ( θ ) can be easily determined for given Σ , and the value is in inverse proportion to the degree of uncertainty regardless of Σ .

3.3. Posterior Distribution

Suppose the distribution of the error vector in the model (1) belongs to the family of scale mixture of normal distributions defined in Equation (2); then, the conditional distribution of the data information from n = 1 is [ y | η ] N p ( θ , κ ( η ) Λ ) . It is well known that the priors π m a x ( θ ) and π c o n s t ( θ ) are conjugate priors for the location vector θ , provided that η and Λ are known. That is, conditional on η , each prior satisfies the conjugate property that the prior and the posterior distributions of θ belong to the same family of distributions. The following corollary provides that the conditional conjugate property also applies to π t w o ( θ ) .
Corollary 7. 
Let y | η N p ( θ , κ ( η ) Λ ) with known Λ . Then, the two-stage maximum entropy prior π t w o ( θ ) in Equation (10) yields the conditional posterior distribution of θ given by:
θ | y , η R S N p ( C ; τ η * , Ψ η * ) ,
where Ω 1 = δ Σ , δ ( 0 , 1 ) , Σ = Ω 1 + Ω 2 , Σ 1 η * = δ ( 1 - δ ) Σ + δ 2 Σ η * , Σ η * = κ ( η ) - 1 Λ - 1 + Σ - 1 - 1 ,
τ η * = θ 0 η * θ η * , Ψ η * = Σ 1 η * δ Σ η * δ Σ η * Σ η * , θ 0 η * = ( 1 - δ ) θ 0 + δ θ η * , a n d θ η * = Σ η * κ ( η ) - 1 Λ - 1 y + Σ - 1 θ 0 .
Proof. 
When the two-stage prior π t w o ( θ ) in Equation (10) is used, the conditional posterior density of θ given η is:
p ( θ | y , η ) ϕ p ( y ; θ , κ ( η ) Λ ) ϕ p ( θ ; θ 0 , Σ ) Φ ¯ p C ; μ , Q / Φ ¯ p C ; θ 0 , Ω 1 , ϕ p ( θ ; θ η * , Σ η * ) Φ ¯ p C ; μ η * , Q η * ,
in that Φ ¯ p C ; μ , Q = Φ ¯ p C ; μ η * , Q η * , where μ = θ 0 + Ω 1 Σ - 1 ( θ - θ 0 ) , μ η * = θ 0 η * + δ ( θ - θ η * ) and Q η * = Σ 1 η * - δ 2 Σ η * . The last term of the proportional relations is a kernel of the R S N p ( C ; τ η * , ψ η * ) density defined by Corollary 3. ☐
Corollaries 3 and 7 establish the conditional conjugate property of π t w o ( θ ) : Suppose the location parameter θ is the normal mean vector, then the R S N prior distribution, i.e., π t w o ( θ ) , yields the conditional posterior distribution, which belongs to the class of R S N distributions as given in Corollary 7. In the particular case where the distribution of η degenerates at κ ( η ) = 1 , i.e., the model (1) is a normal model, then the conditional conjugate property of π t w o ( θ ) reduces to the unconditional conjugate property.
Using the relation between the distribution of Equation (11) and that of Equation (12), we can obtain the stochastic representation for the conditional posterior R S N distribution in Equation (17) as follows.
Corollary 8. 
Conditional on the mixing variable η, the stochastic representation of θ | y , η R S N p ( C ; τ η * , Ψ η * ) is:
[ θ | y , η ] = d θ η * + δ Σ η * Σ 1 η * - 1 W 1 ( α η * , β η * ) + Σ η * - δ 2 Σ η * Σ 1 η * - 1 Σ η * 1 / 2 W 2 ,
where W 1 N p ( 0 , Σ 1 η * ) and W 2 N p ( 0 , I p ) are independent and W 1 ( α η * , β η * ) = d W 1 | W 1 ( α η * , β η * ) , where α * = a - θ 0 η * and β * = b - θ 0 η * .
Proof. 
Suppose the distributions of X 1 and X 2 in Equation (11) changed to X 1 = d θ 0 η * + W 1 and X 2 = d θ η * + δ Σ η * Σ 1 η * - 1 W 1 + Σ η * - δ 2 Σ η * Σ 1 η * - 1 Σ η * 1 / 2 W 2 . Then, the stochastic representation in Equation (12) associated with the distribution X 2 | X 1 C in Equation (11) gives the result. ☐

4. Hierarchical Constrained Scale Mixture of Normal Model

For the model (1), if we are completely sure about a multivariate interval constraint on θ , a suitable restriction on the parameter space θ R p , such as using a truncated normal prior distribution, is expected for eliciting the information. However, there are certain cases where we have a priori information that the location parameter θ is highly likely to have a multivariate interval constraint, and thus, the value of θ needs to be located with uncertainty in a restricted space { θ C } with C = ( a , b ) . Then, we cannot be sure about the constraint, and then, the constraint becomes stochastic (or uncertain), as in our problem of interest. In this case, the uncertainty about the constraint must be taken into account in the estimation procedure of the model (1). This section considers a hierarchical Bayesian estimation of the scale mixture of normal models reflecting the uncertain prior belief on θ .

4.1. The Hierarchical Model

Let us consider a hierarchical constrained scale mixture of normal model (HCSMN) that uses the hierarchy of the scale mixture of normal model (1) and includes the two stages of a prior hierarchy in the following way:
y i | η i , Λ = θ + ϵ i , ϵ i N p ( 0 , κ ( η i ) Λ ) , i = 1 , , n , θ | μ 0 N p ( μ 0 , Ω 2 ) , independent of  ( ϵ 1 , , ϵ n ) , μ 0 N p ( θ 0 , Ω 1 ) I ( μ 0 C ) , Λ W p - 1 ( D , d ) , d > 2 p , η i i i d g ( η ) , i = 1 , , n ,
where W p - 1 ( D , d ) denotes the inverted Wishart distribution with positive definite scale matrix D and d degrees of freedom whose pdf W p - 1 ( Λ ; D , d ) is:
W p - 1 ( Λ ; D , d ) | Λ | - d / 2 exp - 1 2 tr ( Λ - 1 D ) ,
Ω 1 + Ω 2 = Σ with Ω 1 = δ Σ , and δ [ 0 , 1 ] .

4.2. The Gibbs Sampler

Based on the HCSMN model structure with the likelihood and the prior distributions in Equation (19), the joint posterior distribution of θ , Λ and η = ( η 1 , , η n ) given the data { y 1 , , y n } is:
p ( θ , Λ , η | D a t a ) i = 1 n | κ ( η i ) Λ | - 1 / 2 exp - 1 2 tr Λ - 1 κ ( η i ) - 1 ( y i - θ ) ( y i - θ ) × ϕ p ( θ ; μ 0 , Ω 2 ) ϕ p ( μ 0 ; θ 0 , Ω 1 ) I ( μ 0 C ) × | Λ | - d / 2 exp - 1 2 tr ( Λ - 1 D ) i = 1 n g i ( η i ) ,
where g ( η i ) ’s denote the densities of the mixing variables η i ’s. Note that the joint posterior of Equation (20) is not simplified in an analytic form of the known density and, thus, intractable for the posterior inference. Instead, we use the Gibbs sampler for the posterior inference. See [30] for a reference. To run the Gibbs sampler, we need the following full conditional posterior distributions:
(i)
The full conditional posterior densities of η i ’s are given by:
p ( η i | θ , Λ , y i ) κ ( η i ) - p 2 exp - ( y i - θ ) Λ - 1 ( y i - θ ) 2 κ ( η i ) g ( η i ) , i = 1 , , n ,
(ii)
The full conditional distribution of θ is obtained by using the way analogous to the proof of Corollary 7. It is:
θ | Λ , η , D a t a R S N p C ; τ p o s , Ψ p o s ,
where:
τ p o s = τ 0 τ 1 , Ψ p o s Ω 0 * δ Ω * δ Ω * Ω * , τ 0 = ( 1 - δ ) θ 0 + δ τ 1 ,
τ 1 = Ω * Σ - 1 θ 0 + i = 1 n ( κ ( η i ) Λ ) - 1 y i , Ω * = Σ - 1 + i = 1 n ( κ ( η i ) Λ ) - 1 - 1 , and Ω 0 * = δ ( 1 - δ ) Σ + δ 2 Ω * .
(iii)
The full conditional posterior distribution of Λ is an inverse-Wishart distribution:
Λ | θ , η , D a t a W p - 1 V , m , m > 2 p ,
where V = D + i = 1 n κ ( η i ) - 1 ( y i - θ ) ( y i - θ ) and m = n + d .

4.3. Markov Chain Monte Carlo Sampling Scheme

When conducting a posterior inference of the HCSMN model, using the Gibbs sampling algorithm with the full conditional posterior distributions of η i ’s, θ and Λ, the following points should be noted.
  • note 1: Variable η i at κ ( η i ) = 1 , i.e., the HCN (hierarchical constrained normal) model with ϵ i i i d N p ( 0 , Λ ) , i = 1 , , n , the Gibbs sampler consists of two conditional distributions [ θ | Λ , D a t a ] and [ Λ | θ , D a t a ] . To sample from the first full conditional posterior distribution, we can utilize the stochastic representations of the R S N distribution in Corollary 8. The R package tmvtnorm and the R package mvtnorm can be used to sample from the R S N distribution in Equation (22).
  • note 2: According to choice of the distribution η i and the mixing function κ ( η i ) , the HCSMN model may produce a different model other than the HCN model, such as hierarchical constrained multivariate t ν (HC t ν ), hierarchical constrained multivariate l o g i t , hierarchical constrained multivariate s t a b l e and hierarchical constrained multivariate e x p o n e n t i a l p o w e r models. See, e.g., [31,32], for various distributions of η i and corresponding function κ ( η i ) , which can be used to construct the HCSMN model.
  • note 3: When the hierarchical constrained multivariate t ν (HC t ν ) model is considered, the hierarchy of the model in Equation (19) consists of ϵ i i i d N p ( 0 , κ ( η i ) Λ ) with κ ( η i ) = η i - 1 and η i G a m m a ( ν / 2 , ν / 2 ) , i = 1 , , n . Thus, the Gibbs sampler comprises the conditional posterior Equations (21)–(23). Under the HC t ν model, the distribution of Equation (21) reduces to:
    [ η i | θ , Λ , y i ] G a m m a ( ν * / 2 , h / 2 ) ,
    where ν * = p + ν and h = ν + ( y i - θ ) Λ - 1 ( y i - θ ) . To limit model complexity, we consider only fixed ν, so that we can investigate different HC t ν models. As suggested by [32], a uniform prior on 1 / ν ( 0 < 1 / ν < 1 ) can be considered. However, this will bring additional computational burden.
  • note 4: Except for the HCN and HC t ν models, the Metropolis–Hastings algorithm within the Gibbs sampler is used for estimating the HCSMN models, because the conditional posterior densities Equation (20) do not have explicit forms of known distributions as in Equations (21) and (22). See, e.g., [22], for the algorithm for sampling η i from various mixing distributions, g i ( η i ) . A general procedure for the algorithm is as follows: Given the current values Θ = { η , θ , Λ } , we independently generate a candidate η i from a proposal density q ( η i * | η i ) = g i ( η i * ) , as suggested by [33], which is used for a Metropolis–Hastings algorithm. Then, accept the candidate value with the acceptance rate:
    α ( η i , η i * ) = min p ( Θ | η i * ) p ( Θ | η i ) , 1
    i = 1 , , n . Because the target density is proportional to p ( Θ | η i ) g i ( η i ) and p ( Θ | η i ) = ϕ p ( y i ; θ , κ ( η i ) Λ ) is uniformly bounded for η i > 0 .
  • note 5: As noted from Equations (8) and (9), the second and third stage priors of the HCSMN model in Equation (19) reduce to the two-stage prior π t w o ( θ ) , eliciting the stochastic multivariate interval constraint with degree of uncertainty 1 - γ t w o . Instead, if the maximum entropy prior π m a x ( θ ) and the constrained maximum entropy prior π c o n s t ( θ ) are used for the HCSMN, then the respective full conditional distributions of θ of the Gibbs sampler change from Equation (22) to:
    θ | Λ , η , D a t a N p τ 1 , Ω * for θ R p and θ | Λ , η , D a t a N p τ 1 , Ω * I θ C ,
    where τ 1 and Ω * are the same as given in Equation (22).

4.4. Bayes Estimation

For a simple example, let us consider the HCN model with known Λ . When we assume a stochastic constraint { θ ; θ C } obtained from a priori information, we may use the two-stage maximum entropy prior π t w o ( θ ) defined by the second and third stages of the HCSMN model (19) with δ ( 0 , 1 ) , where the value of δ is determined by using Corollary 6. This yields a Bayes estimate based on the two-stage maximum entropy prior. Corollary 8 yields:
θ ^ t w o = τ 1 + δ Ω * Ω 0 * - 1 E θ t n * = τ 1 + δ Ω * Ω 0 * - 1 ζ
and:
ζ = ( ζ 1 , , ζ p ) with ζ i = w 0 i ϕ ( u i / w 0 i ) - ϕ ( v i / w 0 i ) Φ ( v i / w 0 i ) - Φ ( u i / w 0 i ) , i = 1 , , p ,
where θ t n * N p ( 0 , Ω 0 * ) I θ t n * ( u , v ) , a truncated normal distribution with u = a - τ 0 = ( u 1 , , u p ) and v = b - τ 0 = ( v 1 , , v p ) and w 0 i denotes i-th diagonal element of Ω 0 * . Here, τ 1 and τ 0 are the same as those in Equation (22), and ϕ ( · ) denotes the univariate standard normal density function. See [25,34] for the first moment of the truncated multivariate normal distribution and for a numerical calculation of the posterior covariance matrix C o v θ t n * , respectively.
On the other hand, when we have certainty about the constraint { θ ; θ C } , we may use the HCSMN model with δ = 1 , which uses the constrained maximum entropy prior π c o n s t ( θ ) instead of π t w o ( θ ) in its hierarchy. This case gives the Bayes estimate:
θ ^ c o n s t = E θ t n = τ 1 + ζ *
and:
ζ * = ( ζ 1 * , , ζ p * ) with ζ i * = w i ϕ ( a i / w i ) - ϕ ( b i / w i ) Φ ( b i / w i ) - Φ ( a i / w i ) , i = 1 , , p ,
where θ t n N p ( τ 1 , Ω * ) I θ t n C and w i denotes i-th diagonal element of Ω * .
On the contrary, when we have completely no a priori information about the constraint in the space of θ , the HCSMN model with the maximum entropy prior π m a x ( θ ) (equivalently, the HCSMN model with δ = 0 ) may be used for the posterior inference. In this model, the Bayes estimate of the location parameter is given by:
θ ^ m a x = τ 1 .
Comparing Equations (24) and (25) to Equation (26), we see that Equations (24) and (25) are the same for δ = 1 , and the last term in Equation (24) vanishes when we assume that there is no a priori information about the stochastic constraint, { θ ; θ C } . In this sense, the last term in Equation (24) can be interpreted as a shrinkage effect of the HCSMN model with δ 0 . This effect makes the Bayes estimator of θ shrink toward the stochastic constraint. In addition, we can calculate the difference between the estimates in Equations (24) and (25):
D i f f = θ ^ c o n s t - θ ^ t w o = ζ * - δ Ω * Ω 0 * - 1 ζ .
This difference vector is a function of the degree of belief γ t w o or δ ( 0 , 1 ) for Equation (25) is based on γ c o n s t = 1 and δ = 1 and D i f f = 0 for δ = 1 . Thus, the difference represents a stochastic effect of the multivariate interval constraint.

5. Numerical Illustrations

This section presents an empirical analysis of the proposed approach (using the HCSMN model) to the stochastic multivariate interval constraint on the location model. We provide numerical simulation results and a real data application comparing the proposed approach to the hierarchical Bayesian approaches, which use usual priors, π m a x ( θ ) and π c o n s t ( θ ) . For numerical implementations, we develop our program written in R, which is available from the author upon request.

5.1. Simulation Study

To examine the performance of the HSCMN model for estimating the location parameter with a stochastic multivariate interval constraint, we conduct a simulation study. The study is based up 200 synthetic datasets for different sample sizes n = 20 , 200 generated form each distribution of N 4 ( θ , Λ ) and t 4 ( θ , Λ , ν ) , a four-dimensional t with the location parameter θ , scale matrix Λ and degrees of freedom ν = 5 . For the simulation, we used the following choice of parameter values: θ = ( θ , θ , m θ , m θ ) and Λ = ( 1 - ρ ) I 4 + ρ 1 4 1 4 , where m = ( - 1 ) θ + 1 , ρ = 0 . 5 , and θ = 1 , 2 .
To fit each of the 200 synthetic datasets (Dataset I) generated from the N 4 ( θ , Λ ) distribution, we implemented the Markov chain Monte Carlo (MCMC) posterior simulation with the three different HCN models with the multivariate interval constraint C = ( a , b ) : the HCN models that use π t w o ( θ ) , π m a x ( θ ) , and π c o n s t ( θ ) . We denote these models by HCN ( π t w o ) , HCN ( π m a x ) and HCN ( π c o n s t ) . For each dataset, MCMC posterior sampling was based on the first 10,000 posterior samples as the burn-in, followed by a further 100,000 posterior samples with a thinning size of 10. Thus, the final MCMC posterior samples with a size of 10,000 were obtained for each of the three HCN models. Exactly the same MCMC posterior sampling scheme is applied to each of the 200 synthetic datasets (Dataset II) from the t 4 ( θ , Λ , ν ) distribution based on the three HC t ν models, HC t ν ( π t w o ) , HC t ν ( π m a x ) and HC t ν ( π c o n s t ) . To satisfy a subjective perspective of the hierarchical models, we set θ 0 = 0 , Σ = Ω 1 + Ω 2 = 0 . 5 θ I 4 , and Ω 1 = 0 . 85 Σ to specify our information about the parameter θ , while we set D = 10 - 2 I 4 and d = 2 p + 5 to elicit no information about Λ (see, e.g., [32]). For the stochastic multivariate interval constraint, we set a = - 0 . 5 θ 1 4 and b = 0 . 5 θ 1 4 , and this constraint gives the degree of belief γ m a x = 0 . 073 (or 0 . 217 ) and γ t w o = 0 . 394 (or 0 . 571 ) for θ = 1 (or 2). Note that the degree of belief in the constraint, accounted for by π c o n s t ( θ ) , is γ c o n s t = 1 for all of the values of θ .
Summary statistics of the posterior samples of the location parameters (the mean and the standard deviation of 200 posterior means of each parameter) along with the degrees of belief about the constraint C ( γ m a x , γ t w o and γ c o n s t ) are listed in Table 1. For the sake of saving a space, we omit the summary statistics regarding Λ from the table. The table indicates the followings: (i) The MCMC method performs well in estimating the location parameters of all of the models considered. This can be justified by the estimation results of the HCN ( π m a x ) and HC t ν ( π m a x ) models. Specifically, in the posterior estimation of θ , the data information tends to dominate the prior information about θ for the large sample case (i.e., n = 200 ), while the latter tends to dominate the former for the small sample case of n = 20 . Furthermore, the convergence of the MCMC sampling algorithm was evident, and a discussion about the convergence will be given in Subsection 5.2; (ii) The estimates of θ obtained from the HCN ( π t w o ) and HC t ν ( π t w o ) models are uniformly closer to the stochastic constraint θ C than those from the HCN ( π m a x ) and HC t ν ( π m a x ) models. This confirms that π t w o ( θ ) induces an obvious shrinkage effect in Bayesian estimation of the location parameter with a stochastic multivariate interval constraint; (iii) Comparing the estimates of θ obtained from the HCN ( π t w o ) (or HC t ν ( π t w o ) ) model to those from the HCN ( π c o n s t ) (or HC t ν ( π c o n s t ) ) model, we see that the difference between their vector values is significant. Thus, we can expect an apparent stochastic effect if we use π t w o ( θ ) in Bayesian estimation of the location parameter with a stochastic multivariate interval constraint.

5.2. Car Body Assembly Data Example

John and Wichern consider car body assembly data (accessible through www.prenhall.com/statistics, [35]) obtained from a study of its sheet metal assembly process. A major automobile manufacturer uses sensors that record the deviation from the nominal thickness (millimeters × 10 - 1 ) at a specific location on a car, which has the following levels: the deviation of the car body at the final stage of assembly ( Y 1 ) and that at an early stage of assembly ( Y 2 ). The data consist of 50 pairs of observations of ( Y 1 , Y 2 ), and they provide summary statistics as listed in Table 2. The tests given by ([36], p. 148), using the measures of multivariate skewness and kurtosis, accept the bivariate normality of the joint distribution of Y = ( Y 1 , Y 2 ) . The respective skewness and kurtosis are b 1 p = 0 . 074 and b 2 p = 7 . 337 , which give respective p-values of 0.954 (chi-square test for the skewness) and 0.721 (normal test for the kurtosis), indicating the observation model for the dataset is:
y i = θ + ϵ i , i = 1 , , 50 ,
where ϵ i i i d N 2 ( 0 , Λ ) , Λ = { λ i j } . The Shapiro–Wilk (S-W) test is also implemented to see the marginal normality of each Y i , i = 1 , 2 . The test statistic values and corresponding p-values of the S-W test are listed in Table 2.
In practical situations, we may have information about the mean vector of the observation model (i.e., mean deviation from the nominal thickness) from a past study of the sheet metal assembly process or a quality control report of the automobile manufacturer. Suppose that the information about the centroid of the mean deviation vector, θ = ( θ 1 , θ 2 ) , is ( - 1 , 4 ) with C o v ( θ ) = d i a g { 1 , 4 } . Furthermore, there is uncertain information that θ ( a , b ) , where a = ( - 1 . 5 , 3 ) and b = ( - 0 . 5 , 5 ) . This paper has proposed the two-stage maximum entropy prior π t w o ( θ ) to represent all of the information, which is not available with the other priors, such as π m a x ( θ ) and π c o n s t ( θ ) .
Using the three hierarchical models (i.e., the HCN ( π m a x ) , HCN ( π t w o ) and HCN ( π c o n s t ) models), we obtain 10,000 posterior samples from the MCMC sampling scheme based on each of the three models with a 10 thinning period after a burn-in period of 10,000 samples. In estimating the Mote Carlo (MC) error, we used the batch mean method method with 50 batches; see, e.g., [37] (pp. 39–40). For a formal test for the convergence of the MCMC algorithm, we applied the Heidelberger–Welch diagnostic test of [38] to single-chain MCMC runs and calculated the p-values of the test. For the posterior simulation, we used the following choice of hyper-parameter values: θ 0 = ( - 1 , 4 ) , Σ = Ω 1 + Ω 2 = 10 I 2 , Ω 1 = δ Σ , Ω 2 = ( 1 - δ ) Σ , δ ( 0 , 1 ) , D = 10 - 2 I 2 and d = 10 2 + 2 p + 1 . The posterior estimation and the convergence test results are shown in Table 3. Note that Columns 7–9 of the table list the values obtained from implementing the MCMC sampling for the posterior estimation of HCN(πtwo).
The small MC error values listed in Table 3 convince us of the convergence of the MCMC algorithm. Furthermore, the p-values of the Heidelberger–Welch test for the stationarity of the single MCMC run are larger than 0.1. Thus, both of the diagnostic checking methods advocate the convergence of the proposed MCMC sampling scheme. Similar to Table 1, this table also shows that π t w o ( θ ) induces the shrinkage and stochastic effects in the Bayesian estimation of θ with the uncertain multivariate interval constraint: (i) From the comparison of the posterior estimates obtained from HCN ( π t w o ) with those from HCN ( π m a x ) , we see that the estimates of θ 1 and θ 2 , obtained from HCN ( π t w o ) , shrink toward the stochastic interval C . The magnitude of shrinkage effect induced by using the proposed prior π t w o ( θ ) becomes more evident as the degree of belief in the interval constraint γ t w o (or δ) gets larger; (ii) On the other hand, we can see the stochastic effect of the prior π t w o ( θ ) by comparing the posterior estimate of θ obtained from HCN ( π t w o ) with that from HCN ( π c o n s t ) . The stochastic effect can be measured by the difference between the estimates, and we see that the difference becomes smaller as γ t w o (or δ) gets larger.

6. Conclusions

In this paper, we have proposed a two-stage maximum entropy prior π t w o ( θ ) of the location parameter of a scale mixture of normal model. The prior is derived by using the two stages of a prior hierarchy advocated by [5] to elicit a stochastic multivariate interval constraint, { θ ; θ C } . With regard to eliciting the stochastic constraint, the two-stage maximum entropy prior has the following properties. (i) Theorem 1 and Corollary 4 indicate that the two-stage prior is flexible enough to elicit all of the degrees of belief in the stochastic constraint; (ii) Corollary 4 confirms that the entropy of the two-stage prior is commensurate with the uncertainty about the constraint { θ ; θ C } ; (iii) As given in Corollary 6, the preceding two properties enable us to propose an objective way of eliciting the uncertain prior information by using π t w o ( θ ) . From the inferential view point: (i) the two-stage prior for the normal mean vector has the conjugate property that the prior and posterior distributions belong to the same family of the R S N distributions by [23]; (ii) the conjugate property enables us to construct an analytically simple Gibbs sampler for the posterior inference of the model (1) with unknown covariance matrix Λ; (iii) this paper also provides the HCSMN model, which is flexible enough to elicit all of the types of stochastic constraints and the scale mixture for Bayesian inference of the model (1). Based on the HCSMN model, the full conditional posterior distributions of unknown parameters were derived, and the calculation of posterior summary was discussed by using the Gibbs sampler and two numerical applications.
The methodological results of the Bayesian estimation procedure proposed in the paper can be extended to other multivariate models that incorporate functional means, such as linear and nonlinear regression models. For example, the seemingly unrelated regression (SUR) model and the factor analysis model (see, e.g., [24]) can be explained in the same framework of the proposed HCSMN in Equation (1). We hope to address these issues in the near future.

Acknowledgments

The research of Hea-Jung Kim was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2015R1D1A1A01057106).

Conflicts of Interest

The author declares no conflict of interest.

References

  1. O’Hagan, A. Bayes estimation of a convex quadratic. Biometrika 1973, 60, 565–572. [Google Scholar] [CrossRef]
  2. Steiger, J. When constraints interact: A caution about reference variables, identification constraints, and scale dependencies in structural equation modeling. Psychol. Methods 2002, 7, 210–227. [Google Scholar] [CrossRef] [PubMed]
  3. Lopes, H.F.; West, M. Bayesian model assessment in factor analysis. Stat. Sin. 2004, 14, 41–67. [Google Scholar]
  4. Loken, E. Identification constraints and inference in factor models. Struct. Equ. Model. 2005, 12, 232–244. [Google Scholar] [CrossRef]
  5. O’Hagan, A.; Leonard, T. Bayes estimation subject to uncertainty about parameter constraints. Biometrika 1976, 63, 201–203. [Google Scholar] [CrossRef]
  6. Liseo, B.; Loperfido, N. A Bayesian interpretation of the multivariate skew-normal distribution. Stat. Probab. Lett. 2003, 49, 395–401. [Google Scholar] [CrossRef]
  7. Kim, H.J. On a class of multivariate normal selection priors and its applications in Bayesian inference. J. Korean Stat. Soc. 2011, 40, 63–73. [Google Scholar] [CrossRef]
  8. Kim, H.J. A measure of uncertainty regarding the interval constraint of normal mean elicited by two stages of a prior hierarchy. Sic. World J. 2014, 2014, 676545. [Google Scholar] [CrossRef] [PubMed]
  9. Kim, H.J.; Choi, T. On Bayesian estimation of regression models subject to uncertainty about functional constraints. J. Korean Stat. Soc. 2015, 43, 133–147. [Google Scholar] [CrossRef]
  10. Kim, H.J.; Choi, T.; Lee, S. A hierarchical Bayesian regression model for the uncertain functional constraint using screened scale mixture of Gaussian distributions. Statistics 2016, 50, 350–376. [Google Scholar] [CrossRef]
  11. Arellano-Valle, R.B.; Branco, M.D.; Genton, M.G. A unified view on skewed distributions arising from selection. Can. J. Stat. 2006, 34, 581–601. [Google Scholar] [CrossRef]
  12. Kim, H.J. A class of weighted multivariate normal distributions and its properties. J. Multivar. Anal. 2008, 99, 1758–1771. [Google Scholar] [CrossRef]
  13. Jaynes, E.T. Prior probabilities. IEEE Trans. Syst. Sci. Cybern. 1968, 4, 227–241. [Google Scholar] [CrossRef]
  14. Jaynes, E.T. Papers on Probability, Statistics, and Statistical Physics; Rosenkrantz, R.D., Ed.; Reidel: Boston, MA, USA, 1983. [Google Scholar]
  15. Smith, S.R.; Grandy, W. Maximum-Entropy and Bayesian Methods in Inverse Problems; Reidel: Boston, MA, USA, 2013. [Google Scholar]
  16. Ishwar, P.; Moulin, P. On the existence of characterization of the Maxent distribution under general moment inequality constraints. IEEE Trans. Inf. Theory 2005, 51, 3322–3333. [Google Scholar] [CrossRef]
  17. Rosenkrantz, R.D. Inference, Method, and Decision: Towards a Bayesian Philosophy and Science; Reidel: Boston, MA, USA, 1977. [Google Scholar]
  18. Rosenkrantz, R.D. (Ed.) E.T. Jaynes: Papers on Probability, Statistics, and Statistical Physics; Kluwer Academic: Dordrecht, The Netherlands, 1989.
  19. Yuen, K.V. Bayesian Methods for Structural Dynamics and Civil Engineering; John Wiley & Sons: Singapore, Singapore, 2010. [Google Scholar]
  20. Wu, N. The Maximum Entropy Method; Springer: New York, NY, USA, 2012. [Google Scholar]
  21. Cercignani, C. The Boltzman Equation and Its Applications; Springer: Berlin/Heidelberg, Germany, 1988. [Google Scholar]
  22. Leonard, T.; Hsu, J.S.J. Bayesian Methods: An Analysis for Statisticians and Interdisciplinary Researchers; Cambridge University Press: New York, NY, USA, 1999. [Google Scholar]
  23. Kim, H.J.; Kim, H.-M. A Class of Rectangle-Screened Multivariate Normal Distributions and Its Applications. Statisitcs 2015, 49, 878–899. [Google Scholar] [CrossRef]
  24. Press, S.J. Applied Multivariate Analysis, 2nd ed.; Dover Publications, Inc.: New York, NY, USA, 2005. [Google Scholar]
  25. Wilhelm, S.; Manjunath, B.G. Tmvtnorm: Truncated Multivariate Normal Distribution and Student t Distribution. Available online: http://CRAN.R-project.org/package=tmvtnorm (accessed on 17 May 2016).
  26. Genz, A.; Bretz, F. Computation of Multivariate Normal and t Probabilities; Springer: New York, NY, USA, 2009. [Google Scholar]
  27. Gupta, S.D. A note on some inequalities for multivariate normal distribution. Bull. Calcutta Stat. Assoc. 1969, 18, 179–180. [Google Scholar]
  28. Lindly, D.V. Bayesian Statistics: A Review; SIAM: Philadelphia, PA, USA, 1970. [Google Scholar]
  29. Khuri, A.I. Advanced Calculus with Applications in Statistics; John Wiley & Son: New York, NY, USA, 2003. [Google Scholar]
  30. Gamerman, D.; Lopes, H.F. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference, 2nd ed.; Chapman and Hall: New York, NY, USA, 2006. [Google Scholar]
  31. Branco, M.D. A general class of multivariate skew-elliptical distributions. J. Multivar. Anal. 2001, 79, 99–113. [Google Scholar] [CrossRef]
  32. Chen, M.-H.; Dey, D.K. Bayesian modeling of correlated binary response via scale mixture of multivariate normal link functions. Sankhyã 1998, 60, 322–343. [Google Scholar]
  33. Chib, S.; Greenberg, E. Understanding the Metropolis-Hastings algorithm. Am. Stat. 1995, 49, 327–335. [Google Scholar]
  34. Johnson, N.L.; Kotz, S.; Balakrishnan, N. Distribution in Statistics: Continuous Univariate Distributions, 2nd ed.; John Wiley & Son: New York, NY, USA, 1994; Volume 1. [Google Scholar]
  35. Johnson, R.A.; Wichern, D.W. Applied Multivariate Statistical Analysis, 6th ed.; Pearson Prentice Hall: London, UK, 2007. [Google Scholar]
  36. Mardia, K.V.; Kent, J.T.; Bibby, J.M. Multivariate Analysis; Academic Press: London, UK, 1979. [Google Scholar]
  37. Ntzoufras, I. Bayesian Modeling Using WinBUGS; John Wiley & Son: New York, NY, USA, 2009. [Google Scholar]
  38. Heidelberger, P.; Welch, P. Simulation run length control in the presence of an initial transient. Oper. Res. 1992, 31, 1109–1144. [Google Scholar] [CrossRef]
Figure 1. Graphs of the difference between p 1 = γ m a x , p 2 = γ t w o and p 3 = γ c o n s t . (a), (c), and (e) for the difference between p 3 and p 2 ; (b), (d), and (f) for the difference between p 2 and p 1 .
Figure 1. Graphs of the difference between p 1 = γ m a x , p 2 = γ t w o and p 3 = γ c o n s t . (a), (c), and (e) for the difference between p 3 and p 2 ; (b), (d), and (f) for the difference between p 2 and p 1 .
Entropy 18 00188 g001
Figure 2. Graphs of the entropy difference between E 1 = E n t ( π m a x ( θ ) ) , E 2 = E n t ( π t w o ( θ ) ) and E 3 = E n t ( π c o n s t ( θ ) ) for different values of δ [ 0 , 1 ] . (a), (c), and (e) for the difference between E 2 and E 3 ; (b), (d), and (f) for the difference between E 1 and E 2 .
Figure 2. Graphs of the entropy difference between E 1 = E n t ( π m a x ( θ ) ) , E 2 = E n t ( π t w o ( θ ) ) and E 3 = E n t ( π c o n s t ( θ ) ) for different values of δ [ 0 , 1 ] . (a), (c), and (e) for the difference between E 2 and E 3 ; (b), (d), and (f) for the difference between E 1 and E 2 .
Entropy 18 00188 g002
Table 1. Summaries of posterior samples of θ = ( θ 1 , θ 2 , θ 3 , θ 4 ) obtained by using three different priors; π t w o ( θ ) , π m a x ( θ ) and π c o n s t ( θ ) . HCN, hierarchical constrained normal.
Table 1. Summaries of posterior samples of θ = ( θ 1 , θ 2 , θ 3 , θ 4 ) obtained by using three different priors; π t w o ( θ ) , π m a x ( θ ) and π c o n s t ( θ ) . HCN, hierarchical constrained normal.
HCN ( π m a x ) _ HCN ( π t w o ) _ HCN ( π c o n s t ) _
Dataset I ( θ 1 θ 2 θ 3 θ 4 ) ( θ 1 θ 2 θ 3 θ 4 ) ( θ 1 θ 2 θ 3 θ 4 )
n = 20 _
true1.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
mean0.8530.8620.8690.8570.5250.5300.5420.5490.1410.1120.1390.121
s.d.0.2030.2120.1960.1940.1890.1870.1760.1670.0990.1110.0960.055
n = 200 _
true1.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
mean0.9780.9790.9810.9770.9120.9140.9160.9120.3420.2910.2940.299
s.d.0.0730.0720.0740.0680.0600.0670.0690.0640.0370.0580.0770.064
n = 20 _
true2.0002.000−2.000−2.0002.0002.000−2.000−2.0002.0002.000−2.000−2.000
mean1.8371.846−1.853−1.8421.3061.314−1.331−1.3490.5990.501−0.483−0.493
s.d.0.2150.2230.2080.2050.2950.2710.2620.2380.2320.2630.2910.141
n = 200 _
true2.0002.000−2.000−2.0002.0002.000−2.000−2.0002.0002.000−2.000−2.000
mean1.9771.979−1.980−1.9771.7821.784−1.785−1.7820.8840.856−0.795−0.733
s.d.0.0740.0720.0740.0680.0680.0650.0680.0620.0290.0510.0680.062
HC t 5 ( π t w o ) _ HC t 5 ( π m a x ) _ HC t 5 ( π c o n s t ) _
Dataset II ( θ 1 θ 2 θ 3 θ 4 ) ( θ 1 θ 2 θ 3 θ 4 ) ( θ 1 θ 2 θ 3 θ 4 )
n = 20 _
true1.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
mean0.6810.6910.6880.7740.3320.3680.4030.3610.1550.1570.1580.161
s.d.0.1720.1820.2010.2050.1980.1900.2180.1860.1080.1120.1200.099
n = 200 _
true1.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.0001.000
mean1.0030.9741.0331.0060.8020.8180.8140.8060.3620.4110.3410.351
s.d.0.1750.1780.1720.1990.0610.0650.0700.0810.0480.0520.0390.047
n = 20 _
true2.0002.000−2.000−2.0002.0002.000−2.000−2.0002.0002.000−2.000−2.000
mean1.7241.763−1.764−1.6750.8860.874−0.857−0.9240.4150.428−0.496−0.489
s.d.0.2390.2300.2450.2310.3240.3190.3430.3130.2010.1960.1930.204
n = 200 _
true2.0002.000−2.000−2.0002.0002.000−2.000−2.0002.0002.000−2.000−2.000
mean2.0181.943−1.991−2.1031.7021.699−1.715−1.6890.9670.986−0.965−0.959
s.d.0.0790.0820.0810.0730.0960.0990.0910.0890.0530.0490.0470.045
Table 2. Summary statistics for the car body assembly data. S-W, Shapiro–Wilk.
Table 2. Summary statistics for the car body assembly data. S-W, Shapiro–Wilk.
VariableMeans.d.S-Wp-Value
Y 1 −1.9962.7810.9590.083
Y 2 7.4265.3470.9890.926
Table 3. The posterior estimates and the convergence test results.
Table 3. The posterior estimates and the convergence test results.
δ γ t w o ParameterHCN ( π m a x ) HCN ( π c o n s t ) HCN ( π t w o ) s.d.MC Errorp-Value
θ 1 −1.874−1.321−1.6650.5810.0050.483
θ 2 7.0454.8146.3321.0890.0080.354
0.80.423 λ 11 7.5497.8637.6821.2310.0070.551
λ 12 −4.632−4.905−4.8191.6310.0050.671
λ 22 26.87225.02127.3514.9260.0130.352
θ 1 −1.874−1.321−1.5570.4960.0040.434
θ 2 7.0454.8145.9051.1120.0060.298
0.90.567 λ 11 7.5267.7817.9591.3170.0080.635
λ 12 −4.726−5.347−4.9891.5460.0060.712
λ 22 27.58725.34728.0124.8360.0120.384

Share and Cite

MDPI and ACS Style

Kim, H.-J. A Two-Stage Maximum Entropy Prior of Location Parameter with a Stochastic Multivariate Interval Constraint and Its Properties. Entropy 2016, 18, 188. https://0-doi-org.brum.beds.ac.uk/10.3390/e18050188

AMA Style

Kim H-J. A Two-Stage Maximum Entropy Prior of Location Parameter with a Stochastic Multivariate Interval Constraint and Its Properties. Entropy. 2016; 18(5):188. https://0-doi-org.brum.beds.ac.uk/10.3390/e18050188

Chicago/Turabian Style

Kim, Hea-Jung. 2016. "A Two-Stage Maximum Entropy Prior of Location Parameter with a Stochastic Multivariate Interval Constraint and Its Properties" Entropy 18, no. 5: 188. https://0-doi-org.brum.beds.ac.uk/10.3390/e18050188

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop