Multivariate Global-Local Priors for Small Area Estimation

Ghosh, Tamal; Ghosh, Malay; Maples, Jerry J.; Tang, Xueying

doi:10.3390/stats5030040

Open AccessFeature PaperArticle

Multivariate Global-Local Priors for Small Area Estimation

¹

Citibank, Tampa, FL 33610, USA

²

Department of Statistics, University of Florida, Gainesville, FL 32611, USA

³

United States Bureau of the Census, Washington, DC 20233, USA

⁴

Department of Mathematics, University of Arizona, Tucson, AZ 85721, USA

^*

Author to whom correspondence should be addressed.

Stats 2022, 5(3), 673-688; https://0-doi-org.brum.beds.ac.uk/10.3390/stats5030040

Submission received: 2 March 2022 / Revised: 19 July 2022 / Accepted: 21 July 2022 / Published: 25 July 2022

(This article belongs to the Special Issue Small Area Estimation: Theories, Methods and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

It is now widely recognized that small area estimation (SAE) needs to be model-based. Global-local (GL) shrinkage priors for random effects are important in sparse situations where many areas’ level effects do not have a significant impact on the response beyond what is offered by covariates. We propose in this paper a hierarchical multivariate model with GL priors. We prove the propriety of the posterior density when the regression coefficient matrix has an improper uniform prior. Some concentration inequalities are derived for the tail probabilities of the shrinkage estimators. The proposed method is illustrated via both data analysis and simulations.

Keywords:

global-local prior; small area estimation; shrinkage estimators; concentration inequalities; hierarchical multivariate model; posterior density

1. Introduction

Small area estimation (SAE) has been gaining increasing popularity in recent years. Its need is felt both in the public and private sectors. By now, it is well recognized that SAE must be model-based due to the lack of sufficient samples in individual local areas. The most well-known area-level SAE model is due to Fay and Herriot [1], which essentially is a random effect model, each area level effect being treated as a random effect.

While the Fay–Herriot model has enjoyed wide popularity for nearly four decades, questions have been raised recently regarding the need to include random effects for all areas. Datta et al. [2] were the first to address this problem. They suggested a preliminary test-based approach where the null hypothesis was that the common random effect variance was zero. The test was based on a discrepancy statistic measuring the lack of fit of the fixed effect model. Their proposal was to use a fixed or a random effect model based on the acceptance or rejection of the null hypothesis. Further results in this direction are due to Molina et al. [3] and Morales et al. [4].

The method performs well when the number of small areas is moderately large, but, as often happens in practice, the number of small areas is very large, for example, when one considers all counties in the United States. In such situations, even if the regression estimates can describe the small area means very well in most of the small areas, the null hypothesis of no random effects is still very likely to be rejected. This phenomenon appears because, for a few areas, the direct estimates deviate from the regression estimates significantly, even after taking into account the large sampling errors. This problem was first realized by Datta and Mandal [5]. They proposed to model random effects through a mixture of a point mass at zero and a zero-mean normal distribution. Such priors belong to the general class of spike-and-slab priors. The point mass part is suitable for areas where fixed effect models are adequate, while the normal distribution part models random effects when this is not the case.

In contrast to the spike-and-slab priors of Datta and Mandal [5], Tang et al. [6] used global-local shrinkage priors for random effects in small area estimation, which captured wide area-level variation when the number of small areas was very large. These global-local priors employ two levels of parameters, global and local parameters, to express variances of area-specific random effects so that both small and large random effects can be captured properly. The global parameter causes shrinkage on all random effects to capture the small random effects, while the local parameters try to avoid over-shrinkage for areas that need large random effects. The degree of this neutralizing effect is closely related to the tails of the priors for the local parameters. If the prior is appropriately heavy-tailed, then both small and large random effects can be well captured. Moreover, one of the virtues of global-local priors is that they enable one to assess the individual area-level effects, rather than the blanket dichotomization of zero or non-zero effects.

The objective of this paper is to generalize the arguments of Tang et al. [6] to a multidimensional SAE model. There are many instances when one needs multivariate SAE models. A classic example is the simultaneous estimation of the median income of three-, four- and five-person families initiated in Fay [7], and followed up later by Ghosh et al. [8]. A second example is the estimation of unemployment rates, as considered in Datta et al. [9]. A third example is the adjustment of census undercounts considered in Datta et al. [10]. Unlike this paper, the above articles model the area-level effects with only one global variance matrix. In many cases, there is built-in dependence among the direct estimates for different components, which demands the multivariate model.

Returning to the present paper, a hierarchical prior is introduced in Section 2, where we have proven the propriety of the resulting posterior under some conditions. Section 3 discusses the implementation of the proposed method via Markov chain Monte Carlo, and several priors for local parameters are listed in Section 3.2. Concentration inequalities related to the tail behavior of the posterior are given in Section 4. A real data example is considered in Section 5, while Section 6 provides some simulation results. Final remarks are made in Section 7.

The global-local shrinkage priors were introduced in a series of articles by Carvalho et al. [11], Polson and Scott [12], Polson and Scott [13], Polson and Scott [14], Polson and Scott [15] and Scott [16]. They have been extended into a richer class. Some recent inventions are the three-parameter beta normal (TPBN) priors by Armagan et al. [17] and the generalized double Pareto (GDP) priors by Armagan et al. [18]. TPBN itself is a large class and includes the now famous horseshoe (HS) prior Carvalho et al. [11], the normal-exponential-gamma (NEG) priors Griffin and Brown [19] and the Strawderman–Berger (SB) priors (Strawderman [20], Berger [21]). These priors have been used successfully in multiple testing by Datta and Ghosh [22], Ghosh et al. [23], and also in other contexts. GL priors can be further classified into polynomial-tailed priors and exponential-tailed priors according to the tail behavior of the local parameters.

2. The Hierarchical Model

We begin with a multivariate analog of the model proposed in Tang et al. [6].

\begin{matrix} y_{i} = θ_{i} + e_{i}, θ_{i} = B x_{i} + u_{i}, e_{i} \sim N_{k} (0, V_{i}), π (B) = 1, \\ u_{i} | λ_{i}^{2}, Σ \sim N_{k} (0, λ_{i}^{2} Σ), λ_{i}^{2} \overset{i i d}{\sim} π_{λ^{2}}, Σ \sim π_{Σ}, i = 1, 2, \dots, m, \end{matrix}

(1)

where the direct survey estimators

y_{i}

, the corresponding means

θ_{i}

, the random effects

u_{i}

and the error term

e_{i}

are all k-dimensional vectors for the i-th area;

B

is a

k \times p

matrix of regression coefficients; and m is the number of small areas. The variance matrices

V_{i}

s are assumed to be known to avoid nonidentifiability.

λ_{i}^{2}

are local parameters and

Σ

is a

k \times k

matrix for the global parameter. All error terms

e_{i}

and random effects

u_{i}

are independent random variables. It is assumed that all the

V_{i}

are positive definite. Moreover, the smallest eigenvalues of all the

V_{i}

are bounded below by

v_{m}

, while the corresponding largest eigenvalues are bounded above by

v_{M}

, where

0 < v_{m} \leq v_{M} < \infty

.

We introduce the notations

Y = {(y_{1}, \dots, y_{m})}^{⊤}

,

X = {(x_{1}, \dots, x_{m})}^{⊤}

,

e = {(e_{1}, \dots, e_{m})}^{⊤}

,

θ = {(θ_{1}, \dots, θ_{m})}^{⊤}

and

u = {(u_{1}, \dots, u_{m})}^{⊤}

. In matrix notations,

Y = X B^{⊤} + e + u

.

Thus,

v e c (e^{⊤}) = {[e_{1, 1}, \dots, e_{m, 1}, e_{1, 2}, \dots, e_{m, 2}, \dots, e_{1, n}, \dots, e_{m, n}]}^{⊤} \sim N_{m k} (0, V)

, where

V = d i a g (V_{1} \dots V_{m})

. We assume rank

(X) = p (\leq m)

. Moreover, we will write

D_{λ} = d i a g (λ_{1}^{2}, \dots, λ_{m}^{2})

. Hence,

v e c (θ^{⊤}) \sim N_{m k} (v e c ({(X B^{⊤})}^{⊤}), D_{λ} \otimes Σ)

. Since an improper prior density has been used for

B

, we need to find conditions under which the posterior density is proper. To this end, we first define a matrix normal distribution.

Definition 1.

A random matrix

Z

is said to have the matrix-normal density if

Z

has the density function (on the space

R^{m \times k}) :

f (Z) = \frac{exp (- \frac{1}{2} tr [W^{- 1} (Z - M) R^{- 1} {(Z - M)}^{⊤}])}{{(2 π)}^{m k / 2} {| R |}^{m / 2} {| W |}^{k / 2}},

where

M \in R^{m \times k}

, and

W

and

R

are positive definite matrices of dimension

m \times m

and

k \times k

, respectively. We write

Z \sim {MN}_{m \times k} (M, W, R)

. Thus, the random effect matrix

u \sim {MN}_{n \times k} (0, D_{λ}, Σ)

.

We establish the propriety of the posterior under arbitrary proper priors for the local parameters

λ_{i}^{2}

and the global parameter

Σ

in the following theorem, whose proof is given in Appendix A.1. In Section 3, we will have particular choices of priors for the

λ_{i}^{2}

and

Σ

for actual implementation.

Theorem 1.

The posterior distribution corresponding to the (1) is proper if both

π_{λ_{i}^{2}} (λ_{i}^{2})

,

i = 1, \dots, m

and

π_{Σ} (Σ)

are proper.

In the next section, we will consider several choices of priors for the local parameters

λ_{i}^{2}

. Throughout, for actual implementation, we will consider the inverse Wishart prior

π_{Σ} (Σ) \propto {| Σ |}^{- (1 / 2) (ν_{0} + k + 1)} exp [tr (Ψ Σ^{- 1})],

which we symbolically denote as

W^{- 1} (Ψ, ν_{0})

.

3. Computation and Local Prior Selection

3.1. Computation

In this section, we have derived the full conditionals for Gibbs sampling. It is convenient to compute all the conditionals in terms of the

θ_{i}

rather than the

u_{i}

. Since

θ_{i} = B x_{i} + u_{i}

, which is a one–one linear transformation from

(θ_{i}, B) \mapsto (u_{i}, B)

, and the Jacobian matrix is constant and depends only on

X

, we can rewrite the joint density of our model in (1) as

\begin{matrix} \begin{matrix} π (B, Σ, θ, λ^{2} & | Y) \propto exp [- \frac{1}{2} \sum_{i = 1}^{m} {(y_{i} - θ_{i})}^{⊤} V_{i}^{- 1} (y_{i} - θ_{i})] \frac{1}{{| Σ |}^{m / 2} {(λ_{1} \dots λ_{m})}^{k}} \end{matrix} \\ \times exp [- \sum_{i = 1}^{m} \frac{1}{2 λ_{i}^{2}} {(θ_{i} - B x_{i})}^{⊤} Σ^{- 1} (θ_{i} - B x_{i})] π_{Σ} (Σ) π_{λ_{i}^{2}} (λ_{1}^{2}) \dots π_{λ_{m}^{2}} (λ_{m}^{2}) . \end{matrix}

(2)

To compute the conditional distribution for

B

, we need to simplify the following expression:

\begin{matrix} \sum_{i = 1}^{m} \frac{1}{λ_{i}^{2}} & {(θ_{i} - B x_{i})}^{⊤} Σ^{- 1} (θ_{i} - B x_{i}) = \sum_{i = 1}^{m} \frac{1}{λ_{i}^{2}} t r (Σ^{- 1} (θ_{i} - B x_{i}) {(θ_{i} - B x_{i})}^{⊤}) \\ = t r [Σ^{- 1} \sum_{i = 1}^{m} (\frac{θ_{i} θ_{i}^{⊤}}{λ_{i}^{2}} + \frac{B x_{i} x_{i}^{⊤} B^{⊤}}{λ_{i}^{2}} - \frac{B x_{i} θ_{i}^{⊤}}{λ_{i}^{2}} - \frac{θ_{i} x_{i}^{⊤} B^{⊤}}{λ_{i}^{2}})] \\ = t r [Σ^{- 1} (θ^{⊤} D_{λ}^{- 1} θ + B X^{⊤} D_{λ}^{- 1} X B^{⊤} - B X^{⊤} D_{λ}^{- 1} θ - θ^{⊤} D_{λ}^{- 1} X B^{⊤})] \\ = t r [Σ^{- 1} ((B - M) (X^{⊤} D_{λ}^{- 1} X) {(B - M)}^{⊤} + θ^{⊤} (D_{λ}^{- 1} - P_{λ}) θ)], \end{matrix}

(3)

where

M = θ^{⊤} D_{λ}^{- 1} X {(X^{⊤} D_{λ}^{- 1} X)}^{- 1}

and

P_{λ} = D_{λ}^{- 1} X {(X^{⊤} D_{λ}^{- 1} X)}^{- 1} X^{⊤} D_{λ}^{- 1}

.

Using Equation (3), we can rewrite (2) as

\begin{matrix} π & (B, Σ, θ, λ^{2} | Y) \propto exp [- \frac{1}{2} \sum_{i = 1}^{m} {(y_{i} - θ_{i})}^{⊤} V_{i}^{- 1} (y_{i} - θ_{i})] \frac{1}{{| Σ |}^{m / 2} {(λ_{1} \dots λ_{m})}^{k}} π_{λ^{2}} (λ_{1}^{2}) \dots π_{λ^{2}} (λ_{m}^{2}) \\ \times exp [- \frac{1}{2} t r (Σ^{- 1} ((B - M) (X^{⊤} D_{λ}^{- 1} X) {(B - M)}^{⊤} + θ^{⊤} (D_{λ}^{- 1} - P_{λ}) θ)] π_{Σ} (Σ) \end{matrix}

(4)

Using (2) and (4), one can obtain the Gibbs sampler designed to sample from the posterior density. Full conditional distributions are given below.

$θ_{i} | B, λ^{2}, Σ, Y \overset{i n d}{\sim} N_{k} ({(I + V_{i} Σ^{- 1} / λ_{i}^{2})}^{- 1} (y_{i} + V_{i} Σ^{- 1} B x_{i} / λ_{i}^{2}), {(I + V_{i} Σ^{- 1} / λ_{i}^{2})}^{- 1} V_{i})$
$B | θ, λ^{2}, Σ, Y \sim {MN}_{n \times k} (M, Σ, {(X^{⊤} D_{λ}^{- 1} X)}^{- 1})$
$Σ | θ, B, λ^{2}, Y \sim W^{- 1} (Ψ + {(θ - X B^{⊤})}^{⊤} D_{λ}^{- 1} (θ - X B^{⊤}), m + ν_{0})$

To complete the computation, in the next section, we introduce several widely used local priors, and then discuss some specifics regarding their implementation.

3.2. Local Prior Selection

We have provided a short list of widely used priors for the local parameters in Table 1. The priors

π (u_{i} | Σ)

are derived after marginalizing out the local parameters

λ_{i}^{2}

.

To facilitate the discussion of computation, we further group the priors in Table 1 into generalized inverse Gaussian priors and normal beta prime priors.

Generalized Inverse Gaussian (GIG): $λ_{i}^{2} \overset{i n d}{\sim} G I G (δ, χ, ψ)$ if $π_{λ}^{2} (λ_{i}^{2}) \propto λ_{i}^{2 (δ - 1)} exp (- χ / 2 λ_{i}^{2} - ψ λ_{i}^{2} / 2)$ . Thus, $λ_{i}^{2} | θ, B, Σ, Y \overset{i n d}{\sim} G I G (δ - k / 2, χ + {(θ_{i} - B x_{i})}^{⊤} Σ^{- 1} (θ_{i} - B x_{i}), ψ)$ . Using this, we can see that if $λ_{i}^{2}$ is exponential, viz. $π_{λ}^{2} (λ_{i}^{2}) \propto exp (- λ_{i}^{2})$ , then $λ_{i}^{2} | θ, B, Σ, Y \overset{i n d}{\sim} G I G (1 - k / 2, {(θ_{i} - B x_{i})}^{⊤} Σ^{- 1} (θ_{i} - B x_{i}), 2)$ . The Laplace prior is a special case of GIG priors with $δ = 1$ , $χ = 0$ and $ψ = 2$ .
Normal Beta Prime (NBP): $π_{λ^{2}} (λ_{i}^{2}) \propto λ_{i}^{2 (a - 1)} {(1 + λ_{i}^{2})}^{- a - b}, a > 0, b > 0$
One obtains NBP prior $π_{λ^{2}} (λ_{i}^{2}) \propto λ_{i}^{2 (a - 1)} {(1 + λ_{i}^{2})}^{- a - b}$ by introducing a latent variable $ξ_{i} \sim G a m m a (b, 1)$ and $π (λ_{i}^{2} | ξ_{i}) \propto {(λ_{i}^{2})}^{a - 1} ξ_{i}^{a} exp (- ξ_{i} λ_{i}^{2})$ . Therefore, $λ_{i}^{2} | θ, B, Σ, ξ, Y \overset{i n d}{\sim} G I G (a - k / 2, {(θ_{i} - B x_{i})}^{⊤} Σ^{- 1} (θ_{i} - B x_{i}), 2 ξ_{i})$ and $ξ_{i} | θ, B, Σ, λ, Y \overset{i n d}{\sim} G a m m a (a + b, λ_{i}^{2} + 1)$ . By setting different values for a and b, we will obtain Strawderman–Berger (SB), horseshoe (HS) and negative exponential gamma (NEG) as special cases. For NEG, we choose b = 0.75 in our Data Analysis section.

We will study Laplace and normal beta prime priors in Section 4 to illustrate our theoretical results. For the simulation study and real data analysis, we will use Laplace (LA), Strawderman–Berger (SB), horseshoe (HS) and negative exponential gamma (NEG,

b = 0.75

) as local priors in our multivariate models.

4. Shrinkage Factor

Despite their distinct forms, these multivariate GL shrinkage priors possess a common feature—the ability to assign nontrivial probability mass both near zero and in the tail—which enables our multivariate GL model to capture both small and large random effects based on data. To see this, first note that, given the local and global parameters, the conditional posterior mean of the small area mean

θ_{i}

shrinks the direct estimate

y_{i}

toward the synthetic regression estimate

B x_{i}

as

\begin{matrix} E (θ_{i} | B, λ^{2}, Σ, Y) = {(I + V_{i} Σ^{- 1} / λ_{i}^{2})}^{- 1} (y_{i} + V_{i} Σ^{- 1} B x_{i} / λ_{i}^{2}), \end{matrix}

(5)

where

S_{G L, i} = {(I + V_{i} Σ^{- 1} / λ_{i}^{2})}^{- 1} V_{i} Σ^{- 1} / λ_{i}^{2}

is called a shrinkage factor. A larger (smaller) shrinkage factor causes more (less) shrinkage and produces an estimate closer to the synthetic estimate (direct estimate). Here,

{∥ . ∥}_{F}

denotes the Frobenius norm of a matrix in this section.

Theorem 2.

Suppose

π (λ_{i}^{2})

is a proper pdf with support

(0, \infty)

. Then, for

0 < ϵ < 1

,

\begin{matrix} P (∥ S_{G L, i} ∥_{F} < ϵ) \leq \frac{| V_{i} {+ g Σ |}^{1 / 2}}{| V_{i} |^{1 / 2}} exp [{(y_{i} - B x_{i})}^{⊤} V_{i}^{- 1} (y_{i} - B x_{i}) / 2] \frac{\int_{c / {∥ Σ ∥}_{F}}^{\infty} π (λ_{i}^{2}) d λ_{i}^{2}}{\int_{0}^{g} π (λ_{i}^{2}) d λ_{i}^{2}}, \end{matrix}

(6)

where

c = k (\frac{1}{ϵ} - 1) \frac{1}{∥ V_{i}^{- 1} ∥_{F}}

and

g (> 0)

an arbitrary positive constant.

The proof of the theorem is provided in Appendix A.2. This theorem leads immediately to the result

P (∥ S_{G L, i} ∥_{F} < ϵ) \to 0

as

{∥ Σ ∥}_{F} \to 0

. We now illustrate this result with some of the global-local priors.

Example 1.

Consider the Laplace prior

π (λ_{i}^{2}) = exp (- λ_{i}^{2})

. Then,

\frac{\int_{c / {∥ Σ ∥}_{F}}^{\infty} π (λ_{i}^{2}) d λ_{i}^{2}}{\int_{0}^{g} π (λ_{i}^{2}) d λ_{i}^{2}} = \frac{exp (- c / ∥ Σ ∥_{F})}{1 - exp (- g)},

which converges to zero at an exponential rate when

{∥ Σ ∥}_{F} \to 0

.

Example 2.

Consider the NEG prior

π (λ_{i}^{2}) \propto {(λ_{i}^{2})}^{a - 1} {(1 + λ_{i}^{2})}^{- a - b}

. Then, for

a \geq 1

and by variable transformation

u = \frac{λ_{i}^{2}}{1 + λ_{i}^{2}}

, we obtain

\int_{c / {∥ Σ ∥}_{F}}^{\infty} π (λ_{i}^{2}) d λ_{i}^{2} = \int_{c / (∥ Σ ∥_{F} + c)}^{1} u^{a - 1} {(1 - u)}^{b - 1} d u / B (a, b) \leq \frac{1}{b} {[\frac{{∥ Σ ∥}_{F}}{{∥ Σ ∥}_{F} + c}]}^{b} / B (a, b)

which again converges to zero as

{∥ Σ ∥}_{F} \to 0

. For

0 < a < 1

,

\begin{matrix} \int_{c / {∥ Σ ∥}_{F}}^{\infty} π (λ_{i}^{2}) d λ_{i}^{2} \leq {[\frac{c}{{∥ Σ ∥}_{F} + c}]}^{a - 1} \frac{1}{b} {[\frac{{∥ Σ ∥}_{F}}{{∥ Σ ∥}_{F} + c}]}^{b} \to 0 as {∥ Σ ∥}_{F} \to 0 . \end{matrix}

Theorem 3.

Suppose that the prior

π (λ_{i}^{2})

has support

(0, \infty) .

Then, for

ϵ > 0

,

P (∥ S_{G L, i} ∥_{F} > ϵ | B, Σ, y_{i}) \leq K_{i} \frac{\int_{0}^{c_{1} / s_{m}} π (λ_{i}^{2}) d λ_{i}^{2}}{\int_{0}^{c_{1} d / s_{M}} π (λ_{i}^{2}) d λ_{i}^{2}}

, where

c_{1} = \sqrt{k} {∥ V_{i} ∥}_{F} / ϵ

,

d (> 0)

is a positive constant and

K_{i} = | V_{i} + c_{1} d I_{k} |^{1 / 2} {| V_{i} |}^{- 1 / 2} exp [(1 / 2) {(y_{i} - B x_{i})}^{⊤} V_{i}^{- 1} (y_{i} - B x_{i})]

, where

s_{m}

and

s_{M}

are the smallest and largest eigenvalues of Σ, respectively.

The proof of the theorem is provided in Appendix A.3.

Remark 1.

It is only by proper choice

d (> 0)

that we obtain

P (∥ S_{G L, i} ∥_{F} > ϵ) \to 0

as

∥ Σ^{- 1} ∥_{F} \to 0

in the examples to consider. Moreover, we need to assume that

\underset{s_{m} \to \infty}{lim sup} s_{M} / s_{m} < \infty

or, alternatively, we can rewrite this assumption as

\underset{∥ Σ^{- 1} ∥_{F} \to 0}{lim sup} \frac{Largest eigen value of Σ}{Smallest eigen value of Σ} < \infty

.

We now apply Theorem 3 for Examples 1 and 2.

Example 3.

This example is the continuation of Example 1. The ratio

\frac{\int_{0}^{c_{1} / s_{m}} π (λ_{i}^{2}) d λ_{i}^{2}}{\int_{0}^{c_{1} d / s_{M}} π (λ_{i}^{2}) d λ_{i}^{2}}

simplifies to

\frac{1 - exp (- c_{1} M / s_{m})}{1 - exp (- c_{1} d M / s_{M})} < exp

[c_{1} M (d / s_{M} - 1 / s_{m})]

, which converges to zero for

d > w

as

∥ Σ^{- 1} ∥_{F} \to 0

, where

w = \underset{∥ Σ^{- 1} ∥_{F} \to 0}{lim sup} s_{M} / s_{m}

.

Example 4.

This example is the continuation of Example 2.

\frac{\int_{0}^{c_{1} / s_{m}} π (λ_{i}^{2}) d λ_{i}^{2}}{\int_{0}^{c_{1} d / s_{M}} π (λ_{i}^{2}) d λ_{i}^{2}} = R

, say, where

R = \frac{\int_{0}^{c_{1} / (s_{m} + c_{1})} u^{a - 1} {(1 - u)}^{b - 1} d u}{\int_{0}^{c_{1} d / (s_{M} + c_{1} d)} u^{a - 1} {(1 - u)}^{b - 1} d u}

. For

b \geq 1

,

\begin{matrix} R \leq \frac{{(\frac{c_{1}}{s_{m} + c_{1}})}^{a}}{{(\frac{c_{1} d}{s_{M} + c_{1} d})}^{a} {(\frac{s_{M}}{s_{M} + c_{1} d})}^{b - 1}} = d^{- a} \frac{{(\frac{s_{M} / s_{m} + c_{1} d / s_{m}}{1 + c_{1} / s_{m}})}^{a}}{{(\frac{s_{M}}{s_{M} + c_{1} d})}^{b - 1}} \end{matrix}

Now, for

b < 1

,

\begin{matrix} R \leq \frac{{(\frac{c_{1}}{s_{m} + c_{1}})}^{a} {(\frac{s_{m}}{s_{m} + c_{1}})}^{b - 1}}{{(\frac{c_{1} d}{s_{M} + c_{1} d})}^{a}} = d^{- a} {(\frac{s_{M} / s_{m} + c_{1} d / s_{m}}{1 + c_{1} / s_{m}})}^{a} {(\frac{s_{m}}{s_{m} + c_{1}})}^{b - 1} \end{matrix}

Hence,

\underset{∥ Σ^{- 1} ∥_{F} \to 0}{lim sup} R < d^{- a} w^{a}

, where

w = \underset{∥ Σ^{- 1} ∥_{F} \to 0}{lim sup} s_{M} / s_{m}

. For any arbitrarily small δ, choose

d (> 0)

such that

d^{- a} w^{a} < δ

. Therefore,

P (∥ S_{G L, i} ∥_{F} > ϵ | B, Σ, y_{i}) \to 0

as

∥ Σ^{- 1} ∥_{F} \to 0

.

Theorem 4.

Suppose

π_{λ_{i}^{2}} (λ_{i}^{2})

is a proper prior with support on

(0, \infty)

; then,

P (∥ S_{G L, i} ∥_{F} > ϵ | B, Σ, y_{i}) \leq c_{3} e x p (- d ∥ y_{i} - B x_{i} ∥_{2}^{2} / 2)

, where

c_{3}

and d are some positive finite constants.

The proof of Theorem 4 is given in Appendix A.4 and this theorem immediately gives us

P (∥ S_{G L, i} ∥_{F} > ϵ | B, Σ, y_{i}) \to 0

as

∥ y_{i} - B x_{i} ∥_{2} \to \infty

.

5. Data Analysis

We consider the problem of estimating median income. The Census Bureau releases estimates of median household income for many different demographic and geographic subgroups. A model-based approach is needed to obtain finer breakdowns (by demographics and/or geography) while still maintaining adequate precision. We applied our method to estimate a four-dimensional response, which is the median income of the homeowner, renter, married and unmarried populations. These groups are determined by the head of the household, and some of these groups overlap each other, e.g., a homeowner could be both married and an owner. The dataset was compiled using the one-year public-use microsample from the 2015 American Community Survey (data available at https://data.census.gov, accessed on 1 May 2019). The direct estimates are obtained at state level, including the District of Columbia. Thus, the number of small areas m is 51. The descriptive statistics are presented in Table 2. Besides the intercept, per-capita income is included as the covariate (data available at https://bea.gov, accessed on 1 May 2019). Moreover, we have scaled down the values of the data by dividing both the direct estimates and the covariate by 1000. The error variance–covariance matrices,

V_{i}, i = 1, \dots, m

, are also rescaled accordingly.

In our data analysis, we consider model (1) with five different choices of the priors on the variance–covariance matrix of

u_{i}

. In the first model, we assume

λ_{i}^{2} = 1

for

i = 1, \dots, m

. Thus, the random effects

u_{i}

share a common variance–covariance matrix

Σ

across different small areas. It is essentially a multivariate Fay–Herriot model. The remaining four models vary in the prior of the local parameters

λ_{i}^{2}

. We consider the LA, HS, SB, and NEG priors, which are stated in Table 1. Among the four choices, HS, SB, and NEG are special cases of the normal beta prime family and have polynomial tails. In all five models, we used an inverse-Wishart prior for

Σ

with degrees of freedom

ν_{0} = 4

and

Ψ = I

. For each of these models, we run the Gibbs sampler described in Section 3 for 20,000 iterations. The first half is discarded as the burn-in period. The samples from the remaining 10,000 iterations are used for inference. The small area means

θ_{i}

and the random effects

u_{i}

are estimated by the corresponding posterior sample means.

We first examine the estimated random effects from different models. Figure 1 presents the posterior means of random effects

u_{i}

. Since the results from the three models with polynomial-tailed local priors are similar, we only present the results for HS as a representative. For most of the states, the estimated random effects from different models do not differ much. However, significant differences are seen in the District of Columbia, especially for the married and the renter groups. There is also a visible difference in Connecticut for the renter and the unmarried groups. When there is a difference, the estimated random effects from the LA model are usually smaller than those from the HS model and larger than those from the FH model. This observation demonstrates that polynomial-tailed priors lead to less shrinkage towards zero when compared with exponential-tailed priors.

Figure 2 presents the estimated

θ_{i}

from two GL models (LA and HS) against those from the FH model. For most of the states, the estimates agree for all four subgroups. For the District of Columbia, the estimates for the married group from the GL models, especially the HS model, are higher than those from the FH model. It is the only state or district where the estimated

θ_{i}

from the HS model differs from that from the FH model by more than 5%. It is not surprising that DC differs from the rest of the states since this area is more similar to large urban counties than other states, but it is typically included in state-level small area models for completeness of the entire U.S. population. The flexibility in our GL model is able to account for the difference through a larger random effect.

To select an appropriate model for the dataset, we use the deviation information criterion (DIC) proposed by Spiegelhalter et al. [24]. The simulation study in Section 6 shows that the model selected by DIC and the model producing the lowest deviation measurements often give comparable estimates of small area means. For the real median income dataset, the DICs for the FH, LA, HS, SB, and NEG models are 1018.30, 967.79, 957.10, 957.46, and 959.48, respectively. The HS model has the smallest DIC and is thus selected. Table 3 gives the estimated median income from the HS model for the four subgroups in each state.

6. Simulation Study

In this section, we investigate the performance of different priors for estimating the small area means on simulated datasets. In the simulation study, data are generated according to

y_{i} = θ_{i} + e_{i}

, where

e_{i} \sim N_{k} (0, V_{i})

and

θ_{i} = B x_{i} + u_{i}

for

i = 1, \dots, m

. We first consider a data generation setting similar to the real dataset in Section 5. We set

m = 51

and

k = 4

. The per-capita income (pci) from the real dataset is used as the covariate. The

X = {(x_{1}, \dots, x_{m})}^{⊤}

matrix consists of centered pci and a column of ones, so

p = 2

. The coefficient matrix

B

is set at the least square estimate computed from the real data. The error variance matrices

V_{i}, i = 1, \dots, m

are also borrowed from the real dataset. Three models are considered for generating

u_{i}

:

common variance: $u_{i} \sim N_{k} (0, Σ)$ with the element in the j-th row and l-th column of $Σ$ being ${0.3}^{| j - l |}$ ;
two-component: $u_{i} \sim N_{k} (0, λ_{i}^{2} Σ)$ , where $λ_{i}^{2}$ is 0.01 or 10 with equal probability and $Σ$ is the same as in the common variance model;
multi-variance: $u_{i} \sim N_{k} (0, λ_{i}^{2} Σ)$ , where $λ_{i}^{2}, i = 1, \dots, m$ are equally spaced values from 0.01 to 10 and $Σ$ is the same as in the common variance model.

To investigate the performance when the number of small areas is large, we consider two additional choices of m, 500 and 1000. For both choices, k, p, and

B

are the same as in the previous setting. The

X

matrix contains a column of ones and a column of values generated from

N (0, σ^{2})

, where

σ^{2} = 68.65

is the sample variance of pci in the real dataset. For each small area,

V_{i}

is randomly sampled from the error variance matrices in the real dataset. The three models for generating

u_{i}

in the setting

m = 51

are also used in the settings of

m = 500

and 1000.

We generate 100 datasets for each combination of m and the model for

u_{i}

. For each dataset, the five models considered in Section 5 are fitted. For each model, the Gibbs sampler is run for 20,000 iterations. The samples from the first 10,000 iterations are discarded. Then,

θ_{i}

is estimated by the posterior sample mean computed from the remaining 10,000 samples. The estimation accuracy is evaluated using four deviance measures, average absolute deviation (AAD), average squared deviation (ASD), average absolute relative deviation (ARB), and average squared relative deviation (ASRB), which are defined as follows:

\begin{matrix} AAD = \frac{1}{m k} \sum_{i = 1}^{m} \sum_{j = 1}^{k} | {\hat{θ}}_{i j} - θ_{i j} |, ASD = \frac{1}{m k} \sum_{i = 1}^{m} \sum_{j = 1}^{k} {({\hat{θ}}_{i j} - θ_{i j})}^{2}, \\ ARB = \frac{1}{m k} \sum_{i = 1}^{m} \sum_{j = 1}^{k} | {\hat{θ}}_{i j} - θ_{i j} | / θ_{i j}, ASRB = \frac{1}{m k} \sum_{i = 1}^{m} \sum_{j = 1}^{k} {({\hat{θ}}_{i j} - θ_{i j})}^{2} / θ_{i j}^{2} . \end{matrix}

(7)

The averages of each deviation measure across 100 datasets are presented in Figure 3. Under the common variance setting, the FH model produces the smallest estimation errors in terms of all four measures. This is expected because the fitted model is the same as the data generation model. Although the GL models do not match the data generation model, their performance is similar to the FH model.

Under the other two settings for

u_{i}

, the GL models often have the best performance. Especially under the two-component setting, the FH model produces much higher deviation measures compared to GL models. Although the results from different GL models do not vary much under the two settings, the HS, SB, and NEG models have better performance than the LA model under the two-component setting, while the reverse is true under the multi-variance setting. The difference in the performance is caused by the tail of the prior on the local parameters. Polynomial-tailed priors often shrink the signals less than the exponential-tailed priors, so the models with polynomial-tailed priors (HS, SB, NEG) tend to perform better in the two-component case, where the signals (large random effects) and the noises (small random effects) are well separated.

We also investigate the performance of the model selected according to DIC. The results are also plotted in Figure 3. As the figure shows, the deviation measurements produced by the model selected by DIC are often close to the lowest among all models in each setting. This observation indicates that DIC usually selects a reasonable model for a given dataset.

7. Final Remarks

In this paper, we have addressed the situation of multivariate sparsity in random effects. We are able to estimate the random effects in income for different correlated groups and the correlation structure is used in the estimation process. We are able to derive concentration inequalities of the tail behavior of shrinkage factors. Our simulation study shows that the proposed multivariate GL estimators are very close to the truth. The data analysis shows that our multivariate GL model can identify the states/districts that have significantly large random effects.

Our simulation study shows that the prior that works best varies across data generation settings. Overall, priors with polynomial tails perform better when the small and large random effects are well separated, while the priors with exponential tails perform better when there are many intermediate random effects. We demonstrate that DIC can be used to select a suitable prior for a dataset.

There is potential for generalizing these results. In particular, in the proposed model, the same local parameter is assumed for different components of a small area. To account for heterogeneous dependence among components across areas, one can assume the area-level variance matrix for area i to be

σ^{2} Γ_{i}

instead of

λ_{i}^{2} Σ

. Moreover, one can address the same problem for unit-level models with global-local priors involving two variance components for area-level effects.

Moreover, independent random effects are considered in the proposed model. Recently, random effects with spatial or spatio-temporal dependence have been considered in small area estimation when suitable covariates for accounting for dependence in the geographic or time domain are not available (cf. Chung and Datta [25], Bradley et al. [26]). Combining the global-local shrinkage prior with spatial or spatio-temporal priors is also a potential future direction.

Author Contributions

Conceptualization, T.G. and M.G.; methodology, T.G. and M.G.;writing—original draft preparation, T.G. and M.G.; writing—review and editing, X.T. and M.G.; data curation, J.J.M.; supervision, M.G.; Software, T.G. and X.T.; validation, J.J.M. and M.G.; formal analysis; T.G., M.G. and X.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets are based on a one-year public-use microsample from the 2015 American Community Survey available at https://data.census.gov, accessed on 1 May 2019. The per-capita income dataset is available at https://bea.gov, accessed on 1 May 2019.

Acknowledgments

The authors are grateful to the editor and anonymous reviewer(s) for their constructive comments and suggestions, which greatly improved an earlier version of this article. Author Tamal Ghosh wants to dedicate his part of the work to his late father Chittaranjan Ghosh.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Proof of Theorem 1

Proof.

The joint posterior pdf is given by

\begin{matrix} π (B, Σ, u, λ^{2} | Y) & \propto exp [- \frac{1}{2} \sum_{i = 1}^{m} {(y_{i} - B x_{i} - u_{i})}^{⊤} V_{i}^{- 1} (y_{i} - B x_{i} - u_{i})] \frac{1}{{| Σ |}^{m / 2} {(λ_{1} \dots λ_{m})}^{k}} \\ \times exp [- \sum_{i = 1}^{m} \frac{1}{2 λ_{i}^{2}} u_{i}^{⊤} Σ^{- 1} u_{i}] π_{Σ} (Σ) π_{λ_{i}^{2}} (λ_{1}^{2}) \dots π_{λ_{m}^{2}} (λ_{m}^{2}) . \end{matrix}

(A1)

Writing

z_{i} = y_{i} - u_{i}

,

i = 1, \dots, m

,

\begin{matrix} \sum_{i = 1}^{m} {(y_{i} - u_{i} - B x_{i})}^{⊤} V_{i}^{- 1} (y_{i} - u_{i} - B x_{i}) & = \sum_{i = 1}^{m} t r (V_{i}^{- 1} (z_{i} - B x_{i}) {(z_{i} - B x_{i})}^{⊤}) \\ \geq \sum_{i = 1}^{m} t r (v_{M}^{- 1} (z_{i} - B x_{i}) {(z_{i} - B x_{i})}^{⊤}) . \end{matrix}

(A2)

Next, use the identity

\sum_{i = 1}^{m} (z_{i} - B x_{i}) {(z_{i} - B x_{i})}^{⊤} = (B - \hat{B}) (X^{⊤} X) {(B - \hat{B})}^{⊤} + Z^{⊤} (I - P_{X}) Z

, where

Z = {(z_{1} \dots z_{m})}^{⊤}, \hat{B} = Z^{⊤} X {(X^{⊤} X)}^{- 1} and P_{X} = X {(X^{⊤} X)}^{- 1} X^{⊤}

. The cross-product term vanishes since

\sum_{i = 1}^{m} (z_{i} - \hat{B} x_{i}) x_{i}^{⊤} = Z^{⊤} X - Z^{⊤} X = 0

. Hence, the left-hand side of (A2) is bounded below by

t r [v_{M}^{- 1} ((B - \hat{B}) (X^{⊤} X) {(B - \hat{B})}^{⊤})] .

Using (A1), we obtain

\begin{matrix} π (B, Σ, u, λ^{2} | Y) & \leq K exp [- \frac{1}{2 v_{M}} t r [(B - \hat{B}) (X^{⊤} X) {(B - \hat{B})}^{⊤}]] \frac{1}{{| Σ |}^{m / 2} {(λ_{1} \dots λ_{m})}^{k}} \\ \times exp [- \sum_{i = 1}^{m} \frac{1}{2 λ_{i}^{2}} u_{i}^{⊤} Σ^{- 1} u_{i}] π_{Σ} (Σ) π_{λ_{i}^{2}} (λ_{1}^{2}) \dots π_{λ_{m}^{2}} (λ_{m}^{2}) . \end{matrix}

(A3)

Equation (A3) is integrable with respect to

B

since it is the pdf of a matrix normal distribution with

W = I

and

R = {(X^{⊤} X)}^{- 1}

. After integrating out

B

, we obtain

\begin{matrix} π & (Σ, u, λ^{2} | Y) \leq K \frac{1}{{| Σ |}^{m / 2} {(λ_{1} \dots λ_{m})}^{k}} exp [- \sum_{i = 1}^{m} \frac{1}{2 λ_{i}^{2}} u_{i}^{⊤} Σ^{- 1} u_{i}] π_{Σ} (Σ) π_{λ_{i}^{2}} (λ_{1}^{2}) \dots π_{λ_{m}^{2}} (λ_{m}^{2}) . \end{matrix}

(A4)

Note that K is some generic constant.

Now, Equation (A4) is integrable with respect to each

u_{i}

since it has a normal density kernel. Thus,

\begin{matrix} π (Σ, λ^{2} | Y) \leq K π_{Σ} (Σ) π_{λ_{i}^{2}} (λ_{1}^{2}) \dots π_{λ_{m}^{2}} (λ_{m}^{2}) . \end{matrix}

(A5)

Hence, if the priors for

Σ

and

λ_{i}^{2}

are proper, then the posterior density is proper. □

Appendix A.2. Proof of Theorem 2

Proof.

Using the properties

{∥ A ∥}_{F} {∥ B ∥}_{F} \geq {∥ A B ∥}_{F}

and

∥ A^{- 1} ∥_{F} \geq 1 / {∥ A ∥}_{F}

of the Frobenius norm, one obtains

\begin{matrix} ∥ S_{G L, i} ∥_{F} & = ∥ {(λ_{i}^{2} Σ V_{i}^{- 1} + I_{k})}^{- 1} ∥_{F} \geq \frac{∥ I_{k} ∥_{F}}{∥ λ_{i}^{2} Σ V_{i}^{- 1} + I_{k} ∥_{F}} \geq \frac{∥ I_{k} ∥_{F}}{∥ λ_{i}^{2} Σ V_{i}^{- 1} ∥_{F} + {∥ I_{k} ∥}_{F}} \\ \geq \frac{k}{λ_{i}^{2} {∥ (Σ ∥}_{F} {∥ V_{i}^{- 1} ∥}_{F} + k} . \end{matrix}

(A6)

∥ S_{G L, i} ∥_{F} < ϵ \Rightarrow λ_{i}^{2} > k (\frac{1}{ϵ} - 1) \frac{1}{{∥ Σ ∥}_{F} {∥ V_{i}^{- 1} ∥}_{F}} = \frac{c}{{∥ Σ ∥}_{F}}

where

c = k (\frac{1}{ϵ} - 1) \frac{1}{∥ V_{i}^{- 1} ∥_{F}}

.

\begin{matrix} P & (∥ S_{G L, i} ∥_{F} < ϵ | B, Σ, y_{i}) \leq P (λ_{i}^{2} > \frac{c}{{∥ Σ ∥}_{F}} | B, Σ, y_{i}) \\ = & \frac{\int_{c / {∥ Σ ∥}_{F}}^{\infty} {| V_{i} + λ_{i}^{2} Σ |}^{- \frac{1}{2}} exp (- {(y_{i} - B x_{i})}^{⊤} {(V_{i} + λ_{i}^{2} Σ)}^{- 1} (y_{i} - B x_{i}) / 2) π_{λ_{i}^{2}} (λ_{i}^{2}) d λ_{i}^{2}}{\int_{0}^{\infty} {| V_{i} + λ_{i}^{2} Σ |}^{- \frac{1}{2}} exp (- {(y_{i} - B x_{i})}^{⊤} {(V_{i} + λ_{i}^{2} Σ)}^{- 1} (y_{i} - B x_{i}) / 2) π_{λ_{i}^{2}} (λ_{i}^{2}) d λ_{i}^{2}} \\ = & \frac{N}{D}, \end{matrix}

(A7)

where we define N and D as the numerator and denominator of (A7). Now,

\begin{matrix} N \leq \int_{c / {∥ Σ ∥}_{F}}^{\infty} | V_{i} |^{- \frac{1}{2}} exp (0) π_{λ_{i}^{2}} (λ_{i}^{2}) d λ_{i}^{2} = {| V_{i} |}^{- \frac{1}{2}} \int_{c / {∥ Σ ∥}_{F}}^{\infty} π_{λ_{i}^{2}} (λ_{i}^{2}) d λ_{i}^{2}, \end{matrix}

(A8)

and for any

g > 0

, we have

\begin{matrix} D & \geq \int_{0}^{g} {| V_{i} + λ_{i}^{2} Σ |}^{- \frac{1}{2}} exp (- {(y_{i} - B x_{i})}^{⊤} {(V_{i} + λ_{i}^{2} Σ)}^{- 1} (y_{i} - B x_{i}) / 2) π_{λ_{i}^{2}} (λ_{i}^{2}) d λ_{i}^{2} \\ \geq \int_{0}^{g} {| V_{i} + g Σ |}^{- \frac{1}{2}} exp (- {(y_{i} - B x_{i})}^{⊤} {(V_{i})}^{- 1} (y_{i} - B x_{i}) / 2) π_{λ_{i}^{2}} (λ_{i}^{2}) d λ_{i}^{2} \\ = | V_{i} {+ g Σ |}^{- \frac{1}{2}} exp (- {(y_{i} - B x_{i})}^{⊤} {(V_{i})}^{- 1} (y_{i} - B x_{i}) / 2) \int_{0}^{g} π_{λ_{i}^{2}} (λ_{i}^{2}) d λ_{i}^{2} \end{matrix}

(A9)

It is easy to see that Theorem 2 follows from (A7)–(A9). □

Appendix A.3. Proof of Theorem 3

Proof.

We begin with the following inequality:

\begin{matrix} ∥ S_{G L, i} ∥_{F} = & ∥ {(λ_{i}^{2} Σ V_{i}^{- 1} + I_{k})}^{- 1} ∥_{F} = {∥ {(λ_{i}^{2} Σ + V_{i})}^{- 1} V_{i} ∥}_{F} \\ \leq & ∥ {(λ_{i}^{2} Σ + V_{i})}^{- 1} ∥_{F} ∥ V_{i} ∥_{F} \leq ∥ Σ^{- 1} ∥_{F} {∥ V_{i} ∥}_{F} / λ_{i}^{2} . \end{matrix}

Hence,

∥ S_{G L, i} ∥_{F} > ϵ \Rightarrow λ_{i}^{2} < c_{0} {∥ Σ^{- 1} ∥}_{F}

, where

c_{0} = {∥ V_{i} ∥}_{F} / ϵ

. Thus,

\begin{matrix} P & (∥ S_{G L, i} ∥_{F} > ϵ | B, Σ, y_{i}) \leq P (λ_{i}^{2} < c_{0} ∥ Σ^{- 1} ∥_{F} | B, Σ, y_{i}) \\ = & \frac{\int_{0}^{c_{0} {∥ Σ^{- 1} ∥}_{F}} {| V_{i} + λ_{i}^{2} Σ |}^{- \frac{1}{2}} exp (- {(y_{i} - B x_{i})}^{⊤} {(V_{i} + λ_{i}^{2} Σ)}^{- 1} (y_{i} - B x_{i}) / 2) π_{λ_{i}^{2}} (λ_{i}^{2}) d λ_{i}^{2}}{\int_{0}^{\infty} {| V_{i} + λ_{i}^{2} Σ |}^{- \frac{1}{2}} exp (- {(y_{i} - B x_{i})}^{⊤} {(V_{i} + λ_{i}^{2} Σ)}^{- 1} (y_{i} - B x_{i}) / 2) π_{λ_{i}^{2}} (λ_{i}^{2}) d λ_{i}^{2}} \\ = & \frac{N^{'}}{D^{'}}, \end{matrix}

(A10)

where the

N^{'}

and

D^{'}

are defined as the numerator and denominator of (A10). Since

s_{m}

and

s_{M}

are, respectively, the smallest and largest eigenvalues of

Σ

,

\frac{\sqrt{k}}{s_{M}} \leq {∥ Σ^{- 1} ∥}_{F} \leq \frac{\sqrt{k}}{s_{m}} .

Now, writing

c_{1} = c_{0} \sqrt{k}

, we have the inequality

\begin{matrix} N^{'} \leq | V_{i} |^{- 1 / 2} \int_{0}^{c_{0} {∥ Σ^{- 1} ∥}_{F}} π (λ_{i}^{2}) d λ_{i}^{2} \leq {| V_{i} |}^{- 1 / 2} \int_{0}^{\frac{c_{1}}{s_{m}}} π (λ_{i}^{2}) d λ_{i}^{2} . \end{matrix}

Now, for some arbitrary

d > 0

, we have

\begin{matrix} D^{'} & \geq \int_{0}^{\frac{c_{1} d}{s_{M}}} | V_{i} + λ_{i}^{2} Σ |^{- 1 / 2} exp [- (1 / 2) {(y_{i} - B x_{i})}^{⊤} {(V_{i} + λ_{i}^{2} Σ)}^{- 1} (y_{i} - B x_{i}) π (λ_{i}^{2}) d λ_{i}^{2} \\ \geq \int_{0}^{\frac{c_{1} d}{s_{M}}} | V_{i} + c_{1} d Σ / s_{M} |^{- 1 / 2} exp [- (1 / 2) {(y_{i} - B x_{i})}^{⊤} V_{i}^{- 1} (y_{i} - B x_{i}) π (λ_{i}^{2}) d λ_{i}^{2} \\ \geq \int_{0}^{\frac{c_{1} d}{s_{M}}} | V_{i} + c_{1} d I_{k} |^{- 1 / 2} exp [- (1 / 2) {(y_{i} - B x_{i})}^{⊤} V_{i}^{- 1} (y_{i} - B x_{i}) π (λ_{i}^{2}) d λ_{i}^{2} . \end{matrix}

Thus,

N^{'} / D^{'} \leq K_{i} \frac{\int_{0}^{c_{1} / s_{m}} π (λ_{i}^{2}) d λ_{i}^{2}}{\int_{0}^{c_{1} d / s_{M}} π (λ_{i}^{2}) d λ_{i}^{2}}

which proves the theorem. □

Appendix A.4. Proof of Theorem 4

Proof.

We begin with the inequality

\begin{matrix} ∥ S_{G L, i} ∥_{F} = & ∥ {(λ_{i}^{2} Σ V_{i}^{- 1} + I_{k})}^{- 1} ∥_{F} = {∥ {(λ_{i}^{2} Σ + V_{i})}^{- 1} V_{i} ∥}_{F} \\ \leq & ∥ {(λ_{i}^{2} Σ + V_{i})}^{- 1} ∥_{F} {∥ V_{i} ∥}_{F} . \end{matrix}

Hence,

∥ S_{G L, i} ∥_{F} > ϵ \Rightarrow λ_{i}^{2} < c_{0} {∥ Σ^{- 1} ∥}_{F}

, where

c_{0} = {∥ V_{i} ∥}_{F} / ϵ

. Thus,

\begin{matrix} P & (∥ S_{G L, i} ∥_{F} > ϵ | B, Σ, y_{i}) \leq P (λ_{i}^{2} < c_{0} ∥ Σ^{- 1} ∥_{F} | B, Σ, y_{i}) \\ = & \frac{\int_{0}^{c_{0} {∥ Σ^{- 1} ∥}_{F}} {| V_{i} + λ_{i}^{2} Σ |}^{- \frac{1}{2}} exp (- {(y_{i} - B x_{i})}^{⊤} {(V_{i} + λ_{i}^{2} Σ)}^{- 1} (y_{i} - B x_{i}) / 2) π_{λ_{i}^{2}} (λ_{i}^{2}) d λ_{i}^{2}}{\int_{0}^{\infty} {| V_{i} + λ_{i}^{2} Σ |}^{- \frac{1}{2}} exp (- {(y_{i} - B x_{i})}^{⊤} {(V_{i} + λ_{i}^{2} Σ)}^{- 1} (y_{i} - B x_{i}) / 2) π_{λ_{i}^{2}} (λ_{i}^{2}) d λ_{i}^{2}} \\ = & \frac{N^{'}}{D^{'}}, \end{matrix}

(A11)

where

N^{'}

and

D^{'}

are defined as the numerator and the denominator of (A11).

Recall that

v_{m}

and

v_{M}

are the smallest and largest eigenvalues of

V_{i}

, and

s_{m}

and

s_{M}

are the smallest and largest eigenvalues of

Σ

, respectively. Therefore, the upper bound for

N^{'}

is

\begin{matrix} N^{'} & \leq | V_{i} |^{- \frac{1}{2}} exp (- {(y_{i} - B x_{i})}^{⊤} (V_{i} + c_{0} ∥ Σ^{- 1} {∥_{F} Σ)}^{- 1} (y_{i} - B x_{i}) / 2) \int_{0}^{c_{0} {∥ Σ^{- 1} ∥}_{F}} π_{λ_{i}^{2}} (λ_{i}^{2}) d λ_{i}^{2} \\ \leq | V_{i} |^{- \frac{1}{2}} exp (- (v_{M} + c_{0} ∥ Σ^{- 1} {∥_{F} s_{M})}^{- 1} {(y_{i} - B x_{i})}^{⊤} (y_{i} - B x_{i}) / 2) \end{matrix}

(A12)

For some

k_{2} > k_{1}

, we have

\begin{matrix} D^{'} & \geq \int_{k_{1}}^{k_{2}} {| V_{i} + λ_{i}^{2} Σ |}^{- \frac{1}{2}} exp (- {(y_{i} - B x_{i})}^{⊤} {(v_{m} + λ_{i}^{2} s_{m})}^{- 1} (y_{i} - B x_{i}) / 2) π_{λ_{i}^{2}} (λ_{i}^{2}) d λ_{i}^{2} \\ \geq \int_{k_{1}}^{k_{2}} {| V_{i} + k_{2} Σ |}^{- \frac{1}{2}} exp (- {(y_{i} - B x_{i})}^{⊤} {(v_{m} + k_{1} s_{m})}^{- 1} (y_{i} - B x_{i}) / 2) π_{λ_{i}^{2}} (λ_{i}^{2}) d λ_{i}^{2} \\ \geq c_{2} \times exp (- {(v_{m} + k_{1} s_{m})}^{- 1} {(y_{i} - B x_{i})}^{⊤} (y_{i} - B x_{i}) / 2), \end{matrix}

(A13)

where

c_{2}

does not depend on

∥ y_{i} - B x_{i} ∥_{2}

. Now, we choose large

k_{1} > 0

such that

0 < d = (v_{M} + c_{0} ∥ Σ^{- 1} {∥_{F} s_{M})}^{- 1} - {(v_{m} + k_{1} s_{m})}^{- 1}

. Hence, using (A12) and (A13), we have

P (∥ S_{G L, i} ∥_{F} > ϵ | B, Σ, y_{i}) \leq c_{3} exp (- d ∥ y_{i} - B x_{i} ∥_{2}^{2} / 2),

where

c_{3}

is some positive finite constant. Therefore, if

∥ y_{i} - B x_{i} ∥_{2} \to \infty

, then

P (∥ S_{G L, i} ∥_{F} > ϵ | B, Σ, y_{i}) \to 0

. □

References

Fay, R.E.; Herriot, R.A. Estimates of Income for Small Places: An Application of James-Stein Procedures to Census Data. J. Am. Stat. Assoc. 1979, 74, 269–277. [Google Scholar] [CrossRef]
Datta, G.S.; Hall, P.; Mandal, A. Model selection by testing for the presence of small-area effects, and application to area-level data. J. Am. Stat. Assoc. 2011, 106, 362–374. [Google Scholar] [CrossRef]
Molina, I.; Nandram, B.; Rao, J.N.K. Small area estimation of general parameters with application to poverty indicators: A hierarchical Bayes approach. Ann. Appl. Stat. 2014, 8, 852–885. [Google Scholar] [CrossRef]
Morales, D.; Pagliarella, M.C.; Salvatore, R. Small area estimation of poverty indicators under partitioned area-level time models. SORT Stat. Oper. Res. Trans. 2015, 39, 19–34. [Google Scholar]
Datta, G.S.; Mandal, A. Small Area Estimation With Uncertain Random Effects. J. Am. Stat. Assoc. 2015, 110, 1735–1744. [Google Scholar] [CrossRef]
Tang, X.; Ghosh, M.; Ha, N.S.; Sedransk, J. Modeling Random Effects Using Global–Local Shrinkage Priors in Small Area Estimation. J. Am. Stat. Assoc. 2018, 113, 1476–1489. [Google Scholar] [CrossRef]
Fay, R.E. Application of of multivariate regression to small domain estimation. In Small Area Statistics; Platek, R., Rao, J.N.K., Sarndal, C.E., Singh, M.P., Eds.; Wiley: Hoboken, NJ, USA, 1987; pp. 91–102. [Google Scholar]
Ghosh, M.; Nangia, N.; Kim, D.H. Estimation of median income of four-person families: A Bayesian time series approach. J. Am. Stat. Assoc. 1996, 91, 1423–1431. [Google Scholar] [CrossRef]
Datta, G.S.; Lahiri, P.; Maiti, T.; Lu, K.L. Hierarchical Bayes estimation of unemployment rates for the states of the US. J. Am. Stat. Assoc. 1999, 94, 1074–1082. [Google Scholar] [CrossRef]
Datta, G.S.; Fay, R.E.; Ghosh, M. Hierarchical and Empirical Multivariate Bayes Analysis in Small Area Estimation. In Proceedings of the Bureau of Census Annual Research Conference, Arlington, VA, USA, 17ߝ20 March 1991; US Department of Commerce, Bureau of the Census: Washington, DC, USA, 1991; pp. 63–79. [Google Scholar]
Carvalho, C.M.; Polson, N.G.; Scott, J.G. The horseshoe estimator for sparse signals. Biometrika 2010, 97, 465–480. [Google Scholar] [CrossRef]
Polson, N.G.; Scott, J.G. Alternative Global–Local Shrinkage Rules Using Hypergeometric–Beta Mixtures; Technical Report 14; Duke University, Department of Statistical Science: Durham, NC, USA, 2009. [Google Scholar]
Polson, N.G.; Scott, J.G. Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction. Bayesian Stat. 2010, 105, 501–538. [Google Scholar]
Polson, N.G.; Scott, J.G. Local shrinkage rules, Lévy processes and regularized regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 2012, 74, 287–311. [Google Scholar] [CrossRef]
Polson, N.G.; Scott, J.G. On the half-Cauchy prior for a global scale parameter. Bayesian Anal. 2012, 7, 887–902. [Google Scholar] [CrossRef]
Scott, J.G. Bayesian estimation of intensity surfaces on the sphere via needlet shrinkage and selection. Bayesian Anal. 2011, 6, 307–327. [Google Scholar] [CrossRef]
Armagan, A.; Clyde, M.; Dunson, D.B. Generalized Beta Mixtures of Gaussians. In Advances in Neural Information Processing Systems 24; Shawe-Taylor, J., Zemel, R.S., Bartlett, P.L., Pereira, F., Weinberger, K.Q., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2011; pp. 523–531. [Google Scholar]
Armagan, A.; Dunson, D.B.; Lee, J. Generalized Double Pareto Shrinkage. Stat. Sin. 2013, 23, 119–143. [Google Scholar] [CrossRef] [PubMed]
Griffin, J.; Brown, P. Alternative Prior Distributions for Variable Selection with Very Many More Variables than Observations; Technical Report; University of Warwick: Coventry, UK, 2005. [Google Scholar]
Strawderman, W.E. Proper Bayes Minimax Estimators of the Multivariate Normal Mean. Ann. Math. Stat. 1971, 42, 385–388. [Google Scholar] [CrossRef]
Berger, J. A Robust Generalized Bayes Estimator and Confidence Region for a Multivariate Normal Mean. Ann. Statist. 1980, 8, 716–761. [Google Scholar] [CrossRef]
Datta, J.; Ghosh, J.K. Asymptotic properties of bayes risk for the horseshoe prior. Bayesian Anal. 2013, 8, 111–132. [Google Scholar] [CrossRef]
Ghosh, P.; Tang, X.; Ghosh, M.; Chakrabarti, A. Asymptotic Properties of Bayes Risk of a General Class of Shrinkage Priors in Multiple Hypothesis Testing Under Sparsity. Bayesian Anal. 2016, 11, 753–796. [Google Scholar] [CrossRef]
Spiegelhalter, D.J.; Best, N.G.; Carlin, B.P.; Van Der Linde, A. Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B Stat. Methodol. 2002, 64, 583–639. [Google Scholar] [CrossRef]
Chung, H.C.; Datta, G.S. Bayesian Hierarchical Spatial Models for Small Area Estimation; Research Report Series; Census Bureau: Washington, DC, USA, 2020. [Google Scholar]
Bradley, J.R.; Holan, S.H.; Wikle, C.K. Multivariate spatio-temporal models for high-dimensional areal data with application to longitudinal employer-household dynamics. Ann. Appl. Stat. 2015, 9, 1761–1791. [Google Scholar] [CrossRef]

Figure 1. Estimated random effects from FH, LA, and HS models for four subgroups in all states.

Figure 2. Estimated small area means from FH, LA, and HS models for four subgroups in all states.

Figure 3. Average deviation measures for different models under different data generation settings in the simulation study.

Table 1. Global-local shrinkage prior for

u_{i}

.

Table 1. Global-local shrinkage prior for

u_{i}

.

Prior	$π (λ_{i}^{2})$	Class
Laplace (LA)	$exp (- λ_{i}^{2})$	E
Strawderman–Berger (SB)	${(1 + λ_{i}^{2})}^{- 3 / 2}$	P
Horseshoe (HS)	$λ_{i}^{- 1} {(1 + λ_{i}^{2})}^{- 1}$	P
Normal-exponentail-gamma (NEG)	${(1 + λ_{i}^{2})}^{- (1 + b)}$	P
Normal beta prime (NBP)	${(λ_{i}^{2})}^{a - 1} {(1 + λ_{i}^{2})}^{- (a + b)}$	P

The first column gives the names of priors of u_i marginalized over λ²_i. The second column gives the corresponding priors of the local parameters λ²_i up to a normalizing constant. In the third column, E and P stand for exponential-tailed priors and polynomial-tailed priors, respectively.

Table 2. Descriptive statistics for sample sizes and median household income across the 51 states/districts.

	Unweighted Sample Size			Median Household Income (in Thousands)
	Min	Median	Max	Min	Median	Max
Homeowner	1254	12,587	79,099	49.6	66.0	125.0
Renter	533	4942	53,101	23.0	33.7	55.0
Married	804	8245	67,913	62.6	78.0	167.3
Unmarried	1121	8842	64,287	25.0	35.4	62.0

Table 3. Median income estimates from the HS model.

State	Owner	Renter	Married	Unmarried	State	Owner	Renter	Married	Unmarried
AL	55.79	25.09	70.67	27.19	MT	62.03	30.45	73.01	31.41
AK	87.77	41.12	100.78	44.22	NE	70.85	32.35	83.23	34.72
AZ	63.63	35.54	73.97	35.29	NV	68.89	37.03	75.50	37.11
AR	52.84	26.72	64.51	27.01	NH	84.95	39.72	96.88	42.73
CA	87.40	44.45	90.00	45.00	NJ	96.17	42.20	108.18	45.30
CO	79.88	39.90	89.78	41.52	NM	55.43	28.28	67.90	29.10
CT	95.05	38.14	110.37	44.20	NY	83.20	38.44	93.06	40.00
DE	74.53	37.40	88.15	39.53	NC	60.77	30.89	73.22	31.12
DC	125.03	51.60	156.71	58.82	ND	78.48	36.20	90.61	38.94
FL	59.88	35.00	71.90	33.98	OH	65.40	30.08	78.85	32.36
GA	65.53	33.01	76.09	33.49	OK	60.03	30.91	72.09	31.13
HI	91.35	54.73	96.93	51.29	OR	69.74	35.10	77.84	36.01
ID	59.06	30.16	69.10	30.44	PA	69.46	32.22	83.10	35.00
IL	75.32	34.92	89.03	37.73	RI	81.99	33.38	94.95	38.46
IN	62.11	29.87	74.99	31.70	SC	57.13	29.98	71.68	30.03
IA	66.10	30.08	78.82	32.68	SD	65.60	29.80	76.58	32.14
KS	67.87	33.02	80.82	34.22	TN	59.81	29.03	71.28	30.00
KY	56.08	27.99	68.47	29.01	TX	70.95	36.93	80.00	37.01
LA	59.97	25.97	76.61	28.17	UT	75.70	39.27	79.22	40.03
ME	62.31	29.56	75.48	31.59	VT	73.98	35.65	85.58	37.96
MD	96.19	47.15	111.49	50.03	VA	81.65	41.86	94.87	41.96
MA	94.90	40.19	109.58	44.97	WA	80.49	41.32	90.06	41.89
MI	62.18	29.96	76.59	31.90	WV	50.74	24.61	64.58	26.10
MN	78.30	35.41	91.69	39.50	WI	69.78	32.87	81.35	35.26
MS	49.92	24.02	64.42	25.20	WY	73.63	40.26	83.75	39.52
MO	62.62	30.18	75.03	31.89

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ghosh, T.; Ghosh, M.; Maples, J.J.; Tang, X. Multivariate Global-Local Priors for Small Area Estimation. Stats 2022, 5, 673-688. https://0-doi-org.brum.beds.ac.uk/10.3390/stats5030040

AMA Style

Ghosh T, Ghosh M, Maples JJ, Tang X. Multivariate Global-Local Priors for Small Area Estimation. Stats. 2022; 5(3):673-688. https://0-doi-org.brum.beds.ac.uk/10.3390/stats5030040

Chicago/Turabian Style

Ghosh, Tamal, Malay Ghosh, Jerry J. Maples, and Xueying Tang. 2022. "Multivariate Global-Local Priors for Small Area Estimation" Stats 5, no. 3: 673-688. https://0-doi-org.brum.beds.ac.uk/10.3390/stats5030040

Article Menu

Multivariate Global-Local Priors for Small Area Estimation

Abstract

1. Introduction

2. The Hierarchical Model

3. Computation and Local Prior Selection

3.1. Computation

3.2. Local Prior Selection

4. Shrinkage Factor

5. Data Analysis

6. Simulation Study

7. Final Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix A.1. Proof of Theorem 1

Appendix A.2. Proof of Theorem 2

Appendix A.3. Proof of Theorem 3

Appendix A.4. Proof of Theorem 4

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI