Next Article in Journal
Regularization Error Analysis for a Sideways Problem of the 2D Nonhomogeneous Time-Fractional Diffusion Equation
Next Article in Special Issue
Modelling Bimodal Data Using a Multivariate Triangular-Linked Distribution
Previous Article in Journal
A New Bound in the Littlewood–Offord Problem
Previous Article in Special Issue
An Analysis of Travel Patterns in Barcelona Metro Using Tucker3 Decomposition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Two-Sample Test of High Dimensional Means Based on Posterior Bayes Factor

1
School of Mathematics and Statistics, Beijing Institute of Technology, Beijing 100081, China
2
Beijing Key Laboratory on MCAACI, Beijing Institute of Technology, Beijing 100081, China
*
Author to whom correspondence should be addressed.
Submission received: 5 April 2022 / Revised: 14 May 2022 / Accepted: 17 May 2022 / Published: 19 May 2022
(This article belongs to the Special Issue Multivariate Statistics: Theory and Its Applications)

Abstract

:
In classical statistics, the primary test statistic is the likelihood ratio. However, for high dimensional data, the likelihood ratio test is no longer effective and sometimes does not work altogether. By replacing the maximum likelihood with the integral of the likelihood, the Bayes factor is obtained. The posterior Bayes factor is the ratio of the integrals of the likelihood function with respect to the posterior. In this paper, we investigate the performance of the posterior Bayes factor in high dimensional hypothesis testing through the problem of testing the equality of two multivariate normal mean vectors. The asymptotic normality of the linear function of the logarithm of the posterior Bayes factor is established. Then we construct a test with an asymptotically nominal significance level. The asymptotic power of the test is also derived. Simulation results and an application example are presented, which show good performance of the test. Hence, taking the posterior Bayes factor as a statistic in high dimensional hypothesis testing is a reasonable methodology.

1. Introduction

The likelihood ratio is the primary test statistic in hypothesis testing owing to its dominating power. However, for high dimensional data, the likelihood ratio statistic is sometimes undefined. For example, the likelihood function of a multivariate normal distribution is unbounded when the dimension of data is greater than the sample size. Even if the likelihood ratio is well-defined, its performance is unsatisfactory when the dimension is proportionally “close to" the sample size [1]. Therefore, when the dimension is large relative to the sample size, that is the so-called “large p small n” situation; how to choose a test statistic plays a key role in statistical inference.
In this article, we try to use the posterior Bayes factor to be a test statistic for high dimensional data, applying it to equality testing of two multivariate normal mean vectors. The classical likelihood ratio test statistic is the ratio of the maximum values of likelihoods, whereas the Bayes factor is the ratio of the integrated likelihoods. We chose the posterior Bayes factor rather than the prior Bayes factor because when the dimension is fixed, the former is less affected by the variations of the prior. This paper aims to investigate the ability of the posterior Bayes factor as a test statistic. As a result, a simple prior is taken for the parameter.
In multivariate analysis, testing the equality of two means is a fundamental problem. The classical procedure for this problem is the famous Hotelling T 2 test in [2], which is based on Mahalanobis distance between the sample mean vectors weighted by the inverse sample covariance matrix. Hotelling’s T 2 test is the most powerful invariant test when the dimension is fixed and much smaller than the total sample size [3], but it is unsatisfactory when the dimension is large relative to the sample size [1]. However, in recent decades, hypothesis testing viable for high dimensional data is increasingly demanded in many application areas such as genomics, finance, medicine, and so on. An important work [1] modifies Hotelling’s T 2 statistic in a high dimensional setting by removing the inverse of the sample covariance from the Hotelling formulation. Some new test statistics for the mean vector are introduced by replacing the sample covariance with its diagonal in [4,5,6]. In [7], a statistic is constructed by retaining the cross-product terms in work [1]. In the sequel, ref. [8] standardizes each component of X ¯ Y ¯ in [7] by the corresponding variance estimation and proposes a scale-invariant test. The test statistics introduced above are called “sum-of-squares type statistics” (see [9]) and attempt to get around the ill-formed sample covariance matrix. Another major approach called “projecting data” transforms high dimensional data into low dimensional data with random projection so that traditional tests can be applied. See, for example [10,11,12]. By maximizing an average signal to noise ratio, ref. [13] finds the optimal projection subspace and proposes a new test procedure based on it. Besides the two main approaches mentioned above, ref. [14] studies the rates of convergence for the high-dimensional mean and proposes tests based on the sample mean. A new test based on random subspaces is proposed by [15]. A generalized component test is presented in [16], whose statistic is the average of the squared t-statistics for all the component testing problems. A method using a multiple hypothesis test based on the maximum of standardized partial sums of logarithmic p-values statistic is introduced in [17]. More works about testing the mean vectors are presented in [18,19,20].
Few articles develop tests for the means of two samples with Bayesian machineries in high dimensional settings. A Bayes factor-based testing procedure is developed by [12]. However, the statistic is still constructed with lower dimensional random projections of the high dimensional data vectors because Bayes factors based on Jeffrey’s prior involve inversion of the ill-formed sample covariance matrices, as in the classical Hotelling T 2 test statistic in a “large p small n” setting. The approach of random projection cannot be applied when the difference of two mean vectors is dense. However, whether the difference of two mean vectors is dense or sparse is not known in applications. Aitkin [21] proposed the posterior Bayes factor, which is the ratio of the posterior means of the likelihood under each model rather than the usual prior means. Suppose two models M 0 and M 1 for common data x are considered, under which the likelihood function is L i ( θ j ) , where θ j is the parameter of dimension p j and belongs to the parameter space Θ j , j { 0 , 1 } . Specifying prior π j ( θ j ) to θ j , j { 0 , 1 } , then the posterior Bayes factor in favor of the model M 1 , denoted by PBF 10 , is defined as
PBF 10 = L ¯ 1 L ¯ 0 ,
where L ¯ j is the posterior mean:
L ¯ j = Θ j L j ( θ j ) π j ( θ j | x , y ) d θ j , j { 0 , 1 } ,
and π j ( θ j | x , y ) is the posterior density of θ j :
π j ( θ j | x , y ) = L j ( θ j ) π j ( θ j ) d θ j Θ j L j ( θ j ) π j ( θ j ) d θ j , j { 0 , 1 } .
Unlike the Bayes factor, which is highly dependent on the prior and may be very sensitive to variations in the prior, the posterior Bayes factor reduces this sensitivity to the prior. Specifically, when model M 0 is a regular submodel of M 1 , the logarithm of the posterior Bayes factor under model M 1 has
2 ln PBF 10 d v ln 2 + χ 2 ( v ) ,
where “ d ” means the convergence in distribution, and v = p 1 p 0 and χ 2 ( v ) denotes a Chi-square distribution with v degrees of freedom. The asymptotic distribution of the logarithm of the posterior Bayes factor is independent of the prior distribution, which further illustrates that the posterior Bayes factor is insensitive to the prior.
Inspired by [21], we consider testing the equality of two high dimensional means with the posterior Bayes factor. With an appropriate prior, the posterior Bayes factor no longer suffers the impediment of the inversion of ill-formed matrices. Additionally, compared with the approach in [12], which proposed a test based on the Bayes factor with random projections, the posterior Bayes factor can be applied for both dense and sparse cases. In this paper, a non-informative prior also works for the location parameters, while an inverse Wishart prior is taken for the covariance matrix. We establish the asymptotic normality of the logarithm of the posterior Bayes factor under the null hypothesis and derive the asymptotic power of the test. Simulation studies are carried out to investigate the performance of the proposed test. The numerical results show that the power of our test outperforms the competitors in most cases.
The rest of this article is organized as follows. In Section 2, we derive the posterior Bayes factor for testing the equality of two mean vectors in the “large p small n” setting. The asymptotic null distribution of the posterior Bayes factor and the local power function of the test are also presented. Simulation results are given in Section 3. We apply the proposed test to a real dataset in Section 4. Section 5 concludes the paper. Technical proofs and the code for performing the simulation studies are deferred to Appendix A and Appendix B.

2. Test Based on Posterior Bayes Factor

This section tries to construct the test based on the posterior Bayes factor. Let X = ( X 1 , , X n 1 ) and Y = ( Y 1 , , Y n 2 ) be iid samples from p-dimensional multivariate normal distributions N p ( μ 1 , Σ ) and N p ( μ 2 , Σ ) , respectively, where μ 1 and μ 2 are p × 1 vectors, and Σ is a positively definite p × p matrix. The goal is to test the hypotheses
H 0 : μ 1 = μ 2 versus H 1 : μ 1 μ 2 .
In order to test Hypotheses (3) by the posterior Bayes factor, we specify the priors for the parameters μ 1 , μ 2 and Σ under both the null and alternative hypotheses as
π 0 ( μ ) = 1 , Σ W p 1 ( m 0 , V 1 ) ,
and
π 1 ( μ 1 ) = π 1 ( μ 2 ) = 1 , Σ W p 1 ( m 1 , V 1 ) ,
respectively, where μ is the common mean vector under the null hypothesis, and W p 1 ( m j , V 1 ) is the inverse Wishart distribution with real degrees of freedom m j and a positive definite matrix V 1 , j { 0 , 1 } .
The reasons for choosing the above priors are as follows.
  • When no knowledge about the prior is available, a non-informative prior is suggested. A usual one is Jeffrey’s prior. As a result, for the parameters μ 1 , μ 2 , and the common parameter μ under the null hypothesis, we choose Jeffrey’s prior, i.e., Lebesgue measure.
  • For the covariance matrix Σ , the posterior distribution with Jeffrey’s prior does not exist when p > n 2 , where n = n 1 + n 2 . Therefore, we take the inverse Wishart distribution, which is a conjugate for a normal covariance matrix.
  • This paper aims to investigate whether the test with the posterior Bayes factor statistic in high dimensional settings performs better than the existing methods. If the results turn out to be as expected, the posterior Bayes factor could be suggested to be the test statistic for high dimensional datasets. Hence, we will take simple priors. Furthermore, we take V = k I p in the priors for the covariance matrices with small k so that the variation of the Σ is large.
The joint densities of X and Y under the null and alternative hypotheses are
( 2 π ) n p 2 | Σ | n 2 exp 1 2 tr Σ 1 i = 1 n 1 ( x i μ ) ( x i μ ) T + i = j n 2 ( y j μ ) ( y j μ ) T
and
( 2 π ) n p 2 | Σ | n 2 exp 1 2 tr Σ 1 i = 1 n 1 ( x i μ 1 ) ( x i μ 1 ) T + i = j n 2 ( y j μ 2 ) ( y j μ 2 ) T ,
respectively. Then the posterior mean L ¯ 1 under H 1 can be calculated as
L ¯ 1 = Θ 1 L 1 ( θ 1 ) L 1 ( θ 1 ) π 1 ( θ 1 ) Θ 1 L 1 ( θ 1 ) π 1 ( θ 1 ) d θ 1 d θ 1 = Θ 1 L 1 2 ( θ 1 ) π 1 ( θ 1 ) d θ 1 Θ 1 L 1 ( θ 1 ) π 1 ( θ 1 ) d θ 1 ,
where
Θ 1 L 1 ( θ 1 ) π 1 ( θ 1 ) d θ 1 = ( 2 π ) ( n 2 ) p 2 n 1 p 2 n 2 p 2 2 ( m 1 + n 2 ) p 2 | V 1 + S 1 + S 2 | m 1 + n 2 2 Γ p ( m 1 + n 2 2 ) 2 m 1 p 2 | V | m 1 2 Γ p ( m 1 2 ) ,
and Γ p ( · ) denotes the multivariate gamma function, that is,
Γ p ( a ) = π p ( p 1 ) / 4 j = 1 p Γ [ a + ( 1 j ) / 2 ] .
S 1 = i = 1 n 1 ( X i X ¯ ) ( X i X ¯ ) T , S 2 = j = 1 n 2 ( Y j Y ¯ ) ( Y j Y ¯ ) T
with X ¯ = i = 1 n 1 X i / n 1 , Y ¯ = j = 1 n 2 Y j / n 2 , and
Θ 1 L 1 2 ( θ 1 ) π 1 ( θ 1 ) d θ 1 = ( 2 π ) ( 2 n 2 ) p 2 ( 2 n 1 ) p 2 ( 2 n 2 ) p 2 2 ( m 1 + 2 n 2 ) p 2 | V 1 + 2 S 1 + 2 S 2 | m 1 + 2 n 2 2 Γ p ( m 1 + 2 n 2 2 ) 2 m 1 p 2 | V | m 1 2 Γ p ( m 1 2 ) .
The posterior mean L ¯ 0 under H 0 can be calculated as
L ¯ 0 = Θ 0 L 0 2 ( θ 0 ) π 0 ( θ 0 ) d θ 0 Θ 0 L 0 ( θ 0 ) π 0 ( θ 0 ) d θ 0 ,
where
Θ 0 L 0 ( θ 0 ) π 0 ( θ 0 ) d θ 0 = ( 2 π ) ( n 1 ) p 2 n p 2 2 ( m 0 + n 1 ) p 2 | V 1 + S 1 + S 2 + n 1 n 2 n ( X ¯ Y ¯ ) ( X ¯ Y ¯ ) T | m 0 + n 1 2 Γ p ( m 0 + n 1 2 ) 2 m 0 p 2 | V | m 0 2 Γ p ( m 0 2 )
and
Θ 0 L 0 2 ( θ 0 ) π 0 ( θ 0 ) d θ 0 = ( 2 π ) ( 2 n 1 ) p 2 ( 2 n ) p 2 × 2 ( m 0 + 2 n 1 ) p 2 | V 1 + 2 S 1 + 2 S 2 + 4 n 1 n 2 n ( X ¯ Y ¯ ) ( X ¯ Y ¯ ) T | m 0 + 2 n 1 2 Γ p ( m 0 + 2 n 1 2 ) 2 m 0 p 2 | V | m 0 2 Γ p ( m 0 2 ) .
For simplicity, we specify m 0 = m + 1 and m 1 = m + 2 . Then the posterior Bayes factor in favor of H 1 against H 0 in (3) denoted by PB admits an expression as
PB ( X , Y ) = ( 1 2 ) p / 2 1 + 2 n 1 n 2 n ( X ¯ Y ¯ ) T ( V 1 + 2 ( S 1 + S 2 ) ) 1 ( X ¯ Y ¯ ) m + 2 n 2 1 + n 1 n 2 n ( X ¯ Y ¯ ) T ( V 1 + S 1 + S 2 ) 1 ( X ¯ Y ¯ ) m + n 2 .
Multiplying the logarithm of the posterior Bayes factor by 2, we have
2 ln PB ( X , Y ) = p ln 2 + ( m + 2 n ) ln 1 + 2 n 1 n 2 n ( X ¯ Y ¯ ) T ( V 1 + 2 ( S 1 + S 2 ) ) 1 ( X ¯ Y ¯ ) ( m + n ) ln 1 + n 1 n 2 n ( X ¯ Y ¯ ) T ( V 1 + S 1 + S 2 ) 1 ( X ¯ Y ¯ ) .
Now we want to determine a critical value c α , which makes the test given by the rejection region
{ ( x , y ) : 2 ln PB ( x , y ) c α }
have a significance level α . Since the distribution of 2 ln PB ( X , Y ) under the null hypothesis is unknown, the critical value c α is determined by means of the asymptotic distribution of it. In order to obtain its asymptotic distribution, Taylor series expansion of the logarithm function ln ( 1 + x ) in (7) around 0 is carried out, which can be summarized as
2 ln PB ( X , Y ) = p ln 2 + ( m + 2 n ) A 1 A 1 2 2 ( 1 + A 1 * ) 2 ( m + n ) A 2 A 2 2 2 ( 1 + A 2 * ) 2 ,
where
A 1 = 2 n 1 n 2 n ( X ¯ Y ¯ ) T ( V 1 + 2 ( S 1 + S 2 ) ) 1 ( X ¯ Y ¯ ) ,
A 2 = n 1 n 2 n ( X ¯ Y ¯ ) T ( V 1 + S 1 + S 2 ) 1 ( X ¯ Y ¯ ) ,
A 1 * ( 0 , A 1 ) and A 2 * ( 0 , A 2 ) .
In (8), the quadratic form in X ¯ Y ¯ is
( m + 2 n ) A 1 ( m + n ) A 2 = n 1 n 2 n ( X ¯ Y ¯ ) T B ( X ¯ Y ¯ ) ,
where
B = 2 ( m + 2 n ) ( V 1 + 2 S 1 + 2 S 2 ) 1 ( m + n ) ( V 1 + S 1 + S 2 ) 1 .
Denotes the spectral decomposition of Σ 1 2 B Σ 1 2 by G T A G , where A = diag ( a 1 , , a p ) . Let  ξ = ( ξ 1 , ξ 2 , , ξ p ) T = n 1 n 2 n G Σ 1 2 ( X ¯ Y ¯ ) . Then we have
n 1 n 2 n ( X ¯ Y ¯ ) T B ( X ¯ Y ¯ ) = ξ T G Σ 1 2 B Σ 1 2 G T ξ = ξ T G G T A G G T ξ = i = 1 p a i ξ i 2 .
When the null hypothesis is true,
n 1 n 2 n ( X ¯ Y ¯ ) N p ( 0 p , Σ ) ,
ξ N p ( 0 p , I p ) . The asymptotic distribution of the above formulation can be derived with the following Lemma.
Lemma 1
([22]). Let ζ n , i , i { 1 , , n } , n = 1 , 2 , , be iid s-dimensional random vectors with mean zero, covariance matrix M and finite fourth moment. For n = 1 , 2 , , let { a n , i } i = 1 n be real random variables which are independent of { ζ n , i } i = 1 n and satisfy
max 1 i n a n , i 2 i = 1 n a n , i 2 P 0 .
Then
i = 1 n a n , i ζ n , i i = 1 n a n , i 2 d N s ( 0 s , M ) .
We take ζ n , i = ξ i 2 1 , such that E ζ n , i = 0 , i { 1 , , p } . From Lemma 1, (9) needs to be normalized by
2 i = 1 n a n , i 2 = 2 tr A 2 = 2 tr ( Σ B ) 2
because Var ( ζ n , i ) = 2 , i { 1 , , p } . To ensure equality,
i = 1 p a i = tr A = tr ( B Σ )
is added to the right side of the equality. By now, we have
2 ln PB + p ln 2 = i = 1 p a i ( ξ i 2 1 ) + tr ( B Σ ) ( m + 2 n ) A 1 2 2 ( 1 + A 1 * ) 2 + ( m + n ) A 2 2 2 ( 1 + A 2 * ) 2 .
As a result,
2 ln PB + p ln 2 tr ( B Σ ) ^ 2 tr ( B Σ ) 2 = i = 1 p a i ( ξ i 2 1 ) 2 tr ( B Σ ) 2 + tr ( B Σ ) tr ( B Σ ) ^ 2 tr ( B Σ ) 2 ( m + 2 n ) A 1 2 2 ( 1 + A 1 * ) 2 2 tr ( B Σ ) 2 + ( m + n ) A 2 2 2 ( 1 + A 2 * ) 2 2 tr ( B Σ ) 2 ,
where tr ( B Σ ) ^ is the estimator of tr ( B Σ ) . We take
tr ( B Σ ) ^ = tr ( B S n ) ,
where S n = ( S 1 + S 2 ) / ( n 2 ) .
We shall next prove that the first item on the right side of (11) converges in distribution to N ( 0 , 1 ) and the remaining items converge in probability to 0. If a ratio consistent estimator of tr ( B Σ ) 2 is obtained, a test with a level of asymptotical significance α can be constructed by (11). To this end, some usual assumptions are made as follows:
p n c ( 0 , ) , n 1 n τ ( 0 , 1 ) a s n , a n d m = O ( n ) .
λ 1 ( Σ ) tr Σ 2 0 ,
where λ 1 ( A ) is the largest eigenvalue of a matrix A. Let δ = μ 1 μ 2 . We also assume
n 1 n 2 n δ T Σ δ tr Σ 2 0 ,
and
δ T δ tr Σ 2 = O ( 1 ) .
Carefully choosing k = ε n / [ n p λ 1 ( S 1 + S 2 ) ] , where ε n 0 , we ensure that
m + 2 n 2 tr ( B Σ ) 2 A 1 2 2 ( 1 + A 1 * ) 2 P 0 a n d m + n 2 tr ( B Σ ) 2 A 2 2 2 ( 1 + A 2 * ) 2 P 0
under condition (12), (13) and (15). See Appendix A for the proof. For the estimator tr ( B S n ) , the following theorem shows its property.
Lemma 2.
If conditions (12) and (13) are true, the estimator tr ( B Σ ) ^ satisfies
tr ( B S n ) tr ( B Σ ) 2 tr ( B Σ ) 2 P 0 .
Combining (11) with (16) and (17), we have
1 2 tr ( B Σ ) 2 2 ln PB + p ln 2 tr ( B Σ ) ^ i = 1 p a i ( ξ i 2 1 ) 2 tr ( B Σ ) 2 P 0 .
By now, the asymptotic distributions of the linear function of the logarithm of the posterior Bayes factor can be derived.
Theorem 1.
Under the conditions in (12) and (13), the posterior Bayes factor PB has properties as follows.
  • Under the null hypothesis,
    1 2 tr ( B Σ ) 2 2 ln ( PB ) + p ln 2 tr ( B Σ ) ^ d N ( 0 , 1 ) .
  • Under the local alternative in (14) and condition (15),
    1 2 tr ( B Σ ) 2 2 ln ( PB ) + p ln 2 tr ( B Σ ) ^ n 1 n 2 n δ T B δ d N ( 0 , 1 ) .
In order to formulate a test procedure based on Lemma 2, the estimator of tr ( B Σ ) 2 is demanded. The following ratio-consistent estimator for tr Σ 2 is proposed in [1]:
tr Σ 2 ^ = ( n 2 ) 2 n ( n 1 ) tr S n 2 1 n 2 ( tr S n ) 2 .
Inspired by it, we propose the following estimator for tr ( B Σ ) 2 :
t r ( B Σ ) 2 ^ = tr ( B S n ) 2 1 n 2 [ tr ( B S n ) ] 2 .
The following theorem shows the property of tr ( B Σ ) 2 ^ .
Theorem 2.
Under the conditions in (12) and (13), the estimator tr ( B Σ ) 2 ^ is a ratio-consistent estimator of tr ( B Σ ) 2 , which means
tr ( B Σ ) 2 ^ tr ( B Σ ) 2 P 1 a s p and n .
By Theorems 1 and 2, we obtain a test statistic for (3),
T PB = 1 2 tr ( B Σ ) 2 ^ 2 ln ( PB ) + p ln 2 tr ( B S n ) ,
which is asymptotically distributed as N ( 0 , 1 ) when the null hypothesis is true. Then the rejection region of the test with approximate significance level α is
2 ln ( PB ) z 1 α 2 tr ( B Σ ) 2 ^ p ln 2 + tr ( B S n ) ,
where z 1 α is the 1 α quantile of N ( 0 , 1 ) .
Previous results allow us to investigate the asymptotic power of the proposed test. By Theorem 1, the following conclusion is obtained.
Corollary 1.
Under the conditions in (12), (13), the local alternative (14) and condition (15), the power of the posterior Bayes factor-based test is
β P B F ( δ ) E Σ Φ n 1 n 2 n δ T B δ 2 tr ( B Σ ) 2 ^ z 1 α 0 ,
where “ E Σ ” means the expectation about random variance S n .

3. Simulation

In this section, we conduct simulation studies using R language to evaluate the performance of the posterior Bayes factor-based test for various scenarios. The significance level is set to α = 0.05 in all the simulations; p = 1000 and the sample sizes are n 1 = n 2 = 70 . The data X 1 , , X n 1 and Y 1 , , Y n 2 are generated from multivariate normal distributions N p ( μ 1 , Σ ) and N p ( μ 2 , Σ ) , respectively.
We consider the following choices for Σ = ( ( σ i , j ) ) .
  • Σ 1 = I p is the identity matrix.
  • Σ 2 is a covariance matrix with σ i , j = 0.4 | i j | .
  • Σ 3 is block diagonal matrix, with block B 25 × 25 in which the diagonal entries are 1 and the off-diagonal entries are 0.15.
Σ 1 is for independent cases, while Σ 2 and Σ 3 are for dependent cases.
Theorem 1 shows that T PB is a linear function of the logarithm of the posterior Bayes factor, which is asymptotically distributed as N ( 0 , 1 ) . Q–Q plots are presented in Figure 1 to reveal the asymptotic behavior of T PB for μ 1 = μ 2 = 0 p × 1 and different choices of Σ . We can see that points in Figure 1a–c are closely aligned along the identity line, indicating that the distributions of T P B with different Σ are close to N ( 0 , 1 ) .
We also compare the empirical significance levels and powers of the proposed test with several other tests, including not only tests based on the sum-of-squares-type statistics in [4], referred to as SD, and [7], referred to as CQ, but also a Bayes factor-based test which relies on two random projection approaches as in [12], referred to as RMPBT 1 and RMPBT 2 . In this section, the test we proposed is denoted as PB. The results of SD, CQ and RMPBT are cited from [12].
As in [12], we consider two possible alternatives, as follows. Without loss of generality, we shall always take μ 2 = 0 p × 1 in the simulations. The proportion of entries of the vector δ = μ 1 μ 2 that are exactly zero is denoted by p 0 .
  • Simulate μ 1 N p ( 1 , I p ) , set p 0 randomly selected elements to 0, and scale μ 1 so that δ T Σ 1 δ = 2 .
  • Simulate μ 1 N p ( 1 , I p ) , set p 0 randomly selected elements to 0, and scale μ 1 so that | | δ | | 2 tr Σ 2 = 0.1 .
We take p 0 = 0.5 , 0.75 , 0.80 , 0.95 , 0.975 , 1 . Note that the case p 0 = 1 corresponds to the null hypothesis and the power becomes the empirical level. A larger p 0 corresponds to a more sparse alternative, while a smaller p 0 corresponds to a denser one.
For the PB test, we take m = 2 p and ε n = 1 / ln ( n ) . The numerical results are calculated from 1000 replications and summarized in Table 1, Table 2, Table 3 and Table 4. Table 1 compares the empirical sizes of the tests. In general, the test PB performs best in maintaining the significance level. It can be seen that the estimated sizes of PB are reasonably close to the nominal level 0.05. Tests RMPBT and SD show lower empirical levels than the nominal one, whereas test CQ is a little higher.
Table 2, Table 3 and Table 4 compare the powers of the tests. Covariance matrix Σ in Table 2, Table 3 and Table 4 are Σ 1 , Σ 2 and Σ 3 , respectively. Table 2 shows that our test PB substantively outperforms the other three tests for both dense and sparse alternatives. This implies that our method provides the most powerful test compared with the approaches of [4,7,12] for independent cases. In Table 3, the test PB performs better than its competitors in most cases. In Table 4, PB also performs better than the competitors with dense alternatives. Finally, from Table 3 and Table 4, either the prior or the posterior Bayes factor-based tests are better than others. For the dense alternative, the PB test is more powerful than RMPBT.

4. An Application Example

To further explore the practical utility of the posterior Bayes factor-based test, we analyze a real dataset about the small round blue cell tumors (SRBCTs), which is available at https://file.biolab.si/biolab/supp/bi-cancer/projections/info/SRBCT.html, accessed on 1 March 2022.
The SRBCTs are four different childhood tumors including Ewing’s family of tumors (EWS), neuroblastoma (NB), non-Hodgkin lymphoma (BL) and rhabdomyosarcoma (RMS). Our interest is in examining the equality of means of the genes between the EWS and the RMS tumor groups. The dataset contains 29 examples of EWS and 25 examples of RMS with 2038 genes. The observed test statistic of PB is T P B = 14.19842 with p-value 0 , indicating a serious deviation from the null hypothesis.

5. Conclusions

In this article, we explore the potential for the posterior Bayes factor to be a statistic for testing the mean equality of two high dimensional populations. A closed form of the posterior Bayes factor is obtained with simple priors for the model parameters. Asymptotic normality of the posterior Bayes factor is established, and the corresponding test is constructed. Numerical studies and a real-life example show the superiority of the test. Therefore, we recommend the posterior Bayes factor as a test statistic for hypothesis testing in high dimensional settings.

Author Contributions

Conceptualization, X.X.; methodology, X.X.; software, Y.J.; validation, Y.J. and X.X.; formal analysis, Y.J. and X.X.; writing—original draft preparation, Y.J.; writing—review and editing, Y.J.; visualization, Y.J.; supervision, X.X.; project administration, X.X.; funding acquisition, X.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under grant number 11471035.

Institutional Review Board Statement

The study did not require ethical approval.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank the anonymous referee for the insightful suggestions and comments, which significantly improve the quality and the exposition of the paper.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. Proof

The Proof of (16).
( m + 2 n ) A 1 2 2 tr ( B Σ ) 2 = 4 ( m + 2 n ) n 1 n 2 n ( X ¯ Y ¯ ) T ( V 1 + 2 ( S 1 + S 2 ) ) 1 ( X ¯ Y ¯ ) 2 2 tr ( B Σ ) 2
Substituting k I p for V in B,
B = 2 ( m + 2 n ) ( 1 k I p + 2 ( S 1 + S 2 ) ) 1 ( m + n ) ( 1 k I p + S 1 + S 2 ) 1 = k 2 ( m + 2 n ) ( I p + 2 k ( S 1 + S 2 ) ) 1 ( m + n ) ( I p + k ( S 1 + S 2 ) ) 1 = k ( m + 3 n ) × 2 ( m + 2 n ) ( I p + 2 k ( S 1 + S 2 ) ) 1 ( m + n ) ( I p + k ( S 1 + S 2 ) ) 1 m + 3 n .
Let
C = 1 m + 3 n 2 ( m + 2 n ) ( I p + 2 k ( S 1 + S 2 ) ) 1 ( m + n ) ( I p + k ( S 1 + S 2 ) ) 1 ,
then
B = k ( m + 3 n ) C ,
( m + 2 n ) A 1 2 2 tr ( B Σ ) 2 = 4 ( m + 2 n ) k 2 n 1 n 2 n ( X ¯ Y ¯ ) T ( I p + 2 k ( S 1 + S 2 ) ) 1 ( X ¯ Y ¯ ) 2 ( m + 3 n ) k 2 tr ( ( C Σ ) 2 ) 4 k ( m + 2 n ) n 1 n 2 n ( X ¯ Y ¯ ) T ( X ¯ Y ¯ ) 2 2 ( m + 3 n ) tr ( ( C Σ ) 2 )
tr ( C Σ ) 2 tr Σ 2 = tr ( Σ + ( C I p ) Σ ) 2 tr Σ 2 = tr Σ 2 + 2 tr [ ( C I p ) Σ 2 ] + tr [ ( C I p ) Σ ] 2 tr Σ 2 = 1 + 2 tr [ ( C I p ) Σ 2 ] tr Σ 2 + tr [ ( C I p ) Σ ] 2 tr Σ 2
C 1 m + 3 n ( m + 2 n ) ( I p + k ( S 1 + S 2 ) ) 1 ( m + n ) ( I p + k ( S 1 + S 2 ) ) 1 = n m + 3 n ( I p + k ( S 1 + S 2 ) ) 1 > 0 ,
and
C 1 m + 3 n 2 ( m + 2 n ) ( I p + k ( S 1 + S 2 ) ) 1 ( m + n ) ( I p + k ( S 1 + S 2 ) ) 1 = ( I p + k ( S 1 + S 2 ) ) 1 I p .
Hence
I p C = 1 m + 3 n 2 ( m + 2 n ) [ I p ( I p + 2 k ( S 1 + S 2 ) ) 1 ] ( m + n ) [ I p ( I p + k ( S 1 + S 2 ) ) 1 ] = 1 m + 3 n 2 ( m + 2 n ) 2 k ( S 1 + S 2 ) ( I p + 2 k ( S 1 + S 2 ) ) 1 ( m + n ) k ( S 1 + S 2 ) ( I p + k ( S 1 + S 2 ) ) 1 1 m + 3 n ( 3 m + 7 n 1 + 7 n 2 ) k ( S 1 + S 2 ) ( I p + k ( S 1 + S 2 ) ) 1 3 k ( S 1 + S 2 ) ( I p + k ( S 1 + S 2 ) ) 1 .
Because k = ε n / [ n p λ 1 ( S 1 + S 2 ) ] , we have k ( S 1 + S 2 ) ε n / ( n p ) I p . Hence,
0 I p C 3 ε n n p I p .
It follows that
tr [ ( C I p ) Σ 2 ] tr Σ 2 3 ε n n p tr Σ 2 tr Σ 2 = 3 ε n n p 0 ,
and
tr [ ( C I p ) Σ ] 2 tr Σ 2 9 ε n 2 n 2 p 2 0 .
Consequently,
tr ( C Σ ) 2 tr Σ 2 P 1 .
Therefore,
( m + 2 n ) A 1 2 2 tr ( B Σ ) 2 4 k ( m + 2 n ) n 1 n 2 n ( X ¯ Y ¯ ) T ( X ¯ Y ¯ ) 2 O p ( 1 ) 2 ( m + 3 n ) tr Σ 2 = 4 ( m + 2 n ) ε n n 1 n 2 n ( X ¯ Y ¯ ) T ( X ¯ Y ¯ ) 2 O p ( 1 ) 2 n p ( m + 3 n ) ( n 2 ) tr Σ 2 λ 1 ( S n ) .
S n can be written as
S n = i = 1 n 2 Σ 1 / 2 z i z i T Σ 1 / 2 n 2 ,
where z i , i ( 1 , , n 2 ) are independently distributed according to the normal distribution N p ( 0 p , I p ) . It follows that
Var ( tr S n ) = i = 1 n 2 Var ( z i T Σ z i ) ( n 2 ) 2 .
By elementary calculation, we have
E ( z i T Σ z i ) 2 = 2 tr Σ 2 + ( tr Σ ) 2 .
Since
E [ z i T Σ z i ] = tr Σ ,
it follows that,
Var ( tr S n ) = 2 tr Σ 2 n 2 .
Hence,
tr S n = tr Σ 1 + O p 2 tr Σ 2 n 2 tr Σ .
By (A6), we have
1 λ 1 ( S n ) n 2 tr S n = n 2 tr Σ 1 + O p 2 tr Σ 2 n 2 tr Σ .
Because
2 tr Σ 2 n 2 tr Σ = o ( 1 ) ,
then (A5) becomes
( m + 2 n ) A 1 2 2 tr ( B Σ ) 2 4 ( m + 2 n ) ε n n 1 n 2 n ( X ¯ Y ¯ ) T ( X ¯ Y ¯ ) 2 O p ( 1 ) 2 n p ( m + 3 n ) tr Σ 2 tr Σ 1 + o p ( 1 ) .
Under the null hypothesis,
n 1 n 2 n ( X ¯ Y ¯ ) N p ( 0 p , Σ ) .
By elementary calculation, we have
E n 1 n 2 n ( X ¯ Y ¯ ) T ( X ¯ Y ¯ ) 2 = 2 tr Σ 2 + ( tr Σ ) 2 .
Therefore,
( m + 2 n ) A 1 2 2 tr ( B Σ ) 2 4 ( m + 2 n ) ε n 2 tr Σ 2 / tr Σ + tr Σ / tr Σ 2 O p ( 1 ) 2 n p ( m + 3 n ) 1 + o p ( 1 ) .
Because
1 tr Σ tr Σ 2 p ,
we can obtain
4 ( m + 2 n ) ε n 2 tr Σ 2 / tr Σ + tr Σ / tr Σ 2 O p ( 1 ) 2 n p ( m + 3 n ) 1 + o p ( 1 ) 4 ( m + 2 n ) ε n 2 + p O p ( 1 ) 2 n p ( m + 3 n ) 1 + o p ( 1 ) P 0
as ε n P 0 .
We can conclude
( m + 2 n ) A 1 2 2 tr ( B Σ ) 2 P 0 .
Because A 1 * ( 0 , A 1 ) , we have
( m + 2 n ) A 1 2 2 ( 1 + A 1 * ) 2 2 tr ( B Σ ) 2 P 0 .
Similarly, we can prove that
( m + n ) A 2 2 2 ( 1 + A 2 * ) 2 2 tr ( B Σ ) 2 P 0 .
Under the alternative hypothesis,
E n 1 n 2 n ( X ¯ Y ¯ ) T ( X ¯ Y ¯ ) 2 = 2 tr Σ 2 + ( tr Σ ) 2 + 4 n 1 n 2 n δ T Σ δ + ( n 1 n 2 n δ T δ ) 2 + 2 n 1 n 2 n ( tr Σ ) δ T δ .
(A7) becomes
( m + 2 n ) A 1 2 2 tr ( B Σ ) 2 4 ( m + 2 n ) ε n 2 tr Σ 2 + ( tr Σ ) 2 + 4 n 1 n 2 n δ T Σ δ + ( n 1 n 2 n δ T δ ) 2 + 2 n 1 n 2 n ( tr Σ ) δ T δ O p ( 1 ) 2 n p ( m + 3 n ) tr Σ 2 tr Σ 1 + o p ( 1 )
By (A9),
4 ( m + 2 n ) ε n 2 tr Σ 2 + ( tr Σ ) 2 + 4 n 1 n 2 n δ T Σ δ + ( n 1 n 2 n δ T δ ) 2 + 2 n 1 n 2 n ( tr Σ ) δ T δ O p ( 1 ) 2 n p ( m + 3 n ) tr Σ 2 tr Σ 1 + o p ( 1 ) 4 ( m + 2 n ) ε n 4 n 1 n 2 n δ T Σ δ tr Σ 2 ( tr Σ ) + ( n 1 n 2 n δ T δ ) 2 tr Σ 2 ( tr Σ ) + 2 n 1 n 2 n δ T δ tr Σ 2 O p ( 1 ) 2 n p ( m + 3 n ) 1 + o p ( 1 ) P 0
Together with (A8) and (A10) becomes
( m + 2 n ) A 1 2 2 tr ( B Σ ) 2 4 ( m + 2 n ) ε n 4 n 1 n 2 n δ T Σ δ tr Σ 2 + ( n 1 n 2 n δ T δ tr Σ 2 ) 2 + 2 n 1 n 2 n δ T δ tr Σ 2 O p ( 1 ) 2 n p ( m + 3 n ) 1 + o p ( 1 )
Under the assumption (14),
( m + 2 n ) A 1 2 2 tr ( B Σ ) 2 4 ( m + 2 n ) ε n ( n 1 n 2 n δ T δ tr Σ 2 ) 2 + 2 n 1 n 2 n δ T δ tr Σ 2 O p ( 1 ) 2 n p ( m + 3 n ) 1 + o p ( 1 )
Additionally, with assumption (15),
( m + 2 n ) A 1 2 2 ( 1 + A 1 * ) 2 2 tr ( B Σ ) 2 P 0
holds as ε n P 0 . Similarly,
( m + n ) A 2 2 2 ( 1 + A 2 * ) 2 2 tr ( B Σ ) 2 P 0 .
   □
Proof of Lemma 2.
By (A1) and (A4), we need only show that
tr ( C S n ) tr ( C Σ ) tr Σ 2 P 0 .
tr ( C S n ) tr ( C Σ ) tr Σ 2 = tr ( S n Σ + ( C I p ) ( S n Σ ) ) tr Σ 2 = O p 2 tr Σ 2 n 2 + tr ( ( C I p ) ( S n Σ ) ) tr Σ 2 = 1 n 2 O p ( 1 ) + tr ( ( C I p ) ( S n Σ ) ) tr Σ 2 ,
where the second equality follows from (A6).
By Cauchy–Schwarz inequality and (A3),
tr ( ( C I p ) ( S n Σ ) ) tr Σ 2 tr ( C I p ) 2 tr ( S n Σ ) 2 tr Σ 2 3 ε n n p ( n 2 ) p tr ( S n Σ ) 2 tr Σ 2 .
We know that
E [ tr ( S n Σ ) 2 ] = E [ tr S n 2 + tr Σ 2 2 tr ( S n Σ ) ] ,
where
E [ tr S n 2 ] = 1 ( n 2 ) 2 i , j = 1 n 2 E [ z i T Σ z i z j T Σ z j ] = 1 ( n 2 ) 2 i = 1 n 2 E ( z i T Σ z i ) 2 + i j E [ z i T Σ z i z j T Σ z j ] = 1 ( n 2 ) 2 ( n 2 ) [ 2 tr Σ 2 + ( tr Σ ) 2 ] + [ ( n 2 ) 2 ( n 2 ) ] ( tr Σ ) 2 = 2 n 2 tr Σ 2 + ( tr Σ ) 2 .
Therefore,
E tr ( S n Σ ) 2 = 2 tr Σ 2 n 2 .
Hence
tr ( S n Σ ) 2 tr Σ 2 = 1 n 2 O p ( 1 ) .
0 | tr ( I p C ) ( S n Σ ) | tr Σ 2 3 p n 2 ε n n p O p ( 1 ) .
We conclude
| tr ( I p C ) ( S n Σ ) | tr Σ 2 P 0 .
The Lemma is proved.    □
Proof of Theorem 1.
From (18),
1 2 tr ( B Σ ) 2 2 ln ( PB ) + p ln 2 tr ( B Σ ) ^ i = 1 p a i ( ξ i 2 1 ) 2 i = 1 p a i 2 P 0 .
Under the null hypothesis, we know that
ξ N p ( 0 p , I p ) .
Hence,
E ξ i 2 = 1 a n d Var ( ξ i 2 ) = 2 .
We know that
max 1 i p a i = λ 1 ( A ) = λ 1 ( B Σ ) ,
and
i = 1 p a i 2 = tr A 2 = tr ( B Σ ) 2 .
As
B ( m + 2 n 1 + 2 n 2 m + n 1 + n 2 2 ) ( V 1 + S 1 + S 2 ) 1 ( m + 3 n 1 + 3 n 2 2 ) V ,
and
B ( m + 2 n 1 + 2 n 2 ) ( 2 V 1 + 2 S 1 + 2 S 2 ) 1 m + n 1 + n 2 2 ( V 1 + S 1 + S 2 ) 1 = n 1 + n 2 2 ( V 1 + S 1 + S 2 ) 1 = n 1 + n 2 2 ( 1 k I p + S 1 + S 2 ) 1 n 1 + n 2 2 1 λ 1 ( 1 k I p + S 1 + S 2 ) I p ,
we have
max 1 i p | a i | i = 1 p a i 2 m + 3 n 1 + 3 n 2 2 k λ 1 ( Σ ) λ 1 ( 1 k I p + S 1 + S 2 ) n 1 + n 2 2 tr Σ 2 = λ 1 ( Σ ) tr Σ 2 m + 3 n 1 + 3 n 2 2 k n 1 + n 2 2 λ 1 ( 1 k I p + S 1 + S 2 ) = λ 1 ( Σ ) tr Σ 2 m + 3 n 1 + 3 n 2 n 1 + n 2 λ 1 ( I p + k S 1 + k S 2 ) .
Because
k = ε n n p λ 1 ( S 1 + S 2 ) , ε n 0 ,
we have
λ 1 ( I p + k S 1 + k S 2 ) k λ 1 ( S 1 + S 2 ) + λ 1 ( I p ) = ε n n p + λ 1 ( I p ) = O ( 1 ) .
By conditions (12) and (13), we have
max 1 i p | a i | i = 1 p a i 2 P 0 .
By Lemma 1,
i = 1 p a i ( ξ i 2 1 ) 2 i = 1 p a i 2 N ( 0 , 1 ) as p .
Therefore,
1 2 tr ( B Σ ) 2 2 ln ( PB ) + p ln 2 tr ( B Σ ) ^ d N ( 0 , 1 ) .
By now, (1) in Theorem 1 has been proved.
Under the alternative hypothesis, we have
n 1 n 2 n Σ 1 2 ( X ¯ Y ¯ ) N p ( n 1 n 2 n Σ 1 2 δ , I p ) .
Let
η = n 1 n 2 n Σ 1 2 ( X ¯ Y ¯ ) , η 0 = η n 1 n 2 n Σ 1 2 δ ,
i = 1 p a i ξ i 2 = η 0 + n 1 n 2 n Σ 1 2 δ T Σ 1 2 B Σ 1 2 η 0 + n 1 n 2 n Σ 1 2 δ = η 0 T Σ 1 2 B Σ 1 2 η 0 + n 1 n 2 n δ T B δ + 2 n 1 n 2 n δ T B Σ 1 2 η 0 .
Since
η 0 N p ( 0 p , I p ) ,
E n 1 n 2 n δ T B Σ 1 2 η 0 2 tr ( B Σ ) 2 S n = 0 , and Var n 1 n 2 n δ T B Σ 1 2 η 0 2 tr ( B Σ ) 2 S n = n 1 n 2 n δ T B Σ B δ 2 tr ( B Σ ) 2 .
By (22), (A1) and (A4),
Var n 1 n 2 n δ T B Σ 1 2 η 0 2 tr ( B Σ ) 2 S n n 1 n 2 n δ T C Σ C δ 2 tr Σ 2 P 0 .
n 1 n 2 n δ T C Σ C δ tr Σ 2 = n 1 n 2 n δ T ( C I p + I p ) Σ ( C I p + I p ) δ tr Σ 2 2 n 1 n 2 n δ T Σ δ tr Σ 2 + 2 n 1 n 2 n δ T ( I p C ) Σ ( I p C ) δ tr Σ 2
With (A3), we have
n 1 n 2 n δ T ( I p C ) Σ ( I p C ) δ tr Σ 2 n 1 n 2 n 9 ε n 2 δ T Σ δ n 2 p 2 tr Σ 2 .
0 n 1 n 2 n δ T C Σ C δ tr Σ 2 n 1 n 2 n δ T Σ δ tr Σ 2 + n 1 n 2 n 9 ( ε n n p ) 2 δ T Σ δ tr Σ 2 .
Because n 1 n 2 δ T Σ δ / ( n tr Σ 2 ) 0 ,
Var n 1 n 2 n δ T B Σ 1 2 η 0 tr ( B Σ ) 2 ^ S n P 0 .
Hence, we can conclude that
n 1 n 2 n δ T B Σ 1 2 η 0 2 tr ( B Σ ) 2 P 0 ,
1 2 tr ( B Σ ) 2 i = 1 p a i ξ i 2 n 1 n 2 n δ T B δ η 0 T Σ 1 2 B Σ 1 2 η 0 P 0 .
With (17) and (18), we have
1 2 tr ( B Σ ) 2 2 ln ( PB ) + p ln 2 n 1 n 2 n δ T B δ η 0 T Σ 1 2 B Σ 1 2 η 0 P 0 .
In the proof of Theorem 1, we proved that
1 2 tr ( B Σ ) 2 ψ T Σ 1 2 B Σ 1 2 ψ tr ( B Σ ) ^ d N ( 0 , 1 ) ,
where ψ is a random vector distributed according to N p ( 0 p , I p ) . Hence, we have
1 2 tr ( B Σ ) 2 η 0 T Σ 1 2 B Σ 1 2 η 0 tr ( B Σ ) ^ d N ( 0 , 1 ) .
Therefore,
1 2 tr ( B Σ ) 2 2 ln ( PB ) + p ln 2 tr ( B Σ ) ^ n 1 n 2 n δ T B δ d N ( 0 , 1 ) .
(2) in Theorem 1 has been proved.    □
Proof of Theorem 2.
Denote the spectral decomposition of S n by P D P T , where D = diag [ d 1 , d 2 , , d p ] . We can rewrite C as
C = 1 m + 3 n 2 ( m + 2 n ) ( I p + 2 k ( n 2 ) P D P T ) 1 ( m + n ) ( I p + k ( n 2 ) P D P T ) 1 P H P T ,
where H = diag [ h 1 , h 2 , , h p ] and
h i = 1 m + 3 n 2 ( m + 2 n ) 1 + 2 k ( n 2 ) d i m + n 1 + k ( n 2 ) d i , i { 1 , , p } .
1 h i = k ( n 2 ) d i m + 3 n 4 ( m + 2 n ) 1 + 2 k ( n 2 ) d i m + n 1 + k ( n 2 ) d i 3 k ( n 2 ) d i 1 + k ( n 2 ) d i
Now substitute the expression of k into the inequality,
1 h i 3 ( n 2 ) ε n n p P 0 .
tr ( C S n ) 2 1 n 2 ( tr ( C S n ) ) 2 = i = 1 p ( d i h i ) 2 1 n 2 i = 1 p d i h i 2 = i = 1 p [ d i ( 1 + h i 1 ) ] 2 1 n 2 i = 1 p d i ( 1 + h i 1 ) 2 = i = 1 p d i 2 2 i = 1 p ( 1 h i ) d i 2 + i = 1 p ( 1 h i ) 2 d i 2 1 n 2 i = 1 p d i i = 1 p ( 1 h i ) d i 2 = i = 1 p d i 2 2 i = 1 p ( 1 h i ) d i 2 + i = 1 p ( 1 h i ) 2 d i 2 1 n 2 i = 1 p d i 2 + 2 n 2 i = 1 p d i i = 1 p ( 1 h i ) d i 1 n 2 i = 1 p ( 1 h i ) d i 2 = i = 1 p d i 2 1 n 2 i = 1 p d i 2 2 i = 1 p ( 1 h i ) d i 2 + 2 n 2 i = 1 p d i i = 1 p ( 1 h i ) d i + i = 1 p ( 1 h i ) 2 d i 2 1 n 2 i = 1 p ( 1 h i ) d i 2 .
By [1], we know that
tr S n 2 = n n 2 tr Σ 2 + n ( n 2 ) 2 ( tr Σ ) 2 + o p ( tr Σ 2 ) ,
and
1 n 2 ( tr S n ) 2 = n ( n 2 ) 2 ( tr Σ ) 2 + o p ( tr Σ 2 ) .
Noting that p / n ( 0 , ) , by
tr Σ tr I p 2 tr Σ 2 = p tr Σ 2 ,
we have
tr Σ n p n tr Σ 2 = O ( tr Σ 2 ) .
Hence
tr S n 2 tr Σ 2 = O p ( 1 ) ,
and
1 n 2 tr S n 2 ( tr Σ ) 2 = O p ( 1 ) .
With (A11),
0 i = 1 p ( 1 h i ) d i 2 3 ( n 2 ) ε n n p tr S n 2 = 3 ( n 2 ) ε n n p O p ( tr Σ 2 ) .
Similarly, we have
0 2 n 2 i = 1 p d i i = 1 p ( 1 h i ) d i ε n n p ( tr S n ) 2 ,
therefore
2 i = 1 p d i i = 1 p ( 1 h i ) d i ( n 2 ) tr Σ 2 ( n 2 ) ε n n p O p ( 1 ) .
As ε n 0 ,
i = 1 p ( 1 h i ) d i 2 tr Σ 2 P 0 ,
and
2 i = 1 p d i i = 1 p ( 1 h i ) d i ( n 2 ) tr Σ 2 P 0 .
Therefore, with (A11),
tr ( C S n ) 2 1 n 2 ( tr ( C S n ) ) 2 = ( 1 + o p ( 1 ) ) i = 1 p d i 2 1 n 2 i = 1 p d i 2 = ( 1 + o p ( 1 ) ) tr S n 2 1 n 2 ( tr S n ) 2 .
Then,
tr ( C S n ) 2 1 n 2 ( tr ( C S n ) ) 2 tr S n 2 1 n 2 ( tr S n ) 2 P 1 .
The authors of [1] have proved that
tr S n 2 1 n 2 ( tr S n ) 2 tr Σ 2 P 1 .
Thus, by the above and (A4) and (A12), we have
tr ( C S n ) 2 1 n 2 ( tr ( C S n ) ) 2 tr ( C Σ ) 2 P 1 ,
which implies (22) holds.    □
Proof of Corollary 1.
The power of the proposed test is
β ( δ ) = Pr ( T PB z 1 α ) = Pr 1 2 tr ( B Σ ) 2 ^ 2 ln ( PB ) + p ln 2 tr ( B Σ ) ^ z 1 α = E Σ Pr 1 2 tr ( B Σ ) 2 ^ 2 ln ( PB ) + p ln 2 tr ( B Σ ) ^ n 1 n 2 n δ T B δ z 1 α n 1 n 2 n δ T B δ 2 tr ( B Σ ) 2 ^ S n
By (20) and (22), we have
β ( δ ) E Σ Φ n 1 n 2 n δ T B δ tr ( B Σ ) 2 ^ z 1 α P 0 .
   □

Appendix B. R code

rm(list = ls(all = TRUE))
library(MASS)
library(Matrix)
#install.packages("lava")
library(lava)
n1=70
n2=70
n=n1+n2
p=1000
M=1000
m=2*p
mu1=rep(0,p)
#Sigma 1
Sigma1=diag(1,p)
#Sigma 2
#ro=0.4
#Sigma2_0=matrix(1,p,p)
#for (i in 1:p) {
#  for (j in 1:p) {
#    k<-abs(j-i)
#    Sigma2_0[i,j]=ro^{k}
#  }
#}
#Sigma2<-Sigma2_0
#Sigma 3
#Sigma3_1=diag(0.85,25)+matrix(0.15,25,25)
#list2 <- NULL
#for (i in 1:(p/25)){
#  list2[[i]] <- Sigma3_1
#}
#Sigma3<-as.matrix(bdiag(list2))
Sigma=Sigma1
delta=0.975
t1=proc.time()
p0=delta*p
mu20=mvrnorm(1,rep(1,p),diag(rep(1, p)))
mu20_xiabiao=sort(sample(1:p,p0))
for (i in 1:p0){mu20[mu20_xiabiao[i]]=0}
#alternative 1
scal=sqrt((t(mu20)%*%solve(Sigma)%*%(mu20))/2)
#alternative 2
#scal=sqrt(t(mu20)%*%(mu20)/sqrt(tr(t(Sigma)%*%Sigma))/0.1)
mu2=mu20/rep(scal,p)
c=0
T_BF=rep(0,M)
for (q in 1:M) {
  xi<-mvrnorm(n1,mu1,Sigma)
  yi<-mvrnorm(n2,mu2,Sigma)
  x_mean<-rep(0,p)
  for(i in 1:p){
    x_mean[i]=mean(xi[,i])
  }
  y_mean<-rep(0,p)
  for(i in 1:p){
    y_mean[i]=mean(yi[,i])
  }
  z1=matrix(0,n1,p)
  for (l in 1:n1) {
    z1[l,]=x_mean
  }
  z2=matrix(0,n2,p)
  for (l in 1:n2) {
    z2[l,]=y_mean
  }
  A<-t(xi-z1)%*%(xi-z1)+t(yi-z2)%*%(yi-z2)
  k<-1/log10(n)/(eigen(A)$values[1])/(p)
  V<-k*diag(rep(1, p))
  B=((m+2*(n))*solve(solve(V)+2*(A))-(m+n)/2*solve(solve(V)+A))
  T=n1*n2/(n)*(t(x_mean-y_mean)%*%B%*%(x_mean-y_mean))
  S_n<-A/(n-2)
  mu_T=tr(B%*%S_n)
  sigma_T<-tr((B%*%S_n)%*%(B%*%S_n))-1/(n-2)*(tr(B%*%S_n))^2
  T_BF[q]=(T-mu_T)/sqrt(2*sigma_T)
  if(T_BF[q]>=qnorm(0.95)){c=c+1}
}
t2=proc.time()
t=t2-t1
cat("power =", c/M,"time",t[3][[1]],"s","\n")

References

  1. Bai, Z.; Saranadasa, H. Effect of high dimension: By an example of a two sample problem. Stat. Sin. 1996, 6, 311–329. [Google Scholar]
  2. Hotelling, H. The Generalization of Student’s Ratio. Ann. Math. Stat. 1931, 2, 360–378. [Google Scholar] [CrossRef]
  3. Anderson, T.W. An Introduction to Multivariate Statistical Analysis; Technical Report; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 1958. [Google Scholar]
  4. Srivastava, M.S.; Du, M. A test for the mean vector with fewer observations than the dimension. J. Multivar. Anal. 2008, 99, 386–402. [Google Scholar] [CrossRef] [Green Version]
  5. Srivastava, M.S. A test for the mean vector with fewer observations than the dimension under non-normality. J. Multivar. Anal. 2009, 100, 518–532. [Google Scholar] [CrossRef] [Green Version]
  6. Srivastava, M.S.; Katayama, S.; Kano, Y. A two sample test in high dimensional data. J. Multivar. Anal. 2013, 114, 349–358. [Google Scholar] [CrossRef]
  7. Chen, S.X.; Qin, Y.L. A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Stat. 2010, 38, 808–835. [Google Scholar] [CrossRef] [Green Version]
  8. Feng, L.; Zou, C.; Wang, Z.; Zhu, L. Two-sample Behrens-Fisher problem for high-dimensional data. Stat. Sin. 2015, 25, 1297–1312. [Google Scholar] [CrossRef] [Green Version]
  9. Cai, T.T.; Liu, W.; Xia, Y. Two-sample test of high dimensional means under dependence. J. R. Stat. Soc. Ser. B Stat. Methodol. 2014, 76, 349–372. [Google Scholar]
  10. Lopes, M.; Jacob, L.; Wainwright, M.J. A more powerful two-sample test in high dimensions using random projection. Adv. Neural Inf. Process. Syst. 2011, 24, 1206–1214. [Google Scholar]
  11. Srivastava, R.; Li, P.; Ruppert, D. RAPTT: An exact two-sample test in high dimensions using random projections. J. Comput. Graph. Stat. 2016, 25, 954–970. [Google Scholar] [CrossRef]
  12. Zoh, R.S.; Sarkar, A.; Carroll, R.J.; Mallick, B.K. A powerful Bayesian test for equality of means in high dimensions. J. Am. Stat. Assoc. 2018, 113, 1733–1741. [Google Scholar] [CrossRef] [PubMed]
  13. Wang, R.; Xu, X. On two-sample mean tests under spiked covariances. J. Multivar. Anal. 2018, 167, 225–249. [Google Scholar] [CrossRef]
  14. Kuelbs, J.; Vidyashankar, A.N. Asymptotic inference for high-dimensional data. Ann. Stat. 2010, 38, 836–869. [Google Scholar] [CrossRef] [Green Version]
  15. Thulin, M. A high-dimensional two-sample test for the mean using random subspaces. Comput. Stat. Data Anal. 2014, 74, 26–38. [Google Scholar] [CrossRef] [Green Version]
  16. Gregory, K.B.; Carroll, R.J.; Baladandayuthapani, V.; Lahiri, S.N. A two-sample test for equality of means in high dimension. J. Am. Stat. Assoc. 2015, 110, 837–849. [Google Scholar] [CrossRef] [PubMed]
  17. Yu, W.; Xu, W.; Zhu, L. A combined p-value test for the mean difference of high-dimensional data. Sci. China Math. 2018, 62, 961. [Google Scholar] [CrossRef]
  18. Chen, S.X.; Li, J.; Zhong, P.S. Two-sample and ANOVA tests for high dimensional means. Ann. Stat. 2019, 47, 1443–1474. [Google Scholar] [CrossRef] [Green Version]
  19. Zhang, J.T.; Guo, J.; Zhou, B.; Cheng, M.Y. A simple two-sample test in high dimensions based on L2-norm. J. Am. Stat. Assoc. 2020, 115, 1011–1027. [Google Scholar] [CrossRef]
  20. Zhu, Y.; Bradic, J. Significance testing in non-sparse high-dimensional linear models. Electron. J. Stat. 2018, 12, 3312–3364. [Google Scholar] [CrossRef]
  21. Aitkin, M. Posterior bayes factors. J. R. Stat. Soc. Ser. B Methodol. 1991, 53, 111–128. [Google Scholar] [CrossRef]
  22. Wang, R.; Xu, X. Least favorable direction test for multivariate analysis of variance in high dimension. Stat. Sin. 2021, 31, 723–747. [Google Scholar] [CrossRef]
Figure 1. Quantile–Quantile plot of asymptotic distribution for T P B F under the null hypothesis μ 1 = μ 2 against N ( 0 , 1 ) for different Σ based on 1000 independently generated T P B F with n 1 = n 2 = 70 , p = 1000 .
Figure 1. Quantile–Quantile plot of asymptotic distribution for T P B F under the null hypothesis μ 1 = μ 2 against N ( 0 , 1 ) for different Σ based on 1000 independently generated T P B F with n 1 = n 2 = 70 , p = 1000 .
Mathematics 10 01741 g001
Table 1. Empirical sizes based on 1000 replications with α = 0.05 , n 1 = n 2 = 70 and p = 1000 . RMPBT is the approach of [12], SD is the approach of [4] and CQ is the approach of [7].
Table 1. Empirical sizes based on 1000 replications with α = 0.05 , n 1 = n 2 = 70 and p = 1000 . RMPBT is the approach of [12], SD is the approach of [4] and CQ is the approach of [7].
PB RMPBT 1 RMPBT 2 SDCQ
Σ 1 0.0490.0310.0300.0400.063
Σ 2 0.0520.0380.0350.0370.049
Σ 3 0.0600.0600.0400.0450.063
Table 2. Power analysis of 4 tests assuming the true covariance matrix is Σ = Σ 1 ; n 1 = n 2 = 70 and p = 1000 . RMPBT is the approach of [12], SD is the approach of [4] and CQ is the approach of [7].
Table 2. Power analysis of 4 tests assuming the true covariance matrix is Σ = Σ 1 ; n 1 = n 2 = 70 and p = 1000 . RMPBT is the approach of [12], SD is the approach of [4] and CQ is the approach of [7].
p 0 PB RMPBT 1 RMPBT 2 SDCQ
Alternative 10.9750.4700.3320.3090.3840.450
0.9500.4780.3880.3390.4230.474
0.8000.4820.3370.3040.3890.448
0.7500.4820.3480.2940.4010.470
0.5000.4850.3720.3430.4220.473
Alternative 20.9750.7640.6850.6120.7220.761
0.9500.7970.6940.6120.7410.775
0.8000.7850.6600.5810.7170.762
0.7500.8060.6950.6160.7560.789
0.5000.7860.6770.5880.7270.767
Table 3. Power analysis of 4 tests assuming the true covariance matrix is Σ = Σ 2 . n 1 = n 2 = 70 and p = 1000 ; RMPBT is the approach of [12], SD is the approach of [4] and CQ is the approach of [7].
Table 3. Power analysis of 4 tests assuming the true covariance matrix is Σ = Σ 2 . n 1 = n 2 = 70 and p = 1000 ; RMPBT is the approach of [12], SD is the approach of [4] and CQ is the approach of [7].
p 0 PB RMPBT 1 RMPBT 2 SDCQ
Alternative 10.9750.2690.2590.2430.2190.266
0.9500.2770.2490.2320.2090.258
0.8000.2820.2610.2220.2210.270
0.7500.2990.2640.2360.2420.284
0.5000.3360.3030.2650.2680.326
Alternative 20.9750.7830.7910.7380.7220.768
0.9500.7800.7860.7340.7180.766
0.8000.7940.7550.6990.7000.756
0.7500.7920.7720.7220.7300.785
0.5000.7890.7530.6860.7200.766
Table 4. Power analysis of 4 tests assuming the true covariance matrix is Σ = Σ 3 ; n 1 = n 2 = 70 and p = 1000 . RMPBT is the approach of [12], SD is the approach of [4] and CQ is the approach of [7].
Table 4. Power analysis of 4 tests assuming the true covariance matrix is Σ = Σ 3 ; n 1 = n 2 = 70 and p = 1000 . RMPBT is the approach of [12], SD is the approach of [4] and CQ is the approach of [7].
p 0 PB RMPBT 1 RMPBT 2 SDCQ
Alternative 10.9750.2960.3150.2780.2450.294
0.9500.3030.3350.3070.2700.311
0.8000.3320.3480.3180.2850.343
0.7500.3570.3270.2940.2780.331
0.5000.4220.4140.3790.3530.401
Alternative 20.9750.7850.8360.7760.7160.755
0.9500.8010.8270.7760.7300.782
0.8000.7950.7960.7340.7280.775
0.7500.7930.7900.7270.7180.764
0.5000.7780.7740.7170.7200.761
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Jiang, Y.; Xu, X. A Two-Sample Test of High Dimensional Means Based on Posterior Bayes Factor. Mathematics 2022, 10, 1741. https://0-doi-org.brum.beds.ac.uk/10.3390/math10101741

AMA Style

Jiang Y, Xu X. A Two-Sample Test of High Dimensional Means Based on Posterior Bayes Factor. Mathematics. 2022; 10(10):1741. https://0-doi-org.brum.beds.ac.uk/10.3390/math10101741

Chicago/Turabian Style

Jiang, Yuanyuan, and Xingzhong Xu. 2022. "A Two-Sample Test of High Dimensional Means Based on Posterior Bayes Factor" Mathematics 10, no. 10: 1741. https://0-doi-org.brum.beds.ac.uk/10.3390/math10101741

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop