Next Article in Journal
Progress in Finite Time Thermodynamic Studies for Internal Combustion Engine Cycles
Next Article in Special Issue
Beyond Hypothesis Testing
Previous Article in Journal
Mixed Diffusive-Convective Relaxation of a Warm Beam of Energetic Particles in Cold Plasma
Previous Article in Special Issue
Statistical Evidence Measured on a Properly Calibrated Scale for Multinomial Hypothesis Comparisons
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Reproducibility Probability Estimation and RP-Testing for Some Nonparametric Tests

Department of Statistics and Quantitative Methods, University of Milano-Bicocca, via Bicocca degli Arcimboldi, 8, Milano 20126, Italy
*
Author to whom correspondence should be addressed.
Submission received: 13 January 2016 / Revised: 10 March 2016 / Accepted: 6 April 2016 / Published: 16 April 2016
(This article belongs to the Special Issue Statistical Significance and the Logic of Hypothesis Testing)

Abstract

:
Several reproducibility probability (RP)-estimators for the binomial, sign, Wilcoxon signed rank and Kendall tests are studied. Their behavior in terms of MSE is investigated, as well as their performances for RP-testing. Two classes of estimators are considered: the semi-parametric one, where RP-estimators are derived from the expression of the exact or approximated power function, and the non-parametric one, whose RP-estimators are obtained on the basis of the nonparametric plug-in principle. In order to evaluate the precision of RP-estimators for each test, the MSE is computed, and the best overall estimator turns out to belong to the semi-parametric class. Then, in order to evaluate the RP-testing performances provided by RP estimators for each test, the disagreement between the RP-testing decision rule, i.e., “accept H 0 if the RP-estimate is lower than, or equal to, 1/2, and reject H 0 otherwise”, and the classical one (based on the critical value or on the p-value) is obtained. It is shown that the RP-based testing decision for some semi-parametric RP estimators exactly replicates the classical one. In many situations, the RP-estimator replicating the classical decision rule also provides the best MSE.

1. Introduction

Statistical tests are usually applied in almost all fields of science to evaluate experimental results. The reproducibility probability (RP) is the true power of a statistical test, and its estimation provides useful information to evaluate the stability of statistical test results. Indeed, when the Neyman–Pearson approach is adopted, that is the Type I error probability is fixed before starting the experiment, the statistical test turns out to be a Bernoullian random variable (viz. significant/non-significant), whose parameter is the RP. Therefore, looking at the RP estimate is the natural perspective for evaluating the stability of test results: the higher the estimated RP, the more stable the observed result is estimated to be; see [1]. RP estimation was applied, for example, in the context of clinical trials [2,3,4,5,6]. Moreover, RP-testing, that is the adoption of the RP estimate to evaluate the significance of statistical test results, can substitute the p-value testing [7,8]. In detail, the RP-testing decision rule, which sounds very intuitive, states: “accept H 0 if the RP-estimate is lower than, or equal to, 1/2, and reject H 0 otherwise”. We argue that the RP-testing rule can be adopted in order to bypass the many, well-known criticisms raised by the p-value [9,10,11,12,13] In the context of nonparametric tests, RP estimation has not yet been widely studied. The only works in this field concern RP estimation and testing for the Wilcoxon rank sum test [14,15].
In this paper, some RP estimators for the most commonly-used nonparametric tests are introduced and studied. Specifically, the sign test, the binomial test, the Kendall test and the Wilcoxon signed rank test are considered. Both nonparametric and semi-parametric RP estimators are presented, for each test. Focus is placed on two features: (1) the behavior of different estimators for a given test and their consequent comparison, for example in terms of MSE; (2) the validity, exact or approximated, of the RP-testing rule based on the RP estimators presented here. For the first task, we resort to some simulation studies, whereas appropriate theoretical results are developed for the second one.
The theoretical framework of nonparametric RP estimation and testing is introduced in Section 2, where the problems that can be encountered are explained in depth; then, the class of semi-parametric estimators and that of nonparametric plug-in estimators are introduced, and some theoretical results on RP-testing are provided. In Section 3, the sign test and the binomial test are considered: semi-parametric RP estimation and testing for the binomial test are studied first; then, nonparametric estimation techniques are studied for the same aim; finally, the sign test is considered, showing that the results obtained for the binomial test hold true also for the sign test. RP estimation and testing for the Wilcoxon signed rank test is studied in Section 4, where semi-parametric and nonparametric plug-in estimators are considered and studied separately; then, the behavior of different estimators is compared through simulation. The last test considered (Section 5) is the Kendall test of monotonic association. As in the previous sections, semi-parametric and nonparametric estimators are studied separately, then a simulation is run to compare the behavior of different estimators, in terms of MSE and RP-testing performances. An example of the applications is shown in Section 6, and the conclusions are reported in Section 7.

2. RP-Estimation and Testing in the Nonparametric Framework

2.1. The General Nonparametric Framework

Let t F be the true cumulative distribution function of a study variable X. This distribution function is unknown and belongs to the class of distributions F . Assume that starting from a random sample X n = ( X 1 , , X n ) drawn from t F , it is of interest to solve the testing problem:
H 0 : t F F 0 v s H 1 : t F F \ F 0 ,
where F 0 F . Let T n = T ( X n ) be the test statistic used to solve (1). There are two typical cases that can be encountered when considering nonparametric tests:
(A)
the exact and asymptotic distributions of T n are known both under H 0 and H 1 ;
(B)
the exact and asymptotic distributions of T n are known under H 0 . Under H 1 , only the asymptotic distribution can be derived.
Case (A) is rather an exception. The binomial and sign tests are examples of tests under this case. Case (B) is the common situation: for almost all of the distribution-free tests, the exact null-distribution of T n can be derived by using permutations, combinatorics and ad hoc algorithms (see, e.g., [16]). On the contrary, the non-null distribution can be derived only recurring to large-sample approximations. A few examples include the Wilcoxon signed rank test, the Wilcoxon rank sum test and the Kendall test.
Under both Cases (A) and (B), the knowledge of the exact null distribution of T n allows the definition of the exact test:
Ψ α ( X n ) = 1 if T n R n , α 0 if T n R n , α ,
where, as usual, α denotes the Type I error probability and R n , α is a level-α critical region corresponding to the sample size n. For example, if the testing problem (1) is one-sided, the critical region takes, without loss of generality, the form R n , α = ( t n , 1 α , ) , where t n , 1 α is the ( 1 α ) -quantile of the null distribution G 0 of T n . Note that, if T n is a discrete random variable, the critical region R n , α is exact, but conservative, i.e., its Type I error probability can be lower than α, since t n , 1 α = inf t : G 0 ( t ) 1 α ) . In practice, if the sample size n is sufficiently high, an asymptotic test is usually preferred to avoid the computational effort needed to compute the exact distribution of T n . In particular, the following test is used:
Ψ ˜ α ( X n ) = 1 if T n R ˜ n , α 0 if T n R ˜ n , α ,
where R ˜ n , α is the level-α asymptotic critical region, which, considering the one-sided example mentioned above, takes the form R ˜ n , α = t ˜ n , 1 α , , where t ˜ n , 1 α denotes the ( 1 α ) -quantile of the large sample null distribution of T n . Obviously, the tests Ψ α and Ψ ˜ α become closer as the sample size n increases, and they are asymptotically equivalent. However, whatever the sample size is, there is a certain probability of disagreement between (2) and (3). To clearly explain the definition of the probability of disagreement, consider sets A 1 , A 2 and A defined as follows: A 1 = x n : Ψ α ( x n ) = 0 and Ψ ˜ α ( x n ) = 1 , A 2 = x n : Ψ α ( x n ) = 1 and Ψ ˜ α ( x n ) = 0 , A = A 1 A 2 . Set A 2 collects the realizations x n of X n for which the null hypothesis is accepted by the asymptotic test and rejected by the exact one. Conversely, set A 1 collects the realizations x n of X n for which the null hypothesis is accepted by the exact test and rejected by the asymptotic one. Therefore, the probability of disagreement between (2) and (3) is:
D ( α , n , F ) = P F ( A ) = P F ( A 2 ) + P F ( A 1 ) .
The differences between Cases (A) and (B) do not impact the definition of the statistical test to solve (1), but they determine the way the power of the test and, therefore, the RP can be evaluated: under Case (A), the power of the test can be exactly computed; under Case (B), the power can be evaluated only approximately. In detail, under Case (A), the exact power Ψ α corresponding to the distribution F F can be computed as π ( n , α , F ) = P F ( T n R n , α ) = E F [ Ψ α ( X n ) ] . Consequently, the exact RP of the test (i.e., the exact “true power” of the test) coincides with R P = π ( n , α , t F ) = P t F ( T n R n , α ) = E t F [ Ψ α ( X n ) ] . Under Case (B), the exact power of Ψ α can be approximated by π ˜ ( n , α , F ) = P ˜ F ( T n R n , α ) = E ˜ F [ Ψ α ( X n ) ] , where the symbols P ˜ F and E ˜ emphasize that probability and expectation are computed according to the asymptotic distribution of T n . In this case, the approximated RP is R P ˜ = π ˜ ( n , α , t F ) . Analogously, under Case (B), the power of Ψ ˜ α can be approximated by π a ( n , α , F ) = P ˜ F ( T n R ˜ n , α ) = E ˜ F [ Ψ ˜ α ( X n ) ] and the approximate RP results R P a = π a ( n , α , t F ) . Obviously, the approximate power π ˜ ( n , α , F ) and π a ( n , α , F ) and the approximate RP π ˜ ( n , α , t F ) and π a ( n , α , t F ) , can be computed under Case (A), as well. Moreover, in this latter case, it is also possible to compute the exact power of the approximate test. However, in practice, under Case (A), if the computational burden is acceptable, the exact test and power are usually computed. In the case of a huge computational cost, the asymptotic test and its approximate power are used. To summarize, in Table 1, the possible approaches to compute the power of a test are represented under the different scenarios that can arise under Cases (A) and (B). The background of the cell representing the approaches commonly employed in practice are colored in gray.
Under both Cases (A) and (B), it is possible to get an RP-estimator following several methodologies. These methodologies can be divided into two main subgroups: semi-parametric estimators and non-parametric estimators.

2.2. Semi-Parametric RP-Estimation and RP-Testing

As for the WRS test (see [14,15]), in common nonparametric tests, the asymptotic/exact distribution of T n depends on a vector θ t of parameters defined as particular functionals of t F . In such cases, the asymptotic/exact power can be interpreted as a function of θ t instead of a functional of t F . Now, a semi-parametric RP-estimator can be obtained by plugging an appropriate point estimator θ ^ of θ t into the expression of the exact/asymptotic power:
  • under Case (A.1), the semi-parametric RP-estimator is π ^ = π ( n , α , θ ^ ) ;
  • under Case (A.2) and Case (B.2), the semi-parametric RP-estimator is π ^ a = π a ( n , α , θ ^ ) ;
  • under Case (B.1), the semi-parametric RP-estimator is π ˜ ^ a = π ˜ ( n , α , θ ^ ) .
As will be explained later, if the estimator θ ^ is appropriately chosen and the testing problem (1) is one sided, the semi-parametric RP-estimator π ^ and π ^ a can be used to replicate the tests Ψ α and Ψ ˜ α through the RP-testing technique: “accept H 0 if the RP-estimate is lower or equal to 1/2 and reject H 0 otherwise”. For several non-parametric tests (see [8] for the general parametric case), if the estimator θ ^ is appropriately chosen, it is possible to demonstrate that T n R n , α π ^ > 1 / 2 or T n R ˜ n , α π ^ a > 1 / 2 . Then, the exact and asymptotic tests can be rewritten as
Ψ α ( X n ) = 1 if π ^ > 1 / 2 0 if π ^ 1 / 2 and Ψ ˜ α ( X n ) = 1 if π ^ a > 1 / 2 0 if π ^ a > 1 / 2 .
The above identities cover Cases (A.1), (A.2) and (B.2). In Case (B.1), the exact test cannot generally be replicated through the RP-testing technique based on semi-parametric estimators. However, the following lemma (proved in the Supplementary Material) describes a case in which this is possible:
Lemma 1. 
Assume that the testing problem (1) is one sided and that the exact test based on the test statistic T n is Ψ α ( X n ) = 1 if T n > t n , 1 α 0 if T n t n , 1 α . Moreover, assume that T n E [ T n ] V a r [ T n ] d N ( 0 , 1 ) , with E [ T n ] = e ( θ t ) and V a r [ T n ] = v ( θ t ) . If θ ^ is such that T n = e ( θ ^ ) , then:
1.
the RP-based decision rule defined by π ˜ ^ a = π ˜ ( n , α , θ ^ ) = 1 Φ t n , 1 α e ( θ ^ ) v ( θ ^ ) exactly replicates the exact test Ψ α ;
2.
the RP-based decision rule defined by π ˜ ^ a * = 1 Φ t n , 1 α e ( θ ^ ) V ^ with V ^ any estimator for V a r [ T n ] , exactly replicates the exact test Ψ α .

2.3. Non-Parametric RP-Estimation and RP-Testing: The Non-Parametric Plug-In Approach

As pointed out in [17] and in [3,18], it is possible to estimate the RP by using a non-parametric plug-in estimator. Under Cases (A.1) and (B.1), it is possible to consider the plug-in estimators π ^ e P I = P F ^ n ( T n > t n , 1 α ) = E F ^ n [ Ψ α ( X n ) ] where F ^ n denotes the empirical cumulative distribution function (ecdf). In practice, π ^ e P I coincides with the rejection rate computed performing test Ψ α over all n n possible samples of size n that can be drawn from the ecdf: π ^ e P I = 1 n n x n i X ( X n ) Ψ α ( x n i ) where X ( X n ) denotes the set of all of the samples of size n that can be drawn with replacement from the ecdf corresponding to X n . Apart from some special cases, the analytical expression of π ^ e P I cannot be derived. Consequently, it is usually approximated by the Monte-Carlo method: B samples of length n are drawn from the ecdf. The test Ψ α is then performed over all of the B samples, and the plug-in RP-estimate is computed as the rejection rate. In detail:
π ^ P I = 1 B j = 1 B Ψ α ( X n j )
where X n j denotes the j-th re-sample drawn from the ecdf. Similarly, under Case (A.2) and Case (B.2), it is possible to define the plug-in RP-estimator starting from the asymptotic test obtaining π ^ a , e P I = 1 n n x n i X ( X n ) Ψ ˜ α ( x n i ) and:
π ^ a P I = 1 B j = 1 B Ψ ˜ α ( X n j ) .
The plug-in RP-estimators introduced above can be used to define the RP-based test Ψ α P I , e ( X n ) = 1 if π ^ e P I > 1 / 2 0 if π ^ e P I > 1 / 2 , Ψ ˜ α P I , e ( X n ) = 1 if π ^ a , e P I > 1 / 2 0 if π ^ a , e P I > 1 / 2 , Ψ α P I ( X n ) = 1 if π ^ P I > 1 / 2 0 if π ^ P I > 1 / 2 and Ψ ˜ α P I ( X n ) = 1 if π ^ a P I > 1 / 2 0 if π ^ a P I > 1 / 2 .
However, there are no general theoretical results assuring that Ψ α P I , e and Ψ ˜ α P I , e or Ψ α P I and Ψ ˜ α P I are level-α tests equivalent to Ψ α and Ψ ˜ α , respectively.

3. RP-Estimation and Testing for the Binomial and Sign Test

In this section, the performances of the semi-parametric and non-parametric R P estimators for binomial and sign tests are evaluated. At first, the binomial test is considered. Let X n = ( X 1 , . . . , X n ) be a random sample drawn from the Bernoulli distribution with unknown parameter p t . The statistical hypotheses of interest are:
H 0 : p t p 0 v e r s u s H 1 : p t > p 0 .
The previous hypotheses can be tested by using the statistic P ^ = 1 n i = 1 n X i . The exact and asymptotic distribution of P ^ is known both under H 0 and under H 1 , and consequently, this test falls under Case (A). Specifically, n P ^ B i n o m i a l ( n , p t ) and n P ^ p t p t ( 1 p t ) d N ( 0 , 1 ) . The exact test is then given by Ψ α ( X n ) = 1 if P ^ > c α 0 if P ^ c α where c α = b ( 1 α ; n , p 0 ) n and b ( q ; n , p ) is the q-quantile of the binomial distribution with parameters n and p (the test so-defined is conservative). The asymptotic test results Ψ ˜ α ( X n ) = 1 if n P ^ > n c ˜ α 0 if n P ^ n c ˜ α where c ˜ α = p 0 + z 1 α p 0 ( 1 p 0 ) n , z q is the q-quantile of the standard normal distribution and · denotes the floor function. Obviously, the exact and the asymptotic critical regions are not equivalent. Their disagreement can be evaluated using Expression (4), which, in this case, can be exactly evaluated. In Table S1 of the Supplementary Material, the values of D ( p t , n , α , p 0 ) are computed by fixing α = 0 . 05 for some values of n, p 0 and p t . From this table, it emerges that the probability of disagreement between the tests Ψ α ( X n ) and Ψ ˜ α ( X n ) can be very high for some combinations of n, p 0 and p t : it is often higher than 10% (up to 20%) with sample size n = 15 , and it remains higher than 10%, for just a few cases, even with n = 30 .

3.1. Semi-Parametric RP-Estimation and Testing for the Binomial Test

The exact power function and the exact RP of Ψ α (Case (A.1)) are π ( n , α , p ) = 1 B ( n c α ; n , p ) and R P = π ( n , α , p t ) , where B ( · ; n , p ) is the binomial cumulative distribution function with parameters n and p. Following [8], the semi-parametric RP-estimator based on the exact power is obtained by plugging the median estimator for p t into the expression of π ( n , α , p ) . The median estimator P ^ is defined as the solution of the equation B ( n p ^ ; n , P ^ ) = 1 / 2 , and the resulting RP-estimator is π ^ = 1 B ( n c α ; n , P ^ ) . Similarly, the approximate power function of Ψ ˜ α (Case (A.2)) is π a ( n , α , p ) = 1 Φ n p 0 p p ( 1 p ) + z 1 α p 0 ( 1 p 0 ) p ( 1 p ) , where Φ ( · ) is the standard normal cdf, and the approximate RP results R P = π a ( n , α , p t ) . The corresponding RP-estimator is then π ^ a = π a ( n , α , P ^ ) . Note that, in this case, the probability distribution of π ^ and π ^ a can be obtained analytically. In particular, the support of π ^ is given by the values π ^ ( s ) = 1 B ( n c α ; n , p ^ s ) , s = 0 , 1 , . . . , n , where p ^ s is the solution of B ( s ; n , p ^ ) = 1 / 2 . The probability function of π ^ is given by P π ^ = π ^ ( s ) = n s p t s ( 1 p t ) n s , s = 0 , 1 , . . . , n . Analogously, the support of π ^ a is π ^ a ( s ) = 1 Φ n p 0 s / n s / n ( 1 s / n ) + z 1 α p 0 ( 1 p 0 ) s / n ( 1 s / n ) , s = 0 , 1 , . . . , n , and P π ^ a = π ^ a ( s ) = P π ^ = π ^ ( s ) = n s p t s ( 1 p t ) n s , s = 0 , 1 , . . . , n .
Now, both the asymptotic and the exact tests can be replicated by using the RP-estimators defined above. Specifically, thanks to the results in [8] (which require the adoption of the median estimator P ^ in the definition of the RP-estimator π ^ ), it results that:
Ψ α ( X n ) = 1 if P ^ > c α 0 if P ^ c α = 1 if π ^ > 1 / 2 0 if π ^ 1 / 2 .
Similarly, it is easy to verify that: Ψ ˜ α ( X n ) = 1 if P ^ > c ˜ α 0 if P ^ c ˜ α = 1 if π ^ a > 1 / 2 0 if π ^ a 1 / 2 . Note that, also for the validity of this last identity, the use of the point estimator P ^ in the definition of π ^ a is fundamental.

3.2. Non-Parametric RP-Estimation and Testing for the Binomial Test

The case of the binomial test is particularly interesting when studying the features of the plug-in RP-estimators, since, in this context, the probability function of the estimators π ^ e P I and π ^ a , e P I can be analytically derived, and the RP-based decision rules based on the latter can be analytically studied. Lemma 2 below describes the analytical expression of the non-parametric plug-in RP-estimator for the exact binomial test (Point 1); provides the probability distribution of this estimator (Point 2); establishes the equivalence between the exact binomial test and the RP-based decision rule derived by the non-parametric plug-in estimator (Point 3). Similar results concerning the asymptotic binomial test are provided in Lemma 3.
Lemma 2. 
Let X n = ( X 1 , . . . , X n ) be a random sample drawn from the Bernoulli distribution with unknown parameter p t in order to test hypotheses (7). It results that:
1.
π ^ e P I = 1 n n x n i X ( X n ) Ψ α ( x n i ) = 1 B n c α ; n , P ^ ;
2.
the support of π ^ e P I is π ^ e P I ( s ) = 1 B n c α ; n , s n , s = 0 , 1 , . . . , n , and
P ( π ^ e P I = π ^ e P I ( s ) ) = n s p t s ( 1 p t ) n s .
3.
the decision rule based on the RP-estimator π ^ e P I exactly replicates the exact Binomial test Ψ α .
Lemma 3. 
Let X n = ( X 1 , . . . , X n ) be a random sample drawn from the Bernoulli distribution with unknown parameter p t in order to test Hypotheses (7). It results that:
1.
π ^ a , e P I = 1 n n x n i X ( X n ) Ψ ˜ α ( x n i ) = 1 B n c ˜ α ; n , P ^ ;
2.
the support of π ^ a , e P I is π ^ a , e P I ( s ) = 1 B n c ˜ α ; n , s n , s = 0 , 1 , . . . , n , and
P ( π ^ a , e P I = π ^ a , e P I ( s ) ) = n s p t s ( 1 p t ) n s .
3.
the decision rule based on the RP-estimator π ^ a , e P I exactly replicates the asymptotic Binomial test Ψ ˜ α .
The proofs of Lemma 2 and Lemma 3 are reported in the Supplementary Material.

3.3. Evaluating the Performances of the RP-Estimators for the Binomial Test

In the case of the binomial test, it is possible to compute the exact expectation and the MSE of π ^ and π ^ a , π ^ e P I and π ^ a , e P I . In order to make a comparison among these estimators, their exact bias and MSE are represented in Figure S1 and Figure S2 of the Supplementary Material. Here, in Figure 1, only the MSE curves with n = 15 , α = 0 . 05 , and p = 0 . 2 , 0 . 5 , are given. From these figures, it emerges that there is no RP-estimator that uniformly performs best. Concerning the estimators for the power of Ψ α ( X n ) , there is a tangible difference between the performance of π ^ and π ^ e P I . For a wide range of small values of p t , π ^ has a bias and MSE, which is greater than the one of π ^ e P I ; for large values of p t , π ^ generally performs better than π ^ e P I ; whereas, the performances of π ^ a and π ^ a P I for the power of Ψ ˜ α ( X n ) are very similar. Regarding RP-testing, we recall that there is no disagreement between classical binomial tests (exact or approximated) and their RP-based version. The results obtained here for the binomial test still hold for the sign test. The interested reader is referred to the Supplementary Material where the connection between these tests is explained in depth.

4. RP-Estimation and Testing for the Wilcoxon Signed Rank Test

Let X n = ( X 1 , . . . , X n ) be a random sample from a continuous and symmetric cdf F θ t with median θ t . In order to test H 0 : θ t θ 0 vs H 1 : θ t > θ 0 , it is possible to apply the Wilcoxon signed rank (WSR) test, which is based on the statistic W = i = 1 n I i i R i = i = 1 n j = i n I i j where Z i = X i θ 0 , R i = rank ( | Z i | ) and:
I i j = 1 if Z i + Z j > 0 0 if Z i + Z j < 0 .
Following the classification proposed in Section 2, the WSR test falls under Case (B), since the exact distribution of W can be derived by enumeration (see [19] on p. 126) under H 0 , but, under H 1 , it can only be approximated by using a central limit theorem. In particular, it is well known (see [19] on p. 166) that W E F θ t [ W ] V a r F θ t ( W ) d N ( 0 , 1 ) where:
E F θ t [ W ] = e ( p , p 1 ) = n ( n 1 ) 2 p 1 + n p ,
V a r F θ t [ W ] = v ( p , p 1 , p 2 ) = n ( n 1 ) ( n 2 ) ( p 2 p 1 2 ) + n ( n 1 ) 2 2 ( p p 1 ) 2 + 3 p 1 ( 1 p 1 ) + n p ( 1 p ) ,
with:
p = P F θ t ( Z > 0 ) , p 1 = P F θ t ( Z + Z > 0 ) , p 2 = p 1 = P F θ t ( Z + Z > 0 and Z + Z > 0 ) ,
being Z = X θ 0 , Z and Z i.i.d. to Z.
Note that, under H 0 , p = p 1 = 1 2 and p 2 = 1 3 . These results allow the use of W in order to define the exact and asymptotic tests
Ψ α ( X n ) = 1 if W > w α 0 otherwise and Ψ ˜ α ( X n ) = 1 if W > w ˜ α 0 otherwise
where w α denotes the ( 1 α ) -quantile of the exact null distribution of W and w ˜ α = n ( n + 1 ) 4 + z 1 α n ( n + 1 ) ( 2 n + 1 ) 24 . Obviously, the exact and asymptotic tests are not equivalent, and their disagreement is evaluated using Expression (4). In Table S2 of the Supplementary Material, the values of D ( α , n , F θ t , θ 0 ) are computed by fixing α = 0 . 05 and θ 0 = 0 for some values of n and θ t and by considering X N ( θ t , 1 ) (light tails) and X Cauchy ( θ t ) (fat tails).

4.1. Semi-Parametric RP-Estimation and Testing for the WSR Test

As mentioned above, the WSR is classified under Case (B). Therefore, its exact power function cannot be generally determined. However, it can be approximated thanks to the asymptotic normality of W. The approximation of the power function of the exact test Ψ α ( X n ) is π ˜ ( n , α , F θ , θ 0 ) 1 Φ w α E F θ [ W ] V a r F θ [ W ] . Analogously, the approximation of the power function of the asymptotic test Ψ ˜ α ( X n ) is π a ( n , α , F θ , θ 0 ) 1 Φ w ˜ α E F θ [ W ] V a r F θ [ W ] . Now, in order to define some semi-parametric RP-estimators starting from the approximated power function reported above, it is necessary to derive the estimators for E F θ [ W ] and V a r F θ [ W ] . They can be obtained by plugging into Expressions (9) and (10) the estimators for the parameters p, p 1 and p 2 , defined in (11). Below, two different estimators for these parameters are considered.
  • Analogic estimators:
    p ^ = 1 n i = 1 n I i i , p ^ 1 = 1 n ( n 1 ) i = 1 n j = 1 i j n I i j = 2 n ( n 1 ) i = 1 n j = i + 1 n I i j ,
    p ^ 2 = 1 n ( n 1 ) ( n 2 ) i = 1 n j = 1 n k = 1 i j k n I i j I i k = 2 n ( n 1 ) ( n 2 ) i = 1 n j = 1 n k = j + 1 i j i k n I i j I i k .
  • Plug-in estimators. In order to introduce the plug-in estimators for p, p 1 and p 2 , let G θ t ( z ) = F θ t ( z + θ 0 ) and g θ t ( z ) be the cumulative distribution function and the density function of Z = X θ 0 . By using this notation, it is easy to note that p = G θ t ( 0 ) , p 1 = = 1 E G θ t [ G θ ( Z ) ] , and p 2 = 1 2 E G θ t [ G θ ( Z ) ] + E G θ t [ G θ ( Z ) 2 ] . Let G n be the empirical distribution function of the Z i ’s (i.e., of the X i θ 0 ’s). By plugging G n into the above expressions, the following estimators are obtained:
    p ˜ = 1 G n ( 0 ) p ^ , p ˜ 1 = 1 1 n i = 1 n G n ( Z i ) , p ˜ 2 = 1 2 1 n i = 1 n G n ( Z i ) + 1 n i = 1 n G n 2 ( Z i ) .
Now, the following RP-estimators for the exact test can be introduced: π ^ 1 = 1 Φ w α E ^ V ^ and π ^ 2 = 1 Φ w α E ˜ V ˜ , where E ^ = e ( p ^ , p ^ 1 ) , V ^ = v ( p ^ , p ^ 1 , p ^ 2 ) , E ˜ = e ( p ˜ , p ˜ 1 ) and V ˜ = v ( p ˜ , p ˜ 1 , p ˜ 2 ) . Analogously, the following RP-estimators for the asymptotic test can be introduced: π ^ a 1 = 1 Φ w ˜ α E ^ V ^ and π ^ a 2 = 1 Φ w ˜ α E ˜ V ˜ .
Following the idea in [20], the approximated power of nonparametric tests can be simplified by assuming that the variance of the test statistic is close to its value under H 0 (see [19] on pp. 72 and 167, for other applications of Noether’s approach). In that case, the approximated and simplified power functions of Ψ α ( X n ) and Ψ ˜ α ( X n ) result: π ˜ ( n , α , F θ , θ 0 ) 1 Φ w α E F θ t [ W ] n ( n + 1 ) ( 2 n + 1 ) 24 , and π a ( n , α , F θ , θ 0 ) 1 Φ w ˜ α E F θ t [ W ] n ( n + 1 ) ( 2 n + 1 ) 24 . These expressions give rise to the following additional RP-estimators:
-
RP-estimators for the exact test:
π ^ 1 S = 1 Φ w α E ^ n ( n + 1 ) ( 2 n + 1 ) 24 and π ^ 2 S = 1 Φ w α E ˜ n ( n + 1 ) ( 2 n + 1 ) 24 ;
-
RP-estimators for the asymptotic test:
π ^ a S 1 = 1 Φ w ˜ α E ^ n ( n + 1 ) ( 2 n + 1 ) 24 and π ^ a S 2 = 1 Φ w ˜ α E ˜ n ( n + 1 ) ( 2 n + 1 ) 24 .
Finally, the estimators based on the following Noether’s power approximation π ˜ ( n , α , F θ , θ 0 ) π a ( n , α , F θ , θ 0 ) 1 Φ z 1 α 3 n p 1 1 2 are also considered here. In particular, the estimators π ^ N 1 = 1 Φ z 1 α 3 n p ^ 1 1 2 and π ^ N 2 = 1 1 Φ z 1 α 3 n p ˜ 1 1 2 are applied to estimate the RP of both the exact and asymptotic WSR tests.
Concerning the RP-based version of the WSR test based on the introduced semi-parametric RP-estimators, the following corollary (proven in the Supplementary Material) can be stated:
Corollary 4. 
The decision rules based on the RP-estimators π ^ 1 and π ^ 1 S exactly replicate the exact WSR test Ψ α . Analogously, the decision rules based on the RP-estimators π ^ a 1 and π ^ a S 1 exactly replicate the asymptotic WSR test Ψ ˜ α .
Concerning the RP-based decision rules stemming from the remaining semi-parametric RP-estimators (i.e., π ^ 2 , π ^ 2 S , π ^ a 2 , π ^ a S 2 , π ^ N 1 and π ^ N 2 ), they do not replicate the exact/asymptotic WSR tests, and their disagreement probabilities will be evaluated in Section 4.3.

4.2. Non-Parametric RP-Estimation and Testing for the WSR Test

As explained, in Section 2, the RP of the exact and asymptotic WSR test can be estimated by using (5) and (6), respectively. Here, we consider the non-parametric RP-estimators π ^ 5 P I , π ^ 10 P I , π ^ 20 P I , π ^ a 5 P I , π ^ a 10 P I and π ^ a 20 P I . The first three estimators coincide with (5) with B = 500 , B = 1000 , B = 2000 . The last three estimators coincide with (6) with B = 500 , B = 1000 , B = 2000 . As mentioned above, the RP-based decision rules based on these estimators do not replicate the exact and asymptotic WSR tests, respectively, and their disagreement probabilities will be evaluated in Section 4.3.

4.3. Evaluating the Performances of the RP-Estimators for the WSR Test

In order to evaluate the performances of the several RP-estimators introduced above for the exact and asymptotic WSR test, a simulation study is built. The scenarios considered in the simulation study regard the testing problem H 0 : θ t 0 vs. H 0 : θ t > 0 with α = 0 . 05 . The considered sample sizes are n = 15 , 30 , 60 , 120 , 240 . Data are drawn from normal distribution with unit variance and mean (median) θ t and shifted Cauchy with median θ t . For each one of the considered sample sizes and distributions (normal or Cauchy), 19 values for θ t have been considered. These values have been obtained by simulation and have been chosen in order to provide the following prefixed values for the power of the exact/asymptotic test: (α, 0.1, 0.15, 0.20, 0.25, ..., 0.85, 0.9, 0.95). In each simulation, 10 4 replications are considered.
The results of the simulation study are summarized in Tables S3 and S4 in the Supplementary Material, where the averages (computed over the 19 different values of θ t ) of the simulated MSE, simulated bias and disagreement rate are provided. Here, in Table 2, only the simulated MSE and disagreement rate related to the Cauchy distribution are provided.
Note that the disagreement between the exact Wilcoxon signed rank test and its approximated versions is often higher than 2% with n = 15 (up to 2.5%) and can reach 0.8% with n = 30 (see Table S2 in the Supplementary Material). Rather, the averaged disagreement between classical tests and their RP-based version, with n = 15 , surpasses 2% just with two estimators, whereas for some of them, no disagreement is shown; with n = 30 , some RP estimators provide a disagreement between the classical test and the RP-based one resulting in a little higher than 1%, but no disagreement is shown for two of them.
Regarding RP estimation, the estimators that globally have the lowest MSE are π ^ a S 2 for the approximated test and π ^ S 2 for the exact test. However, these estimators do not exactly replicate the corresponding classical test. By considering both the estimation performance and the disagreement probability, we suggest using the estimators π ^ a S 1 for the approximated test and π ^ S 1 for the exact test, since their MSE is very similar to the ones of π ^ a S 2 and π ^ S 2 , but their disagreement probability is null. As a final remark, note the good performance of the non-parametric plug-in estimators, which is not far from the one of the semi-parametric ones, even if they are not ad hoc estimators.

5. RP-Estimation and Testing for the Kendall Test of Monotonic Association

Let ( X , Y ) be a bivariate continuous random variable with joint distribution F t and margins t F X and t F Y . Let ( X , Y ) n = ( X i , Y i ) , i = 1 , . . . , n be a random sample drawn from F t . To test the presence of positive or negative monotone association between X and Y, the Kendall test can be adopted. Without loss of generality, consider the alternative hypothesis of positive monotone association. In that case, the testing problem of interest is H 0 : τ t 0 vs. H 0 : τ t > 0 , where τ t is the Kendall rank-correlation coefficient, which, under the assumption of absolute continuity of F t , is defined as the difference between the probability of concordance p 1 and the probability of discordance p 1 : τ t = p 1 p 1 = 2 p 1 1 with p 1 = P F t ( X X ) ( Y Y ) > 0 , p 1 = P F t ( X X ) ( Y Y ) < 0 = 1 p 1 and ( X , Y ) i.i.d as ( X , Y ) . The test statistics is τ ^ = 2 K n ( n 1 ) where K = i = 1 n 1 j = i + 1 n sign ( X i X j ) · sign ( Y i Y j ) and sign ( a ) = 1 if a > 0 1 if a < 0 . As for the WSR test, the Kendall test falls under Case (B). The exact distribution of τ ^ can be derived, under H 0 , by enumeration or by using a recurrence relation (see [21]), but generally, it can only be approximated through a central limit theorem under H 1 . In particular, it is well known (see [22]) that τ ^ E F t [ τ ^ ] V a r F t [ τ ^ ] d N ( 0 , 1 ) where:
E F t [ τ ^ ] = τ t
and V a r F t [ τ ^ ] = u ( τ t , p 2 ) = 2 n ( n 1 ) ( 1 τ t 2 ) + 4 ( n 2 ) n ( n 1 ) ( 2 p 2 1 τ t 2 )
with p 2 = P F t [ ( X X ) ( Y Y ) ( X X ) ( Y Y ) > 0 ]
being ( X , Y ) and ( X , Y ) i.i.d as ( X , Y ) .
Note that, under H 0 , p 2 = 5 9 , and consequently: V a r 0 [ τ ^ ] = u 0 , 5 9 = 2 ( 2 n + 5 ) 9 n ( n 1 ) . These results allow the introduction of the exact and asymptotic Kendall tests Ψ α ( ( X , Y ) n ) = 1 if τ ^ > t α 0 otherwise and Ψ ˜ α ( ( X , Y ) n ) = 1 if τ ^ > t ˜ α 0 otherwise where t α denotes the ( 1 α ) -quantile of the exact null distribution of τ ^ and t ˜ α = z 1 α 2 ( 2 n + 5 ) 9 n ( n 1 ) .
Note that the computational burden necessary to compute the exact null distribution of τ ^ increases rapidly with n. From a practical point of view, the exact test can be performed, if n < 9 , by computing the exact ( 1 α ) -quantile from the null distribution of τ ^ using, for example, the software R [23] function q K e n d a l l of package SuppDists [24]. If n > 9 , the asymptotic test Ψ ˜ α is generally performed or an Edgeworth expansion (see [25]) is used to obtain a better approximation of t α . The ( 1 α ) -quantile from the null distribution of τ ^ approximated by the Edgeworth expansion is also computed by q K e n d a l l . When n > 9 , it is common practice to refer to the test based on the Edgeworth expansion as the “exact” Kendall test, even if it is actually an approximated test. From here onwards, this commonly-used terminology will be adopted.
Obviously, the exact and asymptotic tests are not equivalent, and their disagreement is evaluated, again, using Expression (4). In Table S5 of the Supplementary Material, the probabilities of disagreement between the asymptotic and exact (based on Edgeworth expansion) tests are computed by fixing α = 0 . 05 for some values of n and τ t when sampling from the bivariate normal distribution with correlation coefficient ρ and from the bivariate Student’s t distribution with three degrees of freedom (df) and correlation coefficient ρ.

5.1. Semi-Parametric RP-Estimation and Testing for the Kendall Test

The exact power function of the Kendall test cannot be generally determined, but it can be approximated thanks to the asymptotic normality of τ ^ . In particular, the approximation of the power function of the exact test Ψ α ( ( X , Y ) n ) is π ˜ ( n , α , F t ) 1 Φ t α E F t [ τ ^ ] V a r F t [ τ ^ ] . Analogously, the approximation of the power function of the asymptotic test Ψ ˜ α ( ( X , Y ) n ) is π a ( n , α , F τ ) 1 Φ t ˜ α E F t [ τ ^ ] V a r F t [ τ ^ ] .
Now, in order to define semi-parametric RP-estimators starting from the approximated power function reported above, it is necessary to derive estimators for E F t [ τ ^ ] and V a r F t [ τ ^ ] . From Expressions (12) and (13), it follows that E F t [ τ ^ ] can be estimated by τ ^ , while an estimator for V a r F t [ τ ^ ] can be introduced once an estimator for p 2 has been defined. Two different estimators for p 2 are considered here:
  • Analogic estimators: Remembering Expression (14), the analogic estimator for p 2 results: p ^ 2 = 1 n ( n 1 ) ( n 2 ) i = 1 n j = 1 i j k n k = 1 n I i j k where
    I i j k = 1 if ( x i x j ) ( x i x k ) ( y i y j ) ( y i y k ) > 0 0 otherwise .
  • Plug-in estimators: In order to introduce these estimators, the following alternative expression for p 2 is useful: p 2 = p ( x , y ) 2 + ( 1 p ( x , y ) ) 2 f t ( x , y ) d x d y = E F t p ( X , Y ) 2 + ( 1 p ( X , Y ) 2 ) where p ( x , y ) = P F t [ X x Y y ] + P F t [ X > x Y > y ] = 1 t F X ( x ) t F Y ( y ) + 2 F t ( x , y ) .
    Now, let F ^ n X , F ^ n Y and F ^ n be the ecdfs of X, Y and ( X , Y ) , respectively. The plug-in estimators for p 2 results: p ˜ 2 = 1 n i = 1 n p ˜ ( x i , y i ) 2 + ( 1 p ˜ ( x i , y i ) ) 2 where p ˜ ( x , y ) = 1 F ^ n X ( x ) F ^ n Y ( y ) + 2 F ^ n ( x , y ) .
Now, the following RP-estimators for the exact test can be introduced: π ^ 1 = 1 Φ t α τ ^ U ^ and π ^ 2 = 1 Φ t α τ ^ U ˜ , where U ^ = u ( τ ^ , p ^ 2 ) , U ˜ = u ( τ ^ , p ˜ 2 ) . Analogously, the following RP-estimators for the asymptotic test can be introduced: π ^ a 1 = 1 Φ t ˜ α τ ^ U ^ and π ^ a 2 = 1 Φ t ˜ α τ ^ U ˜ .
As for the WSR test, the approximated power of nonparametric tests can be simplified following Noether’s idea by assuming that the variance of the test statistic is close to the value it assumes under H 0 , obtaining the estimators π ^ S = 1 Φ t α τ ^ 2 ( 2 n + 5 ) 9 n ( n 1 ) and π ^ a S = 1 Φ t ˜ α τ ^ 2 ( 2 n + 5 ) 9 n ( n 1 ) for the exact and asymptotic tests, respectively. Note that the above estimators are very simple, since they do not require the estimation of p 2 . Another approach that can be followed in order to introduce an approximation of the power function π ˜ ( n , α , F t ) and π a ( n , α , F t ) is described below. From Expression (14), it is clear that p 2 is not independent from τ t , and there is no unique function describing the behavior of p 2 as a function of τ t since the relation between p 2 and τ t depends on the entire shape of F t . However, if τ t = ± 1 , then p 2 = 1 , while if τ t = 0 , then p 2 = 5 / 9 . Then, the relation between τ t and p 2 can be intuitively represented by the parabola passing through the points ( 1 , 1 ) , ( 0 , 5 / 9 ) and ( 1 , 1 ) : p 2 = 1 4 9 ( 1 τ t 2 ) . By substituting this expression into (13) one obtains the RP-estimators π ^ L = 1 Φ t α τ ^ 2 ( 2 n + 5 ) 9 n ( n 1 ) ( 1 τ ^ 2 ) and π ^ a L = 1 Φ t ˜ α τ ^ 2 ( 2 n + 5 ) 9 n ( n 1 ) ( 1 τ ^ 2 ) for the exact and asymptotic tests, respectively.
For completeness, the [20] estimator is also considered and applied both to the exact and asymptotic tests: π ^ N = 1 Φ z 1 α 3 2 n τ ^ . Finally, the estimators deduced from the power approximation provided in [2] are considered: π ^ C 1 = 1 Φ z 1 α 3 n 2 τ ^ 2 p ^ 2 1 τ ^ 2 , π ^ C 2 = 1 Φ z 1 α 3 n 2 τ ^ 2 p ˜ 2 1 τ ^ 2 .

5.2. Non-Parametric RP-Estimation and Testing for the Kendall Test

As for the WSR test, the RP of the exact and asymptotic Kendall test can be estimated by using the non-parametric RP-estimators π ^ 5 P I , π ^ 10 P I , π ^ 20 P I , π ^ a 5 P I , π ^ a 10 P I and π ^ a 20 P I . It is recalled once again that the RP-based decision rules based on these estimators do not replicate the exact and asymptotic Kendall tests, respectively, and their disagreement probabilities will be evaluated next.

5.3. Evaluating the Performances of the RP-Estimators for the Kendall Test

In order to evaluate the performances of the several RP-estimators introduced above for the exact and asymptotic Kendall test, a simulation study is built. The scenarios considered in the simulation study regards the testing problem H 0 : τ t 0 vs. H 1 : τ t > 0 . The considered sample sizes are n = 15 , 30 , 60 , 120 . Data are drawn from the bivariate standard normal distribution with correlation coefficient ρ and from the bivariate Student’s t distribution with three df and correlation coefficient ρ. For each one of the considered sample sizes and distributions, 19 values for ρ have been considered. These values have been obtained by simulation and have been chosen in order to provide the following prefixed values for the power of the exact/asymptotic test: ( α , 0 . 1 , 0 . 15 , 0 . 20 , 0 . 25 , . . . , 0 . 85 , 0 . 9 , 0 . 95 ) . In each simulation, 10 4 replications are considered. The results of the simulation study are summarized in Table S6 and Table S7 of the Supplementary Materials. In these tables, the averages (computed over the 19 different values of θ) of the simulated MSE, simulated bias and disagreement rate are provided. Here (see Table 3), only the simulated MSE and disagreement rate obtained under the bivariate Student’s t distribution with three df are reported. As for the binomial and sign tests, the disagreement between the exact Kendall test and its approximated versions is quite high: often higher than 5% and in some cases higher than 10%, both with n = 15 and n = 30 . The disagreement is still remarkable even with n = 120 . On the contrary, the averaged disagreement between the classical asymptotic test and its RP-based version is between 3% and 7% for just three estimators, for each sample size, whereas the other estimators provide a disagreement often lower than 1%. The disagreement between the classical exact test and the RP-based one results in being a little higher than the previous case, but still lower than the disagreements between classical tests.
Regarding RP estimation, the simulation results suggest that the best estimators for the approximated and exact tests are π ^ a 2 and π ^ 2 , respectively. Indeed, these two estimators generally have the least MSE and a null probability of disagreement. Also in these cases, the good performance of the general non-parametric plug-in estimators should be noted.

6. Example of Applications

Let us consider the data reported in Table 4 (see [26], p.38), concerning the Hamilton depression scale factor (HDSF) in nine patients with mixed anxiety and depression, observed at a first visit before the initiation of a therapy (X) and at a second visit after administration of a tranquilizer (Y). An improvement due to the tranquilizer corresponds to a reduction of the HDSF. Six patients out of nine showed a reduction; one was almost invariant; and two gave small increments. The sign test and the WSR test have been applied to evaluate HDSF reduction and the Kendall test to evaluate the association between X and Y.
For each test, the RP estimates given by the best semiparametric estimator (among those studied above) and by the nonparametric π ^ 20 P I are computed, at three levels of α: 0 . 01 , 0 . 05 , 0 . 1 (see Table 5). First, note that RP estimates decrease as tests become stricter, i.e., as α decreases. Second, RP estimates fulfill RP-testing. Third, RP estimates might differ from one technique to another: the nonparametric technique is not the most reliable, but is a general one, whereas the best RP estimation technique should be customized for each test.
As concerns the interpretation of the results, RP estimates highlight that significant outcomes are often less reproducible than one may think. For example, when α = 0 . 05 , the significance threshold for the WSR test with n = 9 data is w 0 . 05 = 36 , and the significant result W o b = 40 , although providing a p-value that is quite small (i.e., 0 . 02 ), gave an RP estimate of about 2 / 3 : this means that, it is estimated that, under the same experimental conditions, about one out of three test replications will not show significance.
On the other hand, non-significant outcomes might be highly variable, and significance can be found with non-negligible probability when replicating the experiment. Continuing the example above, and assuming that α was 0 . 01 , the observed test statistic provides a non-significant p-value of about two-times α, but also gives an RP estimate not far from 50 % .
Finally, we remark that even when p-values are quite a bit smaller than α, RP estimates may not be high, that is the test results (viz. significances) are estimated to be quite variable. For example, when p-values result in being about one order of magnitude smaller than α, RP estimates are still close to 80 % .

7. Conclusions

Several results have been obtained, concerning both the precision of RP estimators and of RP-testing in the cases of the binomial, sign, Wilcoxon signed rank and Kendall tests.
For both the binomial and sign tests, the RP-testing rule holds exactly, also when nonparametric estimators of RP are adopted. In terms of estimation performances, semi-parametric and nonparametric estimators behave similarly.
For the WSR and Kendall tests, the RP-testing rule holds exactly for just some RP estimators, and for the remaining ones, the disagreement is very small. It is worth noting that the disagreement between these two classical exact tests and their respective approximated version is often higher than the disagreement between the classical tests (exact or approximated) and their RP-based version.
In general, the disagreement between the several classes of tests decreases when the sample size increases.
Concerning the variability of RP estimators, there is not an overall best performer for the WSR and Kendall tests. Nevertheless, the estimators showing good estimation performances also present slow or null disagreement and, mainly, belong to the semi-parametric family. Nonparametric estimators present a slightly higher variability and disagreement with respect to the best semi-parametric ones, but have the advantage of being general, since they can be adopted even when the power functional has not been studied and parametrized.
To conclude, many useful and actually applicable solutions to estimate the RP and to perform RP-testing for the most commonly-used nonparametric tests, exact or approximated, are provided. The RP-testing rule is shown to be easily extended to these nonparametric tests. Further development in RP estimation might concern the application of Bayesian techniques in the nonparametric context, since in the parametric one, they showed promising improvement when uninformative priors have been adopted [27]. Furthermore, prediction intervals may be considered for nonparametric RP estimation (see [28]); in particular, it would be interesting to link prediction intervals, which provide likely results of future RP estimators, once experimental data have been observed, to the RP-testing rule.

Supplementary Materials

The following are available online at www.mdpi.com/1099-4300/18/4/142/s1. Figure S1. Bias (left) and MSE (right) of the RP estimators π ^ e (solid), π ^ a (dashed), π ^ e P I (dotted), and π ^ a , e P I (dot-dashed). The MSE and the Bias are computed setting α = 0 . 05 and considering the testing problem (3.1) by setting α = 0 . 05 , p 0 = 0 . 2 and n = ( 5 , 10 , 15 ) . (a) p 0 = 0 . 2 , n = 5 ; (b) p 0 = 0 . 2 , n = 5 ; (c) p 0 = 0 . 2 , n = 15 ; (d) p 0 = 0 . 2 , n = 15 ; (e) p 0 = 0 . 2 , n = 30 ; (f) p 0 = 0 . 2 , n = 30 ; Figure S2. Bias (left) and MSE (right) of the RP estimators π ^ e (solid), π ^ a (dashed), π ^ e P I (dotted), and π ^ a , e P I (dot-dashed). The MSE and the Bias are computed setting α = 0 . 05 and considering the testing problem (3.1) by setting α = 0 . 05 , p 0 = 0 . 5 and n = ( 5 , 10 , 15 ) . (a) p 0 = 0 . 5 , n = 5 ; (b) p 0 = 0 . 5 , n = 5 ; (c) p 0 = 0 . 5 , n = 15 ; (d) p 0 = 0 . 5 , n = 15 ; (e) p 0 = 0 . 5 , n = 30 ; (f) p 0 = 0 . 5 , n = 30 ; Table S1. Probability of disagreement D ( p t , n , α , p 0 ) between the tests Ψ α ( X n ) and Ψ ˜ α ( X n ) evaluated for α = 0 . 05 , n = ( 5 , 15 , 30 ) , p 0 = ( 0 . 2 , 0 . 5 ) and p t = ( 0 . 1 , 0 . 2 , . . . , 0 . 9 ) .; Table S2. Probability of disagreement D ( α , n , F θ t , θ 0 ) between the tests Ψ α ( X n ) and Ψ ˜ α ( X n ) evaluated for α = 0 . 05 , n = ( 15 , 30 , 60 , 120 , 240 ) , θ 0 = 0 , and assuming that F θ t N ( θ t , 1 ) or F θ t Cauchy ( θ t ) with θ t = ( 0 . 0 , 0 . 1 , 0 . 2 , . . . , 0 . 9 ) ; Table S3. Averaged MSE, Bias and Disagreement rate for the asymptotic and exact WSR test when sampling from the Normal distribution. The averages are computed over the 19 different values of θ considered in the simulation study. The smallest values for the averaged MSE, Bias and disagreement are highlighted in bold; Table S4. Averaged MSE, Bias and Disagreement rate for the asymptotic and exact WSR test when sampling from the Chaucy distribution. The averages are computed over the 19 different values of θ considered in the simulation study. The smallest values for the averaged MSE, Bias and disagreement are highlighted in bold; Table S5. Probability of disagreement between Ψ ˜ α ( ( X , Y ) n ) and Ψ α ( ( X , Y ) n ) when α = 0 . 05 , n = ( 15 , 30 , 60 , 120 ) , and assuming that F t is the Bivariate Normal or Bivariate Student’s t (3 df) with correlation coefficient ρ = ( 0 , 0 . 1 , 0 . 2 , 0 . 3 , 0 . 4 , 0 . 5 , 0 . 6 ) . The values of τ t corresponding to the considered values of ρ are reported in the second column of the table. They have been obtained by using the relation τ t = 2 π arcsin ( ρ ) which holds for all absolutely continuous elliptical distributions; Table S6. Averaged MSE, Bias and Disagreement rate for the asymptotic and exact Kendall’s test when sampling from the Gaussian copula. The averages are computed over the 19 different values of ρ considered in the simulation study. The least values for the averaged MSE, Bias and disagreement are highlighted in bold; Table S7. Averaged MSE, Bias and Disagreement rate for the asymptotic and exact Kendall’s test when sampling from the t copula with 3 degrees of freedom. The averages are computed over the 19 different values of ρ considered in the simulation study. The least values for the averaged MSE, Bias and disagreement are highlighted in bold.

Acknowledgments

We thank the Editors, J. Stern and A. Polpo, for the good timing of this Special Issue, and also thank two anonymous reviewers for their constructive comments which helped us to improve the manuscript.

Author Contributions

Lucio De Capitani and Daniele De Martini wrote together Sections: 2, 4.2, 4.3, 5.2, 5.3 and 7, and also conceived and designed numerical experiments. Lucio De Capitani performed numerical computations and wrote Sections: 3, 4.1, 5.1, 6 and supplementary material. Daniele De Martini wrote Section 1. Both authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Goodman, S.N. A comment on replication, p-values and evidence. Stat. Med. 1992, 11, 875–879. [Google Scholar] [CrossRef] [PubMed]
  2. Chow, S.C.; Shao, J.; Wang, H. Sample Size Calculation in Clinical Research; Marcel Dekker: New York, NY, USA, 2003. [Google Scholar]
  3. De Martini, D. Stability Criteria for the Outcomes of Statistical Tests to Assess Drug Effectiveness with a Single Study. Pharm. Stat. 2012, 11, 273–279. [Google Scholar] [CrossRef] [PubMed]
  4. De Martini, D. Success Probability Estimation with Applications to Clinical Trials; John Wiley & Sons: Hoboken, NJ, USA, 2013. [Google Scholar]
  5. Hsieh, T.C.; Chow, S.C.; Yang, L.Y.; Chi, E. The evaluation of biosimilarity index based on reproducibility probability for assessing follow-on biologics. Stat. Med. 2013, 32, 406–414. [Google Scholar] [CrossRef] [PubMed]
  6. Shao, J.; Chow, S.C. Reproducibility Probability in Clinical Trials. Stat. Med. 2002, 21, 1727–1742. [Google Scholar] [CrossRef] [PubMed]
  7. De Capitani, L. An introduction to RP-testing. Epidemiol. Biostat. Public Health 2013, 10. [Google Scholar] [CrossRef]
  8. De Martini, D. Reproducibility Probability Estimation for Testing Statistical Hypotheses. Stat. Probab. Lett. 2008, 78, 1056–1061. [Google Scholar] [CrossRef]
  9. Berger, J. Could Fisher, Jeffreys and Neyman Have Agreed on Testing? Stat. Sci. 2003, 18, 1–12. [Google Scholar] [CrossRef]
  10. Berger, J.; Sellke, T. Testing a point null hypothesis: The irreconcilability of P-values and evidence. J. Am. Stat. Assoc. 1987, 82, 112–122. [Google Scholar] [CrossRef]
  11. Hubbard, R.; Bayarri, M.J. Confusion over measures of evidence (ps) versus errors (αs) in classical statistical testing. Am. Stat. 2003, 57, 171–178. [Google Scholar] [CrossRef]
  12. Hubbard, R.; Lindsay, R.M. Why p-values are not a useful measure of evidence in statistical significance testing. Theory Psychol. 2008, 18, 69–88. [Google Scholar] [CrossRef]
  13. Schervish, M.J. p-values: What they are and what they are not. Am. Stat. 1996, 50, 203–206. [Google Scholar] [CrossRef]
  14. De Capitani, L.; De Martini, D. On stochastic orderings of the Wilcoxon Rank Sum test statistic—with applications to reproducibility probability estimation testing. Stat. Probab. Lett. 2011, 81, 937–946. [Google Scholar] [CrossRef]
  15. De Capitani, L.; De Martini, D. Reproducibility probability estimation and testing for the Wilcoxon rank-sum test. J. Stat. Comput. Simul. 2015, 85, 468–493. [Google Scholar] [CrossRef]
  16. Van de Wiel, M.A.; Di Bucchianico, A.; van der Laan, A. Exact distributions of nonparametric test statistics and computer algebra. J. R. Stat. Soc. Series D 1999, 48, 507–551. [Google Scholar] [CrossRef]
  17. Efron, B.; Tibshirani, R.J. An Introduction to the Bootstrap; Chapman & Hall: New York, NY, USA, 1993. [Google Scholar]
  18. De Martini, D. Conservative Sample Size Estimation in Nonparametrics. J. Biopharm. Stat. 2011, 21, 24–41. [Google Scholar] [CrossRef] [PubMed]
  19. Lehmann, E.L. Nonparametrics: Statistical Methods Based on Ranks; Chapman & Hall: New York, NY, USA, 1998. [Google Scholar]
  20. Noether, G.E. Sample size determination for some common non-parametric tests. J. Am. Stat. Assoc. 1987, 82, 645–647. [Google Scholar] [CrossRef]
  21. Kaarsemaker, L.; van Wijngaarden, A. Tables for use in rank correlation. Stat. Neerlandica 1953, 7, 41–54. [Google Scholar] [CrossRef]
  22. Gibbons, J.D.; Chakraborti, S. Nonparametric Statistical Inference; Dekker: New York, NY, USA, 2003. [Google Scholar]
  23. The R Project for Statistical Computing. Available online: http://www.R-project.org/ (accessed on 8 April 2016).
  24. Wheeler, B. SuppDists: Supplementary Distributions. Available online: http://CRAN.R-project.org/package=SuppDists (accessed on 8 April 2016).
  25. Best, D.J.; Gipps, P.G. Algorithm AS 71: The Upper Tail Probabilities of Kendall’s Tau. J. R. Stat. Soc. 1974, 23, 98–100. [Google Scholar] [CrossRef]
  26. Hollander, M.; Wolfe, D.A. Nonparametric Statistical Methods, 2nd ed.; Wiley: Weinheim, Germany, 1999. [Google Scholar]
  27. De Capitani, L.; De Martini, D. Improving Reproducibility Probability estimation, preserving RP-testing. Available online: https://boa.unimib.it/handle/10281/105595 (accessed on 14 April 2016).
  28. Coolen, F.P.; Bin Himd, S. Nonparametric predictive inference for reproducibility of basic nonparametric tests. J. Stat. Theory Pract. 2014, 8, 591–618. [Google Scholar] [CrossRef]
Figure 1. MSE curves of the reproducibility probability (RP) estimators π ^ e (solid), π ^ a (dashed), π ^ e P I (dotted) and π ^ a , e P I (dot-dashed). The MSE curves are computed considering the testing problem (7) with α = 0 . 05 , p 0 = 0 . 2 , 0 . 5 and n = 15 . (a) p 0 = 0 . 2 , n = 15 ; (b) p 0 = 0 . 5 , n = 30 .
Figure 1. MSE curves of the reproducibility probability (RP) estimators π ^ e (solid), π ^ a (dashed), π ^ e P I (dotted) and π ^ a , e P I (dot-dashed). The MSE curves are computed considering the testing problem (7) with α = 0 . 05 , p 0 = 0 . 2 , 0 . 5 and n = 15 . (a) p 0 = 0 . 2 , n = 15 ; (b) p 0 = 0 . 5 , n = 30 .
Entropy 18 00142 g001
Table 1. Possible approaches to compute the power of a test under the different scenarios related to Cases (A) and (B). The cells with gray background represent the possible approaches commonly employed in practice.
Table 1. Possible approaches to compute the power of a test under the different scenarios related to Cases (A) and (B). The cells with gray background represent the possible approaches commonly employed in practice.
Case (A)Case (B)
Exact Test ( Ψ α ) Asymptotic Test ( Ψ ˜ α ) Exact Test ( Ψ α ) Asymptotic Test ( Ψ ˜ α )
Computation of thePossiblePossibleNotNot
exact powerCase (A.1)(not considered)PossiblePossible
Computation of thePossiblePossiblePossiblePossible
approximated power(not considered)Case (A.2)Case (B.1)Case (B.2)
Table 2. Averaged MSE and disagreement rate for the asymptotic and exact Wilcoxon signed rank (WSR) test when sampling from the Cauchy distribution. The averages are computed over the 19 different values of θ considered in the simulation study. The smallest values for the averaged MSE and disagreement are highlighted in bold.
Table 2. Averaged MSE and disagreement rate for the asymptotic and exact Wilcoxon signed rank (WSR) test when sampling from the Cauchy distribution. The averages are computed over the 19 different values of θ considered in the simulation study. The smallest values for the averaged MSE and disagreement are highlighted in bold.
RP-estimation and Testing for the Asymptotic WSR Test
n = 15 n = 30 n = 60 n = 120 n = 240
RP-est.MSED MSED MSED MSED MSED
π ^ N 1 0.06840.0286 0.06780.0073 0.06860.0045 0.06880.0018 0.06830.0012
π ^ N 2 0.06640.0131 0.06690.0057 0.06820.0025 0.06860.0010 0.06820.0008
π ^ a S 1 0.06520.0000 0.06640.0000 0.06800.0000 0.06850.0000 0.06810.0000
π ^ a S 2 0.06360.0122 0.06560.0034 0.06760.0019 0.06830.0009 0.06800.0004
π ^ a 1 0.07930.0000 0.07340.0000 0.07130.0000 0.07020.0000 0.06900.0000
π ^ a 2 0.07350.0122 0.07020.0034 0.06970.0019 0.06930.0009 0.06850.0004
π ^ a 5 P I 0.07180.0180 0.06980.0142 0.06960.0136 0.06950.0133 0.06870.0129
π ^ a 10 P I 0.07170.0164 0.06960.0111 0.06940.0097 0.06930.0093 0.06860.0091
π ^ a 20 P I 0.07160.0156 0.06950.0091 0.06940.0072 0.06920.0068 0.06850.0065
RP-estimation and Testing for the Exact WSR Test
n = 15 n = 30 n = 60 n = 120 n = 240
RP-est.MSED MSED MSED MSED MSED
π ^ N 1 0.06770.0200 0.06780.0082 0.06860.0040 0.06880.0018 0.06830.0011
π ^ N 2 0.06570.0045 0.06680.0033 0.06820.0020 0.06860.0010 0.06820.0006
π ^ S 1 0.06470.0000 0.06640.0000 0.06800.0000 0.06850.0000 0.06810.0000
π ^ S 2 0.06310.0066 0.06550.0036 0.06750.0018 0.06830.0009 0.06800.0005
π ^ 1 0.07970.0000 0.07340.0000 0.07130.0000 0.07020.0000 0.06900.0000
π ^ 2 0.07390.0066 0.07030.0036 0.06970.0018 0.06930.0009 0.06850.0005
π ^ 5 P I 0.07220.0211 0.06980.0143 0.06960.0137 0.06950.0133 0.06870.0130
π ^ 10 P I 0.07210.0200 0.06970.0113 0.06940.0099 0.06930.0093 0.06860.0092
π ^ 20 P I 0.07200.0193 0.06960.0091 0.06940.0073 0.06920.0068 0.06850.0066
Table 3. Averaged MSE and disagreement rate for the asymptotic and exact Kendall’s test when sampling from the t copula with 3 degrees of freedom. The averages are computed over the 19 different values of ρ considered in the simulation study. The least values for the averaged MSE and disagreement are highlighted in bold.
Table 3. Averaged MSE and disagreement rate for the asymptotic and exact Kendall’s test when sampling from the t copula with 3 degrees of freedom. The averages are computed over the 19 different values of ρ considered in the simulation study. The least values for the averaged MSE and disagreement are highlighted in bold.
RP-estimation and Testing for the Asymptotic Kendall’s Test
n = 15 n = 30 n = 60 n = 120
RP-est.MSED MSED MSED MSED
π ^ N 0.07590.0551 0.07510.0281 0.07520.0133 0.07480.0070
π ^ a S 0.06890.0000 0.07210.0000 0.07380.0000 0.07410.0000
π ^ a 1 0.07780.0000 0.07350.0000 0.07090.0000 0.06930.0000
π ^ a 2 0.05910.0000 0.05810.0000 0.05770.0000 0.05700.0000
π ^ C 1 0.08810.0551 0.07650.0281 0.07200.0133 0.06970.0070
π ^ C 2 0.06170.0551 0.05890.0281 0.05790.0133 0.05710.0070
π ^ a L 0.07360.0000 0.07410.0000 0.07470.0000 0.07450.0000
π ^ a 5 P I 0.06950.0769 0.06870.0595 0.06850.0448 0.06800.0329
π ^ a 10 P I 0.06940.0775 0.06850.0597 0.06830.0451 0.06790.0327
π ^ a 20 P I 0.06930.0779 0.06840.0596 0.06830.0455 0.06780.0329
RP-estimation and Testing for the Exact Kendall’s Test
n = 15 n = 30 n = 60 n = 120
RP-est.MSED MSED MSED MSED
π ^ N 0.09060.1411 0.08470.1063 0.08290.0955 0.08190.0882
π ^ S 0.06720.0000 0.07130.0000 0.07340.0000 0.07390.0000
π ^ 1 0.07800.0000 0.07380.0000 0.07110.0000 0.06930.0000
π ^ 2 0.05900.0000 0.05800.0000 0.05770.0000 0.05700.0000
π ^ C 1 0.10640.1411 0.08760.1063 0.08030.0955 0.07690.0882
π ^ C 2 0.07640.1411 0.06800.1063 0.06510.0955 0.06340.0882
π ^ L 0.07320.0000 0.07390.0000 0.07460.0000 0.07450.0000
π ^ 5 P I 0.07380.0781 0.07150.0601 0.07070.0457 0.06960.0324
π ^ 10 P I 0.07370.0784 0.07140.0604 0.07060.0461 0.06950.0326
π ^ 20 P I 0.07360.0791 0.07130.0607 0.07050.0463 0.06940.0323
Table 4. Hamilton depression scale factor (HDSF) data: first visit (X) and second visit (Y).
Table 4. Hamilton depression scale factor (HDSF) data: first visit (X) and second visit (Y).
x i y i
1.830.878
0.500.647
1.620.598
2.482.050
1.681.060
1.881.290
1.551.060
3.063.140
1.301.290
Table 5. RP estimates for the example data.
Table 5. RP estimates for the example data.
Sign Test
standard resultsα c α π ^ π ^ e P I
n = 9 0.160.79050.6781
B o b = 7 0.0570.50.3719
p v a l u e = 0 . 0898 0.0180.16830.1042
WSR Test
standard resultsα w α π ^ 1 S π ^ 20 P I
n = 9 0.1340.76140.8835
W o b = 40 0.05360.68220.7435
p v a l u e = 0 . 0195 0.01410.45280.4505
Kendall Test
standard resultsα t α π ^ 2 π ^ 20 P I
n = 9 0.10.33330.69790.6930
τ ^ o b = 0 . 5 0.050.44440.56850.5495
p v a l u e = 0 . 2231 0.010.61110.36480.2615

Share and Cite

MDPI and ACS Style

De Capitani, L.; De Martini, D. Reproducibility Probability Estimation and RP-Testing for Some Nonparametric Tests. Entropy 2016, 18, 142. https://0-doi-org.brum.beds.ac.uk/10.3390/e18040142

AMA Style

De Capitani L, De Martini D. Reproducibility Probability Estimation and RP-Testing for Some Nonparametric Tests. Entropy. 2016; 18(4):142. https://0-doi-org.brum.beds.ac.uk/10.3390/e18040142

Chicago/Turabian Style

De Capitani, Lucio, and Daniele De Martini. 2016. "Reproducibility Probability Estimation and RP-Testing for Some Nonparametric Tests" Entropy 18, no. 4: 142. https://0-doi-org.brum.beds.ac.uk/10.3390/e18040142

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop