A New Estimator for Standard Errors with Few Unbalanced Clusters

Niccodemi, Gianmaria; Wansbeek, Tom

doi:10.3390/econometrics10010006

Open AccessEditor’s ChoiceArticle

A New Estimator for Standard Errors with Few Unbalanced Clusters

by

Gianmaria Niccodemi

^1,* and

Tom Wansbeek

²

¹

Faculty of Humanities, Education and Social Sciences, University of Luxembourg, Esch-sur-Alzette, L-4366 Luxembourg, Luxembourg

²

Faculty of Economics and Business, University of Groningen, 9700 AV Groningen, The Netherlands

^*

Author to whom correspondence should be addressed.

Econometrics 2022, 10(1), 6; https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics10010006

Submission received: 11 October 2021 / Revised: 17 January 2022 / Accepted: 18 January 2022 / Published: 21 January 2022

Download Review Reports Versions Notes

Abstract

:

In linear regression analysis, the estimator of the variance of the estimator of the regression coefficients should take into account the clustered nature of the data, if present, since using the standard textbook formula will in that case lead to a severe downward bias in the standard errors. This idea of a cluster-robust variance estimator (CRVE) generalizes to clusters the classical heteroskedasticity-robust estimator. Its justification is asymptotic in the number of clusters. Although an improvement, a considerable bias could remain when the number of clusters is low, the more so when regressors are correlated within cluster. In order to address these issues, two improved methods were proposed; one method, which we call CR2VE, was based on biased reduced linearization, while the other, CR3VE, can be seen as a jackknife estimator. The latter is unbiased under very strict conditions, in particular equal cluster size. To relax this condition, we introduce in this paper CR3VE-

λ

, a generalization of CR3VE where the cluster size is allowed to vary freely between clusters. We illustrate the performance of CR3VE-

λ

through simulations and we show that, especially when cluster sizes vary widely, it can outperform the other commonly used estimators.

Keywords:

clustered data; few clusters; unbalanced clusters; cluster-robust variance estimator; inference

1. Introduction

In linear regressions with clustered data, it is common practice to estimate the variance of the estimated parameters using the cluster-robust variance estimator (CRVE from hereon) introduced by Liang and Zeger (1986), as a generalization of the White (1980) heteroskedastic-robust estimator. The justification is asymptotic, with number of clusters tending to infinity. Bell and McCaffrey (2002) show that in a finite context, with few clusters and error terms that are correlated within cluster, CRVE leads to severely downward-biased standard errors and thus to misleading inference about the estimated parameters. Moulton (1986, 1990) and Cameron and Miller (2015) point out that this issue is particularly relevant for regressors that are correlated within cluster such as policy variables that are implemented only in certain regions or states. An additional issue for inference about the estimated parameters is that, under the null hypothesis and with few clusters, the distribution of the test statistic is unknown and approximate normality cannot be claimed.

Following Bell and McCaffrey (2002), inferences about the estimated parameters can be improved by (i) reducing the bias of CRVE with either BRL (bias reduced linearization), also known as CR2VE, or the jackknife estimator

v_{J K}

, also known as CR3VE, both based on transformed OLS residuals; CR2VE and CR3VE generalize, using clustered data, the heteroskedasticity-consistent covariance estimators HC2 and HC3, introduced by MacKinnon and White (1985). Inference about the estimated parameters can be also improved by (ii) approximating the distribution of the test statistic with the t-distribution with an extension of the Satterthwaite (1946) degrees of freedom (DOF) that are data-determined and regressor-specific. Imbens and Kolesar (2016) developed a more refined version of the data-determined regressor-specific DOF used by Bell and McCaffrey (2002).

Bell and McCaffrey (2002) also show that CR3VE tends to overestimate the standard errors. In this paper, we introduce CR3VE-

λ

, a cluster-robust variance estimator that is identical to CR3VE in the case of balanced clusters but, in the case of unbalanced clusters, takes the difference in cluster sizes into account such that the computed standard errors are less conservative and unbiased under more general conditions.

The paper is organized as follows. In Section 2, we discuss basic theory on CRVE, CR2VE and CR3VE. In Section 3, we introduce CR3VE-

λ

. In Section 4, we illustrate and test the performance of CRVE, CR2VE, CR3VE and CR3VE-

λ

to compute standard errors with few clusters using Monte Carlo simulations. In Section 5, we present ideas for future research related to the current paper. Section 6 concludes the paper.

2. Basic Theory: CRVE, CR2VE and CR3VE

Consider the regression model

y = X β + ε

with observations that can be grouped into C clusters of size

n_{1}, \dots, n_{C}

;

\sum_{c} n_{c} = n

. Write, for the c-th cluster,

y_{c} = X_{c} β + ε_{c}

, with

E (ε_{c}) = 0

and

var (ε_{c}) = V_{c}

. The

V_{c}

’s are collected in the block-diagonal matrix

V

. After OLS we have

var (\hat{β}) = {(X^{'} X)}^{- 1} X^{'} V X {(X^{'} X)}^{- 1} = {(X^{'} X)}^{- 1} (\sum_{c} X_{c}^{'} V_{c} X_{c}) {(X^{'} X)}^{- 1} .

(1)

An intuitively appealing cluster-robust variance estimator (CRVE) based on OLS residuals per cluster

{\hat{ε}}_{c}

is

\hat{var} (\hat{β}) = {(X^{'} X)}^{- 1} (\sum_{c} X_{c}^{'} {\hat{ε}}_{c} {\hat{ε}}_{c}^{'} X_{c}) {(X^{'} X)}^{- 1} .

(2)

This estimator, which directly generalizes White (1980) and was introduced by Liang and Zeger (1986), is consistent when the number of clusters goes to infinity. The same holds when (2) is scaled, as in Stata, by the factor

C (n - 1) / (C - 1) (n - k)

, with k the number of regressors. Since this factor is larger than one, it increases the estimated variance. In the case of few clusters, asymptotics will be a poor guide. In what follows, we therefore consider its bias instead.

Let

M = I_{n} - X {(X^{'} X)}^{- 1} X^{'}

, let

S_{c}

be the

n \times n_{c}

matrix that selects the columns of

M

corresponding to cluster c, let

L_{c} \equiv M S_{c}

and let

H_{c} \equiv S_{c}^{'} M S_{c} = I_{n_{c}} - X_{c} {(X^{'} X)}^{- 1} X_{c}^{'} .

There holds

H_{c} = L_{c}^{'} L_{c}

since

M

is idempotent and symmetric. With

\hat{ε} = M ε

and

{\hat{ε}}_{c} = L_{c}^{'} ε

, we then have

E ({\hat{ε}}_{c} {\hat{ε}}_{c}^{'}) = L_{c}^{'} V L_{c} \neq V_{c}

, so that

E [\hat{var} (\hat{β})] = {(X^{'} X)}^{- 1} (\sum_{c} X_{c}^{'} L_{c}^{'} V L_{c} X_{c}) {(X^{'} X)}^{- 1} \neq var (\hat{β}) .

(3)

To reduce the bias, consider choosing a variance estimator based on transformed residuals

{\tilde{ε}}_{c} \equiv A_{c} {\hat{ε}}_{c}

, for some

A_{c}

. Then

E [\hat{var} (\hat{β})] = {(X^{'} X)}^{- 1} (\sum_{c} X_{c}^{'} A_{c} L_{c}^{'} V L_{c} A_{c}^{'} X_{c}) {(X^{'} X)}^{- 1} .

From (1), unbiasedness requires the

A_{c}

to be such that

A_{c} L_{c}^{'} V L_{c} A_{c}^{'} = V_{c}

for all c uniformly in the

V_{c}

. This is infeasible and therefore we consider two second-best solutions.

The first solution is to consider the case of no cluster effects,

V_{c} = σ^{2} I_{n_{c}}

for all c, and make the estimator unbiased for this case. Then

E ({\hat{ε}}_{c} {\hat{ε}}_{c}^{'}) = L_{c}^{'} V L_{c} = σ^{2} L_{c}^{'} L_{c} = σ^{2} H_{c}

and consequently

E [\hat{var} (\hat{β})] = σ^{2} {(X^{'} X)}^{- 1} (\sum_{c} X_{c}^{'} A_{c} H_{c} A_{c}^{'} X_{c}) {(X^{'} X)}^{- 1} .

(4)

The variance estimator is unbiased if

A_{c} H_{c} A_{c}^{'} = I_{n_{c}}

and so we choose

A_{c} = H_{c}^{- \frac{1}{2}}

. This estimator, introduced by Bell and McCaffrey (2002) and called BRL, is extensively discussed by Cameron and Miller (2015) and it is also known as CR2VE.

The second solution is based on the idea that the elements in

M

outside the blocks on the diagonal may be small. Then

L_{c}

can be approximated by a matrix with

H_{c}

as its c-th block and zeros outside this block. Then

L_{c}^{'} V L_{c} = H_{c} V_{c} H_{c}

and choosing

A_{c} = H_{c}^{- 1}

leads, when scaled by a factor

(C - 1) / C

, to an estimator that is approximately unbiased when there are no cluster effects. This estimator with the jackknife correction is also introduced by Bell and McCaffrey (2002), who called it

v_{J K}

, it is discussed by Cameron and Miller (2015) and it is also known as CR3VE. CR2VE and CR3VE can be computationally intensive because they require the inversion of matrices of order equal to the cluster sizes. CR2VE and CR3VE can be computed efficiently, that is, with computing time and storage of order

O (n_{c})

; a succinct proof is given by Niccodemi et al. (2020).

Both CR2VE and CR3VE are used in the literature as an alternative to bootstrapping. The bootstrap literature has evolved rapidly since Cameron et al. (2008) proposed the use of a wild cluster bootstrap procedure to improve inference in the case of few clusters. Generally, the wild cluster bootstrap procedure performs well. However, MacKinnon and Webb (2017) show that inference based on this procedure can fail in the case of dummy regressors equal to zero or one in very few clusters. Djogbenou et al. (2019) propose an asymptotic analysis of cluster-robust inference mainly focused on the wild cluster bootstrap procedure, proving its asymptotic validity under certain conditions on the cluster sizes. They show, both theoretically and through some experiments, how variation in cluster sizes affects the asymptotic validity of this procedure and they conclude that the wild cluster restricted bootstrap using the Rademacher distribution performs better than any other competitors.

3. From CR3VE to CR3VE- $λ$

To analyze the bias of CR3VE we scale (4) by

(C - 1) / C

and use

A_{c} H_{c} A_{c} = H_{c}^{- 1} = I_{n_{c}} + X_{c} {(X^{'} X - X_{c}^{'} X_{c})}^{- 1} X_{c}^{'}

to obtain

E [\hat{var} (\hat{β})] = \frac{C - 1}{C} σ^{2} ({(X^{'} X)}^{- 1} + \sum_{c} {(X^{'} X)}^{- 1} X_{c}^{'} X_{c} {(X^{'} X - X_{c}^{'} X_{c})}^{- 1} X_{c}^{'} X_{c} {(X^{'} X)}^{- 1}) .

(5)

When clusters are balanced and have the same covariance structure then

X_{c}^{'} X_{c} = X^{'} X / C

for all c, and (5) reduces to

E [\hat{var} (\hat{β})] = σ^{2} {(X^{'} X)}^{- 1}

. Thus, in the case of balanced clusters, CR3VE with the correction factor

(C - 1) / C

is unbiased.

We propose a different scaling factor than

(C - 1) / C

for CR3VE in the more general case of unbalanced clusters that still have the same covariance structure. Define

π_{c} \equiv n_{c} / n

for cluster c. Then

X_{c}^{'} X_{c} = π_{c} X^{'} X

and the expression in parentheses in (5) becomes

λ {(X^{'} X)}^{- 1}

, with

λ \equiv 1 + \sum_{c} \frac{π_{c}^{2}}{1 - π_{c}},

and

λ \geq C / (C - 1)

, with equality holding in the case of balanced clusters. To see this, let

π \equiv {(π_{1}, \dots, π_{C})}^{'}, Π \equiv diag (π), a \equiv {(I_{C} - Π)}^{- \frac{1}{2}} π

and

b \equiv {(I_{C} - Π)}^{\frac{1}{2}} ι_{C}

,

a^{'} a = π^{'} {(I_{C} - Π)}^{- 1} π, b^{'} b = ι_{C}^{'} (I_{C} - Π) ι_{C}

, and

a^{'} b = 1

. Since

{(a^{'} b)}^{2} \leq a^{'} a b^{'} b

there holds

\sum_{c} \frac{π_{c}^{2}}{1 - π_{c}} = π^{'} {(I_{C} - Π)}^{- 1} π \geq \frac{1}{ι_{C}^{'} (I_{C} - Π) ι_{C}} = \frac{1}{C - 1},

so

λ - 1 \geq 1 / (C - 1)

or

λ \geq C / (C - 1)

. This suggests that

1 / λ

may be a better scaling factor than

(C - 1) / C

. As

1 / λ \leq (C - 1) / C

, we propose a lower estimate of the variance than with CR3VE. This fits in well with the observation by Bell and McCaffrey (2002), as mentioned in the Introduction, that CR3VE tends to overestimate the standard errors. We denote this estimator, which is unbiased under more general conditions than CR3VE, by CR3VE-

λ

.

4. Monte Carlo Simulations

We run several sets of Monte Carlo (MC) simulations and compare the bias of the standard errors based on unclustered standard errors (UN), CRVE, CR2VE and CR3VE with the bias of the standard errors based on CR3VE-

λ

. In each simulation, we generate randomly C unbalanced clusters with number of observations per cluster

n_{c} \sim U {1000 - g, 1000 + g}

, where g is different in each set of simulations. In other words,

n_{c}

is drawn from a uniform distribution with constant mean but standard deviation that depends on g. We generate our dependent variable

y_{h c} = α + β x_{h c} + γ d_{c} + e_{h c}

, where h identifies the single observation (e.g., household) and c identifies the C clusters of size

n_{c} = n_{1}, \dots, n_{C}

, and where

x_{h c} = q_{h c} + z_{c}

and

e_{h c} = w_{h c} + u_{c}

. Moreover,

q_{h c}, z_{c}, w_{h c}, u_{c}

are independently drawn from

N (0, 1)

,

α = 0

and

β = γ = 1

, and

d_{c}

is a dummy variable constant within cluster and randomly constrained, in each simulation, to be equal to 1 in half of the randomly generated clusters. The simulation set-up is somewhat similar to the one in Cameron et al. (2008). As pointed out by Cameron and Miller (2015), unclustered standard errors and CRVE are likely to be severely biased if the cluster effect and the correlation of the regressors within cluster are different from zero. Therefore, we set up experiments that allow both

e_{h c}

and the regressors to be correlated within cluster, including the extreme case of

d_{c}

, a dummy variable that is constant within cluster. The presence of regressors correlated within cluster implies that the assumption under which CR3VE and CR3VE-

λ

are unbiased are not met. Yet, CR3VE-

λ

takes into account the difference in cluster size and, as this difference increases, it is expected to be less biased than CR3VE.

We run 100,000 simulations for each MC set and each MC set differs with respect to the number of clusters C and g. We show results for

C = 4

and

C = 6

, and for

g = 0

(i.e., balanced clusters),

g = 250

,

g = 500

,

g = 900

and

g = 990

, with standard deviation of the cluster size equal to 0, 145, 289, 520 and 572, respectively. For each simulation: (i) we compute the true standard deviation of

\hat{β}

,

s d (\hat{β})

, based on

var (\hat{β}) = {(X^{'} X)}^{- 1} (\sum_{c} X_{c}^{'} V_{c} X_{c}) {(X^{'} X)}^{- 1},

where

V_{c} = I_{n_{c}} + ι_{c} ι_{c}^{'},

and where

β = (α, β, γ)

, (ii) we compute the standard errors of

\hat{β}

and of

\hat{γ}

based on the different methods

{se}_{U N}

,

{se}_{C R V E}

,

{se}_{C R 2 V E}

,

{se}_{C R 3 V E}

and

{se}_{C R 3 V E - λ}

, (iii) we compute the difference between the standard errors based on the different methods and the true standard deviations

sd (\hat{β})

and

sd (\hat{γ})

. Finally, for each MC set we compute the mean of this difference (i.e., the estimated bias) for each method to compute the standard errors. From Table 1 and Table 2 we can see that CR3VE-

λ

always leads to the least biased standard errors, with estimated bias always close to zero. Moreover, it remarkably reduces the estimated bias of CR3VE with high unbalancedness. This is especially true for the dummy variable

d_{i}

.

We acknowledge that the reader might be particularly interested in comparing the inferential performance of the various CRVEs, including CR3VE-

λ

, especially in a real-data setting. For this purpose we refer the reader to Niccodemi et al. (2020), where inferential results based on the Current Population Survey data clustered in few, highly unbalanced clusters and the t-distribution using the Imbens and Kolesar (2016) DOF are reported. This experiment is similar to the one developed by Cameron and Miller (2015), although more focused on cluster unbalancedness. According to the results, with few, highly unbalanced clusters CR3VE-

λ

appears to be among the most promising methods for inference, as CR3VE tends to underreject a true null hypothesis.

5. A Note on Future Research

Future research on cluster-robust variance estimators, directly linked to the current work, might take at least two directions. First, Djogbenou et al. (2019) show through some experimental designs how the variation in cluster sizes affects the asymptotic validity of the wild cluster bootstrap. Testing how CR3VE-

λ

performs, in comparison to CR2VE and CR3VE and using the same experimental designs, might provide further elements to evaluate its performance.

Second, the effective number of clusters introduced by Carter et al. (2017) might be of particular interest for CR3VE-

λ

. The effective number of clusters depends, among others, on the cluster sizes. If the effective and the nominal number of clusters differ remarkably, and if this difference is, to some extent, due to heterogeneity in cluster sizes, then inference using CR3VE-

λ

might be much more accurate then inference based on CR3VE. Therefore, it would be interesting to develop experiments that focus on the interaction between the effective number of clusters as a diagnostic tool and the use of CR3VE-

λ

instead of CR3VE for inference. Of course, other possibilities include the use of the effective number of clusters to construct the scaling factor for CR3VE and the introduction of measures of the effective size of the clusters to compute CR3VE-

λ

.

6. Conclusions

We propose CR3VE-

λ

, an estimator for clustered standard errors that improves the jackknife estimator and is unbiased under more general conditions in the case of few unbalanced clusters. In simulations, CR3VE-

λ

reduces the bias of CR3VE as the unbalancedness of the clusters increases. We also provide a reference to a longer working paper (i.e., Niccodemi et al. (2020)) that develops simulation results to compare inference based on CRVE, CR2VE, CR3VE and CR3VE-

λ

. Given the results of both sets of simulations, we suggest researchers to prefer CR3VE-

λ

to CR3VE in the case of (few) highly unbalanced clusters.

For all the computations and the empirical illustrations we used Stata/SE 15.0. This paper comes with a Stata do-file that can be used with any cross-sectional dataset for the efficient computation of the standard errors based on CRVE, CR2VE, CR3VE and CR3VE-

λ

and with a Stata do-file to replicate the Monte Carlo simulations. The Stata do-files are available upon request.

Author Contributions

All authors have contributed equally to the research. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We are grateful to Viola Angelini, Rob Alessie, Nick Koning, Erik Meijer, Douglas Miller, Ulrich Schneider, Roberto Wessels and four referees for their helpful comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bell, Robert M., and Daniel F. McCaffrey. 2002. Bias reduction in standard errors for linear regression with multi-stage samples. Survey Methodology 28: 169–79. [Google Scholar]
Cameron, A. Colin, and Douglas L. Miller. 2015. A practitioner’s guide to cluster-robust inference. Journal of Human Resources 50: 317–72. [Google Scholar] [CrossRef]
Cameron, A. Colin, Jonah B. Gelbach, and Douglas L. Miller. 2008. Bootstrap-based improvements for inference with clustered errors. The Review of Economics and Statistics 90: 414–27. [Google Scholar] [CrossRef]
Carter, Andrew V., Kevin T. Schnepel, and Douglas G. Steigerwald. 2017. Asymptotic behavior of a t-test robust to cluster heterogeneity. Review of Economics and Statistics 99: 698–709. [Google Scholar] [CrossRef]
Djogbenou, Antoine A., James G. MacKinnon, and Morten Ørregaard Nielsen. 2019. Asymptotic theory and wild bootstrap inference with clustered errors. Journal of Econometrics 212: 393–412. [Google Scholar] [CrossRef] [Green Version]
Imbens, Guido W., and Michal Kolesar. 2016. Robust standard errors in small samples: Some practical advice. Review of Economics and Statistics 98: 701–12. [Google Scholar] [CrossRef]
Liang, Kung-Yee, and Scott L. Zeger. 1986. Longitudinal data analysis using generalized linear models. Biometrika 73: 13–22. [Google Scholar] [CrossRef]
MacKinnon, James G., and Halbert White. 1985. Some heteroskedasticity-consistent covariance matrix estimators with improved finite sample properties. Journal of Econometrics 29: 305–25. [Google Scholar] [CrossRef] [Green Version]
MacKinnon, James G., and Matthew D. Webb. 2017. Wild bootstrap inference for wildly different cluster sizes. Journal of Applied Econometrics 32: 233–54. [Google Scholar] [CrossRef] [Green Version]
Moulton, Brent R. 1986. Random group effects and the precision of regression estimates. Journal of Econometrics 32: 385–97. [Google Scholar] [CrossRef]
Moulton, Brent R. 1990. An illustration of a pitfall in estimating the effects of aggregate variables on micro unit. Review of Economics and Statistics 72: 334–38. [Google Scholar] [CrossRef]
Niccodemi, Gianmaria, Rob Alessie, Viola Angelini, Jochen Mierau, and Thomas Wansbeek. 2020. Refining Clustered Standard Errors with Few Clusters. SOM Research Report 2021002. Groningen: University of Groningen. [Google Scholar]
Satterthwaite, Franklin E. 1946. An approximate distribution of estimates of variance components. Biometrics Bulletin 2: 110–14. [Google Scholar] [CrossRef] [PubMed]
White, Halbert L. 1980. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48: 817–38. [Google Scholar] [CrossRef]

Table 1. Estimated bias of

s e (\hat{β})

based on different methods: 100,000 Monte Carlo simulations.

Table 1. Estimated bias of

s e (\hat{β})

based on different methods: 100,000 Monte Carlo simulations.

	Std. Deviation Cluster Size
	Balanced	wa145	wa289	wa520	wa572
4 clusters
$\hat{E} [sd (\hat{β})]$	w0.1978	w0.1967	w0.1929	w0.1790	w0.1745
$\hat{Bias} [{s e}_{U N} (\hat{β})]$	−0.1820	−0.1809	−0.1769	−0.1628	−0.1581
$\hat{Bias} [{s e}_{C R V E} (\hat{β})]$	−0.1293	−0.1271	−0.1207	−0.1069	−0.1043
$\hat{Bias} [{s e}_{C R 2 V E} (\hat{β})]$	−0.0667	−0.0663	−0.0644	−0.0605	−0.0599
$\hat{Bias} [{s e}_{C R 3 V E} (\hat{β})]$	w0.0191	w0.0192	w0.0188	w0.0164	w0.0157
$\hat{Bias} [{s e}_{C R 3 V E - λ} (\hat{β})]$	w0.0191	w0.0184	w0.0157	w0.0066	w0.0040
6 clusters
$\hat{E} [sd (\hat{β})]$	w0.1839	w0.1837	w0.1829	w0.1811	w0.1807
$\hat{Bias} [{s e}_{U N} (\hat{β})]$	−0.1709	−0.1707	−0.1699	−0.1679	−0.1675
$\hat{Bias} [{s e}_{C R V E} (\hat{β})]$	−0.0775	−0.0774	−0.0792	−0.0844	−0.0868
$\hat{Bias} [{s e}_{C R 2 V E} (\hat{β})]$	−0.0301	−0.0300	−0.0325	−0.0386	−0.0413
$\hat{Bias} [{s e}_{C R 3 V E} (\hat{β})]$	w0.0198	w0.0208	w0.0199	w0.0202	w0.0195
$\hat{Bias} [{s e}_{C R 3 V E - λ} (\hat{β})]$	w0.0198	w0.0204	w0.0182	w0.0142	w0.0120

Table 2. Estimated bias of

s e (\hat{γ})

based on different methods: 100,000 Monte Carlo simulations.

Table 2. Estimated bias of

s e (\hat{γ})

based on different methods: 100,000 Monte Carlo simulations.

	Std. Deviation Cluster Size
	Balanced	wa145	wa289	wa520	wa572
4 clusters
$\hat{E} [sd (\hat{γ})]$	w1.0209	w1.0250	w1.0369	w1.0847	w1.1066
$\hat{Bias} [{s e}_{U N} (\hat{γ})]$	−0.9805	−0.9843	−0.9957	−1.0416	−1.0623
$\hat{Bias} [{s e}_{C R V E} (\hat{γ})]$	−0.4700	−0.4790	−0.5038	−0.6066	−0.6533
$\hat{Bias} [{s e}_{C R 2 V E} (\hat{γ})]$	−0.1868	−0.1953	−0.2181	−0.3191	−0.3703
$\hat{Bias} [{s e}_{C R 3 V E} (\hat{γ})]$	w0.1005	w0.1000	w0.1023	w0.1054	w0.1068
$\hat{Bias} [{s e}_{C R 3 V E - λ} (\hat{γ})]$	w0.1005	w0.0960	w0.0856	w0.0410	w0.0225
6 clusters
$\hat{E} [sd (\hat{γ})]$	w0.8306	w0.8355	w0.8506	w0.9059	w0.9276
$\hat{Bias} [{s e}_{U N} (\hat{γ})]$	−0.7965	−0.8013	−0.8163	−0.8706	−0.8919
$\hat{Bias} [{s e}_{C R V E} (\hat{γ})]$	−0.2478	−0.2531	−0.2786	−0.3628	−0.3953
$\hat{Bias} [{s e}_{C R 2 V E} (\hat{γ})]$	−0.0837	−0.0861	−0.1057	−0.1653	−0.1894
$\hat{Bias} [{s e}_{C R 3 V E} (\hat{γ})]$	w0.0524	w0.0556	w0.0514	w0.0564	w0.0610
$\hat{Bias} [{s e}_{C R 3 V E - λ} (\hat{γ})]$	w0.0524	w0.0537	w0.0436	w0.0265	w0.0223

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Niccodemi, G.; Wansbeek, T. A New Estimator for Standard Errors with Few Unbalanced Clusters. Econometrics 2022, 10, 6. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics10010006

AMA Style

Niccodemi G, Wansbeek T. A New Estimator for Standard Errors with Few Unbalanced Clusters. Econometrics. 2022; 10(1):6. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics10010006

Chicago/Turabian Style

Niccodemi, Gianmaria, and Tom Wansbeek. 2022. "A New Estimator for Standard Errors with Few Unbalanced Clusters" Econometrics 10, no. 1: 6. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics10010006

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Estimator for Standard Errors with Few Unbalanced Clusters

Abstract

1. Introduction

2. Basic Theory: CRVE, CR2VE and CR3VE

3. From CR3VE to CR3VE- $λ$

4. Monte Carlo Simulations

5. A Note on Future Research

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A New Estimator for Standard Errors with Few Unbalanced Clusters

Abstract

1. Introduction

2. Basic Theory: CRVE, CR2VE and CR3VE

3. From CR3VE to CR3VE- λ

4. Monte Carlo Simulations

5. A Note on Future Research

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3. From CR3VE to CR3VE- $λ$