# Johansen’s Reduced Rank Estimator Is GMM

*Keywords:*GMM; VECM; reduced rank

Next Article in Journal

Next Article in Special Issue

Next Article in Special Issue

Previous Article in Journal

Previous Article in Special Issue

Previous Article in Special Issue

Department of Economics, University of Wisconsin, Madison, WI 53706, USA

Received: 30 January 2018
/
Revised: 9 March 2018
/
Accepted: 16 May 2018
/
Published: 18 May 2018

(This article belongs to the Special Issue Celebrated Econometricians: Katarina Juselius and Søren Johansen)

The generalized method of moments (GMM) estimator of the reduced-rank regression model is derived under the assumption of conditional homoscedasticity. It is shown that this GMM estimator is algebraically identical to the maximum likelihood estimator under normality developed by Johansen (1988). This includes the vector error correction model (VECM) of Engle and Granger. It is also shown that GMM tests for reduced rank (cointegration) are algebraically similar to the Gaussian likelihood ratio tests. This shows that normality is not necessary to motivate these estimators and tests.

The vector error correction model (VECM) of Engle and Granger (1987) is one of the most widely used time-series models in empirical practice. The predominant estimation method for the VECM is the reduced-rank regression method introduced by Johansen (1988, 1991, 1995). Johansen’s estimation method is widely used because it is straightforward, it is a natural extension of the VAR model of Sims (1980), and it is computationally tractable.

Johansen motivated his estimator as the maximum likelihood estimator (MLE) of the VECM under the assumption that the errors are i.i.d. normal. For many users, it is unclear whether the estimator has a broader justification. In contrast, it is well known that least-squares estimation is both maximum likelihood under normality and method of moments under uncorrelatedness.

This paper provides the missing link. It is shown that Johansen’s reduced-rank estimator is algebraically identical to the generalized method of moments (GMM) estimator of the VECM, under the imposition of conditional homoscedasticity. This GMM estimator only uses uncorrelatedness and homoscedasticity. Thus Johansen’s reduced-rank estimator can be motivated under much broader conditions than normality.

The asymptotic efficiency of the estimator in the GMM class relies on the assumption of homoscedasticity (but not normality). When homoscedasticity fails, the reduced-rank estimator loses asymptotic efficiency but retains its interpretation as a GMM estimator.

It is also shown that the GMM tests for reduced (cointegration) rank are nearly identical to Johansen’s likelihood ratio tests. Thus the standard likelihood ratio tests for cointegration can be interpreted more broadly as GMM tests.

This paper does not introduce new estimation nor inference methods. It merely points out that the currently used methods have a broader interpretation than may have been understood. The results leave open the possibility that new GMM methods that do not impose homoscedasticity could be developed.

This connection is not new. In a different context, Adrian et al. (2015) derived the equivalence of the likelihood and minimum-distance estimators of the reduced-rank model. The equivalence between the Limited Information Maximum Likelihood (LIML) estimator (which has a dual relation with reduced-rank regression) and a minimum distance estimator was discovered by Goldberger and Olkin (1971). Recently, Kolesár (2018) drew out connections between likelihood-based and minimum-distance estimation of endogenous linear regression models.

This paper is organized as follows. Section 2 introduces reduced-rank regression models and Johansen’s estimator. Section 3 presents the GMM and states the main theorems demonstrating the equivalence of the GMM and MLE. Section 4 presents the derivation of the GMM estimator. Section 5 contains two technical results relating generalized eigenvalue problems and the extrema of quadratic forms.

The VECM for p variables of cointegrating rank r with k lags is
where ${D}_{t}$ are the deterministic components. Observations are $t=1,\dots ,T$. The matrices $\alpha $ and $\beta $ are $p\times r$ with $r\le p$. This is a famous workhorse model in applied time series, largely because of the seminal work of Engle and Granger (1987).

$$\Delta {X}_{t}=\alpha {\beta}^{\prime}{X}_{t-1}+\sum _{i=1}^{k-1}{\mathsf{\Gamma}}_{i}\Delta {X}_{t-i}+\mathsf{\Phi}{D}_{t}+{e}_{t},$$

The primary estimation method for the VECM is known as reduced-rank regression and was developed by Johansen (1988, 1991, 1995). Algebraically, the VECM (1) is a special case of the reduced-rank regression model:
where ${Y}_{t}$ is $p\times 1$, ${X}_{t}$ is $m\times 1$, and ${Z}_{t}$ is $q\times 1$. The coefficient matrix $\alpha $ is $p\times r$ and $\beta $ is $m\times r$ with $r\le min(m,p)$. Johansen derived the MLE for model (2) under the assumption that ${e}_{t}$ is i.i.d. $N\left(0,\mathsf{\Omega}\right)$. This immediately applies to the VECM (1) and is the primary application of reduced-rank regression in econometrics.

$${Y}_{t}=\alpha {\beta}^{\prime}{X}_{t}+\mathsf{\Psi}{Z}_{t}+{e}_{t},$$

Canonical correlations were introduced by Hotelling (1936), and reduced-rank regression was introduced by Bartlett (1938). A complete theory was developed by Anderson and Rubin (1949, 1950) and Anderson (1951). These authors developed the MLE for the model:
where $\mathsf{\Gamma}$ is $p\times \left(p-r\right)$ and is unknown. This is an alternative parameterization of (2) without the covariates ${Z}_{t}$. Anderson and Rubin (1949, 1950) considered the case $p-r=1$ and primarily focused on estimation of the vector $\mathsf{\Gamma}$. Anderson (1951) considered the case $p-r\ge 1$.

$$\begin{array}{cc}\hfill {Y}_{t}& =\mathsf{\Pi}{X}_{t}+{e}_{t},\hfill \end{array}$$

$$\begin{array}{cc}\hfill {\mathsf{\Gamma}}^{\prime}\mathsf{\Pi}& =0,\hfill \end{array}$$

While the models (2) and (3)–(4) are equivalent and thus have the same MLE, the different parameterizations led the authors to different derivations. Anderson and Rubin derived the estimator of (3) and (4) by a tedious application of constrained optimization. (Specifically, they maximized the likelihood of (3) imposing the constraint (4) using Lagrange multiplier methods. The solution turned out to be tedious because (4) is a nonlinear function of the parameters $\mathsf{\Gamma}$ and $\mathsf{\Pi}$.) The derivation is so cumbersome that it is excluded from nearly all statistics and econometrics textbooks, despite the fact that it is the source of the famous LIML estimator.

The elegant derivation used by Johansen (1988) is algebraically unrelated to that of Anderson-Rubin and is based on applying a concentration argument to the product structure in (2). It is similar to the derivation in Tso (1981), although the latter did not include the covariates ${Z}_{t}$. Johansen’s derivation is algebraically straightforward and thus is widely taught to students.

It is useful to briefly describe the likelihood problem. The log-likelihood for model (2) under the assumption that ${e}_{t}$ is i.i.d. $N\left(0,\mathsf{\Omega}\right)$ is

$$\ell \left(\alpha ,\beta ,\mathsf{\Psi},\mathsf{\Omega}\right)=-\frac{T}{2}logdet\mathsf{\Omega}-\frac{1}{2}\sum _{t=1}^{T}{\left({Y}_{t}-\alpha {\beta}^{\prime}{X}_{t}-\mathsf{\Psi}{Z}_{t}\right)}^{\prime}{\mathsf{\Omega}}^{-1}\left({Y}_{t}-\alpha {\beta}^{\prime}{X}_{t}-\mathsf{\Psi}{Z}_{t}\right).$$

The MLE maximizes $\ell \left(\alpha ,\beta ,\mathsf{\Psi},\mathsf{\Omega}\right)$. Johansen’s solution is as follows. Define the projection matrix ${M}_{Z}={I}_{T}-Z{\left({Z}^{\prime}Z\right)}^{-1}{Z}^{\prime}$ and the residual matrices $\tilde{Y}={M}_{Z}Y$ and $\tilde{X}={M}_{Z}X$. Consider the generalized eigenvalue problem:

$$\left|{\tilde{X}}^{\prime}\tilde{Y}{\left({\tilde{Y}}^{\prime}\tilde{Y}\right)}^{-1}{\tilde{Y}}^{\prime}\tilde{X}-{\tilde{X}}^{\prime}\tilde{X}\lambda \right|=0.$$

The solutions $1>{\widehat{\lambda}}_{1}>\cdots >{\widehat{\lambda}}_{p}>0$ satisfy
where $({\widehat{\lambda}}_{i},{\widehat{\nu}}_{i})$ are known as the generalized eigenvalues and eigenvectors of ${\tilde{X}}^{\prime}\tilde{Y}{\left({\tilde{Y}}^{\prime}\tilde{Y}\right)}^{-1}{\tilde{Y}}^{\prime}\tilde{X}$ with respect to ${\tilde{X}}^{\prime}\tilde{X}$. The normalization ${\widehat{\nu}}_{i}^{\prime}{\tilde{X}}^{\prime}\tilde{X}{\widehat{\nu}}_{i}=1$ is imposed.

$${\tilde{X}}^{\prime}\tilde{Y}{\left({\tilde{Y}}^{\prime}\tilde{Y}\right)}^{-1}{\tilde{Y}}^{\prime}\tilde{X}{\nu}_{i}={\tilde{X}}^{\prime}\tilde{X}{\widehat{\nu}}_{i}{\widehat{\lambda}}_{i}.$$

Given the normalization ${\beta}^{\prime}{\tilde{X}}^{\prime}\tilde{X}\beta ={I}_{r}$, Johansen’s reduced-rank estimator for $\beta $ is

$${\widehat{\beta}}_{\mathrm{mle}}=\left[{\widehat{\nu}}_{1},\dots ,{\widehat{\nu}}_{r}\right].$$

The MLE ${\widehat{\alpha}}_{\mathrm{mle}}$ and ${\widehat{\mathsf{\Psi}}}_{\mathrm{mle}}$ are found by least-squares regression of ${Y}_{t}$ on ${\widehat{\beta}}_{\mathrm{mle}}^{\prime}{X}_{t}$ and ${Z}_{t}$.

Define ${W}_{t}={\left({X}_{t}^{\prime},{Z}_{t}^{\prime}\right)}^{\prime}$. The GMM estimator of the reduced-rank regression model (2) is derived under the standard orthogonality restriction:
plus the homoscedasticity condition:
where $\mathsf{\Omega}=\mathbb{E}\left({e}_{t}{e}_{t}^{\prime}\right)$ and $Q=\mathbb{E}\left({W}_{t}{W}_{t}^{\prime}\right)$. These moment conditions are implied by the normal regression model. (Equations (7) and (8) can be deduced from the first-order conditions for maximization of (5)). Because (7) and (8) can be deduced from (5) but not vice versa, the moment condition model (7) and (8) is considerably more general than the normal regression model (5).

$$\mathbb{E}\left({W}_{t}{e}_{t}^{\prime}\right)=0$$

$$\mathbb{E}\left({e}_{t}{e}_{t}^{\prime}\otimes {W}_{t}{W}_{t}^{\prime}\right)=\mathsf{\Omega}\otimes Q,$$

The efficient GMM criterion (see Hansen 1982) takes the form
where
and ${\widehat{e}}_{t}$ are the least-squares residuals of the unconstrained model:

$${J}_{r}(\alpha ,\beta ,\mathsf{\Psi})=T{\overline{g}}_{r}{\left(\alpha ,\beta ,\mathsf{\Psi}\right)}^{\prime}{\widehat{V}}^{-1}{\overline{g}}_{r}\left(\alpha ,\beta ,\mathsf{\Psi}\right),$$

$$\begin{array}{cc}\hfill {\overline{g}}_{r}\left(\alpha ,\beta ,\mathsf{\Psi}\right)& =\frac{1}{T}\sum _{t=1}^{n}\left(\left({Y}_{t}-\alpha {\beta}^{\prime}{X}_{t}-\mathsf{\Psi}{Z}_{t}\right)\otimes {W}_{t}\right),\hfill \\ \hfill \widehat{V}& =\widehat{\mathsf{\Omega}}\otimes \widehat{Q},\hfill \end{array}$$

$$\begin{array}{cc}\hfill \widehat{\mathsf{\Omega}}& =\frac{1}{T}\sum _{t=1}^{n}{\widehat{e}}_{t}{\widehat{e}}_{t}^{\prime},\hfill \\ \hfill \widehat{Q}& =\frac{1}{T}\sum _{t=1}^{n}{W}_{t}{W}_{t}^{\prime},\hfill \end{array}$$

$${\widehat{e}}_{t}={Y}_{t}-\widehat{\mathsf{\Pi}}{X}_{t}-\widehat{\mathsf{\Psi}}{Z}_{t}.$$

The GMM estimator are the parameters that jointly minimize the criterion ${J}_{r}\left(\alpha ,\beta ,\mathsf{\Psi}\right)$ subject to the normalization ${\beta}^{\prime}{\tilde{X}}^{\prime}\tilde{X}\beta ={I}_{r}$:

$$\left({\widehat{\alpha}}_{\mathrm{gmm}},{\widehat{\beta}}_{\mathrm{gmm}},{\widehat{\mathsf{\Psi}}}_{\mathrm{gmm}}\right)=\underset{{\beta}^{\prime}{\tilde{X}}^{\prime}\tilde{X}\beta ={I}_{r}}{argmin}{J}_{r}\left(\alpha ,\beta ,\mathsf{\Psi}\right).$$

The main contribution of the paper is the following surprising result.

$\left({\widehat{\alpha}}_{\mathrm{gmm}},{\widehat{\beta}}_{\mathrm{gmm}},{\widehat{\mathsf{\Psi}}}_{\mathrm{gmm}}\right)=\left({\widehat{\alpha}}_{\mathrm{mle}},{\widehat{\beta}}_{\mathrm{mle}},{\widehat{\mathsf{\Psi}}}_{\mathrm{mle}}\right).$

${J}_{r}({\widehat{\alpha}}_{\mathrm{gmm}},{\widehat{\beta}}_{\mathrm{gmm}},{\widehat{\mathsf{\Psi}}}_{\mathrm{gmm}})=tr\left({\widehat{\mathsf{\Omega}}}^{-1}\left({\tilde{Y}}^{\prime}\tilde{Y}\right)\right)-Tp-T{\sum}_{i=1}^{r}\frac{{\widehat{\lambda}}_{i}}{1-{\widehat{\lambda}}_{i}}$, where ${\widehat{\lambda}}_{i}$ are the eigenvalues from (6).

Theorem 1 states that the GMM estimator is algebraically identical to the Gaussian maximum likelihood estimator.

This shows that Johansen’s reduced-rank regression estimator is not tied to the normality assumption. This is similar to the equivalence of least-squares as a method of moments estimator and the Gaussian MLE in the regression context.

The key is the use of the homoscedastic weight matrix. This shows that the Johansen reduced-rank estimator is an efficient GMM estimator under conditional homoscedasticity. When homoscedasticity fails, the Johansen reduced-rank estimator continues to be a GMM estimator but is no longer the efficient GMM estimator.

It is important to understand that Theorem 1 is different from the trivial statement that the MLE is GMM applied to the first-order condition of the likelihood (e.g., Hall (2005), Section 3.8.1). Specifically, if you take the derivatives of the Gaussian log-likelihood function (5) and treat these as moment conditions and solve, this is a GMM estimator, and thus MLE can be interpreted as GMM. That is not what Theorem 1 states.

GMM hypothesis tests can be constructed by the difference in the GMM criteria; tests for reduced rank are considered, which in the context of VECM are tests for cointegration rank. The model
is taken and the following hypotheses on reduced rank are considered:

$${Y}_{t}=\mathsf{\Pi}{X}_{t}+\mathsf{\Psi}{Z}_{t}+{e}_{t}$$

$${\mathbb{H}}_{r}:rank\left(\mathsf{\Pi}\right)=r.$$

The GMM test statistic for ${\mathbb{H}}_{r}$ against ${\mathbb{H}}_{r+1}$ is

$${C}_{r,r+1}=\underset{{\beta}^{\prime}{\tilde{X}}^{\prime}\tilde{X}\beta ={I}_{r}}{min}{J}_{r}\left(\alpha ,\beta ,\mathsf{\Psi}\right)-\underset{{\beta}^{\prime}{\tilde{X}}^{\prime}\tilde{X}\beta ={I}_{r+1}}{min}{J}_{r+1}\left(\alpha ,\beta ,\mathsf{\Psi}\right).$$

The GMM test statistic for ${\mathbb{H}}_{r}$ against ${\mathbb{H}}_{p}$ is

$${C}_{r,p}=\underset{{\beta}^{\prime}{\tilde{X}}^{\prime}\tilde{X}\beta ={I}_{r}}{min}{J}_{r}\left(\alpha ,\beta ,\mathsf{\Psi}\right)-\underset{{\beta}^{\prime}{\tilde{X}}^{\prime}\tilde{X}\beta ={I}_{p}}{min}{J}_{p}\left(\alpha ,\beta ,\mathsf{\Psi}\right).$$

The GMM test statistics for reduced rank are
where ${\widehat{\lambda}}_{i}$ are the eigenvalues from (6).

$$\begin{array}{cc}\hfill {C}_{r,r+1}& =T\left(\frac{{\widehat{\lambda}}_{r+1}}{1-{\widehat{\lambda}}_{r+1}}\right),\hfill \\ \hfill {C}_{r,p}& =T\sum _{i=r+1}^{p}\frac{{\widehat{\lambda}}_{i}}{1-{\widehat{\lambda}}_{i}},\hfill \end{array}$$

Here it is recalled in contrast that the likelihood ratio test statistics derived by Johansen are

$$\begin{array}{cc}\hfill L{R}_{r,r+1}& =-Tlog\left(1-{\widehat{\lambda}}_{r+1}\right),\hfill \\ \hfill L{R}_{r,p}& =-T\sum _{i=r+1}^{p}log\left(1-{\widehat{\lambda}}_{r+1}\right).\hfill \end{array}$$

The GMM test statistic ${C}_{r,r+1}$ and the likelihood ratio (LR) statistic $L{R}_{r,r+1}$ yield equivalent tests, as they are monotonic functions of one another. (If the bootstrap is used to assess significance, the two statistics will yield numerically identical p-values.) They are asymptotically identical under standard approximations and in practice will be nearly identical, because the eigenvalues ${\widehat{\lambda}}_{i}$ tend to be quite small in value (at least under the null hypothesis), so that $-log\left(1-\lambda \right)\approx \lambda /(1-\lambda )\approx \lambda $. For $p-(r+1)>1$, the GMM test statistic ${C}_{r,p}$ and the LR statistic $L{R}_{r,p}$ do not provide equivalent tests (they cannot be written as monotonic functions of one another), but they are also asymptotically equivalent and will be nearly identical in practice.

It is convenient to rewrite the criterion in standard matrix notation, defining the matrices Y, X, Z, and W by stacking the observations. Model (2) is

$$Y=X\beta {\alpha}^{\prime}+Z{\mathsf{\Psi}}^{\prime}+e.$$

The moment (9) is

$${\overline{g}}_{r}\left(\alpha ,\beta ,\mathsf{\Psi}\right)=\frac{1}{T}vec\left({W}^{\prime}\left(Y-X\beta {\alpha}^{\prime}-Z{\mathsf{\Psi}}^{\prime}\right)\right).$$

Using the relation
the following is obtained:

$$tr\left(ABCD\right)=vec{\left({D}^{\prime}\right)}^{\prime}\left({C}^{\prime}\otimes A\right)vec\left(B\right),$$

$$\begin{array}{cc}\hfill {J}_{r}(\alpha ,\beta ,G)& =T{\overline{g}}_{r}{\left(\alpha ,\beta ,\mathsf{\Psi}\right)}^{\prime}\left({\widehat{\mathsf{\Omega}}}^{-1}\otimes {\widehat{Q}}^{-1}\right){\overline{g}}_{r}\left(\alpha ,\beta ,\mathsf{\Psi}\right)\hfill \\ & =vec{\left({W}^{\prime}\left(Y-X\beta {\alpha}^{\prime}-Z{\mathsf{\Psi}}^{\prime}\right)\right)}^{\prime}\left({\widehat{\mathsf{\Omega}}}^{-1}\otimes {\left({W}^{\prime}W\right)}^{-1}\right)vec\left({W}^{\prime}\left(Y-X\beta {\alpha}^{\prime}-Z{\mathsf{\Psi}}^{\prime}\right)\right)\hfill \\ & =tr\left({\widehat{\mathsf{\Omega}}}^{-1}{\left(Y-X\beta {\alpha}^{\prime}-Z{\mathsf{\Psi}}^{\prime}\right)}^{\prime}W{\left({W}^{\prime}W\right)}^{-1}{W}^{\prime}\left(Y-X\beta {\alpha}^{\prime}-Z{\mathsf{\Psi}}^{\prime}\right)\right).\hfill \end{array}$$

Following the concentration strategy used by Johansen, $\beta $ is fixed and $\alpha $ and $\mathsf{\Psi}$ are concentrated out, producing a concentrated criterion that is a function of $\beta $ only. The system is linear in the regressors $X\beta $ and Z. Given the homoscedastic weight matrix, the GMM estimator of $(\alpha ,\mathsf{\Psi})$ is multivariate least-squares. Using the partialling out (residual regression) approach, the least-squares residual can be written as the residual from the regression of $\tilde{Y}$ on $\tilde{X}\beta $, where $\tilde{Y}={M}_{Z}Y$ and $\tilde{X}={M}_{Z}X$ are the residuals from regressions on Z. That is, the least-squares residual is
where the second equality uses the normalization ${\beta}^{\prime}{\tilde{X}}^{\prime}\tilde{X}\beta ={I}_{r}$. Because the space spanned by $W=(X,Z)$ equals that spanned by $(\tilde{X},Z)$, the following can be written:

$$\begin{array}{cc}\hfill \widehat{e}\left(\beta \right)& =\tilde{Y}-\tilde{X}\beta {\left({\beta}^{\prime}{\tilde{X}}^{\prime}\tilde{X}\beta \right)}^{-1}{\beta}^{\prime}{\tilde{X}}^{\prime}\tilde{Y}\hfill \\ & =\tilde{Y}-\tilde{X}\beta {\beta}^{\prime}{\tilde{X}}^{\prime}\tilde{Y},\hfill \end{array}$$

$$W{\left({W}^{\prime}W\right)}^{-1}{W}^{\prime}=Z{\left({Z}^{\prime}Z\right)}^{-1}{Z}^{\prime}+\tilde{X}{\left({\tilde{X}}^{\prime}\tilde{X}\right)}^{-1}{\tilde{X}}^{\prime}.$$

Because ${Z}^{\prime}\widehat{e}\left(\beta \right)=0$, then
and
where

$$\begin{array}{cc}\hfill W{\left({W}^{\prime}W\right)}^{-1}{W}^{\prime}\widehat{e}\left(\beta \right)& =\tilde{X}{\left({\tilde{X}}^{\prime}\tilde{X}\right)}^{-1}{\tilde{X}}^{\prime}\widehat{e}\left(\beta \right)\hfill \\ & =\tilde{X}{\left({\tilde{X}}^{\prime}\tilde{X}\right)}^{-1}{\tilde{X}}^{\prime}\tilde{Y}-\tilde{X}\beta {\beta}^{\prime}{\tilde{X}}^{\prime}\tilde{Y}\hfill \end{array}$$

$$\begin{array}{cc}\hfill \widehat{e}{\left(\beta \right)}^{\prime}W{\left({W}^{\prime}W\right)}^{-1}{W}^{\prime}\widehat{e}\left(\beta \right)& ={\tilde{Y}}^{\prime}\tilde{X}{\left({\tilde{X}}^{\prime}\tilde{X}\right)}^{-1}{\tilde{X}}^{\prime}\tilde{Y}-{\tilde{Y}}^{\prime}\tilde{X}\beta {\beta}^{\prime}{\tilde{X}}^{\prime}\tilde{Y}\hfill \\ & ={\tilde{Y}}^{\prime}\tilde{Y}-{\tilde{Y}}^{\prime}{M}_{\tilde{X}}\tilde{Y}-{\tilde{Y}}^{\prime}\tilde{X}\beta {\beta}^{\prime}{\tilde{X}}^{\prime}\tilde{Y},\hfill \end{array}$$

$${M}_{\tilde{X}}=I-\tilde{X}{\left({\tilde{X}}^{\prime}\tilde{X}\right)}^{-1}{\tilde{X}}^{\prime}.$$

Using the partialling out (residual regression) approach, the variance estimator (10) can be written as

$$\widehat{\mathsf{\Omega}}=\frac{1}{T}{Y}^{\prime}\left(I-W{\left({W}^{\prime}W\right)}^{-1}{W}^{\prime}\right)Y=\frac{1}{T}{\tilde{Y}}^{\prime}{M}_{\tilde{X}}\tilde{Y}.$$

Thus the concentrated GMM criterion is

$$\begin{array}{cc}\hfill {J}_{r}^{*}\left(\beta \right)& =tr\left({\widehat{\mathsf{\Omega}}}^{-1}\widehat{e}{\left(\beta \right)}^{\prime}W{\left({W}^{\prime}W\right)}^{-1}{W}^{\prime}\widehat{e}\left(\beta \right)\right)\hfill \\ & =tr\left({\widehat{\mathsf{\Omega}}}^{-1}\left({\tilde{Y}}^{\prime}\tilde{Y}\right)\right)-tr\left({\widehat{\mathsf{\Omega}}}^{-1}\left({\tilde{Y}}^{\prime}{M}_{\tilde{X}}\tilde{Y}\right)\right)-tr\left({\widehat{\mathsf{\Omega}}}^{-1}\left({\tilde{Y}}^{\prime}\tilde{X}\beta {\beta}^{\prime}{\tilde{X}}^{\prime}\tilde{Y}\right)\right)\hfill \\ & =tr\left({\widehat{\mathsf{\Omega}}}^{-1}\left({\tilde{Y}}^{\prime}\tilde{Y}\right)\right)-Tp-Ttr\left({\beta}^{\prime}{\tilde{X}}^{\prime}\tilde{Y}{\left({\tilde{Y}}^{\prime}{M}_{\tilde{X}}\tilde{Y}\right)}^{-1}{\tilde{Y}}^{\prime}\tilde{X}\beta \right).\hfill \end{array}$$

The GMM estimator minimizes ${J}_{r}^{*}\left(\beta \right)$ or, equivalently, maximizes the third term in (11). This is a generalized eigenvalue problem. Lemma 2 (in the next section) shows that the solution is ${\widehat{\beta}}_{\mathrm{gmm}}=\left[{\tilde{\nu}}_{1},\dots ,{\tilde{\nu}}_{r}\right]$ as claimed.

Because the estimates ${\widehat{\alpha}}_{\mathrm{gmm}}$ and ${\widehat{\mathsf{\Psi}}}_{\mathrm{gmm}}$ are found by regression given ${\widehat{\beta}}_{\mathrm{gmm}}$, and because this is equivalent with the MLE, it is also concluded that ${\widehat{\alpha}}_{\mathrm{gmm}}={\widehat{\alpha}}_{\mathrm{mle}}$ and ${\widehat{\mathsf{\Psi}}}_{\mathrm{gmm}}={\widehat{\mathsf{\Psi}}}_{\mathrm{mle}}$. This completes the proof of Theorem 1.

To establish Theorem 2, Lemma 2 also shows that the minimum of the criterion is

$$\begin{array}{cc}\hfill {J}_{r}({\widehat{\alpha}}_{\mathrm{gmm}},{\widehat{\beta}}_{\mathrm{gmm}},{\widehat{\mathsf{\Psi}}}_{\mathrm{gmm}})& =\underset{{\beta}^{\prime}{\tilde{X}}^{\prime}\tilde{X}\beta ={I}_{r}}{min}{J}_{r}(\alpha ,\beta ,G)\hfill \\ & =\underset{{\beta}^{\prime}{\tilde{X}}^{\prime}\tilde{X}\beta ={I}_{r}}{min}{J}_{r}^{*}\left(\beta \right)\hfill \\ & =tr\left({\widehat{\mathsf{\Omega}}}^{-1}\left({\tilde{Y}}^{\prime}\tilde{Y}\right)\right)-Tp-T\underset{{\beta}^{\prime}{\tilde{X}}^{\prime}\tilde{X}\beta ={I}_{r}}{max}tr\left({\beta}^{\prime}{\tilde{X}}^{\prime}\tilde{Y}{\left({\tilde{Y}}^{\prime}{M}_{\tilde{X}}\tilde{Y}\right)}^{-1}{\tilde{Y}}^{\prime}\tilde{X}\beta \right)\hfill \\ & =tr\left({\widehat{\mathsf{\Omega}}}^{-1}\left({\tilde{Y}}^{\prime}\tilde{Y}\right)\right)-Tp-T\sum _{i=1}^{r}\frac{{\widehat{\lambda}}_{i}}{1-{\widehat{\lambda}}_{i}}.\hfill \end{array}$$

This establishes Theorem 2.

To establish Theorems 1 and 2, a simple extrema property is necessary. First, a simple property that relates the maximization of quadratic forms to generalized eigenvalues and eigenvectors is given. It is a slight extension of Theorem 11.13 of Magnus and Neudecker (1988).

Suppose A and C are $p\times p$ real symmetric matrices with $C>0$. Let ${\lambda}_{1}>\cdots >{\lambda}_{p}>0$ be the generalized eigenvalues of A with respect to C and ${\nu}_{1},\dots ,{\nu}_{p}$ be the associated eigenvectors. Then
and

$$\underset{{\beta}^{\prime}C\beta ={I}_{r}}{max}tr\left({\beta}^{\prime}A\beta \right)=\sum _{i=1}^{r}{\lambda}_{i}$$

$$\underset{{\beta}^{\prime}C\beta ={I}_{r}}{argmax}tr\left({\beta}^{\prime}A\beta \right)=\left[{\nu}_{1},\dots ,{\nu}_{r}\right].$$

Define $\gamma ={C}^{1/2\prime}\beta $ and $\overline{A}={C}^{-1/2}A{C}^{-1/2\prime}$. The eigenvalues of $\overline{A}$ are equal to the generalized eigenvalues ${\lambda}_{i}$ of A with respect to C. The associated eigenvectors of $\overline{A}$ are ${C}^{1/2\prime}{\nu}_{i}$. Thus by Theorem 11.13 of Magnus and Neudecker (1988),
and
as claimed. ☐

$$\underset{{\beta}^{\prime}C\beta ={I}_{r}}{max}tr\left({\beta}^{\prime}A\beta \right)=\underset{{\gamma}^{\prime}\gamma ={I}_{r}}{max}tr\left({\gamma}^{\prime}\overline{A}\gamma \right)=\sum _{i=1}^{r}{\lambda}_{i}$$

$$\begin{array}{cc}\hfill \underset{{\beta}^{\prime}C\beta ={I}_{r}}{argmax}tr\left({\beta}^{\prime}A\beta \right)& ={C}^{-1/2\prime}\underset{{\gamma}^{\prime}\gamma ={I}_{r}}{argmax}tr\left({\gamma}^{\prime}\overline{A}\gamma \right)\hfill \\ & ={C}^{-1/2\prime}{C}^{1/2\prime}\left[{\nu}_{1},...,{\nu}_{r}\right]\hfill \\ & =\left[{\nu}_{1},...,{\nu}_{r}\right]\hfill \end{array}$$

Let ${M}_{X}=I-X{\left({X}^{\prime}X\right)}^{-1}{X}^{\prime}$. If ${X}^{\prime}X>0$ and ${Y}^{\prime}{M}_{X}Y>0$ then
and
where $1>{\lambda}_{1}>\cdots >{\lambda}_{p}>0$ are the generalized eigenvalues of ${X}^{\prime}Y{\left({Y}^{\prime}Y\right)}^{-1}{Y}^{\prime}X$ with respect to ${X}^{\prime}X$, and ${\nu}_{1},\dots ,{v}_{p}$ are the associated eigenvectors.

$$\underset{{\beta}^{\prime}{X}^{\prime}X\beta ={I}_{r}}{max}tr\left({\beta}^{\prime}{X}^{\prime}Y{\left({Y}^{\prime}{M}_{X}Y\right)}^{-1}{Y}^{\prime}X\beta \right)=\sum _{i=1}^{r}\frac{{\lambda}_{i}}{1-{\lambda}_{i}}$$

$$\underset{{\beta}^{\prime}{X}^{\prime}X\beta ={I}_{r}}{argmax}tr\left({\beta}^{\prime}{X}^{\prime}Y{\left({Y}^{\prime}{M}_{X}Y\right)}^{-1}{Y}^{\prime}X\beta \right)=\left[{\nu}_{1},\dots ,{\nu}_{r}\right],$$

By Lemma 1,
and
where ${\tilde{\lambda}}_{1}>\cdots >{\tilde{\lambda}}_{p}>0$ are the generalized eigenvalues of ${X}^{\prime}Y{\left({Y}^{\prime}{M}_{X}Y\right)}^{-1}{Y}^{\prime}X$ with respect to ${X}^{\prime}X$ and ${\tilde{\nu}}_{1},\dots ,{\tilde{\nu}}_{p}$ are the associated eigenvectors. The proof is established by showing that ${\tilde{\lambda}}_{i}={\lambda}_{i}/(1-{\lambda}_{i})$ and ${\tilde{\nu}}_{i}={\nu}_{i}.$

$$\underset{{\beta}^{\prime}{X}^{\prime}X\beta ={I}_{r}}{max}tr\left({\beta}^{\prime}{X}^{\prime}Y{\left({Y}^{\prime}{M}_{X}Y\right)}^{-1}{Y}^{\prime}X\beta \right)=\sum _{i=1}^{r}{\tilde{\lambda}}_{i}$$

$$\underset{{\beta}^{\prime}{X}^{\prime}X\beta ={I}_{r}}{argmax}tr\left({\beta}^{\prime}{X}^{\prime}Y{\left({Y}^{\prime}{M}_{X}Y\right)}^{-1}{Y}^{\prime}X\beta \right)=\left[{\tilde{\nu}}_{1},\dots ,{\tilde{\nu}}_{r}\right],$$

Let $(\tilde{\nu},\tilde{\lambda})$ be a generalized eigenvector/eigenvalue pair of ${X}^{\prime}Y{\left({Y}^{\prime}{M}_{X}Y\right)}^{-1}{Y}^{\prime}X$ with respect to ${X}^{\prime}X$. The pair satisfies

$${X}^{\prime}Y{\left({Y}^{\prime}{M}_{X}Y\right)}^{-1}{Y}^{\prime}X\tilde{\nu}={X}^{\prime}X\tilde{\nu}\tilde{\lambda}.$$

By the Woodbury matrix identity (e.g., Magnus and Neudecker (1988), Equation (7)),
where ${M}_{Y}=I-Y{\left({Y}^{\prime}Y\right)}^{-1}{Y}^{\prime}$. Thus
where ${P}_{Y}=Y{\left({Y}^{\prime}Y\right)}^{-1}{Y}^{\prime}$ and the final equality uses ${X}^{\prime}{P}_{Y}X={X}^{\prime}X-{X}^{\prime}{M}_{Y}X$. Substituting into (12) produces
Multiplying both sides by $\left({X}^{\prime}{M}_{Y}X\right){\left({X}^{\prime}X\right)}^{-1}$, this implies
By collecting terms,
which implies

$$\begin{array}{cc}\hfill {\left({Y}^{\prime}{M}_{X}Y\right)}^{-1}& ={\left({Y}^{\prime}Y-{Y}^{\prime}X{\left({X}^{\prime}X\right)}^{-1}{X}^{\prime}Y\right)}^{-1}\hfill \\ & ={\left({Y}^{\prime}Y\right)}^{-1}+{\left({Y}^{\prime}Y\right)}^{-1}{Y}^{\prime}X{\left({X}^{\prime}X-{X}^{\prime}Y{\left({Y}^{\prime}Y\right)}^{-1}{Y}^{\prime}X\right)}^{-1}{X}^{\prime}Y{\left({Y}^{\prime}Y\right)}^{-1}\hfill \\ & ={\left({Y}^{\prime}Y\right)}^{-1}+{\left({Y}^{\prime}Y\right)}^{-1}{Y}^{\prime}X{\left({X}^{\prime}{M}_{Y}X\right)}^{-1}{X}^{\prime}Y{\left({Y}^{\prime}Y\right)}^{-1},\hfill \end{array}$$

$$\begin{array}{cc}\hfill {X}^{\prime}Y{\left({Y}^{\prime}{M}_{X}Y\right)}^{-1}{Y}^{\prime}X& ={X}^{\prime}Y{\left({Y}^{\prime}Y\right)}^{-1}{Y}^{\prime}X+{X}^{\prime}Y{\left({Y}^{\prime}Y\right)}^{-1}{Y}^{\prime}X{\left({X}^{\prime}{M}_{Y}X\right)}^{-1}{X}^{\prime}Y{\left({Y}^{\prime}Y\right)}^{-1}{Y}^{\prime}X\hfill \\ & ={X}^{\prime}{P}_{Y}X+{X}^{\prime}{P}_{Y}X{\left({X}^{\prime}{M}_{Y}X\right)}^{-1}{X}^{\prime}{P}_{Y}X\hfill \\ & ={X}^{\prime}X{\left({X}^{\prime}{M}_{Y}X\right)}^{-1}{X}^{\prime}{P}_{Y}X,\hfill \end{array}$$

$${X}^{\prime}X{\left({X}^{\prime}{M}_{Y}X\right)}^{-1}{X}^{\prime}{P}_{Y}X\tilde{\nu}={X}^{\prime}X\tilde{\nu}\tilde{\lambda}.$$

$$\begin{array}{cc}\hfill {X}^{\prime}{P}_{Y}X\tilde{\nu}& ={X}^{\prime}{M}_{Y}X\tilde{\nu}\tilde{\lambda}\hfill \\ & ={X}^{\prime}X\tilde{\nu}\tilde{\lambda}-{X}^{\prime}{P}_{Y}X\tilde{\nu}\tilde{\lambda}.\hfill \end{array}$$

$${X}^{\prime}{P}_{Y}X\tilde{\nu}(1+\tilde{\lambda})={X}^{\prime}X\tilde{\nu}\tilde{\lambda},$$

$${X}^{\prime}{P}_{Y}X\tilde{\nu}={X}^{\prime}X\tilde{\nu}\frac{\tilde{\lambda}}{(1+\tilde{\lambda})}.$$

This is an eigenvalue equation. It shows that $\tilde{\lambda}/(1+\tilde{\lambda})=\lambda $ is a generalized eigenvalue and $\tilde{\nu}$ is the associated eigenvector of ${X}^{\prime}{P}_{Y}X$ with respect to ${X}^{\prime}X$. Solving, $\tilde{\lambda}=\lambda /(1-\lambda )$. This means that the generalized eigenvalues of ${X}^{\prime}Y{\left({Y}^{\prime}{M}_{X}Y\right)}^{-1}{Y}^{\prime}X$ with respect to ${X}^{\prime}X$ are ${\lambda}_{i}/(1-{\lambda}_{i})$ and ${\nu}_{i}$. Because $\lambda /(1-\lambda )$ is monotonically increasing on $[0,1)$ and ${\lambda}_{i}<1$, it follows that the orderings of ${\lambda}_{i}$ and ${\tilde{\lambda}}_{i}$ are identical. Thus ${\tilde{\lambda}}_{i}={\lambda}_{i}/(1-{\lambda}_{i})$ as claimed. ☐

This research is supported by the National Science Foundation and the Phipps Chair. Thanks to Richard Crump, the co-editors, and two referees for helpful comments on an earlier version. The author gives special thanks to Soren Johansen and Katerina Juselius for many years of stunning research, stimulating conversations, and impeccable scholarship.

The author declares no conflict of interest.

- Adrian, Tobias, Richard K. Crump, and Emanuel Moench. 2015. Regression-based estimation of dynamic asset pricing models. Journal of Financial Economics 118: 211–44. [Google Scholar] [CrossRef]
- Anderson, Theodore Wilbur. 1951. Estimating linear restrictions on regression coefficeints for multivariate normal distributions. Annals of Mathematical Statistics 22: 327–50. [Google Scholar] [CrossRef]
- Anderson, Theodore Wilbur, and Herman Rubin. 1949. Estimation of the parameters of a single equation in a complete system of stochastic equations. Annals of Mathematical Statistics 20: 46–63. [Google Scholar] [CrossRef]
- Anderson, Theodore Wilbur, and Herman Rubin. 1950. The asymptotic properties of estimates of the parameters of a single equation in a complete system of stochastic equations. Annals of Mathematical Statistics 21: 570–82. [Google Scholar] [CrossRef]
- Bartlett, Maurice S. 1938. Further aspects of the theory of multiple regression. Proceedings of the Cambridge Philosophical Society 34: 33–40. [Google Scholar] [CrossRef]
- Engle, Robert F., and Clive W. J. Granger. 1987. Co-integration and error correction: Representation, estimation, and testing. Econometrica 55: 251–76. [Google Scholar] [CrossRef]
- Goldberger, Arthur S., and Ingram Olkin. 1971. A minimum-distance interpretation of limited-information estimation. Econometrica 39: 635–49. [Google Scholar] [CrossRef]
- Hall, Alastair R. 2005. Generalized Method of Moments. Oxford: Oxford University Press. [Google Scholar]
- Hansen, Lars Peter. 1982. Large sample properties of generalized method of moments estimators. Econometrica 50: 1029–54. [Google Scholar] [CrossRef]
- Hotelling, Harold. 1936. Relations between two sets of variates. Biometrika 28: 321–77. [Google Scholar] [CrossRef]
- Johansen, Søren. 1988. Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control 12: 231–54. [Google Scholar] [CrossRef]
- Johansen, Søren. 1991. Estimation and hypothesis testing of cointegration vectors in Gaussian vector autoregressive models. Econometrica 59: 1551–80. [Google Scholar] [CrossRef]
- Johansen, Søren. 1995. Likelihood-Based Inference in Cointegrated Vector Auto-Regressive Models. Oxford: Oxford University Press. [Google Scholar]
- Kolesár, Michal. 2018. Minimum distance approach to inference with many instruments. Journal of Econometrics 204: 86–100. [Google Scholar] [CrossRef]
- Magnus, Jan R., and Heinz Neudecker. 1988. Matrix Differential Calculus with Applications in Statistics and Econometrics. New York: Wiley. [Google Scholar]
- Muirhead, Robb J. 1982. Aspects of Multivariate Statistical Theory. New York: Wiley. [Google Scholar]
- Pillai, K. C. S. 1955. Some new test criteria in multivariate analysis. The Annals of Mathematical Statistics 26: 117–21. [Google Scholar] [CrossRef]
- Sims, Christopher A. 1980. Macroeconomics and reality. Econometrica 48: 1–8. [Google Scholar] [CrossRef]
- Tso, M. K.-S. 1981. Reduced-rank regression and canonical analysis. Journal of the Royal Statistical Society, Series B 43: 183–89. [Google Scholar]

© 2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).