## 1. Introduction

In a conceptual exploration of long-run causal order,

Hoover (

2018) applies the CVAR(1) model for the processes

${X}_{t}={({x}_{1t},\cdots ,{x}_{pt})}^{\prime}$ and

${T}_{t}={({T}_{1t},\cdots ,{T}_{mt})}^{\prime}$, to model a causal graph. The process

${({X}_{t}^{\prime};{T}_{t}^{\prime})}^{\prime}$ is a solution to the equations

where the error terms

${\epsilon}_{t}$ are independent identically distributed (i.i.d.) Gaussian variables with mean 0 and variance

${\mathsf{\Omega}}_{\epsilon}=\mathrm{diag}({\omega}_{11},\cdots ,{\omega}_{pp})>0,$ and are independent of the errors

${\eta}_{t}$, which are (i.i.d.) Gaussian with mean 0 and variance

${\mathsf{\Omega}}_{\eta}$.

Thus, the stochastic trends, ${T}_{t}$ are nonstationary random walks and conditions will be given below for ${X}_{t}$ to be $I(1),$ that is, nonstationary, but $\Delta {X}_{t}$ stationary. This will imply that $M{X}_{t}+C{T}_{t}$ is stationary, so that ${X}_{t}$ and ${T}_{t}$ cointegrate.

The entry ${M}_{ij}\ne 0$ means that ${x}_{j}$ causes ${x}_{i},$ which is written ${x}_{j}\to {x}_{i}$, and ${C}_{ij}\ne 0$ means that ${T}_{j}\to {x}_{i},$ and it is further assumed that ${M}_{ii}\ne 0.$ Note that the model assumes that there are no causal links from ${X}_{t}$ to ${T}_{t},$ so that ${T}_{t}$ is strongly exogenous.

A simple example for three variables,

${x}_{1}$,

${x}_{2}$,

${x}_{3}$, and a trend

T, is the graph

where the matrices are given by

where ∗ indicates a nonzero coefficient.

Provided that

${I}_{p}+M$ has all eigenvalues in the open unit disk, it is seen that

determines a stationary process defined for all

$t.$ We define a nonstationary solution to (

1) for

$t=0,1,\cdots $ by

Note that the starting values are

It is seen that

$\Delta {X}_{t+1}$,

$\Delta {T}_{t+1}$ and

$M{X}_{t}+C{T}_{t}$ are stationary processes for all

$t,$ and that

${({X}_{t}^{\prime};{T}_{t}^{\phantom{\rule{4pt}{0ex}}\prime})}^{\prime}$ is a solution to Equation (

1). In the following, we assume that

${({X}_{t}^{\prime};{T}_{t}^{\prime})}^{\prime}$ is defined by (

2) for

$t=0,1,\cdots $The paper by Hoover gives a detailed and general discussion of the problems of recovering causal structures from nonstationary observations ${X}_{t},$ or subsets of ${X}_{t},$ when ${T}_{t}$ is unobserved, that is, ${X}_{t}={({X}_{1t}^{\prime};{X}_{2t}^{\prime})}^{\prime}$ where the observations ${X}_{1t}$ are ${p}_{1}$-dimensional and the unobserved processes ${X}_{2t}$ and ${T}_{t}$ are ${p}_{2}$- and m-dimensional respectively, $p={p}_{1}+{p}_{2}$. It is assumed that there are at least as many observations as trends, that is ${p}_{1}\ge m.$

Model (

1) is therefore rewritten as

Note that there is now a causal link from the observed process

${X}_{1t}^{\phantom{\rule{4pt}{0ex}}}$ to the unobserved process

${X}_{2t}$ if

${M}_{21}\ne 0$.

It follows from (

3) that

${X}_{1t}$ is

$I(1)$ and cointegrated with

${p}_{1}-m$ cointegrating vectors

$\beta ,$ see Theorem 1. Therefore,

$\Delta {X}_{1t}^{\phantom{\rule{4pt}{0ex}}}$ has an infinite order autoregressive representation, see (

Johansen and Juselius 2014, Lemma 2), which is written as

where the operator norm

$||{\Gamma}_{i}||={\lambda}_{max}^{1/2}({\Gamma}_{i}^{\prime}{\Gamma}_{i})$ is

$O({\rho}^{i})$ for some

$0<\rho <1$. The matrices

$\alpha $ and

$\beta $ are

${p}_{1}\times m$ of rank

m, and

${\nu}_{t+1}^{\beta}=\Delta {X}_{1,t+1}-E(\Delta {X}_{1,t+1}|{\mathcal{F}}_{t}^{\beta}),$ where

${\mathcal{F}}_{t}^{\beta}=\sigma (\Delta {X}_{1s},s\le t$,

${\beta}^{\prime}{X}_{1t})$. Thus,

${X}_{1t}$ is not measurable with respect to

${\mathcal{F}}_{t}^{\beta}$, but

${\beta}^{\prime}{X}_{1t}$ is measurable with respect to

${\mathcal{F}}_{t}^{\beta}.$ Here, the prediction errors

${\nu}_{t+1}^{\beta}$ are i.i.d.

${N}_{{p}_{1}}(0,\Sigma )$, where

$\Sigma $ is calculated below. The representation of

${X}_{1t},$ similar to (

2), is

where

$\Gamma ={I}_{{p}_{1}}-{\sum}_{i=1}^{\infty}{\Gamma}_{i}$ and

$||{C}_{i}||=O({\rho}^{i}).$ Here,

${\beta}_{\perp}$ is a

${p}_{1}\times ({p}_{1}-m)$ matrix of full rank for which

${\beta}^{\prime}{\beta}_{\perp}=0$, and similarly for

${\alpha}_{\perp}$. This shows that

${X}_{1t}$ is a cointegrated

$I(1)$ process, that is,

${X}_{1t}$ is nonstationary, while

${\beta}^{\prime}{X}_{1t}$ and

$\Delta {X}_{1t}$ are stationary.

A statistical analysis, including estimation of

$\alpha $,

$\beta $, and

$\Gamma ,$ can be conducted for the observations

${X}_{1t},$$t=1,\cdots T,$ using an approximating finite order CVAR, see

Saikkonen (

1992) and

Saikkonen and Lütkepohl (

1996).

Hoover (

2018) investigates, in particular, whether weak exogeneity for

$\beta $ in the approximating finite order CVAR, that is, a zero row in

$\alpha ,$ is a useful tool for finding the causal structure in the graph.

The present note solves the problem of finding expressions for the parameters

$\alpha $ and

$\beta $ in the CVAR(

∞) model (

4) for the observation

${X}_{1t}$, as functions of the parameters in model (

3), and finds conditions on these for the presence of a zero row in

$\alpha ,$ and hence weak exogeneity for

$\beta $ in the approximating finite order CVAR.

## 2. The Assumptions and Main Results

First, some definitions and assumptions are given, then the main results on

$\alpha $ and

$\beta $ are presented and proved in Theorems 1 and 2. These results rely on Theorem A1 on the solution of an algebraic Riccati equation, which is given and proved in the

Appendix A.

In the following, a $k\times k$ matrix is called stable, if all eigenvalues are contained in the open unit disk. If A is a ${k}_{1}\times {k}_{2}$ matrix of rank $k\le min({k}_{1},{k}_{2})$, an orthogonal complement, ${A}_{\perp},$ is defined as a ${k}_{1}\times ({k}_{1}-k)$ matrix of rank ${k}_{1}-k$ for which ${A}_{\perp}^{\prime}A=0$. If ${k}_{1}=k$, ${A}_{\perp}=0.$ Note that ${A}_{\perp}$ is only defined up to multiplication from the right by a $({k}_{1}-k)\times ({k}_{1}-k)$ matrix of full rank. Throughout, ${E}_{t}(.)$ and $Va{r}_{t}(.)$ denote conditional expectation and variance given the sigma-field ${\mathcal{F}}_{0,t}=\sigma \{{X}_{1,s}$, $0\le s\le t\},$ generated by the observations.

**Assumption** **1.** In Equation (3), it is assumed that (i) ${\epsilon}_{1t}$, ${\epsilon}_{2t}$, and ${\eta}_{t}$ are mutually independent and i.i.d. Gaussian with mean zero and variances ${\mathsf{\Omega}}_{1}$, ${\mathsf{\Omega}}_{2}$, and ${\mathsf{\Omega}}_{\eta},$ where ${\mathsf{\Omega}}_{1}$ and ${\mathsf{\Omega}}_{2}$ are diagonal matrices,

(ii) ${I}_{{p}_{1}}+{M}_{11}$, ${I}_{{p}_{2}}+{M}_{22}$ and ${I}_{p}+M$ are stable,

(iii) ${C}_{1.2}={C}_{1}-{M}_{12}{M}_{22}^{-1}{C}_{2}$ has full rank m.

Let ${({X}_{1t}^{\prime};{X}_{2t}^{\prime};{T}_{t}^{\prime})}^{\prime}$, $0=1,\cdots ,n$, be the solution to (3) given in (2), such that $\Delta {X}_{t}$ and $M{X}_{t}+C{T}_{t}$ are stationary. Assumption 1(ii) on

${M}_{11},{M}_{22}$ and

M is taken from

Hoover (

2018) to ensure that, for instance, the process

${X}_{t}$ given by the equations

${X}_{t}=({I}_{p}+M){X}_{t-1}+input,$ is stationary if the input is stationary, such that the nonstationarity of

${X}_{t}$ in model (

3) is created by the trends

${T}_{t},$ and not by the own dynamics of

${X}_{t}\phantom{\rule{4pt}{0ex}}$as given by

$M.$ It follows from this assumption that

M is nonsingular, because

${I}_{p}+M$ is stable, and similarly for

${M}_{11}$ and

${M}_{22}.$ Moreover

${M}_{11.2}={M}_{11}-{M}_{12}{M}_{22}^{-1}{M}_{21}$ is nonsingular because

#### The Main Results

The first result on

$\beta $ is a simple consequence of model (

3).

**Theorem** **1.** Assumption 1 implies that the cointegrating rank is $r={p}_{1}-m,$ and that the coefficients β and ${\beta}_{\perp}$ in the CVAR($\infty )$ representation for ${X}_{1t}$, see (4), are given for ${p}_{1}>m$ as For ${p}_{1}=m,$ ${\beta}_{\perp}$ has rank ${p}_{1},$ and there is no cointegration: $\alpha =\beta =0$.

**Proof** **Theorem** **of** **1.** From the model Equation (

3), it follows, by eliminating

${X}_{2t}$ from the first two equations, that

Solving for the nonstationary terms gives

Multiplying by

${\beta}^{\prime}{M}_{11.2}^{-1}$, it is seen that

${\beta}^{\prime}{X}_{1t}$ is stationary, if

${\beta}^{\prime}{M}_{11.2}^{-1}{C}_{1.2}=0.$ By Assumption 1(i),

${C}_{1.2}$ has rank

$m,$ so that

$\beta $ has rank

${p}_{1}-m,$ which proves (

6). ☐

The result for

$\alpha $ is more involved and is given in Theorem 2. The proof is a further analysis of (

7) and involves first, the representation

${X}_{1t}$ in terms of a sum of prediction errors

${\nu}_{t}^{\beta}=\Delta {X}_{1t}-E(\Delta {X}_{1t}|{\mathcal{F}}_{t-1}^{\beta}),$ see (

5), and second, a representation of

$E({T}_{t}|{\mathcal{F}}_{0,t})=E({T}_{t}|{X}_{10},\cdots ,{X}_{1t})$ as the (weighted) sum of the prediction errors

${\nu}_{0t}=\Delta {X}_{1t}-E(\Delta {X}_{1t}|{\mathcal{F}}_{0,t-1})$. The second representation requires a result from control theory on the solution of an algebraic Riccati equation, together with some results based on the Kalman filter for the calculation of the conditional mean and variance of the unobserved processes

${X}_{2t},{T}_{t}$ given the observations

${X}_{0s}$,

$0\le s\le t$. These are collected as Theorem A1 in the

Appendix A.

For the discussion of these results, it is useful to reformulate (

3) by defining the unobserved variables and errors

and the matrices

One can then show, see Theorem A1, that based on properties of the Gaussian distribution, a recursion can be found for the calculation of

${V}_{t}=Va{r}_{t}({T}_{t}^{\ast})$ and

${E}_{t}={E}_{t}({T}_{t}^{\ast})={E}_{t}({T}_{t}^{\ast}|{\mathcal{F}}_{0t})$ and

${V}_{t}=Va{r}_{t}({T}_{t}^{\ast})=Va{r}_{t}({T}_{t}^{\ast}|{\mathcal{F}}_{0t})$, using the matrices in (

8) and (

9), by the equations Some

It then follows from results from control theory, that

$V={lim}_{t\to \infty}Va{r}_{t}({T}_{t}^{\ast})$ exists and satisfies the algebraic Riccati equation

Moreover, the prediction errors

${\nu}_{0t}=\Delta {X}_{1t}-E(\Delta {X}_{1t}|{\mathcal{F}}_{0,t-1})$ are independent

${N}_{{p}_{1}}(0,{\Sigma}_{t})$ for

${\Sigma}_{t}={C}^{\ast}{V}_{t}{C}^{\ast \prime}+{\mathsf{\Omega}}_{1},$ and the prediction errors

${\nu}_{t}^{\beta}=\Delta {X}_{1t}-E(\Delta {X}_{1t}|{\mathcal{F}}_{t-1}^{\beta})$ are independent identically distributed

${N}_{{p}_{1}}(0,\Sigma )$ for

$\Sigma ={C}^{\ast}V{C}^{\ast \prime}+{\mathsf{\Omega}}_{1}$. Finally,

${E}_{t}({T}_{t})$ has the representation in the prediction errors,

${\nu}_{0i},$
where

${E}_{0}({T}_{0})=E({T}_{0}|{X}_{10})=0$.

Comparing the representation (

5) for

${X}_{1t}$ and (

14) for

${E}_{t}({T}_{t})$ gives a more precise relation between the coefficients of the nonstationary terms in (

7). The main result of the paper is to show how this leads to expressions for the coefficients

$\alpha $ and

${\alpha}_{\perp}$ as functions of the parameters in model (

3).

**Theorem** **2.** Assumption 1 implies, that the coefficients α and ${\alpha}_{\perp}$ in the CVAR($\infty )$ representation of ${X}_{1t}$ are given for ${p}_{1}>m$ aswhere **Proof** **of** **Theorem** **2.** The left hand side of (

7) has two nonstationary terms. The observation

${X}_{1t}$ is represented in (

5) in terms of a random walk in the prediction errors

${\nu}_{i}^{\beta},$ plus a stationary term, and

${T}_{t}$ is a random walk in

${\eta}_{i}.$ Calculating the conditional expectation given the sigma-field

${\mathcal{F}}_{0,t}$,

${T}_{t}$ is replaced by

${E}_{t}({T}_{t}),$ which in (

14) is represented as a weighted sum of

${\nu}_{0i}.$ Thus, the conditional expectation of (

7) gives

where the right hand side is bounded in mean:

Setting

$t=\left[nu\right]$ and dividing by

${n}^{1/2},$ it follows from (

5) that

where

${W}_{\nu}(u)$ is the Brownian motion generated by the i.i.d. prediction errors

${\nu}_{t}^{\beta}.$From (

14), it can be proved that

This follows by replacing

${V}_{t},{\Sigma}_{t}$ by

$V,\Sigma ,$ because for

${\delta}_{t}^{\prime}={V}_{t}{C}^{\ast \prime}{\Sigma}_{t}^{-1}-V{C}^{\ast \prime}{\Sigma}^{-1}\to 0,$ it holds that

Next we can replace

${\nu}_{0t}$ by

${\nu}_{t}^{\beta}$ as follows: For

$t=0,1,\cdots $ the sum

is measurable with respect to both

${\mathcal{F}}_{t}^{\beta}$ and

${\mathcal{F}}_{0t},$ such that

Then

and therefore

which proves (

19).

Finally, setting

$t=\left[nu\right]$ and normalizing (

17) by

${n}^{-1/2},$ it follows that in the limit

This relation shows that the coefficient to

${W}_{\nu}(u)$ is zero, so that

${\alpha}_{\perp}$ can be chosen as

and therefore

$\alpha =\Sigma {({M}_{12}{V}_{2T}+{C}_{1}{V}_{TT})}_{\perp}$ which proves (

15). ☐

## 3. Two Examples of Simplifying Assumptions

It follows from Theorem 2 that in order to investigate a zero row in

$\alpha ,$ the matrix

V is needed. This is easy to calculate from the recursion (

11), for a given value of the parameters, but the properties of

V are more difficult to evaluate. In general,

$\alpha $ does not contain a zero row, but if

${M}_{12}{V}_{2T}=0,$ the expressions for

$\alpha $ and

${\alpha}_{\perp}$ simplify, so that simple conditions on

${M}_{12}$ and

${C}_{1}$ imply a zero row in

$\alpha $ and hence give weak exogeneity in the statistical analysis of the approximating finite order CVAR. This extra condition,

${M}_{12}{V}_{2T}=0,$ implies that

and

such that

$\alpha $ simplifies to

Thus, a condition for a zero row in

$\alpha $ is

because

${\mathsf{\Omega}}_{1}=\mathrm{diag}({\omega}_{1},\cdots ,{\omega}_{{p}_{1}}).$ This is simple to check by inspecting the matrices

${M}_{12}$ and

${C}_{1\perp}$ in model (

3). In the next section, two cases are given, where such a simple solution is available.

**Case** **1** (

M_{12} = 0)

**.** If the unobserved process ${X}_{2t}$ does not cause the observation ${X}_{1t},$ then ${M}_{12}=0.$ Therefore, ${M}_{12}{V}_{2T}=0$ and from (20) it follows thatThus, α has a zero row if ${C}_{1\perp}$ has a zero row.

An example of ${M}_{12}=0$ is the chain $T\to {x}_{1}\to {x}_{2}\to {x}_{3},$ where ${X}_{1}=\{{x}_{1},{x}_{2},{x}_{3}\}$ is observed and ${X}_{2}=0,$ and hence ${M}_{12}=0$ and ${C}_{2}=0.$ Then, because $T\to {x}_{1}$ Thus, the first row of ${C}_{1\perp}$ is a zero row, such that ${x}_{1}$ is weakly exogenous.

To formulate the next case, a definition of strong orthogonality of two matrices is introduced.

**Definition** **1.** Let A be a $k\times {k}_{1}$ matrix and B a $k\times {k}_{2}$ matrix. Then, A and B are called strongly orthogonal if ${A}^{\prime}DB=0$ for all diagonal matrices D, or equivalently if ${A}_{ji}{B}_{j\ell}=0$ for all $i,j,\ell $.

Thus, if

${A}_{ji}\ne 0,$ we assume that row

j of

B is zero, and if

${B}_{j\ell}\ne 0,$ row

j of

A is zero. A simple example is

Thus, the definition means that if two matrices are strongly orthogonal, it is due to the positions of the zeros and not to linear combination of nonzero numbers being zero.

Thus, in particular if ${M}_{12}$ and ${C}_{1}$ are strongly orthogonal, and if T causes a variable in ${X}_{1},$ then ${X}_{2}$ does not cause that variable. The expression for V simplifies in the following case.

**Lemma** **1.** If ${C}_{2}=0,$ and ${M}_{12}^{\prime}{\mathsf{\Omega}}_{1}^{-1}{C}_{1}=0,$ then ${Q}^{\ast}=\mathrm{blockdiag}({I}_{{p}_{2}}+{M}_{22};{I}_{m}),$ and ${V}_{2T}=0$ such that $V=\mathrm{blockdiag}({V}_{22};{V}_{TT}).$

**Proof** **of** **Lemma** **1.** We first prove that

${V}_{t}$ is blockdiagonal for

$t=0$. From (

2), it follows that

Thus, if

$\mathsf{\Phi}$ denotes the variance of

${({X}_{10}^{\prime};{X}_{20}^{\prime})}^{\prime},$ then

and hence blockdiagonal. Assume, therefore, that

${V}_{t}=$ blockdiag(

${V}_{t22};{V}_{tTT})$ and consider the expression for

${V}_{t+1},$ see (

11). In this expression,

${Q}^{\ast}$ is block diagonal (because

${C}_{2}=0)$ and

${Q}^{\ast}{V}_{t}{Q}^{\ast \prime}$ and

${\mathsf{\Omega}}^{\ast}$ are block diagonal, and the same holds for

${Q}^{\ast}{V}_{t}^{1/2}.$ Thus, it is enough to show that

is block diagonal. To simplify the notation, define the normalized matrices

Then, by assumption,

so that, using

${V}_{t2T}=0,$A direct calculation shows that

and that

such that

${(\stackrel{\u02c7}{M},\stackrel{\u02c7}{C})}^{\prime}{(\stackrel{\u02c7}{M}{\stackrel{\u02c7}{M}}^{\prime}+\stackrel{\u02c7}{C}{\stackrel{\u02c7}{C}}^{\prime}+{I}_{{p}_{1}})}^{-1}(\stackrel{\u02c7}{M},\stackrel{\u02c7}{C})$ is block diagonal.

Then, ${V}_{t}^{1/2}{C}^{\ast \prime}{\{{C}^{\ast}{V}_{t}{C}^{\ast \prime}+{\mathsf{\Omega}}_{1}\}}^{-1}{C}^{\ast}{V}_{t}^{1/2}$ and hence ${V}_{t+1}$ are block diagonal. Taking the limit for $t\to \infty ,$ it is seen that also V is block diagonal. ☐

**Case** **2** (

C_{2} = 0, and

M_{12} and

C_{1} are strongly orthogonal)

**.** Because ${C}_{2}=0$ and ${M}_{21}^{\prime}{\mathsf{\Omega}}_{1}^{-1}{C}_{1}=0,$ Lemma 1 shows that ${V}_{2T}=0,$ so that the condition ${M}_{12}{V}_{2T}=0$ and (20) hold. Moreover, strong orthogonality also implies that ${M}_{12}^{\prime}{C}_{1}=0$ such that ${M}_{12}={C}_{1\perp}\xi $ for some $\xi .$ Henceand therefore, a zero row in ${C}_{1\perp}$ gives a zero row in α.Consider again the chain $T\to {x}_{1}\to {x}_{2}\to {x}_{3},$ but assume now that ${x}_{2}$ is not observed. Thus, ${X}_{1}=\{{x}_{1},{x}_{3}\}$ and ${X}_{2}=\left\{{x}_{2}\right\}.$ Here, T causes ${x}_{1},$ and ${x}_{2}$ causes ${x}_{3},$ so that Note that ${M}_{12}^{\prime}D{C}_{1}=0$ for all diagonal D because T and ${X}_{2}$ cause disjoint subsets of ${X}_{1}$. This, together with ${C}_{2}=0$, implies that V is block diagonal and that (21) holds. Thus, ${x}_{i}$ is weakly exogenous, ${e}_{i}^{\prime}\alpha =0$, if