Next Article in Journal
Information Recovery in a Dynamic Statistical Markov Model
Previous Article in Journal
Two-Step Lasso Estimation of the Spatial Weights Matrix
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Joint Chow Test for Structural Instability

1
Department of Economics, University of Oxford & Institute of Economic Modelling & Nuffield College, Oxford OX1 1NF, UK
2
The World Bank, 1818 H Street NW, Washington DC 20433, USA
*
Author to whom correspondence should be addressed.
Submission received: 19 December 2014 / Revised: 6 February 2015 / Accepted: 26 February 2015 / Published: 12 March 2015

Abstract

:
The classical Chow test for structural instability requires strictly exogenous regressors and a break-point specified in advance. In this paper, we consider two generalisations, the one-step recursive Chow test (based on the sequence of studentised recursive residuals) and its supremum counterpart, which relaxes these requirements. We use results on the strong consistency of regression estimators to show that the one-step test is appropriate for stationary, unit root or explosive processes modelled in the autoregressive distributed lags (ADL) framework. We then use the results in extreme value theory to develop a new supremum version of the test, suitable for formal testing of structural instability with an unknown break-point. The test assumes the normality of errors and is intended to be used in situations where this can be either assumed nor established empirically. Simulations show that the supremum test has desirable power properties, in particular against level shifts late in the sample and against outliers. An application to U.K. GDP data is given.
JEL classifications:
C22

1. Introduction

Identifying structural instability in models is of major concern to econometric practitioners. The Chow [1] tests are perhaps the most widely used for this purpose, but require strictly exogenous regressors and a break-point specified in advance. As such, a plethora of variants have been developed to meet different requirements. In this paper, we consider two generalisations: the one-step recursive Chow test, based on the sequence of studentised recursive forecast residuals; and its supremum counterpart. The pointwise test is frequently used and reported in the applied work, while the supremum test is new. Whereas Chow assumes a classical regression framework, practitioners typically use the one-step test to evaluate dynamic models, e.g., [2]. Further, since a series of such tests is usually presented graphically to the modeller, multiple testing issues arise, making it difficult to determine how many point failures may be tolerated. These two issues motivate the analysis that follows. First, in Theorem 6, we show that the pointwise statistic has the correct asymptotic distribution under fairly general assumptions about the generating process, including lagged dependent variables and deterministic terms. Second, we take advantage of the almost sure convergence proven earlier to construct a supremum version of the one-step test, applicable to detecting parameter change or the outlier at an unknown point in the sample. The supremum test offers several advantages useful to modellers: it is simple to compute and has a standard distribution under the null, which does not depend on the autoregressive parameter (even in the unit-root or explosive cases); it focuses attention on end-sample instability; and it is agnostic about the number of breaks, giving power against more complex forms of misspecification. These advantages incur certain costs: the test is not invariant to the distribution of errors (even asymptotically); and other tests are more powerful against particular alternatives.
The pointwise one-step Chow test is essentially the “prediction interval” test described by Chow, but computed recursively and over the sample (rather than at an a priori hypothesised change point). It first appears in PcGive Version 4.0 [3] as part of a suite of model misspecification diagnostics; a similar diagnostic graphic, the “one-step forecast test”, is provided in EViews ([4] p. 180). The idea of using residuals calculated recursively to test model misspecification dates to the landmark cumulated sum (CUSUM) and cumulated sum of squares (CUSUMSQ) tests [5,6], which are based on partial sums of (squared) recursive residuals and have since been generalised to models, including lagged dependent variables [7,8,9]. Unlike these tests, the one-step Chow test does not consider partial sums, but the sequence of recursive residuals itself; in effect, testing one-step-ahead forecast failure at each time step. As the following analysis shows, this approach leads to a different type of asymptotics, with a residual sequence behaving like i.i.d. random variables, rather than a partial sum of residuals behaving like a Brownian motion.
Examining the residual sequence to check the model specification is, of course, well established. The residuals can be either OLS residuals or recursive residuals, see [6,10]. The recursive residuals have two advantages over the OLS residuals: first, under the normal linear model with fixed regressors, they are identically and independently normal; second, they have a natural interpretation (in a time series setting) as forecast errors. Ironically, in typical time series settings, where the forecast error interpretation is most useful, the independence of the residuals does not hold due to the presence of lagged dependent variables, see [11]. This may lead to difficulties drawing firm conclusions from plotted pointwise test sequences and, thus, motivates the second part of this paper, which considers a supremum test.
The supremum test considers the maximum of the pointwise one-step tests, appropriately normalised. It is intended to reflect structural instability anywhere in the sample (with the early part excluded to allow consistent estimation). It relates to work on tests for either structural breaks or outliers, both at a possibly unknown time.
Perron [12] divides structural breaks tests into those that do not explicitly model a break, those that model one break and those that model multiple breaks. Our test joins the first category, which includes the already mentioned CUSUM and CUSUMSQ tests. (As an aside, Perron notes that these tests can suffer from the non-monotonicity of power against some alternatives. The risk of this is much reduced by using the one-step Chow statistics, since all parameters are estimated on a growing sub-sample.) The modelled break category includes, most prominently, the Quandt-Andrews (respectively, [13,14] supremum tests. These tests are complicated by a non-standard distribution (tabulated in [15]), but are nevertheless popular in practice, being implemented in several software packages. Our test is distinguished from these by not imposing any restrictions on the end-of-sample, so that end-of-sample instability may be detected. This feature is similar to [15], but a key distinction is that our test is agnostic about the number of breaks in the sample, a useful property in practice. It is also substantially simpler in implementation. Additionally, because the one-step tests behave like an i.i.d. process, the asymptotics differ from full-sample tests, like Quandt-Andrews, requiring the application of the extreme value theory of independent and weakly dependent sequences, rather than the suprema of random-walks.
Seen specifically as an outlier test, the supremum Chow test falls squarely within the tradition of [16], which, however, considers an unknown outlier in a classical setting. Outliers in the Box-Jenkins paradigm have attracted substantial interest, see [17,18,19]. These authors take a full-sample approach with stepwise elimination of outliers. Although effective in many cases, there is a risk of smearing/masking effects when multiple outliers are present, which is reduced with the recursive the test we present.
Surprisingly, the use of recursive residuals to detect outliers in time series data is relatively unexplored, although there is little doubt that they are used for this purpose in practice. Barnett and Lewis ([20] p. 330) comment that “[recursive residuals] would seem to have the potential for the study of outliers, although no major progress on this front is evident. There is a major difficulty in that the labelling of the observations is usually done at random, or in relation to some concomitant variable…”. This difficulty does not exist with time series, where there is a natural chronological labelling of observations. The section in the same book (at p. 396) on detecting outliers in time series is, nevertheless, notably brief, and recursive methods are not considered.

2. The Test Statistics

The one-step test applies to a linear regression:
y t = β x t + ε t t = 1 , , T ,
with y t scalar, x t a k-dimensional vector of regressors and the errors independently and identically distributed. For such a regression, we can define the sequence of least squares estimators calculated over progressively larger subsamples,
β ^ t = s = 1 t x s x s 1 s = 1 t x s y s t = k , , T ,
along with the corresponding residual sums of squares
RSS t = s = 1 t ( β ^ t x s y s ) 2 t = k , , T ,
and recursive residual (or standardised one-step forecast error)
ε ˜ t = 1 + x t s = 1 t 1 x s x s 1 x t 1 / 2 ( y t β ^ t 1 x t ) t = ( k + 1 ) , , T .
The one-step Chow test statistic, C 1 , t 2 , is then defined as:
C 1 , t 2 = ( RSS t RSS t 1 ) ( t k 1 ) RSS t 1 t = ( k + 1 ) , , T ,
and can be expressed as:
C 1 , t 2 = ε ˜ t 2 ( t k 1 ) RSS t 1 .
Chow showed that in a classical Gaussian regression model, this statistic would have an exact F ( 1 , t k 1 ) distribution. We first extend this result to show that, for a general class of Gaussian autoregressive distributed lag (ADL) processes, C 1 , t 2 converges in distribution to a χ 1 2 random variable, so that, asymptotically, the additional dependence does not matter. This result means that comparing the pointwise statistic against an F ( 1 , · ) or χ 1 2 distribution (as is typically done) is appropriate in large samples. However, it still leaves unresolved the difficulty that this test is generally reported graphically to detect parameter change with an unknown change point. To formally treat the problem of multiple testing that occurs in evaluating many pointwise statistics over the entire sample, we introduce a new supremum test based on the test statistic:
max g ( T ) t T C 1 , t 2 ,
where g is an arbitrary function of T, such that g ( T ) .
To put the test statistics and the asymptotic analysis into perspective, it is useful to review some related, but also somewhat different statistics. The literature on adaptive control, also called tracking, comes to mind, although it does not appear to be applied much in econometrics. It is concerned with tracking the sum of squares innovations. The aim is then to show that:
1 T t = 1 T ( y t β ^ x t 1 ) 2 1 T t = 1 T ε t 2
vanishes. A discussion for a non-explosive autoregressive setup without deterministic terms is given in [21,22]. Asymptotic distribution theory does not seem to be discussed. The reason that the tracking result does not extend to the explosive case is that the residuals are not normalised by the “hat” matrix, in contrast to the residuals in Equation (4). Normalisation by the “hat” matrix also gives excellent finite-sample properties; see Section 5.
The Chow statistic C 1 , t 2 in Equation (5) also involves scaling by a residual variance estimator, so that the asymptotic distribution is free of nuisance parameter; see Theorem 6 below. More fundamentally, the present analysis is concerned with the Chow statistic for individual observations rather than sums, and it will therefore involve extreme value theory.
Another related test that is used in econometrics is the CUSUMSQ test based on the statistic:
C U S Q t , T = T s = k t ε ˜ s 2 s = k T ε ˜ s 2 t k T k ,
or a similar statistic based on least squares residual variance estimates instead of sums of squared recursive residuals. This test statistic is aimed at detecting non-constancy in the innovation variance rather than detecting individual outliers. Again, it includes normalisation by the “hat” matrix. Distribution theory is discussed in [7,8] for the stationary case and in [9] for general autoregressions, which are possibly explosive and with deterministic terms.
A third related test is Andrews’ sup-F test, see [13]. This is a test for structural breaks in the mean parameters for which the asymptotic theory only applies in the stationary case. The finite sample performances of the Chow test and the sup-F test are compared in Section 6.

3. Model and Assumptions

We consider the behaviour of the test statistic for ADL models with arbitrary deterministic terms, a class that includes by restriction many commonly-posited economic relationships, see ([23] Chapter 7). For the purpose of analysis, we assume that the true data generating model can be represented as a vector autoregression (VAR).
We observe a p-dimensional time series X 1 k , , X 0 , X 1 , X T . We model the series by partitioning X t as ( Y t , Z t ) , where Y t is univariate and Z t is of dimension p 1 , and then consider the regression of Y t on the contemporaneous Z t , lags of both Y t and Z t and a deterministic term D t . That is,
Y t = ρ Z t + j = 1 k α j Y t j + j = 1 k β j Z t j + ν D t 1 + ε t t = 1 , , T .
In order to specify the joint distribution of X t = ( Y t , Z t ) , we assume that X t follows the vector autoregression:
X t = j = 1 k A j X t j + μ D t 1 + ξ t t = 1 , , T ,
with the deterministic term D t given by:
D t = D D t 1 .
The deterministic term D t follows the approach of [24,25] and may include, for example, a constant, a linear trend or periodic functions, such as seasonal dummies. The matrix D has characteristic roots on the unit circle. For example,
D = 1 0 1 1 and D 0 = 1 0
will generate a constant and a biannually dummy. The term D t is assumed to have linearly-independent coordinates, formalised as follows.
Assumption 1. | eigen ( D ) | = 1 and rank ( D 1 , , D dim D ) = dim D .
We assume the VAR innovations form a martingale difference sequence satisfying the assumption below. The requirement that the innovations have finite moments just beyond 16 stems from a problem with controlling unit root processes, see ([25] Remark 9.3). In the present analysis, this constraint emerges in Lemma 12 (i) and is transmitted via Lemma 13 (iv) to Lemma 16. If dim D = 0 and the geometric multiplicity of roots in unity equals their algebraic multiplicity (including I ( 1 ) , but excluding I ( 2 ) processes), this could be improved to finite moments greater than four using the result of [26].
Assumption 2. ξ t is a martingale difference sequence with respect to the natural filtration F t , so E ( ξ t | F t 1 ) = 0 . The initial values X 0 , X 1 k are F 0 -measurable and:
sup t E ( ξ t α | F t 1 ) < a . s . f o r s o m e α > 16 ,
E ( ξ t ξ t | F t 1 ) = a . s . Ω w h e r e Ω i s p o s i t i v e d e f i n i t e .
This assumption also excludes the possibility that the innovations could be heteroscedastic, a common assumption in financial modelling (e.g., autoregressive conditional heteroscedastic, ARCH), but also an increasingly relevant property in macroeconomic work, particularly in light of the “Great Moderation” period [27] and subsequent period of the “Global Financial Crisis”. The assumption indicates that such heteroscedasticity should be modelled.
We permit nearly all possible values of the autoregressive parameters A j in Equation (11), excluding only the case of singular explosive roots, which can only arise for a VAR with p 2 and multiple explosive roots, see [28] for a discussion. We can express the restriction in terms of the companion matrix:
B = ( A 1 , , A k 1 ) A k I p ( k 1 ) 0 .
Assumption 3. The explosive roots of B have geometric multiplicity of unity. That is, for all complex λ with | λ | > 1 , rank ( B λ I p k ) p k 1 .
Additionally, we require that the innovations in the ADL regression are martingale differences.
Assumption 4. Let G t be the sigma field over F t and Z t . Then, ( ε t , G t ) is a martingale difference sequence, i.e., E ( ε t | G t 1 ) = 0 .
Finally, the one-step statistic is such that a distributional assumption must be made in order to derive the limiting distribution of the statistic (since the statistic is an estimate of a single error term, we cannot take advantage of a central limit theorem). Similarly, since the analysis of the supremum statistic will rely on extreme value theory, we must impose distributional and independence assumptions on the ADL innovations ε t , in order to uniquely determine the norming sequences applied in Lemma 9. We assume normality, which may result from joint normality in the underlying VAR process and is tested, in practice, under the above assumptions, see [29].
Assumption 5. ε t iid N ( 0 , σ 2 ) .

4. Main Results

We must briefly examine the decomposition of the process used in the proofs in order to elucidate the first main result in the explosive case (in the non-explosive case, this decomposition becomes trivial). A two-way decomposition allows us to express separately certain terms that arise in connection with the explosive component of the process. Group the regressors by defining:
S t 1 = ( Y t 1 , Z t 1 , Y t k , Z t k , D t 1 ) ,
and then write Equation (11) in companion form, so that:
S t = S S t 1 + ( ξ t , 0 ) .
Then, there exists a regular real matrix M to block diagonalize S (see the elaboration in Section 3 of [25]), so that the process can be decomposed into non-explosive and explosive components, R t and S t , respectively. We have:
M S t = ( M S M 1 ) M S t 1 + M ( ξ t , 0 ) ,
R t W t = R 0 0 W R t 1 W t 1 + e R , t e W , t ,
with R and W having eigenvalues inside or on, and outside, the unit circle, respectively.
The first theorem states that the test statistic is almost surely close to a related process in the innovations, q t 2 , under multiple assumptions. This result does not require the normality of Assumption 5.
Theorem 6. Under Assumptions 1, 2, 3 and 4,
C 1 , t 2 ( q t / σ ) 2 as 0 a s t ,
where:
q t = ε t s = 1 ε t s W ( W s ) F W 1 W ( 1 + W F W 1 W ) 1 / 2 ,
and W is as in Equation (16), and as in [25] (Corollaries 5.3 and 7.2), W = W 0 + t = 1 W t e W , t and F W = t = 1 W t W W ( W t ) , with F W almost surely positive definite.
Having established pointwise convergence almost surely, we use an argument based on Egorov’s theorem to establish the convergence of the supremum of a subsequence. Both the subsequence itself and the lead-in period must grow without bound, to allow the regression estimates to converge.
Lemma 7. Suppose C 1 , t 2 ( q t / σ ) 2 as 0 a s t . Then:
sup g ( T ) < t T C 1 , t 2 ( q t / σ ) 2 p 0 a s T , g ( T ) .
where g ( T ) is an arbitrary function of T, such that g ( T ) .
Now, if an appropriately normalised expression in the maximum over q t can be shown to converge in distribution, then so will the supremum statistic, with the same normalisation, by asymptotic equivalence. We show that, under the assumption of independent and identical Gaussian innovations, max 1 s t q s , appropriately normalised, does indeed converge to the Gumbel extremal distribution (as t ), which has distribution function:
Pr ( Λ < x ) = exp [ exp ( x ) ] where: x R .
A useful property of the Gumbel distribution is the following simple monotonically decreasing transformation to a χ 2 variable, allowing standard distributions to be used:
Λ Gumbel iff 2 e Λ χ 2 2 .
In showing the above convergence, we rely on Theorem 1 of Deo [30], and its corollary, showing that the extremal distribution of the absolute values of a Gaussian sequence is the same in the stationary dependent and independent cases. However, Deo’s Lemma 1 gives an incorrect statement of the norming sequences. Here, we state the correct sequences, adopting the notation of Deo (proof in Section A.6).
Lemma 8. Let { X n } be independent Gaussian random variables with mean zero and variance one. Let Z n = max 1 j n | X j | . Then, a n ( Z n b n ) converges in distribution to Λ, where a n = ( 2 log n ) 1 / 2 and b n = ( 2 log n ) 1 / 2 ( 8 log n ) 1 / 2 ( log log n + log π ) .
The original gives b n = ( 2 log n ) 1 / 2 ( 8 log n ) 1 / 2 ( log log n + 4 π 4 ) .
Deo’s result can then be applied to q t defined in Equation (18).
Lemma 9. Under Assumption 5,
q t / σ N ( 0 , 1 ) , a n d
a t ( max 1 s t q s 2 b t ) d Λ ,
where:
a t = 1 / 2 a n d b t = log t 2 log log t log π ,
and Λ is a random variable distributed according to the Gumbel (Type 1) law.
Combining these lemmas gives our main result, that with independent and identically Gaussian innovations, an appropriate normalisation of the supremum one-step Chow test converges in distribution to the Gumbel extremal distribution.
Theorem 10. Under Assumptions 1, 2, 3, 4 and 5, with some g ( T ) ,
SC T 2 = 1 2 max g ( T ) < t T C 1 , t 2 d T g ( T ) d Λ a s T ,
where C 1 , t 2 is the one-step Chow statistic defined in Equation (5) and:
d T g ( T ) = 2 log [ T g ( T ) ] 1 2 log log [ T g ( T ) ] log π ,
and Λ is a random variable distributed according to the Gumbel distribution Equation (20).
As a simple corollary, we can transform the test using Equation (21), so that it may be compared against a more readily-available distribution.
Corollary 11. Under the same assumptions, 2 · exp ( SC T 2 ) χ 2 2 . A test based on this result should reject for small values of the statistic.

5. Finite-Sample Corrections

In practice, we find by simulation that the test as specified above is over-sized in small samples. To minimise this, we suggest two corrections. For the first correction, we observe that the one-step statistics appear to be distributed close to F ( 1 , t k 1 ) (as indeed, they are exactly in the classical case) and so use the following transformation to bring the statistics closer to the asymptotic chi-squared distribution:
C 1 , t 2 * = G 1 [ F ( C 1 , t 2 ) ]
where F ( · ) and G ( · ) are the F ( 1 , t k 1 ) and χ 1 2 distribution functions, respectively. This first correction results in a test that tends to under-correct, largely a result of relatively slow convergence to the limiting Gumbel distribution. We find that the test performs better if simply compared with the finite maximal distribution, assuming the independence and identical distribution of the test statistics (the first assumption holding only in the limit and in the absence of an explosive component and the second holding only in the limit). That is, we approximate the distribution of the maximum, max g ( T ) < t T C 1 , t 2 * , by:
Pr max g ( T ) < t T C 1 , t 2 * x Pr max g ( T ) < t T ε 1 , t 2 x σ 2 = [ G ( x ) ] T g ( T ) .
This forms the basis of the finite adjusted sup-Chow test ( SC 2 * ), with rejection in the right tail. Note that in this case, no centring or scaling is required, because the null distribution itself depends on T.

6. Simulation Study

We present the results of four simulation experiments done in Ox [31]: size; distributional sensitivity; power against mean shifts; and power against outliers. All of the simulations are done first for a first order autoregression and then for an autoregressive distributed lag model.

6.1. Autoregressive Data-Generating Process

Consider the following data-generating process:
x t = α x t 1 + ε t , t = 1 , , T , x 0 = 0 .
Where not otherwise stated, we set α = 0 and ϵ t i i d N ( 0 , 1 ) . The number of Monte Carlo repetitions and the implied Monte Carlo standard error, MCSE , are indicated in table captions.
The regression model is that of an first order autoregression. It includes an intercept unless otherwise stated. The five tests computed and presented in the tables are: the asymptotic sup-Chow test ( SC 2 ); the corrected sup-Chow test ( SC * 2 ); the [13] sup-F test ( sup F ); an outlier test based on the OLS residuals ( sup t 2 ); and the [32] E p test for normality (Φ). The nominal size is 5 % unless otherwise stated.
The two sup-Chow tests are described in Theorem 10 and Equation (28), respectively; for the function g ( T ) , we use T 1 / 2 . The sup-F test is a linear regression form of Andrews’ sup-W (Wald) test with 15% trimming, as used for the simulations in [33] and implemented in EViews 7 by the command ubreak. It is the maximum of the Chow F-tests calculated over break points 0 . 15 T λ 0 . 85 T , such that for each break point λ, under the alternative, the model is estimated separately for each subsample ( 1 λ and λ + 1 …T), whereas under the hypothesis, the model is estimated for the full sample. The null distribution is given by asymptotic approximation to a non-standard distribution with simulated critical values given in [15]. The sup- t 2 test examines the maximum of the squared full-sample OLS residuals, externally studentised as in ([34] s. 2.2.9); that is, residual t is normalised using an estimate of the error variance that excludes residual t itself. These squared statistics are F 1 , t k 1 distributed under normality of the errors, but not independent; hence, the use of the Bonferroni inequality to find significance levels is recommended by [34]. We find in simulation that, despite dependence, the exact maximal distribution of T independent F 1 , t k 1 random variables is a reasonable choice for the sample sizes and processes we consider, except those that are near-unit-root. Finally, in experiments where we wish to evaluate the performance of the sup-Chow test conditional on residuals having satisfied a normality test, we use the [32] E p test of the OLS residuals.
In the first experiment (Table 1), we vary the autoregressive parameter through the stationary, unit-root and explosive regions and consider the effect of either including or excluding an intercept from the model. As noted above, the SC 2 test is uniformly oversized. The SC * 2 test is correctly sized and approximately similar, with simulated size varying very little across the parameter space. There is some tendency towards inflated sizes under near-unit-root processes when an intercept is included in the model, but the extent of this is quite limited (7% simulated size). The key consequence of this result is that it is not necessary to know a priori where the autoregressive parameter lies to effectively apply the SC * 2 test, avoiding a potential circularity in model construction.
In simulations that are not reported here, we also investigated the sup F test. This test is not valid in the non-stationary case. Thus, the same patterns were seen, albeit with a larger effect. The simulated size is as high as 44% in the unit root case.
Table 1. Simulated rejection frequency for SC 2 and SC * 2 under the Gaussian autoregression in Equation (29). 200,000 repetitions, MCSE 0 . 1 .
Table 1. Simulated rejection frequency for SC 2 and SC * 2 under the Gaussian autoregression in Equation (29). 200,000 repetitions, MCSE 0 . 1 .
T Autoregressive Coefficient (α)
−1.03−1.00−0.500.000.500.901.001.03
5% Nominal Size
Intercept included in model (M1)
25 SC 2 14.5214.4413.9214.4015.8219.2819.8620.21
SC 2 * 5.305.265.015.245.787.287.597.75
50 SC 2 12.8012.7212.3212.6013.5016.1316.9717.43
SC 2 * 5.175.154.925.055.386.527.007.27
100 SC 2 10.4310.4110.1510.3610.7312.3413.2713.85
SC 2 * 5.095.054.955.005.085.826.386.74
No intercept included in model (M2)
25 SC 2 15.2815.2314.5114.4914.6215.1115.3315.39
SC 2 * 5.495.465.205.195.235.445.495.53
50 SC 2 13.1713.1212.7212.7212.7113.0113.2013.21
SC 2 * 5.335.295.065.035.095.205.265.27
100 SC 2 10.6410.6010.2710.3110.3410.4210.6210.63
SC 2 * 5.175.125.024.995.015.035.135.16
1% Nominal Size
No intercept included in model (M1)
25 SC 2 6.306.295.986.247.068.879.169.33
SC 2 * 1.091.101.041.071.241.641.751.79
50 SC 2 4.774.774.564.705.166.446.837.04
SC 2 * 1.081.081.021.061.151.451.571.60
100 SC 2 3.363.363.243.313.474.224.594.74
SC 2 * 1.051.051.021.021.041.211.361.43
No intercept included in model (M2)
25 SC 2 6.716.696.256.226.346.596.716.73
SC 2 * 1.161.151.061.041.061.121.181.19
50 SC 2 4.984.974.704.694.784.904.954.95
SC 2 * 1.121.111.041.051.051.071.101.09
100 SC 2 3.413.373.243.243.253.333.393.40
SC 2 * 1.061.051.021.011.001.001.031.04
The second experiment evaluates the sensitivity of the SC * 2 test to failures of Assumption 5, in particular the non-normality of the errors. Table 2 presents simulated sizes for both the SC * 2 and sup F tests under a range of error distributions. The former is very sensitive to departures from normality, while the latter is not. In the second part of the table we consider a further scenario, in which a model builder runs the structural instability tests only if a test for normal residuals is not rejected. This yields three additional tests, the normality test Φ and the SC * 2 | Φ and sup F | Φ tests, each conditional on the normality hypothesis having not been rejected. We also consider joint tests SC * 2 + Φ and sup F + Φ that first tests normality and then tests for break if normality cannot be rejected. As the table illustrates, some size distortion remains, but the inflation of the unconditional test is largely controlled in the conditional case. As noted in Section 8 below, we recommend using the test in this way if the normality of the errors cannot be safely assumed.
Table 2. Simulated rejection frequency for SC * 2 and sup F , possibly combined with normality test Φ, under autoregression in Equation (29) with various error distributions. 50,000 repetitions, MCSE 0 . 25 .
Table 2. Simulated rejection frequency for SC * 2 and sup F , possibly combined with normality test Φ, under autoregression in Equation (29) with various error distributions. 50,000 repetitions, MCSE 0 . 25 .
T Error Distribution
Φ t 50 t 10 t 5 χ 3 , cent . 2
Unconditional tests
50 SC * 2 5.06.615.028.840.9
sup F 6.06.06.05.96.3
100 SC * 2 5.07.422.245.059.9
sup F 4.95.04.94.74.9
Joint tests
50Φ4.96.719.041.095.1
SC * 2 | Φ 3.43.96.28.0* 7.4
sup F | Φ 6.06.06.16.0* 6.7
SC * 2 + Φ 8.110.424.045.8* 95.5
sup F + Φ 10.612.323.944.6* 95.4
100Φ4.87.628.763.0100.0
SC * 2 | Φ 3.34.27.28.6
sup F | Φ 5.05.05.04.9
SC * 2 + Φ 8.011.533.966.2* 100.0
sup F + Φ 9.512.232.364.8* 100.0
The third experiment considers the power of the tests against a single shift in the mean level of the process. The data generating process is:
x t = γ I t > τ + ε t , t = 1 , , T , x 0 = 0 .
and we allow γ and τ to vary as presented. The regression model remains a first order autoregression with an intercept. The level shift is therefore not modelled. Table 3 shows simulated sizes for the unconditional tests as in the previous experiment. We note that the sup F test performs well for a break at mid-sample, but is outperformed by the SC * 2 test for breaks occurring near the end of the sample. We also consider conditional tests as in the previous experiment. There are two main observations: firstly, the normality test is increasingly likely to reject as the break magnitude becomes large; but secondly, the SC * 2 test still has power (attenuated by around one-half) to detect the break in this case.
Table 3. Simulated rejection frequency for SC * 2 and sup F , possibly combined with normality test Φ, under the process in Equation (30) with a break of magnitude γ at time τ. 50,000 repetitions, MCSE 0 . 5 .
Table 3. Simulated rejection frequency for SC * 2 and sup F , possibly combined with normality test Φ, under the process in Equation (30) with a break of magnitude γ at time τ. 50,000 repetitions, MCSE 0 . 5 .
T Break Timing (τ)
0.5TT-2T-1
Post-Break Constant (γ)
0.02.04.02.04.02.04.0
Unconditional tests
25 SC * 2 5.513.751.221.078.716.070.3
sup F 10.390.499.919.944.814.128.1
50 SC * 2 5.217.167.618.683.213.369.9
sup F 6.099.8100.010.234.57.111.5
100 SC * 2 5.120.077.916.184.211.567.6
sup F 4.9100.0100.07.231.25.47.4
Joint tests
25Φ5.14.315.69.637.99.953.9
SC * 2 | Φ 3.812.545.314.866.79.438.3
sup F | Φ 10.290.499.919.652.113.122.6
SC * 2 + Φ 8.716.253.822.979.418.471.5
sup F + Φ 14.890.899.927.370.321.764.3
50Φ4.84.417.011.161.99.558.6
SC * 2 | Φ 3.515.762.611.358.67.231.1
sup F | Φ 6.099.8100.09.833.96.99.0
SC * 2 + Φ 8.219.469.021.184.216.171.5
sup F + Φ 10.699.9100.019.874.815.862.3
100Φ4.74.215.611.274.78.557.4
SC * 2 | Φ 3.418.774.78.943.56.428.4
sup F | Φ 4.9100.0100.06.819.35.46.3
SC * 2 + Φ 8.022.178.719.185.714.469.5
sup F + Φ 9.4100.0100.017.279.613.560.0
The fourth experiment (Table 4) considers the power of the tests against a single innovation outlier at the process mid-point. The data generating process is:
x t = α x t 1 + δ I t = T / 2 + 1 + ε t , t = 1 , , T , x 0 = 0 .
and we allow α and δ to vary. Both tests presented have similar power in most circumstances, with an outlier larger than three-times the error standard deviation being detected with useful frequency. The OLS-based sup t 2 test has slightly better power than the Chow test. The conditional evaluations show that both tests retain power in situations where the normality test is not rejected.
Table 4. Simulated rejection frequency for SC * 2 and sup F , possibly combined with normality test Φ, under the process in Equation (31) with a break of magnitude δ. 50,000 repetitions, MCSE 0 . 5 .
Table 4. Simulated rejection frequency for SC * 2 and sup F , possibly combined with normality test Φ, under the process in Equation (31) with a break of magnitude δ. 50,000 repetitions, MCSE 0 . 5 .
αT Outlier Magnitude (δ)
0.01.02.03.04.05.0
Unconditional tests
0.550 SC * 2 5.55.810.628.758.784.9
sup t 2 4.75.010.531.866.290.7
100 SC * 2 5.25.59.828.161.187.9
sup t 2 4.85.19.829.965.190.8
0.950 SC * 2 6.67.012.030.259.785.3
sup t 2 4.74.910.331.365.590.3
100 SC * 2 5.96.310.828.961.487.9
sup t 2 4.95.19.829.764.990.6
Joint tests
0.550Φ4.95.09.627.759.386.6
SC * 2 | Φ 3.84.05.810.818.025.6
sup t 2 | Φ 2.22.44.010.322.736.8
SC * 2 + Φ 8.58.814.835.566.690.0
sup t 2 + Φ 7.07.313.235.268.591.5
100Φ4.85.08.725.057.186.0
SC * 2 | Φ 3.53.75.211.020.932.0
sup t 2 | Φ 2.72.84.410.924.340.0
SC * 2 + Φ 8.18.513.433.266.190.5
sup t 2 + Φ 7.37.712.733.267.591.6
0.950Φ4.95.19.427.158.685.9
SC * 2 | Φ 4.95.27.313.121.630.6
sup t 2 | Φ 2.22.34.010.322.637.5
SC * 2 + Φ 9.510.016.036.667.590.2
sup t 2 + Φ 6.97.213.034.668.091.2
100Φ4.75.08.624.756.885.8
SC * 2 | Φ 4.24.46.212.222.233.6
sup t 2 | Φ 2.72.84.410.924.239.5
SC * 2 + Φ 8.79.214.333.866.490.6
sup t 2 + Φ 7.37.612.632.967.291.4

6.2. Autoregressive Distributed Lag Data-Generating Process

We consider a bivariate data-generating process, written in triangular equilibrium correction form as
Δ y t = 1 4 Δ z t 1 4 ( y t 1 z t 1 ) + ν + ε t ,
Δ z t = ( ψ 1 ) z t 1 + η t y 0 = z 0 = 0 ,
where ϵ t , η t i i d N 2 ( 0 , I 2 ) . The characteristic roots of the system are ψ and 3 / 4 . When ψ = 1 , the model is cointegrated. When ψ = 1 / 4 , the model is stationary. In both cases, y t z t is stationary.
We then fit the univariate autoregressive distributed lag model
y t = ρ z t + α y t 1 + β z t 1 + ν + ε t ,
and investigate the residuals ε ^ t using the Chow statistics.
The first experiment (Table 5) evaluates the size of the Chow tests. Here, ψ varies, while ν = 0 in the data generating process. The results are in line with those seen for the autoregressive situation in Table 1. The second experiment is not done in this situation.
Table 5. Simulated rejection frequency for SC 2 , SC * 2 under an autoregressive distributed lag process in Equations (32), (33). 50,000 repetitions, MCSE 0 . 2 .
Table 5. Simulated rejection frequency for SC 2 , SC * 2 under an autoregressive distributed lag process in Equations (32), (33). 50,000 repetitions, MCSE 0 . 2 .
T Autoregressive Coefficient (ψ)
0.251.00
50 SC 2 15.016.3
SC * 2 6.16.7
100 SC 2 11.612.2
SC * 2 5.45.8
The third experiment (Table 6) evaluates the power of the Chow tests against a single shift in the mean level. This is done by replacing ν by ν I t > τ in the data generating process Equation (32). The results are in line with those seen for the autoregressive situation in Table 3: There is good size control. The power is nearly uniform in ψ and comparable to the power reported in Table 3.
The fourth experiment (Table 7) evaluates the power of the Chow tests against a single innovation outlier at the process mid-point. This is done by replacing ν by ν I t = T / 2 + 1 in the data generating process Equation (32). The results are in line with those seen for the autoregressive situation in Table 4.
Table 6. Simulated rejection frequency for SC * 2 under process in Equations (32), (33) with a break of magnitude ν at time τ. T = 50 . 50,000 repetitions, MCSE 0 . 5 .
Table 6. Simulated rejection frequency for SC * 2 under process in Equations (32), (33) with a break of magnitude ν at time τ. T = 50 . 50,000 repetitions, MCSE 0 . 5 .
ψ Break Timing(τ)
0.5TT-2T-1
Post-Break Constant (ν)
0.02.04.02.04.02.04.0
0.25 SC * 2 6.116.364.619.783.213.868.2
Φ4.96.834.110.961.89.154.9
SC * 2 | Φ 4.513.451.212.758.88.433.1
SC * 2 + Φ 9.219.367.822.284.316.769.8
1.00 SC * 2 6.716.062.019.481.214.267.2
Φ5.06.025.19.754.58.751.8
SC * 2 | Φ 5.313.752.613.560.89.135.2
SC * 2 + Φ 10.018.964.521.982.217.068.8
Table 7. Simulated rejection frequency for SC * 2 under process in Equations (32), (33) with an outlier of magnitude ν at mid-sample. T = 50 . 50,000 repetitions, MCSE 0 . 5 .
Table 7. Simulated rejection frequency for SC * 2 under process in Equations (32), (33) with an outlier of magnitude ν at mid-sample. T = 50 . 50,000 repetitions, MCSE 0 . 5 .
ψ Outlier Magnitude (ν)
0.01.02.03.04.05.0
0.25 SC * 2 6.16.310.425.553.079.1
Φ4.95.29.125.455.483.1
SC * 2 | Φ 4.54.66.310.817.524.8
SC * 2 + Φ 9.29.614.833.563.287.3
1.00 SC * 2 6.77.111.025.451.577.4
Φ5.05.39.325.955.983.5
SC * 2 | Φ 5.35.57.111.718.526.0
SC * 2 + Φ 10.010.515.734.664.187.8

7. Empirical Illustration

As an empirical illustration, consider log quarterly U.K. gross domestic product1, y, say, for the period 1991:1 to 2013:3. This gives a total sample length of 91, of which two observations are held back as initial values. The data is provided as supplementary material. Figure 1a shows the series.
An autoregression with two lags, an intercept and a linear trend is fitted to the data:
y ^ t ( se ) = 1 . 65 ( 0 . 08 ) y t 1 0 . 66 ( 0 . 08 ) y t 2 + 0 . 17 ( 0 . 14 ) + 0 . 000045 ( 0 . 000075 ) t
σ ^ = 0 . 0050 , log-likelihood = 347 . 439 ,
χ n o r m 2 [ 2 ] = 4 . 3 ( p = 0 . 12 ) , F a r ( 1 5 ) [ 5 , 80 ] = 1 . 2 ( p = 0 . 32 ) ,
max C 2 * = 12 . 9 ( p = 0 . 026 ) { arg max = 2008 : 2 }
Figure 1. U.K. GDP series. (a) Series in logs; (b) scaled residuals; (c) pointwise one-step Chow tests; the horizontal line is the 1% critical value; (d) simultaneous Chow test; the horizontal lines are the 1% (top) and 5% (bottom) critical values.
Figure 1. U.K. GDP series. (a) Series in logs; (b) scaled residuals; (c) pointwise one-step Chow tests; the horizontal line is the 1% critical value; (d) simultaneous Chow test; the horizontal lines are the 1% (top) and 5% (bottom) critical values.
Econometrics 03 00156 g001
The specification tests in Equation (36) are a cumulant-based test for normality and a test for autocorrelated residuals. They are valid for non-stationary autoregressive distributed lag models; see [29,35,36]. These specification tests do not indicate particular problems with the model. The specification test in Equation (37) is the supremum Chow test C 1 , t 2 * from Equation (27) evaluated according to the finite sample approximation Equation (28). This indicates a specification problem that is worst for 2008:2. A graphical version of this test is discussed below.
Figure 1b shows the scaled residuals. There appears to be clustering of first positive and then negative residuals around the start of the financial crisis in 2006–2009. This tendency is, however, not sufficient to tricker the specification tests for autocorrelation and non-normality in Equation (36).
Figure 1c shows pointwise one-step Chow tests. This is the standard output from PcGive. The scale is a probability scale, where the horizontal line indicates the critical value for pointwise tests at a 1% level. The exact nature of the probability scale is not documented in [37]. This plot represents pointwise tests based on 79 statistics with an unclear correlation structure. Thus, researchers have traditionally interpreted this plot with a grain of salt; see ([38] p.197). In this plot, there is a cluster of pointwise rejections, so it would seem prudent to question the stability of the model.
Figure 1d shows the Chow statistics C 1 , t 2 * (see Equation (27)), along with horizontal lines indicating the critical values of simultaneous 5% and 1% tests. These are computed according to the finite sample approximation Equation (28). The maximum of the test statistics is 12.9; see Equation (37). This is between the 5% and the 1% critical levels of 11.6 and 14.7, respectively. Thus, with a 5% level, it would seem prudent to question the stability of the model.
What could be done to remedy the situation? The usual answer is to inspect the data, think about the economic context and the economic question of interest and update the empirical model accordingly. While the economic question is vague in this illustration, the context of the specification problem is the financial crisis. In 2008, there is downward shift both in productivity and in growth rates. How big these effects were remains unclear, and so, the Office of National Statistics keeps revising the data for this period; see [39,40]. Now, one way to capture the downward shift in productive and growth rates is to use an impulse indicator and a step indicator. This gives the model:
y ^ t ( se ) = 1 . 33 ( 0 . 09 ) y t 1 0 . 42 ( 0 . 08 ) y t 2 + 1 . 17 ( 0 . 24 ) + 0 . 00077 ( 0 . 00016 ) t 0 . 015 ( 0 . 0045 ) 1 t = 2008 : 2 0 . 019 ( 0 . 0038 ) 1 t 2008 : 3
σ ^ = 0 . 0043 , log-likelihood = 361 . 397 ,
χ n o r m 2 [ 2 ] = 2 . 7 ( p = 0 . 25 ) , F a r ( 1 5 ) [ 5 , 78 ] = 0 . 5 ( p = 0 . 79 ) ,
max C 2 * = 8 . 9 ( p = 0 . 198 ) { arg max = 2009 : 2 } .
The formal asymptotic theory does not cover this situation with dummies. We proceed with these results nonetheless. The specification tests (39), (40) for the new model (38) are clearly better than the tests (37), (37) for the original model (35). An asymptotic theory for this type of iterative specification of a regression model is given in [41].
It is worth noting how the plot Figure 1d is computed. First, C 1 , t 2 is computed using Equation (5) for t = 11 , , 89 , so that g ( T ) = 10 and T = 89 . This can be done easily using standard regression techniques. For instance, to compute C 1 , 11 2 , run a regression including over the sample 1 , , 11 , including a dummy for Observation 11. Then, C 1 , 11 2 is the F-statistic for testing the absence of the dummy variable. Secondly, C 1 , 11 2 * is computed from C 1 , 11 2 by first transforming using the F ( 1 , t k 1 ) distribution function, with t = 11 and k = 4 in this case, and then by the inverse of G, the χ 1 2 distribution function. The critical values are computed according to the finite sample approximation (28). For instance, for p = 0 . 05 , then x = G 1 { ( 1 p ) 1 / [ T g ( T ) ] } = 11 . 72 , so that [ G ( x ) ] T g ( T ) = 1 p .

8. Conclusions

We advocate the sup-Chow test as a general misspecification test to be used as part of an iterative modelling strategy. It is a relatively simple transformation of the existing one-step Chow test (or, similarly, the EViews one-step forecast test), with a standard and easily calculated null distribution, which does not vary substantially in the AR(1) parameter space. We anticipate that it would be used as one of a battery of tests (including the normality of residuals); rejection would draw the modeller’s attention to the pointwise plot, which would help identify the cause and timing of the failure.
By construction, the test is sensitive to parameter changes and outliers and is somewhat agnostic about the timing and number of these breaks. This makes it useful against a variety of simple and complex misspecification types. However, there is a clear trade-off, and as the first columns of Table 3 show, the test is less powerful against a particular alternative than the [13] sup-F test, which explicitly models a single break. This motivates the use of multiple different tests, with the failure of any one signalling misspecification and triggering further investigation. In real datasets, breaks may not be of the single mean-shift variety, and the parallel popularity of CUSUM-type and Andrews-type tests suggests that both approaches have value.
The test is not invariant to the error distribution, even asymptotically; a feature it shares with most outlier tests and end-sample tests of parameter change. There are two different solutions to this, depending on the modelling approach being used.
If normality is assumed and tested, there is no problem. As Table 2 shows, the pre-test for normality affects the size and power of the test, but substantial power remains following the pre-test: that is, the sup-Chow test has power somewhat orthogonal to a common normality test.
If normality is not maintained, the sup-Chow test as presented cannot be used. The solution to this problem would involve a subsampling technique to recover the distribution of the errors from the known good part of the sample. This is the approach taken in [15], but it necessarily complicates the test. An appealing alternative is offered in [42] in the context of another test applying extreme value theory to outlier detection. The authors resolve the distribution dependency in three stages. First, extreme value theory itself means that a wide class of error distributions converge in the maximum to the same extreme value distribution: the Gumbel, discussed in Equation (20) above. Second, the centring factor required for extreme value convergence is eliminated by examining differences between order statistics, applying a convenient theorem of [43]. Third, the scaling factor is implicitly sampled by examining the largest ( 1 / α ) pointwise statistics for a level α test. Preliminary experiments suggest that this approach can be applied effectively to the sup-Chow test, but as in [42], a further correction is needed to control the size of the test; hence, this remains future work.

Supplementary Materials

Supplementary File 1

Acknowledgements

The comments from the referees are gratefully acknowledged.

Author Contributions

The authors made equal contributions.

A. Proofs

A.1. Notation

We use the spectral matrix norm, so M = [ λ max ( M M ) ] 1 / 2 , where λ m a x ( M M ) indicates the largest eigenvalue of M M . For a symmetric, non-negative definite matrix M, then M 1 / 2 refers to a matrix that is multiplied with its transpose to give M. The choice of the matrix square root does not matter, due to the choice of the matrix norm and the properties of the eigenvalues of the products. Define for any a s , b s { x s , R s 1 , W s 1 , ξ 2 , s , Q s 1 , U ˜ s 1 } , the sum S a b = s = 1 t 1 a s b s , the correlation C a b = S a a 1 / 2 S a b S b b 1 / 2 and the partial regressions quantities ( a | b ) t = a t S a b S b b 1 b t and S a a . b = S a a S a b S b b 1 S b a .

A.2. Three-Way Process Decomposition

We elaborate on the decomposition of the companion form Equation (15) given in Equation (16). Whereas, there, it was decomposed into non-explosive and explosive components, we now further decompose the non-explosive components into stationary and unit-root components. As before, there exists a regular real matrix M 3 to block diagonalize S into stationary, unit-root and explosive components:
M 3 S t = ( M 3 S M 3 1 ) M 3 S t 1 + M 3 ( ξ t , 0 ) ,
U ˜ t Q t W t = U 0 0 0 Q 0 0 0 W U ˜ t 1 Q t 1 W t 1 + e U ˜ , t e Q , t e W , t ,
where U ˜ , Q and W have eigenvalues inside, on and outside the unit circle, respectively. We can now express the two-way decomposition presented in Equation (16) as follows:
R t = U ˜ t Q t and R = U 0 0 Q .

A.3. Preliminary Asymptotic Results

The ADL model Equation (10) becomes:
Y t = ρ Z t + θ S t 1 + ε t t = 1 , , T .
where θ is the vector of coefficients. Then, from Equation (11), we have Z t = Π S t 1 + ξ 2 , t , where ξ t has been partitioned conformably with X t . Then, the residuals from regressing Y t on ( Z t , S t 1 ) could also be obtained by regressing Y t on ( ξ 2 , t , S t 1 ) or as a result of the decomposition above in Equation (16), on x t = ( ξ 2 , t , R t 1 , W t 1 ) ; so, we can analyse the test statistic Equation (6) as if these were the actual regressors.
Lemma 12. Suppose Assumptions 1, 2 and 3 hold with α > 4 only. Then, for all β > 1 / α and ζ < 1 / 8 ,
(i) 
C R W = a . s . o ( t ζ / 2 ) ,
(ii) 
C ξ S = a . s . o ( t β 1 / 2 ) ,
(iii) 
S R R · W 1 = a . s . S R R 1 / 2 · { 1 + o ( 1 ) } · S R R 1 / 2 ,
(iv) 
S ξ 2 ξ 2 · S 1 = a . s . S ξ 2 ξ 2 1 / 2 · { 1 + o ( 1 ) } · S ξ 2 ξ 2 1 / 2 ,
(v) 
S R R 1 / 2 R t 1 = a . s . o ( t ζ / 2 ) ,
(vi) 
S W W 1 / 2 W t 1 = a . s . O ( 1 ) ,
(vii) 
S R R 1 / 2 ( R | W ) t = a . s . o ( t ζ / 2 ) , and
(viii) 
S ξ 2 ξ 2 1 / 2 ( ξ 2 | S ) t = a . s . o ( t β 1 / 2 ) .
Proof. Result (i) is proven by decomposing the correlation to apply results from [25], so that:
C R W = S R R 1 / 2 S R W S W W 1 / 2 1 C U ˜ Q C Q U ˜ 1 1 / 2 S U ˜ U ˜ 1 / 2 0 0 S Q Q 1 / 2 S U ˜ W S Q W S W W 1 / 2 = a . s . O ( 1 ) C U ˜ W C Q W ,
where the last line follows, because C U ˜ Q is vanishing almost surely by [25], Theorem 9.4. Then the result follows, since C U ˜ W = a . s . o ( t β 1 / 2 ) and C Q W = a . s . o ( t ζ / 2 ) by [25], Theorems 9.1 and 9.2. The latter term will dominate since α > 16 / 7 under Assumption 2.
Result (ii) is proven by noting that C ξ S S ξ ξ 1 / 2 S ξ S S S S 1 / 2 , with the first normed term O ( t 1 / 2 ) by [25], Theorem 2.8, and the second o ( t β ) by [25], Theorem 2.4.
Result (iii) follows by writing:
S R R . W 1 = ( S R R S R W S W W 1 S W R ) 1 = S R R 1 / 2 ( I C R W C W R ) 1 S R R 1 / 2 ,
and applying (i) to show that C R W is vanishing.
Result (iv) is exactly analogous, but substitutes (ii) for (i).
Result (v) follows by again decomposing R t . Namely,
S R R = S U ˜ U ˜ 1 / 2 0 0 S Q Q 1 / 2 1 C U ˜ Q C Q U ˜ 1 S U ˜ U ˜ 1 / 2 0 0 S Q Q 1 / 2 ,
so that:
S R R 1 / 2 R t 1 1 C U ˜ Q C Q U ˜ 1 1 / 2 S U ˜ U ˜ 1 / 2 0 0 S Q Q 1 / 2 U ˜ t 1 Q t 1 .
Then, the first normed quantity on the right-hand side is bounded, since C U ˜ Q is vanishing by [25], Theorem 9.4. The second normed quantity comprises S U ˜ U ˜ 1 / 2 U ˜ t 1 stacked with S Q Q 1 / 2 Q t 1 . By [25], Theorem 8.3, we have S U ˜ U ˜ 1 / 2 = O ( t 1 / 2 ) and by [44], Theorem 1(i), we have that U ˜ t 1 = o ( t β ) , so S U ˜ U ˜ 1 / 2 U ˜ t 1 = o ( t β 1 / 2 ) .
We cannot bound S Q Q 1 / 2 independently in the same way, but since Q t contains only the unit-root components (with eigenvalues on the unit circle), we can apply [25], Theorem 8.4, which states that for some η, max t η s < t Q s s = 1 t Q s 1 Q s 1 1 Q s = o ( t ζ ) for all ζ < 1 / 8 , and so, a fortiori, Q t 1 s = 1 t Q s 1 Q s 1 1 Q t 1 = o ( t ζ ) . However, then S Q Q 1 / 2 Q t 1 2 = Q t 1 S Q Q 1 Q t 1 , and we can then use the matrix identity b ( A + b b ) 1 b = b A 1 b ( 1 + b A 1 b ) 1 ([45] p. 151) to write:
Q t 1 S Q Q 1 Q t 1 = Q t 1 s = 1 t Q s 1 Q s 1 1 Q t 1 1 Q t 1 s = 1 t Q s 1 Q s 1 1 Q t 1 ,
which is o ( t ζ ) , so that S Q Q 1 / 2 Q t 1 = o ( t ζ / 2 ) .
Considering the maximum of these components, we have again that the latter dominates and S R R 1 / 2 R t 1 = O ( t ζ / 2 ) , since α > 16 / 7 under Assumption 2.
Result (vi) follows directly from [44], Lemma 4(i).
Result (vii) follows from (i), (v) and (vi). Write:
S R R 1 / 2 ( R | W ) t = S R R 1 / 2 R t 1 S R R 1 / 2 S R W S W W 1 W t 1 S R R 1 / 2 R t 1 + C R W S W W 1 / 2 W t 1 ,
giving three normed quantities to bound. The first is o ( t ζ / 2 ) by (v), as is the second by (i), while the third is bounded by (vi).
Result (viii) is proven in a similar fashion. Write:
S ξ 2 ξ 2 1 / 2 ( ξ 2 | S ) t = S ξ 2 ξ 2 1 / 2 ξ 2 , t S ξ 2 ξ 2 1 / 2 S ξ 2 S S S S 1 S t 1 = S ξ 2 ξ 2 1 / 2 ξ 2 , t C ξ 2 S S S S 1 / 2 S t 1 .
Then, the first of the normed quantities is o ( t β 1 / 2 ) by [25], Theorem 2.8, and the result is ξ t = o ( t β ) by [44], Theorem 1; the second is O ( t β 1 / 2 ) by (ii); and the third is O ( 1 ) , since we use a partial regression transformation to write:
S S S 1 / 2 S t 1 2 = S t 1 S S S 1 S t 1 = ( R | W ) t S R R · W 1 ( R | W ) t + W t 1 S W W 1 W t 1 ,
and then apply (iii) and (vii) and (vi), respectively. ☐
Lemma 13. Under Assumptions 1, 2 and 3 with α > 4 and with β > 1 / α ,
(i) 
s = 1 t 1 ε s S s 1 S S S 1 / 2 = a . s . o ( t β ) ,
(ii) 
s = 1 t 1 ε s R s 1 S R R 1 / 2 = a . s . O [ ( log t ) 1 / 2 ] ,
(iii) 
s = 1 t 1 ε s W s 1 S W W 1 / 2 = a . s . o ( t β ) ,
(iv) 
s = 1 t 1 ε s ( R | W ) s S R R 1 / 2 = a . s . O [ ( log t ) 1 / 2 ] + o ( t β 1 / 16 ) .
Proof. For (i), (ii) and (iii), use [25], Theorem 2.4. For (iv), write:
s = 1 t 1 ε s ( R | W ) s S R R 1 / 2 = s = 1 t 1 ε s R s 1 S R R 1 / 2 s = 1 t 1 ε s W s 1 S W W 1 / 2 C W R ,
and then apply (ii), (iii) and Lemma 12 (i). ☐
Lemma 14. Under Assumptions 1, 2, 3 and 4,
(i) 
s = 1 t 1 ε s ξ 2 , s S ξ 2 ξ 2 1 / 2 = a . s . O [ ( log t ) 1 / 2 ] ,
(ii) 
s = 1 t 1 ε s ( ξ 2 | S ) s S ξ 2 ξ 2 1 / 2 = a . s . o ( t 2 β 1 / 2 ) + O [ ( log t ) 1 / 2 ] , the latter term dominating when α > 4 .
Proof. For (i), use [46], Lemma 1(iii), and [44], Corollary 1(iii). For (ii), write:
s = 1 t 1 ε s ( ξ 2 | S ) s S ξ 2 ξ 2 1 / 2 = s = 1 t 1 ε s ξ 2 , s S ξ 2 ξ 2 1 / 2 s = 1 t 1 ε s S s 1 S S S 1 / 2 C S ξ ,
and then apply (i), Lemma 13 (i) and Lemma 12 (ii). ☐

A.4. Proof of Theorem 6

We proceed to examine the behaviour of ϵ ˜ t , the one-step forecast residuals. From Equation (6), we can write:
ε ˜ t = ε t s = 1 t 1 ε s x s s = 1 t 1 x s x s 1 x t 1 + x t s = 1 t 1 x s x s 1 x t 1 / 2 .
We break the result into two lemmas, one describing the denominator and one the numerator, with similar reasoning in each case.
Lemma 15. Under Assumptions 1, 2 and 3,
x t S x x 1 x t W F W 1 W = o ( t ζ ) a . s .
for all ζ < 1 / 8 with W and F W as in Theorem 6.
Proof. Divide the statistic into two parts using that:
x t S x x 1 x t W F W 1 W x t S x x 1 x t W t 1 S W W 1 W t 1 + W t 1 S W W 1 W t 1 W F W 1 W .
We use a partial regression transformation to divide the first part into two partial components:
x t S x x 1 x t W t 1 S W W 1 W t 1 ( ξ 2 | R , W ) t S ξ 2 ξ 2 . R W 1 ( ξ 2 | R , W ) t + ( R | W ) t S R R . W 1 ( R | W ) t .
The first normed term on the right-hand side is o ( t 2 β 1 ) , and the second is o ( t ζ ) by Lemma 12, Parts (iv) and (viii); and (iii) and (vii), respectively. The second term will dominate, since α > 16 / 7 , so x t S x x 1 x t W t 1 S W W 1 W t 1 = o ( t ζ ) .
The lemma is then proven by rewriting the second step:
W t 1 S W W 1 W t 1 W F W 1 W = ( W ( t 1 ) W t 1 ) [ ( W t 1 ) S W W 1 W t 1 F W 1 ] ( W ( t 1 ) W t 1 ) + ( W ( t 1 ) W t 1 W ) F W 1 ( W ( t 1 ) W t 1 ) + W F W 1 ( W ( t 1 ) W t 1 W ) ,
and noting that, by [25], Corollary 5.3(i), W t 1 W t 1 W = O ( λ min ( W ) t ) and, by [25], Corollary 7.2, ( W t 1 ) S W W 1 W t 1 F W 1 = O ( λ min ( W ) 2 t ) , almost surely, while all of the other terms are bounded by the same corollaries. ☐
We next state a lemma concerning the main numerator term in Equation (43).
Lemma 16. Under Assumptions 1, 2, 3 and 4:
s = 1 t 1 ε s x s S x x 1 x t G t F W 1 W = o ( t β 1 / 8 ) a . s .
for all β > 1 / α , where W and F W are defined as in Theorem 6, and:
G t = s = 1 t 1 ε t s W ( W s ) = o ( t β ) a . s . .
Proof. Once again, we take the proof in two steps, using that:
s = 1 t 1 ε s x s S x x 1 x t G t F W 1 W s = 1 t 1 ε s x s S x x 1 x t s = 1 t 1 ε s W s 1 S W W 1 W t 1 + s = 1 t 1 ε s W s 1 S W W 1 W t 1 G t F W 1 W .
For the first step, we again decompose using a partial regression transformation, so that:
s = 1 t 1 ε s x s S x x 1 x t s = 1 t 1 ε s W s 1 S W W 1 W t 1 s = 1 t 1 ε s ( ξ 2 | R , W ) s S ξ 2 ξ 2 · R W 1 ( ξ 2 | R , W ) t
+ s = 1 t 1 ε s ( R | W ) s S R R · W 1 ( R | W ) t ,
and we consider each term on the right separately. For the first term in Equation (47), use Lemma 12 (iv) to write:
s = 1 t 1 ε s ( ξ 2 | R , W ) s S ξ 2 ξ 2 · R W 1 ( ξ 2 | R , W ) t s = 1 t 1 ε s ( ξ 2 | R , W ) s S ξ 2 ξ 2 1 / 2 O ( 1 ) S ξ ξ 1 / 2 ( ξ 2 | R , W ) t a.s. ,
and then apply Lemma 14 (ii) and Lemma 12 (viii) to arrive at o ( t 3 β 1 ) + o [ t β 1 / 2 ( log t ) 1 / 2 ] almost surely. For the second term in Equation (47), we use Lemma 12 (iii) to write:
s = 1 t 1 ε s ( R | W ) s S R R · W 1 ( R | W ) t s = 1 t 1 ε s ( R | W ) s S R R 1 / 2 O ( 1 ) S R R 1 / 2 ( R | W ) t a.s.
and then apply Lemma 13 (iv) and Lemma 12 (vii) to arrive at o ( t β 1 / 8 ) almost surely. Overall then, as long as α > 4 , the first step is dominated by this second term.
For the second step, we have to show the bounding rate for:
s = 1 t 1 ε s W s 1 S W W 1 W t 1 G t F W 1 W = s = 1 t 1 ε s W s 1 ( W ( t 1 ) ) ( W t 1 ) S W W 1 W t 1 W ( t 1 ) W t 1 G t F W 1 W = s = 1 t 1 ε s W s 1 ( W ( t 1 ) ) G t ( W t 1 ) S W W 1 W t 1 W ( t 1 ) W t 1 + G t ( W t 1 ) S W W 1 W t 1 F W 1 W ( t 1 ) W t 1 + G t F W 1 W ( t 1 ) W t 1 W
Many of these terms are familiar from the proof of Lemma 15, and the only new terms to bound are s = 1 t 1 ε s W s 1 ( W ( t 1 ) ) G t and G t . For the latter, we have:
G t = s = 1 t 1 ε t s W ( W s ) max 1 s < t ε s W s = 1 t 1 ( W s ) ,
which is o ( t β ) , since the latter two terms are bounded, while ε s = o ( s β ) by [44], Theorem 1. For the former term, we have:
s = 1 t 1 ε s W s 1 ( W ( t 1 ) ) G t = s = 1 t 1 ε s W ( t 1 ) W s 1 W ( t s ) W = s = 1 t 1 ε s W ( s t ) p = s W p e W , p max 1 s < t ε s W t s = 1 t 1 u = 0 W u e W , u + s = O ( t β ) O ( λ min ( W ) t ) o ( t 1 + β ) a.s. = o ( t 2 β + 1 λ min ( W ) t ) a.s. ,
where at the second to last line, we use that u = 0 W u e W , u + s = o ( s β ) by [28], Corollary 4.3. Combining these results, we see that this second step vanishes exponentially fast, and the first step dominates the expression of interest, giving the result.
The order of G t follows by writing:
G t = s = 1 t 1 ε t s W ( W s ) max 1 s < t ε s W s = 1 t 1 ( W s ) ,
and applying [44], Theorem 1. ☐
Proof of Theorem 6. We aim to show that:
C 1 , t 2 ( q t / σ ) 2 = a . s . o ( 1 ) .
Using Equation (6), we can rewrite this expression as:
ε ˜ t 2 ( t k 1 ) 1 RSS t 1 q t σ 2 = ε ˜ t 2 ( t k 1 ) RSS t 1 1 σ 2 + ε ˜ t 2 q t 2 σ 2 .
We first consider the difference ε ˜ t 2 q t 2 . We have from Equation (43),
ε ˜ t 2 q t 2 = ( ε t s = 1 t 1 ε s x s S x x 1 x t ) 2 1 + x t S x x 1 x t ( ε t G t F W 1 W ) 2 1 + W F W 1 W = ( ε t A 3 ) 2 1 + A 1 ( ε t A 4 ) 2 1 + A 2
= A 2 A 1 ( 1 + A 1 ) ( 1 + A 2 ) ε t A 3 2 + 1 1 + A 2 A 4 A 3 2 ε t A 4 A 3 ,
where:
A 1 = x t S x x 1 x t A 2 = W F W 1 W A 3 = s = 1 t 1 ε s x s S x x 1 x t A 4 = G t F W 1 W .
Both denominators are bounded from below by unity, since A 1 and A 2 are non-negative. In the first numerator, A 1 A 2 is o ( t ζ ) by Lemma 15. The factor ε t A 3 = ε t A 4 + A 4 A 3 is o ( t β ) , since ε t and A 4 are both o ( t β ) by [44], Theorem 1, and Lemma 16, respectively, while A 4 A 3 is O ( t β 1 / 8 ) by Lemma 16. Therefore, the first term of the sum is o ( t 2 β 1 / 8 ) almost surely.
In the second numerator, A 4 A 3 is O ( t β 1 / 8 ) by Lemma 16, while ε t and A 4 are each o ( t β ) as above, so that the whole second term is also o ( t 2 β 1 / 8 ) almost surely.
Thus, the second term in Equation (49) will vanish as long as 2 β < 1 / 8 or α > 16 in Assumption 2, as required. To show the same for the first term, note that ε ˜ t 2 = q t 2 + ( ε ˜ t 2 q t 2 ) , where the difference vanishes as just proven, while:
q t 2 = ε t A 4 2 1 + A 2 = o ( t 2 β ) a.s. ,
since, as above, ε t and A 4 are both o ( t β ) , while A 2 is nonnegative. Then, [25], Corollary 2.9, implies that:
( t k 1 ) RSS t 1 1 σ 2 = o ( t γ ) a.s. ,
for γ < 1 / 2 . Therefore, the first term in Equation (49) vanishes as long as 2 β < 1 / 2 , which is satisfied by Assumption 2. ☐

A.5. Proof of Lemma 7

Proof. Theorem 6 shows that C 1 , t 2 q t 2 vanishes almost surely. Egorov’s theorem ([47], Theorem 18.4) then shows that C 1 , t 2 q t 2 vanishes uniformly on a set with large probability. That is,
ϵ > 0 T 0 : Pr ( sup t > T 0 | C 1 , t 2 q t 2 | < ϵ ) > 1 ϵ .
This implies that for any sequence g ( T ) which increases to infinity, then sup g ( T ) < t T | C 1 , t 2 q t 2 | vanishes in probability as T increases. ☐

A.6. Proof of Lemma 8 (Correction to Lemma 1 of [30])

Proof. The first part of Deo’s lemma, determining the domain of attraction as Λ, is correct. The second part, determining the norming sequences, is in error. Deo cites ([48] p. 374) for this calculation. There, Cramér calculates the norming sequences for a sequence of independent standard normal random variables (with a right tail differing from the density of interest in only a constant factor). We follow the slightly more direct approach of [49], Theorem 1.5.3.
Since { X n } are independent standard normal random variables, { | X n | } are independent random variables identically distributed with the half-normal density, that is the normal density folded around zero:
Pr { | X 1 | < x } = F ( x ) = 2 / π 0 x e t 2 / 2 d t = 2 Φ ( x ) , x 0
We are interested in probabilities of the form Pr { a n ( Z n b n ) < x } , which may be rewritten Pr { Z n u n } , where u n ( x ) = x / a n + b n . We seek a n , b n , such that the sequence u n satisfies (1.5.1) in [49], Theorem 1.5.1, namely:
n ( 1 F ( u n ) ) e x as n .
Apply a modified version of the well-known normal tail relation,
1 F ( u ) f ( u ) / u as u ,
so that combining Equations (54) and (55), we have that ( 1 / n ) e x u n / f ( u n ) 1 . Taking logs and substituting the density f, we have:
log n x + log u n 1 2 log ( π / 2 ) + u n 2 / 2 0 .
Dividing through by log n ,
1 x log n + log u n log n log ( π / 2 ) log n + u n 2 2 log n 0 ,
then, for any fixed x, the second and fourth terms vanish trivially. The third term vanishes by substituting Equation (54) for n and twice applying L’Hôpital’s rule. It then follows that u n 2 2 log n 1 , or (taking logarithms again),
2 log u n log 2 log log n 0 .
Substituting this result into Equation (56), we have that:
log n x + 1 2 log 2 + 1 2 log log n 1 2 log ( π / 2 ) + u n 2 / 2 0 .
so that rearranging,
u n 2 = 2 log n 1 + x 1 2 log π 1 2 log log n log n + o 1 log n ,
and, hence, the maximum of n half-normal random variables has the form:
u n = ( 2 log n ) 1 / 2 1 + x 1 2 log π 1 2 log log n 2 log n + o 1 log n .
It then follows from [49], Theorem 1.5.3, that Pr { Z n u n } exp ( e x ) , and rearranging gives the norming sequences. ☐

A.7. Proof of Lemma 9

Proof. Consider the normalised linear process:
q t / σ = ( ε t / σ ) ( 1 + W F W 1 W ) 1 / 2 s = 1 ( ε t s / σ ) W ( W s ) F W 1 W ( 1 + W F W 1 W ) 1 / 2
In the case without explosive components, this reduces to:
q t / σ = ( ε t / σ )
so that under Assumption 5, q t / σ is an independent standard normal sequence and q t 2 / σ 2 is an independent χ 1 2 sequence. Then, classical extreme value theory gives the lemma with the norming sequences a t and b t as stated (see, for instance, [50] p. 56), noting that the χ 2 distribution is a special case of the gamma distribution.
When an explosive component is present, q t / σ under Assumption 5 is still marginally standard normal. However, dependence between members of the sequence means that classical extreme value theory cannot be applied. In particular, we have:
E ( q t / σ ) = 0 Var ( q t / σ ) = 1 Covar ( q s / σ , q t / σ ) = r ( s , t ) = r | t s | = 2 E W F W 1 W | t s | W ( 1 + W F W 1 W ) 1
The general approach to dealing with dependent sequences is outlined in [51]; as long as the dependence is not too great, the same limiting results hold.
We take advantage of the relationship between the χ 1 2 and normal distributions to use existing results on dependent normal sequences to analyse the limiting behaviour of q t 2 / σ 2 . In particular, we have:
max t q t 2 / σ 2 < u t iff max t | q t / σ | < u t
where | q t / σ | has the half-normal distribution. Lemma 1 and Theorem 1 of [30] (and its Corollary) consider just such processes, under a square-summability condition that holds here: r s 2 = 4 < . Then, Deo’s result is:
c t ( max 1 s t | q s / σ | d t ) d Λ
with:
c t = ( 2 log t ) 1 / 2 d t = ( 2 log t ) 1 / 2 ( 8 log t ) 1 / 2 ( log log t + log π ) .
(Note that the centring sequence (here, d t , originally b n ) is incorrect in the original. A correction is provided as Lemma 8.) Taking u t ( z ) = c t z + d t and using Equations (60) and (61), we have:
c t 2 d t max 1 s t q t 2 / σ 2 d t 2 d Λ
giving norming sequences:
a t = c t 2 d t (scaling) b t = d t 2 (centring) .
The equivalence between a t , b t and a t , b t is proven by showing that a t / a t 1 and a t ( b t b t ) 0 . ☐

A.8. Proof of Theorem 10

Proof. By a property of inequalities, we can establish a lower bound on the supremum statistic,
1 2 max g ( T ) t T ( C 1 , t 2 ) d T g ( T )
1 2 max g ( T ) t T C 1 , t 2 ( q t / σ ) 2 + 1 2 max g ( T ) t T ( q t / σ ) 2 d T g ( T )
where the left term vanishes in probability by Lemma 7 and the right term converges in distribution by Lemma 9. We can establish a similar upper bound, so that the normalised supremum statistic is bounded above and below by quantities that converge in distribution, and the theorem is proven. ☐

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. G.C. Chow. “Tests of equality between sets of coefficients in two linear regressions.” Econometrica 28 (1960): 591–605. [Google Scholar] [CrossRef]
  2. T. Kimura. “The impact of financial uncertainties on money demand in Europe.” In Monetary Analysis: Tools and Applications. Edited by H.-J. Klöckers and C. Willeke. Frankfurt am Main, Germany: European Central Bank, 2001, pp. 97–116. [Google Scholar]
  3. D.F. Hendry. “Using PC-GIVE in econometrics teaching.” Oxf. Bull. Econ. Stat. 48 (1986): 87–98. [Google Scholar] [CrossRef]
  4. Quantitative Micro Software (QMS). EViews 7 User’s Guide II. Irvine, CA, US: QMS, 2009. [Google Scholar]
  5. R.L. Brown, and J. Durbin. “Methods of investigating whether a regression relationship is constant over time.” In Selected Statistical Papers, European Meeting. Mathematical Centre Tracts, 26; Amsterdam, the Netherlands: Mathematisch Centrum, 1968. [Google Scholar]
  6. R.L. Brown, J. Durbin, and J.M. Evans. “Techniques for testing the constancy of regression relationships over time.” J. R. Stat. Soc. Ser. B 37 (1975): 149–192. [Google Scholar]
  7. W. Krämer, W. Ploberger, and R. Alt. “Testing for structural change in dynamic models.” Econometrica 56 (1988): 1355–1369. [Google Scholar] [CrossRef]
  8. W. Ploberger, and W. Krämer. “On studentizing a test for structural change.” Econ. Lett. 20 (1986): 341–344. [Google Scholar] [CrossRef]
  9. B. Nielsen, and J.S. Sohkanen. “Asymptotic behavior of the CUSUM of squares test under stochastic and deterministic time trends.” Econom. Theory 27 (2011): 913–927. [Google Scholar] [CrossRef]
  10. J.S. Galpin, and D.M. Hawkins. “The use of recursive residuals in checking model fit in linear regression.” Am. Stat. 38 (1984): 94–105. [Google Scholar]
  11. J.-M. Dufour. “Recursive stability analysis of linear regression relationships: An exploratory methodology.” J. Econom. 19 (1982): 31–76. [Google Scholar] [CrossRef]
  12. P. Perron. “Dealing with structural breaks.” In Palgrave Handbook of Econometrics. Edited by T.C. Mills and K. Patterson. Basingstoke, UK: Palgrave Macmillan, 2006, pp. 278–352. [Google Scholar]
  13. D.W.K. Andrews. “Tests for parameter instability and structural change with unknown change point.” Econometrica 61 (1993): 821–856. [Google Scholar] [CrossRef]
  14. R.E. Quandt. “Tests of the hypothesis that a linear regression system obeys two separate regimes.” J. Am. Stat. Assoc. 55 (1960): 324–330. [Google Scholar] [CrossRef]
  15. D.W.K. Andrews. “End-of-sample instability tests.” Econometrica 71 (2003): 1661–1694. [Google Scholar] [CrossRef]
  16. K.S. Srikantan. “Testing for the single outlier in a regression model.” Sankhya Indian J. Stat. Ser. A 23 (1961): 251–260. [Google Scholar]
  17. I. Chang, G.C. Tiao, and C. Chen. “Estimation of time series parameters in the presence of outliers.” Technometrics 31 (1988): 193–204. [Google Scholar] [CrossRef]
  18. C. Chen, and L.M. Liu. “Forecasting time series with outliers.” J. Forecast. 12 (1993): 13–35. [Google Scholar] [CrossRef]
  19. A.J. Fox. “Outliers in time series.” J. R. Stat. Soc. Ser. B 34 (1972): 350–363. [Google Scholar]
  20. V. Barnett, and T. Lewis. Outliers in Statistical Data, 3rd ed. New York, NY, USA: Wiley, 1994. [Google Scholar]
  21. T.L. Lai, and C.-Z. Wei. “Extended least squares and their applications to adaptive control and prediction in linear systems.” Autom. Control 31 (1986): 898–906. [Google Scholar]
  22. M. Duflo. Random Iterative Models. Berlin, Germany: Springer-Verlag, 1997. [Google Scholar]
  23. D.F. Hendry. Dynamic Econometrics. Oxford, UK: Oxford University Press, 1995. [Google Scholar]
  24. S. Johansen. “A Bartlett correction factor for tests on the cointegrating relations.” Econom. Theory 16 (2000): 740–778. [Google Scholar] [CrossRef]
  25. B. Nielsen. “Strong consistency results for least squares estimators in general vector autoregressions with deterministic terms.” Econom. Theory 21 (2005): 534–561. [Google Scholar] [CrossRef]
  26. D. Bauer. “Almost sure bounds on the estimation error for OLS estimators when the regressors include certain MFI(1) processes.” Econom. Theory 25 (2009): 571–582. [Google Scholar] [CrossRef]
  27. J.H. Stock, and M.W. Watson. “Has the business cycle changed and why? ” NBER Macroecon. Annu. 17 (2002): 159–230. [Google Scholar]
  28. B. Nielsen. Singular Vector Autoregressions with Deterministic Terms: Strong Consistency and Lag Order Determination. Nuffield College Discussion Paper; Oxford, UK: Nuffield College, 2008. [Google Scholar]
  29. E. Engler, and B. Nielsen. “The empirical process of autoregressive residuals.” Econom. J. 12 (2009): 367–381. [Google Scholar] [CrossRef]
  30. C.M. Deo. “Some limit theorems for maxima of absolute values of Gaussian sequences.” Sankhyā Indian J. Stat. Ser. A 34 (1972): 289–292. [Google Scholar]
  31. J.A. Doornik. Object-Oriented Matrix Programming Using Ox, 3rd ed. London, UK: Timberlake Consultants Press, 2007. [Google Scholar]
  32. J.A. Doornik, and H. Hansen. “An omnibus test for univariate and multivariate normality.” Oxf. Bull. Econ. Stat. 70 (2008): 927–939. [Google Scholar] [CrossRef]
  33. D.W.K. Andrews. Tests for Parameter Instability and Structural Change with Unknown Change Point. Cowles Foundation Discussion Papers; New Haven, CT, USA: Cowles Foundation for Research in Economics, Yale University, 1990, Volume 943. [Google Scholar]
  34. R.D. Cook, and S. Weisberg. Residuals and Influence in Regression. New York, NY, USA: Chapman and Hall, 1982. [Google Scholar]
  35. L. Kilian, and U. Demiroglu. “Residual based tests for normality in autoregressions: Asymptotic theory and simulations.” J. Econ. Bus. Control 18 (2000): 40–50. [Google Scholar]
  36. B. Nielsen. “Order determination in general vector autoregressions.” In Time Series and Related Topics: In Memory of Ching-Zong Wei. IMS Lecture Notes and Monograph Series; Edited by H.-C. Ho, C.-K. Ing and T.L. Lai. Beachwood, Ohio, USA: Institute of Mathematical Statistics, 2006, Volume 52, pp. 93–112. [Google Scholar]
  37. J.A. Doornik, and D.F. Hendry. Empirical Econometric Modelling—PcGive 14. London, UK: Timberlake Consultants, 2013, Volume 1. [Google Scholar]
  38. D.F. Hendry, and B. Nielsen. Econometric Modelling. Princeton, NJ, USA: Princeton University Press, 2007. [Google Scholar]
  39. R. Lynch, and C. Richardson. “Discussion.” J. Off. Stat. 20 (2004): 623–629. [Google Scholar]
  40. K.D. Patterson, and S.M. Heravi. “Revisions to official data on U.S. GNP: A multivariate assessment of different vintages (with discussion).” J. Off. Stat. 20 (2004): 573–602. [Google Scholar]
  41. S. Johansen, and B. Nielsen. Outlier Detection Algorithms for Least Squares Time Series. Nuffield College Discussion Paper 2014-W04; Oxford, UK: Nuffield College, 2014. [Google Scholar]
  42. P. Burridge, and A.M.R. Taylor. “Additive outlier detection via extreme-value theory.” J. Time Ser. Anal. 27 (2006): 685–701. [Google Scholar] [CrossRef]
  43. I. Weissman. “Estimation of parameters and larger quantiles based on the k largest observations.” J. Am. Stat. Assoc. 73 (1978): 812–815. [Google Scholar] [CrossRef]
  44. T.L. Lai, and C.-Z. Wei. “Asymptotic properties of multivariate weighted sums with applications to stochastic regression in linear dynamic systems.” In Multivariate Analysis VI. Edited by P.R. Krishnaiah. Amsterdam, the Netherlands: North Holland, 1985, pp. 375–393. [Google Scholar]
  45. S.R. Searle. Matrix Algebra Useful for Statistics. New York, NY, USA: John Wiley and Sons, 1982. [Google Scholar]
  46. T.L. Lai, and C.-Z. Wei. “Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems.” Ann. Stat. 10 (1982): 154–166. [Google Scholar] [CrossRef]
  47. J. Davidson. Stochastic Limit Theory. Oxford, UK: Oxford University Press, 1994. [Google Scholar]
  48. H. Cramér. Mathematical Methods in Statistics. Princeton, NJ, USA: Princeton University Press, 1946. [Google Scholar]
  49. M.R. Leadbetter, G. Lindgren, and H. Rootzén. Extremes and Related Properties of Random Sequences and Processes. New York, NY, USA: Springer-Verlag, 1982. [Google Scholar]
  50. P. Embrechts, C. Klüppelberg, and T. Mikosch. Modelling Extremal Events for Insurance and Finance. Berlin, Germany: Springer, 1997. [Google Scholar]
  51. M.R. Leadbetter, and H. Rootzen. “Extremal theory for stochastic processes.” Ann. Probab. 16 (1988): 431–478. [Google Scholar] [CrossRef]
  • 1ABMI series from Office of National Statistics, seasonally adjusted, 2010 prices, release 20 December 2013.

Share and Cite

MDPI and ACS Style

Nielsen, B.; Whitby, A. A Joint Chow Test for Structural Instability. Econometrics 2015, 3, 156-186. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics3010156

AMA Style

Nielsen B, Whitby A. A Joint Chow Test for Structural Instability. Econometrics. 2015; 3(1):156-186. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics3010156

Chicago/Turabian Style

Nielsen, Bent, and Andrew Whitby. 2015. "A Joint Chow Test for Structural Instability" Econometrics 3, no. 1: 156-186. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics3010156

Article Metrics

Back to TopTop