Next Article in Journal
On Spurious Causality, CO2, and Global Temperature
Previous Article in Journal
Cointegration, Root Functions and Minimal Bases
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Communication

Prais–Winsten Algorithm for Regression with Second or Higher Order Autoregressive Errors

by
Dimitrios V. Vougas
Swansea University, Swansea SA2 8PP, UK
Retired.
Submission received: 9 June 2021 / Revised: 21 August 2021 / Accepted: 23 August 2021 / Published: 27 August 2021

Abstract

:
There is no available Prais–Winsten algorithm for regression with AR(2) or higher order errors, and the one with AR(1) errors is not fully justified or is implemented incorrectly (thus being inefficient). This paper addresses both issues, providing an accurate, computationally fast, and inexpensive generic zig-zag algorithm.

1. Introduction

Both simulation and theoretical evidence show that the estimation of a regression with an autoregressive of order one (AR(1)) errors via exact (conditional) maximum likelihood (ML) is inferior to an exact nonlinear least squares (NLS) estimation (the correctly implemented Prais and Winsten (1954) (PW) method), at least for trending data. Park and Mitchell (1980) show via simulation that, for trending regressors, the PW method is more efficient than the exact/conditional ML (the (Beach and MacKinnon 1978a) (BM) method). This is theoretically confirmed by (Magge 1985; Magee 1989) who approximates biases of various two-step and iterative estimators. Correctly implemented, the PW method delivers exact NLS (equivalent to unconditional ML under normality) estimators. No normality requirement is needed for the PW method. Gurland (1954) formalises and resolves criticism against the estimation method of Cochrane and Orcutt (1949) (CO, hereafter) for losing the first observation,1 which is employed by PW or re-proposed by Kadiyala (1968). Koopmans (1942) was the first to examine the stationary AR(1) correctly. The PW algorithm is a fast zig-zag (computationally inexpensive) algorithm, which retains the first observation and requires no numerical optimiser. However, in order to obtain an (econometrically efficient) exact NLS estimator, the correct (exact) closed form formula must be employed in the iterations for the update of the AR parameter. This is not the case for all available implementations of the PW method (see Magee (1989)). For example, (Judge et al. 1985, chp. 8) propose using the autocorrelation coefficient formula for the update, but this is not correct. Exact/conditional ML fast zig-zag algorithms are provided by (Beach and MacKinnon 1978a, 1978b) for AR(1)2 and AR(2)3 errors, respectively. None is available for higher than two AR orders. The PW algorithm for a regression with AR(1) errors is relatively well known, although usually incorrectly implemented. This is because an incorrect closed form estimator is used for the update. Park and Mitchell (1980) implement the PW algorithm correctly, although they do not give all the details. Furthermore, there is no zig-zag PW algorithm for a regression with AR(2) or higher order errors, and this literature gap is filled in this paper. Reliable standard errors are also calculated that are not affected by the presence of the lagged dependent variable as a regressor. In fact, our proposed method corrects the inefficient CO method and poor estimation via bad implementation of the Gauss–Newton algorithm.
The paper is organised as follows: Section 2 fully discusses the closed form exact NLS estimation of pure AR(1), AR(2), and AR(p) coefficients, while Section 3 examines the iterative exact joint estimation of regression and autoregressive coefficients. Finally, Section 4 concludes.

2. Closed Form Exact NLS: AR(1,2,…,p)

The main ingredient for the correct PW algorithm is the correct4 closed form update(s) for the autoregressive parameter(s), along with exact generalised least squares (GLS) iterations. Let (observed) y t be generated via the stationary AR(1)
y t = θ y t 1 + v t , t = 2 , , n ,
with
y 1 = v 1 1 θ 2 ,
v t i i d ( 0 , σ v 2 ) , t = 1 , , n . This assumption can be relaxed, and v t can be a martingale difference sequence with finite fourth moment (see (Stock 1991), for example). Assuming normal v t , the exact/conditional ML estimator of (1), conditional on (2), in closed form, is the solution to their cubic equation that BM provides. PW/exact NLS (eventually in closed form) requires the exact sum of squares to be minimised, that is,
min θ { y 1 2 ( 1 θ 2 ) + 2 n ( y t θ y t 1 ) 2 } ,
see (Kmenta 1986, p. 319 (full proof in the present paper)). Here, we make use of σ y / σ v = 1 θ 2 . Phillips (1977,1978) and Sawa (1978) (among others) employ an inexact version of (3), ignoring y 1 2 ( 1 θ 2 ) . This results in the so-called LS (also incorrectly called the ML) estimator ( θ ^ L S ), which is more biased and less efficient than the exact NLS estimator of (3) (also called the (original) PW estimator by Park and Mitchell (1980); PW2 in (Magee 1989, p. 662)). Minimisation is via
2 θ y 1 2 2 2 n ( y t θ y t 1 ) y t 1 = 0 ,
which results in the closed form of the (genuine) PW estimator5
θ ^ P W = 2 n y t y t 1 3 n y t 1 2 ,
which is better than the OLS estimator θ ^ L S = 2 n y t y t 1 / 2 n y t 1 2 .6 An even worse estimator is the autocorrelation coefficient 2 n y t y t 1 / 1 n y t 2 .7
For the stationary AR(2) model
y t = θ 1 y t 1 + θ 2 y t 2 + v t , t = 3 , , n ,
y 1 = v 1 / ( 1 θ 2 2 1 ρ 1 2 ) , y 2 = ρ 1 y 1 + v 2 / 1 θ 2 2 , ρ 1 = θ 1 / ( 1 θ 2 ) ,
also ρ 2 = θ 1 ρ 1 + θ 2 ,8 no closed-form exact NLS/PW estimators of the autoregressive parameters are available in the literature. To derive them, the exact sum of squares to be minimised is
min θ 1 , θ 2 { ( 1 θ 2 2 ) ( 1 ρ 1 2 ) y 1 2 + ( 1 θ 2 2 ) ( y 2 ρ 1 y 1 ) 2 + 3 n ( y t θ 1 y t 1 θ 2 y t 2 ) 2 } .
Use has been made of the fact that in this case, σ y / σ v = 1 θ 2 2 1 ρ 1 2 . The required canonical equations (with the help of MATHEMATICA™ and imposing ρ 1 = θ 1 / ( 1 θ 2 ) ) are
( 1 + θ 2 ) y 1 y 2 + 3 n y t y t 1 θ 1 3 n y t 1 2 θ 2 3 n y t 1 y t 2 = 0 θ 1 y 1 y 2 + θ 2 ( y 1 2 + y 2 2 ) + 3 n y t y t 2 θ 1 3 n y t 1 y t 2 θ 2 3 n y t 2 2 = 0
that become (after manipulations)
( y 1 y 2 + 3 n y t y t 1 ) θ 1 3 n y t 1 2 θ 2 ( y 1 y 2 + 3 n y t 1 y t 2 ) = 0 3 n y t y t 2 θ 1 ( y 1 y 2 + 3 n y t 1 y t 2 ) θ 2 ( y 1 2 y 2 2 + 3 n y t 2 2 ) = 0 .
Re-arranging (10), the system of equations can be solved for the brand new (efficient/genuine) PW estimators of the AR(2) parameters. That is,
θ ^ 1 P W θ ^ 2 P W = 3 n y t 1 2 4 n y t 1 y t 2 4 n y t 1 y t 2 5 n y t 2 2 1 2 n y t y t 1 3 n y t y t 2 .
In view of (11), we can guess the brand new closed-form PW autoregressive estimators vector of any AR(p)
y t = θ 1 y t 1 + θ 2 y t 2 + + θ p y t p + v t , t = p + 1 , , n ,
along with the required equations for observations 1,…,p (not stated as they are difficult). That is,
θ ^ 1 P W θ ^ 2 P W θ ^ p 1 P W θ ^ p P W =
3 n y t 1 2 4 n y t 1 y t 2 1 + p n y t 1 y t p + 1 2 + p n y t 1 y t p 4 n y t 1 y t 2 5 n y t 2 2 2 + p n y t 2 y t p + 1 3 + p n y t 2 y t p 1 + p n y t 1 y t p + 1 2 + p n y t 2 y t p + 1 2 p 1 n y t p + 1 2 2 p n y t p + 1 y t p 2 + p n y t 1 y t p 3 + p n y t 2 y t p 2 p n y t p + 1 y t p 2 p + 1 n y t p 2 1
× 2 n y t y t 1 3 n y t y t 2 p n y t y t p + 1 p + 1 n y t y t p .
This expression is easily programmable in a matrix programming language.

3. Iteration

The closed-form estimators of the previous section are to be used in iterations for updating. Let observed Y t be generated via
Y t = x t β + y t ,
with (unobserved) y t following (1) and (2); x t is a k × 1 vector of regressors (the t-th row of regressor matrix X), and β is the corresponding coefficient vector. The PW iterative algorithm has the following steps: (I) Apply OLS to (12) and derive residual y ^ t and θ ^ P W from (5), replacing y t with y ^ t . (II) Re-estimate β from
Y 1 1 θ ^ P W 2 = x 1 β 1 θ ^ P W 2 + v 1
Y t θ ^ P W 2 Y t 1 = ( x t θ ^ P W 2 x t 1 ) β + v t , t = 2 , , n .
For a PW algorithm that delivers exact NLS estimators, the first observation must be employed (see (13)), and the θ update must be via (5). According to (Magee 1989, p. 663), (5) guarantees convergence. (III) Use the new estimate of β and obtain new residual y ^ t from (12) and new estimate of θ , and proceed to step (II) if convergence criterion is not met. In addition, estimates of θ must be forced to be in ( 1 , 1 ) . P 1 ( θ ) is the exact GLS matrix of first order with the main diagonal 1 θ 2 , 1 , , 1 , first sub-diagonal θ , , θ , and zero elsewhere. Let v ˜ be the converged innovation residual vector from (13) and (14), corresponding to converged θ ˜ P W ; we define the innovation variance estimate σ ˜ v 2 = v ˜ v ˜ / ( n k ) , and c o v β ˜ = σ ˜ v 2 ( X P 1 ( θ ˜ P W ) P 1 ( θ ˜ P W ) X ) 1 for the converged estimate of β , β ˜ . For θ ˜ P W , we rely on the fact that exact NLS estimation is identical to unconditional ML under normality and calculate the quasi-ML covariance c o v θ ˜ P W as the inverse of minus the Hessian of the concentrated unconditional loglikelihood L ( θ ) = ( n / 2 ) ln ( Y * X * β ^ G L S , θ ) ( Y * X * β ^ G L S , θ ) evaluated at θ ˜ P W , with β ^ G L S , θ = ( X * X * ) 1 ( X * Y * ) , Y * = P 1 ( θ ) Y and X * = P 1 ( θ ) X , and the MLE innovation variance estimate σ ^ v M L E , θ 2 = v ^ ( θ ) v ^ ( θ ) / n , with “residual” v ^ ( θ ) for a given θ , v ^ ( θ ) = Y X β ^ G L S , θ . Alternatively, we may use the asymptotic covariance for θ ˜ P W , requiring no normality (see below). A third option could be to calculate the sandwich covariance. Note that all these covariances are not affected by the presence of the lagged dependent variable as a regressor.
Similarly, when y t in (12) follows the AR(2) model, the genuine PW iterative algorithm has the following steps: (I) Apply OLS to (12) and derive residual y ^ t and θ ^ 1 P W and θ ^ 2 P W from (11), replacing y t with y ^ t . (II) Re-estimate β from
Y 1 1 ρ ^ 1 2 1 θ ^ 2 P W 2 = x 1 β 1 ρ ^ 1 2 1 θ ^ 2 P W 2 + v 1
( Y 2 ρ ^ 1 Y 1 ) 1 θ ^ 2 P W 2 = ( x 2 ρ ^ 1 x 1 ) β 1 θ ^ 2 P W 2 + v 2
Y t θ ^ 1 P W Y t 1 θ ^ 2 P W Y t 2 = ( x t θ ^ 1 P W x t 1 θ ^ 2 P W x t 2 ) β + v t , t = 3 , , n .
Note that ρ ^ 1 = θ ^ 1 P W / ( 1 θ ^ 2 P W ) . (In addition, estimates of θ 1 and θ 2 must be forced to be inside the stationarity triangle.) (III) Use the new estimator of β and obtain new residual y ^ t from (12) and new estimators of θ 1 and θ 2 from (11), and proceed to step (II) if convergence criterion is not met. P 2 ( θ 1 , θ 2 ) is the exact GLS matrix of second order with the main diagonal 1 θ 2 2 1 ρ 1 2 , 1 θ 2 2 , 1 , , 1 , first sub-diagonal ρ 1 1 θ 2 2 , θ 1 , , θ 1 , second subdiagonal θ 2 , , θ 2 , and zero elsewhere. Let v ˜ be the converged innovation residual from (15)–(17), and define σ ˜ v 2 = v ˜ v ˜ / ( n k ) . For the converged estimate of β , β ˜ , we define c o v β ˜ = σ ˜ v 2 ( X P 2 ( θ ˜ 1 P W , θ ˜ 2 P W ) P 2 ( θ ˜ 1 P W , θ ˜ 2 P W ) X ) 1 , and θ ˜ 1 P W and θ ˜ 2 P W are the converged estimates of θ 1 and θ 2 , respectively. For θ = { θ 1 , θ 2 } , under normality, we calculate the quasi-ML covariance c o v θ ˜ P W from the inverse of minus the Hessian of the concentrated unconditional loglikelihood L ( θ 1 , θ 2 ) = ( n / 2 ) ln ( Y * X * β ^ G L S , θ ) ( Y * X * β ^ G L S , θ ) evaluated at θ ˜ 1 P W and θ ˜ 2 P W , with β ^ G L S , θ = ( X * X * ) 1 ( X * Y * ) , Y * = P 2 ( θ 1 , θ 2 ) Y and X * = P 2 ( θ 1 , θ 2 ) X , and the MLE innovation variance estimate σ ^ v M L E , θ 2 = v ^ ( θ ) v ^ ( θ ) / n , with “residual” v ^ ( θ ) for a given vector θ , v ^ ( θ ) = Y X β ^ G L S , θ . Alternatively, we may use the asymptotic covariance for θ ˜ P W = { θ ˜ 1 P W , θ ˜ 2 P W } , requiring no normality (see below). A third option could be to calculate the sandwich covariance. Note that all these covariances are not affected by the presence of the lagged dependent variable as a regressor.
Finally, for the model Y = X β + y when y follows AR(p) with θ = { θ 1 , , θ p } , OLS provides β ^ and y ^ , and the first step θ ^ P W = { θ ^ 1 P W , , θ ^ p P W } , using the generic formula above. Then the OLS (in fact GLS) regression of P ( θ ^ P W ) Y on P ( θ ^ P W ) X results in a new β ^ , new y ^ = Y X β ^ , and new θ ^ P W , repeating until convergence and restricting the elements of θ accordingly. The exact GLS matrix P has the correct Cholesky decomposition9 of V p 1 in positions 1 : p and 1 : p , P [ j , j ] = 1 for j = p + 1 , , n , P [ j + i , j ] = θ i for j = p + 1 , , n and i = 1 , , p with the restriction j + i n , and zeros elsewhere. The fast calculation of P ( θ ) Y and P ( θ ) X may rely on the convolution procedure, except for the first p rows where it has to be implemented manually. The typical element of V p 1 , v i j , is given in (Hamilton 1994, p. 125) or (Galbraith and Galbraith 1974, p. 70). For the converged β , β ˜ , c o v β ˜ = σ ˜ v 2 ( X P ( θ ˜ P W ) P ( θ ˜ P W ) X ) 1 is used, where v ˜ is the converged innovation residual and σ ˜ v 2 = v ˜ v ˜ / ( n k ) . Assuming normality, we can calculate the quasi-ML covariance c o v θ ˜ P W from the inverse of minus the Hessian of the concentrated unconditional loglikelihood L ( θ ) = ( n / 2 ) ln ( Y * X * β ^ G L S , θ ) ( Y * X * β ^ G L S , θ ) evaluated at converged θ ˜ P W with β ^ G L S , θ = ( X * X * ) 1 ( X * Y * ) , Y * = P ( θ ) Y and X * = P ( θ ) X , and the MLE innovation variance estimate σ ^ v M L E , θ 2 = v ^ ( θ ) v ^ ( θ ) / n , with “residual” v ^ ( θ ) for a given vector θ , v ^ ( θ ) = Y X β ^ G L S , θ . An alternative covariance for θ ˜ P W is the asymptotic covariance V ˜ p 1 / n ( V ˜ p 1 is V p 1 evaluated at θ ˜ P W = { θ ˜ 1 P W , , θ ˜ p P W } ). A third covariance option is the sandwich covariance. Again, all these covariances are not affected by the presence of the lagged dependent variable as a regressor.
GAUSS™/MATLAB™/GNU Octave programmes for the method are available upon request.

4. Conclusions

We have proposed the correct, fast, and generic PW algorithm for exact NLS estimation of a regression with AR errors of any order (requiring no normality assumption). However, if the innovations are normal, then our proposed estimators are identical to the unconditional ML estimators. No numerical optimiser is required either, although we have checked the results of our algorithm with the results of numerical optimisation. Quasi-normality may be assumed in constructing a covariance (quasi-ML or sandwich) for the AR parameters. Alternatively, the asymptotic covariance can also be calculated (with no normality requirement). The covariance of the regressor parameters is obtained from the GLS formulae. An information criterion may be employed to select the unknown AR order in practise, approximating any underlying ARMA process. For a recent paper on AR lag order selection, see Butler and Paolella (2017). The estimation method is suitable and efficient for trending data. It is also applicable to dynamic regression with a lagged dependent variable included as a regressor. It provides reliable standard errors in the presence of the lagged dependent variable as a regressor. In addition, it can be used to derive efficient Wald-type unit root tests.

Funding

This research received no external funding.

Conflicts of Interest

The author declares no conflict of interest.

Notes

1
The CO method of Oberhofer and Kmenta (1974) and Magnus (1978) is different. It is an ML estimation method, imposing zero initial conditions.
2
Magnus (1978) provides an alternative cumbersome algorithm which requires an equation solver.
3
There may be convergence issues with the BM algorithm for regression with AR(2) errors, but they are beyond the scope of this paper.
4
The one that obeys the corresponding first order condition for minimisation of the exact NLS sum of squares of the innovations.
5
θ y 1 2 + 2 n y t y t 1 θ 2 n y t 1 2 = 0 2 n y t y t 1 = θ ( y 1 2 + 2 n y t 1 2 ) .
6
Since θ ^ P W = θ ^ L S ( 1 + y 1 2 / 3 n y t 1 2 ) > θ ^ L S , θ ^ P W is less biased than θ ^ L S as the bias of the LS estimator is negative. Magee (1989) gives approximate formulae for biases. We have verified superiority of the PW estimator.
7
Judge et al. (1985) recommend this estimator. However, according to our calculation (based on (Magee 1989)), this is the worst possible estimator amongst the ones examined.
8
ρ 1 and ρ 2 are the theoretical autocorrelation coefficients of order 1 and 2, respectively.
9
For this, one must use the command rev(rev(chol()’)’) instead of the build in chol() in matrix language programmes.

References

  1. Beach, Charles M., and James G. MacKinnon. 1978a. A Maximum Likelihood Procedure for Regression with Autocorrelated Errors. Econometrica 46: 51–58. [Google Scholar] [CrossRef]
  2. Beach, Charles M., and James G. MacKinnon. 1978b. Full Maximum Likelihood Estimation of Second-Order Autoregressive Error Models. Journal of Econometrics 7: 187–98. [Google Scholar] [CrossRef]
  3. Butler, Ronald W., and Marc S. Paolella. 2017. Autoregressive Lag-Order Selection Using Conditional Saddlepoint Approximations. Econometrics 5: 43. [Google Scholar] [CrossRef] [Green Version]
  4. Cochrane, Donald, and Guy H. Orcutt. 1949. Application of Least Squares Regression to Relationships Containing Autocorrelated Error Terms. Journal of the American Statistical Association 44: 3–61. [Google Scholar]
  5. Galbraith, R. F., and J. I. Galbraith. 1974. Inverses of Some Patterned Matrices Arising in Theory of Stationary Time Series. Journal of Applied Probability 11: 63–71. [Google Scholar] [CrossRef]
  6. Gurland, John. 1954. An Example of Autocorrelated Disturbances in Linear Regression. Econometrica 22: 218–27. [Google Scholar] [CrossRef]
  7. Hamilton, James D. 1994. Time Series Analysis. Princeton: Princeton University Press. [Google Scholar]
  8. Judge, George G., William E. Griffiths, R. Carter Hill, Helmut Lütkepohl, and Tsoung-Chao Lee. 1985. The Theory and Practice of Econometrics, 2nd ed. New York: Wiley. [Google Scholar]
  9. Kadiyala, Koteswara Rao. 1968. A Transformation Used to Circumvent the Problem of Autocorrelation. Econometrica 36: 93–96. [Google Scholar] [CrossRef]
  10. Kmenta, Jan. 1986. Elements of Econometrics, 2nd ed. New York: MacMillan. [Google Scholar]
  11. Koopmans, Tjalling. 1942. Serial Correlation and Quadratic Forms in Normal Variables. Annals of Mathematical Statistics 13: 14–33. [Google Scholar] [CrossRef]
  12. Magge, Lonnie. 1985. Efficiency of Iterative Estimators in the Regression Model with AR(1) Disturbances. Journal of Econometrics 29: 275–87. [Google Scholar] [CrossRef]
  13. Magee, Lonnie. 1989. An Edgeworth Test Size Correction for the Linear Model with AR(1) Errors. Econometrica 57: 661–74. [Google Scholar] [CrossRef]
  14. Magnus, Jan R. 1978. Maximum Likelihood Estimation of the GLS Model with Unknown Parameters in the Disturbance Covariance Matrix. Journal of Econometrics 7: 281–12. [Google Scholar] [CrossRef] [Green Version]
  15. Oberhofer, Walter, and Jan Kmenta. 1974. A General Procedure for Obtaining Maximum Likelihood Estimates in Generalized Regression Models. Econometrica 42: 579–90. [Google Scholar] [CrossRef]
  16. Park, Rolla Edward, and Bridger M. Mitchell. 1980. Estimating the Autocorrelated Error Model with Trended Data. Journal of Econometrics 13: 185–201. [Google Scholar] [CrossRef]
  17. Phillips, Peter C. B. 1977. Approximations to Some Finite Sample Distributions Associated with a First-Order Stochastic Difference Equation. Econometrica 45: 463–85. [Google Scholar] [CrossRef]
  18. Phillips, Peter C. B. 1978. Edgeworth and Saddlepoint Approximations in the First-Order Noncircular Autoregression. Biometrika 65: 91–98. [Google Scholar] [CrossRef]
  19. Sawa, Takamitsu. 1978. The Exact Moments of the Least Squares Estimator for the Autoregressive Model. Journal of Econometrics 8: 159–72. [Google Scholar] [CrossRef]
  20. Stock, James H. 1991. Confidence Intervals for the Largest Autoregressive Root in US Macroeconomic Time Series. Journal of Monetary Economics 28: 435–59. [Google Scholar] [CrossRef]
  21. Prais, Sigbert J., and Christofer B. Winsten. 1954. Trend Estimators and Serial Correlation, Cowles Commission Discussion Paper No. 383, Chicago. Unpublished Work.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Vougas, D.V. Prais–Winsten Algorithm for Regression with Second or Higher Order Autoregressive Errors. Econometrics 2021, 9, 32. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics9030032

AMA Style

Vougas DV. Prais–Winsten Algorithm for Regression with Second or Higher Order Autoregressive Errors. Econometrics. 2021; 9(3):32. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics9030032

Chicago/Turabian Style

Vougas, Dimitrios V. 2021. "Prais–Winsten Algorithm for Regression with Second or Higher Order Autoregressive Errors" Econometrics 9, no. 3: 32. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics9030032

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop