A Combination Method for Averaging OLS and GLS Estimators

Liu, Qingfeng; Vasnev, Andrey L.

doi:10.3390/econometrics7030038

Open AccessArticle

A Combination Method for Averaging OLS and GLS Estimators^†

by

Qingfeng Liu

^1,* and

Andrey L. Vasnev

²

¹

Department of Economics, Otaru University of Commerce, Otaru 047-8501, Japan

²

Discipline of Business Analytics, The University of Sydney Business School, The University of Sydney, Sydney, NSW 2006, Australia

^*

Author to whom correspondence should be addressed.

^†

The authors thank Okui Ryo, Mototsugu Shintani, Arihiro Yoshimura and the conference participants at the 2014 European Meeting of the Econometric Society and the 2013 Kansai Keiryo Keizaigaku Kenkyukai for their helpful comments. Liu acknowledges financial support from the JSPS Grant-in-Aid for Young Scientists (B) No. 25780148 and JSPS KAKENHI Grant (C) No. JP16K03590 and No. JP19K01582. The authors are grateful to the Editor and two reviewers for their constructive comments.

Econometrics 2019, 7(3), 38; https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics7030038

Submission received: 6 June 2019 / Revised: 3 September 2019 / Accepted: 4 September 2019 / Published: 9 September 2019

(This article belongs to the Special Issue Bayesian and Frequentist Model Averaging)

Download

Browse Figure

Versions Notes

Abstract

:

To avoid the risk of misspecification between homoscedastic and heteroscedastic models, we propose a combination method based on ordinary least-squares (OLS) and generalized least-squares (GLS) model-averaging estimators. To select optimal weights for the combination, we suggest two information criteria and propose feasible versions that work even when the variance-covariance matrix is unknown. The optimality of the method is proven under some regularity conditions. The results of a Monte Carlo simulation demonstrate that the method is adaptive in the sense that it achieves almost the same estimation accuracy as if the homoscedasticity or heteroscedasticity of the error term were known.

Keywords:

model averaging; OLS; GLS; combination method

1. Introduction

Model averaging has been developed as an alternative to model selection. In many situations, model-averaging methods perform better than alternative model-selection methods. The main reason for this is that model selection delivers a pretest estimator that has inferior properties, and its use can be harmful (see Danilov and Magnus, 2004). Yuan and Yang (2005) provided a detailed discussion on the choice between model averaging and model selection. As one of the pioneers of frequentist model averaging, Hansen (2007) proposed Mallows model averaging (MMA) based on the ordinary least-squares (OLS) estimator for linear regression models with homoscedastic errors. Wan et al. (2010) extended the results for non-nested models with homoscedastic errors. Zhao et al. (2018) is the most recent work in this area. For linear regression models with heteroscedastic errors, Hansen and Racine (2012), Liu and Okui (2013) and Zhang et al. (2013, 2015) proposed model averaging methods that are still based on the OLS estimator, while Liu et al. (2016) proposed a method based on the generalized least squares estimator (GLS). They demonstrated that their methods are optimal in the sense of Li (1987) for homoscedastic or heteroscedastic models. For model averaging in big datasets, Xie (2017) proposed the use of model screening (before averaging) in order to deal with the large number of candidate models/regressors.

However, all previous papers assumed that it was known whether the errors of the true data-generating process are homoscedastic or heteroscedastic. Due to this assumption, the previous averaging methods were based only on estimators constructed using the same estimation method, either OLS or GLS estimators (with different regressor sets). This assumption can be unrealistic in empirical applications. Usually, researchers do not know the structure of the error term; therefore, this assumption leads to possible misspecification. A natural solution is to combine OLS and GLS estimators. Combinations of different methods are routinely used in the applied forecast combination literature. In a recent forecasting competition that included 100,000 series, Makridakis et al. (2018) found that, out of the 17 most accurate methods, 12 were combinations. All combinations used different models/methods varying from simple exponential smoothing models to sophisticated machine-learning algorithms.

We propose a combination method based on OLS and GLS estimators to reduce the risk of misspecification between homoscedastic and heteroscedastic linear models. More precisely, the proposed estimator is a weighted average of mixtures of OLS and GLS estimators. The OLS mixture is constructed using the MMA of Hansen (2007) or the heteroscedasticity-robust Cp(HRCp) model averaging of Liu and Okui (2013). The GLS mixture is constructed using the GLS model averaging (GLSMA) of Liu et al. (2016).

We propose the use of two criteria, MMA-GLSMA and HRCp-GLSMA, to choose the weight vector for combining estimators. The optimality of the chosen weight vector in the sense of Li (1987) is investigated. Our method works in situations with an unknown variance-covariance matrix of the error term if an estimate based on the nonparametric method k-nearest neighbours (k-NN) is used. The results of the simulation experiments show that our combination method is adaptive in the sense that it can achieve almost the same estimation accuracy as if the homoscedasticity or heteroscedasticity of the error term was known.

The rest of the paper is organized as follows: In Section 2, we describe the theoretical setup and introduce the new combination method with the criteria for choosing the weight vector. In Section 3, we investigate the optimality of the proposed criteria. Section 4 presents the results of the Monte Carlo simulations. Section 5 concludes the paper, and all proofs are provided in the Appendix A.

2. Method

Suppose that we have an independent random sample of

(y_{i}, x_{i})

for

i = 1, \dots, n

, where

x_{i} = {(x_{i 1}, x_{i 2}, \dots,)}^{'}

is a countably infinite real-valued vector and

y_{i}

is a real-valued scalar random variable generated from an infinite dimensional linear regression model:

\begin{matrix} y_{i} & = & μ_{i} + e_{i}, \end{matrix}

where:

\begin{matrix} μ_{i} & = & \sum_{j = 1}^{\infty} θ_{j} x_{i j}, \end{matrix}

e_{i}

is an unobserved error term that can be homoscedastic or heteroscedastic with

E (e_{i} | x_{i}) = 0

,

E (e_{i}^{2}) = σ^{2}

and

E (e_{i}^{2} | x_{i}) = σ_{i}^{2}

and

θ_{j}

for

j = 1, 2, \dots

are unknown parameters. We also define

X \equiv (x_{1}, \dots, x_{n})

,

Y \equiv {(y_{1}, \dots, y_{n})}^{^{'}}

,

μ \equiv {(μ_{1}, \dots, μ_{n})}^{^{'}}

,

e \equiv {(e_{1}, \dots, e_{n})}^{^{'}}

and denote the variance-covariance matrix as

Ω \equiv E (e e^{^{'}} | X) = d i a g (σ_{1}^{2}, \dots, σ_{n}^{2})

. We state the theoretical results considering the distributions conditional on X and omit all notations for those conditional on X hereafter.

2.1. Infeasible Combination Estimator and Information Criteria

Suppose

Ω

is known; we have a candidate set of

M_{1}

linear models with different numbers of independent variables for OLS estimation and a candidate set of

M_{2}

linear models with different numbers of independent variables for GLS estimation. Then, we can obtain a set of OLS estimates

{\hat{μ}}_{m}^{o l s} \equiv P_{m} Y

for

m = 1, \dots, M_{1}

and a set of GLS estimates

{\hat{μ}}_{m}^{g l s} \equiv G_{m} Y

for

m = 1, \dots, M_{2}

. Therein,

P_{m} = X_{m} {(X_{m}^{'} X_{m})}^{- 1} X_{m}^{'}

is the projection matrix of the

m^{th}

regression model for OLS with

m = 1, \dots, M_{1}

, and

G_{m} \equiv X_{m} {(X_{m}^{'} Ω^{- 1} X_{m})}^{- 1} X_{m}^{'} Ω^{- 1}

with

X_{m}

being the independent variable matrix of the

m^{th}

regression model for GLS with

m = 1, \dots, M_{2}

. In this paper, we only consider the situation with nested models for both OLS and GLS estimators. This means that the

m^{th}

model is nested in the

{(m + 1)}^{th}

model. Theoretical results may be extended to non-nested candidate models using the approach of Wan et al. (2010). Moreover,

M_{1}

and

M_{2}

can be fixed or go to infinity when the sample size n is increasing.

Based on those OLS and GLS estimates, we construct a combination estimator as follows:

\begin{matrix} \hat{μ} (W) & = \sum_{m = 1}^{M_{1}} w_{m}^{o l s} {\hat{μ}}_{m}^{o l s} + \sum_{m = 1}^{M_{2}} w_{m}^{g l s} {\hat{μ}}_{m}^{g l s}, \\ \equiv {\hat{μ}}_{o l s} (W_{1}) + {\hat{μ}}_{g l s} (W_{2}) \end{matrix}

where

W = (W_{1}^{^{'}}, W_{2}^{^{'}}) = {(w_{1}^{o l s}, \dots, w_{M_{1}}^{o l s}, w_{1}^{g l s}, \dots, w_{M_{2}}^{g l s})}^{^{'}}

belongs to:

H = \{W |W \in {[0, 1]}^{M}, I^{'} W = 1\},

where

M = M_{1} + M_{2}

and I denote a

M \times 1

vector having all elements equal to one.

In order to reduce the risk of the combination-estimation method proposed above, we need to select a suitable weight vector. To do that, in this subsection, we propose two versions, MMA-GLSMA and HRCp-GLSMA, with infeasible criteria. In the next subsection, we provide their feasible counterparts for the situation with unknown

Ω

values.

HRCp-GLSMA: The first information criterion for selecting a weight vector is defined as:

\begin{matrix} {\bar{C}}_{n} (W) & = & s_{1}^{2} \{{∥Y - {\hat{μ}}_{o l s} (W_{1}^{*})∥}^{2} + 2 t r (P (W_{1}^{*}) Ω)\} \end{matrix}

(1)

\begin{matrix} + s_{2}^{2} \{{∥Y - {\hat{μ}}_{g l s} (W_{2}^{*})∥}^{2} + 2 t r (G (W_{2}^{*}) Ω)\} \end{matrix}

(2)

\begin{matrix} + 2 s_{1} s_{2} \{{(Y - {\hat{μ}}_{o l s} (W_{1}^{*}))}^{^{'}} (Y - {\hat{μ}}_{g l s} (W_{2}^{*})) \end{matrix}

(3)

\begin{matrix} + t r (P (W_{1}^{*}) Ω) + t r (G (W_{2}^{*}) Ω) - e^{^{'}} e\}, \end{matrix}

(4)

where

s_{1} \equiv \sum_{m = 1}^{M_{1}} w_{m}^{o l s}

,

s_{2} \equiv \sum_{m = 1}^{M_{2}} w_{m}^{g l s}

,

W_{1}^{*} \equiv W_{1} / s_{1}

,

W_{2}^{*} \equiv W_{2} / s_{2}

,

P (W_{1}^{*}) \equiv \sum_{m = 1}^{M_{1}} w_{m}^{o l s} P_{m}

and

G (W_{2}^{*}) \equiv \sum_{m = 1}^{M_{2}} w_{m}^{g l s} G_{m}

.

Note that:

\begin{matrix} {\bar{C}}_{n} (W) & = s_{1}^{2} H R C_{p} (W_{1}^{*}) + s_{2}^{2} C_{I_{n}} (W_{2}^{*}) \\ + 2 s_{1} s_{2} \{{(Y - {\hat{μ}}_{o l s} (W_{1}^{*}))}^{^{'}} (Y - {\hat{μ}}_{g l s} (W_{2}^{*})) \\ + t r (P (W_{1}^{*}) Ω) + t r (G (W_{2}^{*}) Ω) - e^{^{'}} e\}, \end{matrix}

where

H R C_{p} (W_{1}^{*}) = {∥Y - {\hat{μ}}_{o l s} (W_{1}^{*})∥}^{2} + 2 t r (P (W_{1}^{*}) Ω)

is the HRCp model-averaging criterion proposed by Liu and Okui (2013) with the weight vector

W_{1}^{*}

and

C_{I_{n}} (W_{2}^{*}) = {∥Y - {\hat{μ}}_{g l s} (W_{2}^{*})∥}^{2} + 2 t r (G (W_{2}^{*}) Ω)

is a GLSMA information criterion proposed by Liu et al. (2016) with the weight vector

W_{2}^{*}

.

\bar{C}

can be regarded as a combination of the HRCp and the GLSMA. Hence, we call

\bar{C}

the HRCp-GLSMA-type criterion.

MMA-GLSMA: Second, we propose an MMA-GLSMA-type criterion for weight selection. The infeasible MMA-GLSMA-type criterion is defined as follows:

\begin{matrix} {\tilde{C}}_{n} (W) & = & s_{1}^{2} C_{M M A} (W_{1}^{*}) + s_{2}^{2} C_{I_{n}} (W_{2}^{*}) \\ + 2 s_{1} s_{2} \{{(Y - {\hat{μ}}_{o l s} (W_{1}^{*}))}^{^{'}} (Y - {\hat{μ}}_{g l s} (W_{2}^{*})) \\ + σ^{2} t r (P (W_{1}^{*})) + t r (G (W_{2}^{*}) Ω) - e^{^{'}} e\}, \end{matrix}

where

C_{M M A} (W_{1}^{*}) = {∥Y - {\hat{μ}}_{o l s} (W_{1}^{*})∥}^{2} + 2 σ^{2} t r (P (W_{1}^{*}))

is the MMA criterion proposed by Hansen (2007) with the weight vector

W_{1}^{*}

.

Suppose the variance-covariance matrix

Ω

is known; we can then choose the weight by minimizing the criteria

{\bar{C}}_{n}

or

{\tilde{C}}_{n}

, as follows:

\bar{W} = arg min_{W \in H} {\bar{C}}_{n}

or

\tilde{W} = arg min_{W \in H} {\tilde{C}}_{n} .

However, since the variance-covariance matrix

Ω

is unknown,

\bar{W}

and

\tilde{W}

are infeasible.

2.2. Feasible Combination Estimator and Information Criteria

For a situation with unknown variance, a feasible combination estimator can be constructed using feasible GLS (FGLS) estimators. FGLS estimators and the feasible combination estimator are defined below:

\begin{matrix} {\hat{μ}}^{F} (W) & = \sum_{m = 1}^{M_{1}} w_{m}^{o l s} {\hat{μ}}_{m}^{o l s} + \sum_{m = 1}^{M_{2}} w_{m}^{g l s} {\hat{μ}}_{m}^{f g l s}, \\ \equiv {\hat{μ}}_{o l s} (W_{1}) + {\hat{μ}}_{f g l s} (W_{2}) \end{matrix}

where the FGLS estimator is

{\hat{μ}}_{m}^{f g l s} \equiv X_{m} {(X_{m}^{'} {\hat{Ω}}^{- 1} X_{m})}^{- 1} X_{m}^{'} {\hat{Ω}}^{- 1} Y

. Therein, the estimator

\hat{Ω}

is based on the k-NN estimator

{\overset{ˇ}{σ_{i}}}^{2}

of Liu et al. (2016).

We propose two feasible counterparts of

{\bar{C}}_{n}

and

{\tilde{C}}_{n}

. The feasible HRCp-GLSMA-type criterion

{\bar{C}}_{n}^{F}

is defined as:

\begin{matrix} \begin{matrix} {\bar{C}}_{n}^{F} (W) & = & s_{1}^{2} \{{∥Y - {\hat{μ}}_{o l s} (W_{1}^{*})∥}^{2} + 2 \sum_{i = 1}^{n} {\hat{e}}_{i}^{2} p_{i i} (W_{1}^{*})\} \\ + s_{2}^{2} \{{∥Y - {\hat{μ}}_{f g l s} (W_{2}^{*})∥}^{2} + 2 t r (G^{F} (W_{2}^{*}) \hat{Ω})\} \\ + 2 s_{1} s_{2} \{{(Y - {\hat{μ}}_{o l s} (W_{1}^{*}))}^{^{'}} (Y - {\hat{μ}}_{f g l s} (W_{2}^{*})) \\ + t r (P (W_{1}^{*}) \hat{Ω}) + t r (G^{F} (W_{2}^{*}) \hat{Ω}) - {\hat{e}}^{^{'}} \hat{e}\}, \end{matrix} \end{matrix}

(5)

and the feasible MMA-GLSMA-type criterion

{\tilde{C}}_{n}^{F}

is defined as:

\begin{matrix} \begin{matrix} {\tilde{C}}_{n}^{F} (W) & = & s_{1}^{2} \{{∥Y - {\hat{μ}}_{o l s} (W_{1}^{*})∥}^{2} + 2 {\hat{σ}}^{2} t r (P (W_{1}^{*}))\} \\ + s_{2}^{2} \{{∥Y - {\hat{μ}}_{f g l s} (W_{2}^{*})∥}^{2} + 2 t r (G^{F} (W_{2}^{*}) \hat{Ω})\} \\ + 2 s_{1} s_{2} \{{(Y - {\hat{μ}}_{o l s} (W_{1}^{*}))}^{^{'}} (Y - {\hat{μ}}_{f g l s} (W_{2}^{*})) \\ + {\hat{σ}}^{2} t r (P (W_{1}^{*})) + t r (G^{F} (W_{2}^{*}) \hat{Ω}) - {\hat{e}}^{^{'}} \hat{e}\}, \end{matrix} \end{matrix}

(6)

where

{\hat{e}}_{i}

denotes the

i^{th}

element of

\hat{e} \equiv \sqrt{n / (n - k_{L})}

(I - P_{L}) Y

with

k_{L}

denoting the number of independent variables in the largest OLS model:

G^{F} (W_{2}^{*}) = \sum_{m = 1}^{M_{2}} w_{m}^{g l s} X_{m} {(X_{m}^{'} \hat{Ω} X_{m})}^{- 1} X_{m}^{'} {\hat{Ω}}^{- 1} .

P_{L}

denotes the projection matrix of that model, and

p_{i i}

denotes the

i^{th}

diagonal element of

P (W_{1}^{*})

.

\tilde{e} \equiv (I - P_{L}) Y

, while

{\hat{σ}}^{2} \equiv {(n - k_{L})}^{- 1} {\tilde{e}}^{^{'}} \tilde{e}

is defined as one of the estimators suggested by Hansen (2007).

\hat{Ω}

is calculated by plugging in the k-NN estimator

{\overset{ˇ}{σ_{i}}}^{2}

used in Liu et al. (2016).

3. Properties of the Criteria

The following lemma demonstrates the significant fact that the two infeasible criteria proposed above are unbiased estimates of the risk function

R (W) \equiv E (L (W))

plus a constant;

L (W) \equiv {∥μ - \hat{μ} (W)∥}^{2}

is the loss function.

Lemma 1.

For any real-valued vector W,

E ({\bar{C}}_{n} (W)) = R (W) + c_{1}

and

E ({\tilde{C}}_{n} (W)) = R (W) + c_{2}

, where

c_{1}

and

c_{2}

are constants.

Another useful property is that all criteria are asymptotically optimal in the sense of Li (1987). Proofs of the asymptotic optimality for all of the above-mentioned criteria can be performed by extending the proofs of Hansen (2007); Liu and Okui (2013); Liu et al. (2016). As an example, we demonstrate the optimality of the feasible MMA-GLSMA case method with the nonparametric estimator of

Ω

used in Liu et al. (2016).

To do that, we define the feasible loss function and the risk function as follows:

\begin{matrix} L^{F} (W) & \equiv {∥μ - {\hat{μ}}^{F} (W)∥}^{2} \\ = {({\hat{μ}}^{F} (W) - μ)}^{'} ({\hat{μ}}^{F} (W) - μ) . \end{matrix}

We employ some notations and assumptions from Liu et al. (2016), reproduced in the Appendix, and add the following additional assumption:

Assumption 1.

As

n \to \infty

and

k_{l} \to \infty

,

(k_{l} + k_{l} \sqrt{\sum_{j = 1}^{n} b_{l j}^{2}}) / ξ_{n} \to 0

, where

ξ_{n} \equiv {inf}_{W \in H_{n} (N)} R (W)

,

k_{l}

is the number of regressors used in the regression model for the k-NN estimator and

b_{l i}

denotes the approximation error of that model.

Assumption 1’ guarantees that when the number of regressors used in the regression model (adjusted with the approximation errors of that model) increases with the sample size n, it increases at a rate slower than the lower bound of the risk across all possible weights. In practice, this assumption requires us to moderate the increase in the number of regressors

k_{l}

(relative to the sample size) to reduce the approximation errors.

Following Hansen (2007), we restrict the elements of the weighting vector to belong to set

H_{M} (N) \equiv {0, 1 / N, 2 / N, \dots, 1}

for some integer

N < \infty

.

Theorem 1.

Under Assumption 1’ and Assumptions 1–3, 6, and 10–14 of Liu et al. (2016), as

n \to \infty

,

k_{l} \to \infty

,

κ \to \infty

and

k_{L} / \sqrt{n} \to 0

, we have:

\begin{matrix} \frac{L^{F} (\tilde{W})}{{inf}_{W \in H_{M} (N)} L (W)} \to_{p} 1, \end{matrix}

where

\tilde{W} = arg {min}_{W \in H_{M} (N)} {\tilde{C}}_{n}^{F} (W)

.

In other words, as the sample size increases, our method can achieve the infimum of the loss.

4. Simulation Study

To investigate the finite sample performance of the proposed MMA-GLSMA and HRCp-GLSMA versions and to compare them with other alternative methods, we performed a Monte Carlo simulation. Alternative methods include MMA,

H R C_{p}

and GLSMA with

C_{I_{n}}^{F}

, where

C_{I_{n}}^{F}

is the feasible counterpart of

C_{I_{n}}

proposed in Liu et al. (2016).

The data-generating process (DGP) is:

\begin{matrix} y_{i} & = & μ_{i} + e_{i}, \end{matrix}

where:

\begin{matrix} μ_{i} & = & \sum_{j = 1}^{10, 000} θ_{j} x_{i j} \end{matrix}

for

i = 1, \dots, n

, with

n = 50

. We used parameter

θ_{j} = c j^{- 1}

with j truncated at 10,000 and a positive constant c.

x_{i 1} = 1

and

x_{i j} \sim N (0, 1)

for

j \neq 1

are independent with respect to i. We conducted three simulations. The first case was a simulation with a homoscedastic error term

e_{i} \sim N (0, 1)

. The second case was a mild heteroscedastic case with heteroscedastic error term

e_{i} \sim N (0, σ_{i}^{2})

, with

σ_{i} = |x_{i 2}|

. The third case was a strong heteroscedastic case with

σ_{i} = x_{i 2}^{2}

. In all cases,

e_{i}

was independent with respect to i. The number of regressors in the largest approximation model or the number of nested models was

M = 10

. The

m^{th}

model contained the first m regressors, including the constant term. We varied c to change the

R^{2}

of the DGP from

0.1

–

0.9

with an increment of

0.1

.

We considered two cases of GLSMA with different estimation methods of

σ_{i}

, one based on the maximum likelihood estimation (MLE), and the other based on the nonparametric method k-NN. For details of these two cases, see Liu et al. (2016). Because the true specification of

σ_{i}

is usually unknown in practice, we misspecified

σ_{i}

for GLSMA. For the MLE-based method, we set

σ_{i}^{2} = a + b x_{i 3}^{4}

, where a and b are the unknown parameters to be estimated. For the nonparametric case, we only used

x_{i 3}

and

x_{i 4}

for k-NN estimation.

The number of replications for all simulations was 1000. We evaluated the performance of each method by the sample mean squared error (MSE)

= 1 / 1000 \sum_{k = 1}^{1000} {∥{\hat{μ}}_{(k)} - μ_{(k)}∥}^{2}

, where

{\hat{μ}}_{(k)}

and

μ_{(k)}

are the realized vector of the estimated value

\hat{μ}

and true value

μ

in the

k^{th}

replication, respectively. The simulation results are shown in Table 1, Table 2 and Table 3. Figure 1 presents the same information relative to the MSE of the GLSMA method.

The results in Table 1 and Table 2 show that our combination methods, MMA-GLSMA and HRCp- GLSMA, performed better than the alternatives (GLSMA, HRCp and MMA) when the error term was homoscedastic or had mild heteroscedasticity for

R^{2} \leq 0.7

. When

R^{2} \geq 0.8

, the performance of our methods was slightly worse than that of the alternative methods. For the homoscedastic case, the three alternative methods performed similarly. On the other hand, in the case of mild heteroscedasticity, GLSMA and

H R C_{p}

performed better than MMA (which was expected, as MMA was designed for homoscedastic models).

Table 3 demonstrates that, when the heteroscedasticity of the error term was considerably strong, our combination method HRCp-GLSMA worked much better than the others when the MLE-based estimation of

σ_{i}

was used. However, MMA-GLSMA and HRCp-GLSMA became worse than GLSMA when

σ_{i}

was estimated using the nonparametric method.

Moreover, for most cases, GLSMA with the nonparametric estimator of

σ_{i}

outperformed GLSMA with the MLE-based estimator of

σ_{i}

. This can be explained by a characteristic of the k-NN method that we adopted. In the k-NN estimation, a large weight was placed on the

i^{th}

squared residual to estimate

σ_{i}^{2}

; therefore, even though

σ_{i}^{2}

was misspecified, the estimate could catch the heteroscedasticity of the error term to some extent.

The aforementioned simulation results gave us the following indications for practical analysis. If we know that the heteroscedasticity of the data is considerably strong or the population

R^{2}

is large, we should use the nonparametric GLSMA. Otherwise, it is preferable to choose an MMA-GLSMA or HRCp-GLSMA combination.

Table 4 and Table 5 give the averages of the estimated weights corresponding to the OLS and GLS parts for HRCp-GLSMA and MMA-GLSMA. Table 4 shows that for HRCp-GLSMA, when the heteroscedasticity became stronger, the average weights corresponding to the GLS part increased and the average weights corresponding to the OLS estimates decreased. Table 5 does not show a similar trend for MMA-GLSMA. This might be an explanation for why the performance of MMA-GLSMA was worse than that of HRCp-GLSMA.

5. Conclusions

In this paper, we proposed a combination method of OLS and GLS estimators. The proposed method reduced the risk of misspecification between homoscedastic and heteroscedastic models. The optimality of the criteria for choosing a weight vector was proven under some regularity conditions. We performed simulation experiments to investigate the finite sample property of our combination method. The results of the simulations demonstrate that our method was adaptive for homoscedasticity and heteroscedasticity. As mentioned previously, the proposed method was novel in that it combined estimators from two different estimation methods. Combining the different estimation methods can also combine the advantages of each. This idea could be useful and should be extended to the combination of other estimation methods.

Author Contributions

This is a collaborative project. The authors are listed in alphabetical order. The proofs are done by Q. Liu.

Funding

Q. Liu acknowledges financial support from the JSPS Grant-in-Aid for Young Scientists (B) No. 25780148 and JSPS KAKENHI Grant (C) No. JP16K03590 and No. JP19K01582.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

For the convenience of the readers of this journal, we list Assumptions 1–3, 6 and 10–14 and replicate Lemma 7 of Liu et al. (2016) here. Their notation of

R_{I_{n}} (W)

coincides with our notation

R (W)

;

x_{m i}

denotes the

i^{th}

observation vector of the regressors of the

m^{th}

model, and

x_{m, j, i}

is the

j^{th}

entry of

x_{m i}

.

Assumption 1.

E (| e_{i} |^{4 (N + 1)}) \leq c < \infty

for some c.

Assumption 2.

ξ_{n} \equiv {inf}_{W \in H_{n} (N)} R_{I_{n}} (W) \to \infty

as

n \to \infty

.

Assumption 3.

0 < {inf}_{i} σ_{i}^{2} \leq {sup}_{i} σ_{i}^{2} < \infty

.

Assumption 6.

The maximum eigenvalue of

\sum_{i = 1}^{n} x_{m i} x_{m i}^{'} / n

is bounded uniformly in n and m. There exists a

c > 0

such that the minimum eigenvalue of

\sum_{i = 1}^{n} x_{m i} x_{m i}^{'} / n

is greater than c for any n and m.

Assume that

σ_{i}^{2}

is a function of a finite subset of

x_{i}

, denoted by

z_{i}

, so that

σ_{i}^{2} = σ^{2} (z_{i})

.

Assumption 10.

σ^{2} (\cdot)

is differentiable. Let

σ^{'} (\cdot)

denote its first derivative. Then,

{sup}_{z} ∥ σ^{'} (z) ∥ < \infty

. Moreover, the support of

z_{i}

is bounded, and the density of

z_{i}

is bounded from below.

Let

κ

be the tuning parameter for the k-NN estimator. Let

c_{i}

,

1 \leq i \leq n

be such that

c_{i} > 0

for

1 \leq i \leq κ

,

c_{i} = 0

for

i > κ

and

\sum_{i = 1}^{κ} c_{i} = 1

.

Assumption 11.

{lim^{¯}}_{n \to \infty} {max}_{1 \leq i \leq κ} κ c_{i} < \infty

.

Assumption 12.

{lim}_{n \to \infty} \sum_{i = 1}^{n} x_{m, j, i}^{4} / n

is bounded uniformly in m and j.

Assumption 13.

There exists a

C < \infty

such that

{lim}_{n \to \infty} \sum_{i = 1}^{n} μ_{i}^{4} / n < C

.

Assumption 14.

Define

ν = 4 (N + 1)

and:

\begin{matrix} A_{n} = \frac{n^{2 / ν}}{\sqrt{κ}} + \frac{k_{l}}{\sqrt{κ}} + \frac{k_{l} \sqrt{\sum_{j = 1}^{n} b_{l j}^{2}}}{\sqrt{κ}} + \frac{κ^{1 / q}}{n^{1 / q}} . \end{matrix}

As

n \to \infty

,

k_{l} \to \infty

and

κ \to \infty

, it is the case that

k_{M} A_{n} \to 0

,

k_{M}^{2} A_{n} / ξ_{n} \to 0

,

n k_{M} A_{n}^{2} / ξ_{n} \to 0

and

\sqrt{n} k_{M} A_{n} / ξ_{n} \to 0

.

Lemma 7

of Liu et al. (2016).Suppose that Assumptions 1, 3, 5, 6 and 11 of Liu, Okui and Yoshimura (2016) hold. Then, as

n \to \infty

,

k_{l} \to \infty

and

κ \to \infty

, with

k_{l}^{2} / κ \to 0

and

k_{l}^{2} \sum_{j = 1}^{n} b_{l j}^{2} / κ \to 0

, we have:

\begin{matrix} max_{i} | {\overset{ˇ}{σ}}_{i}^{2} - {\tilde{σ}}_{i}^{2} | = O_{p} (\frac{k_{l}}{\sqrt{κ}} (1 + \sqrt{\sum_{j = 1}^{n} b_{l j}^{2}})) . \end{matrix}

Proof of Lemma 1.

We have:

\begin{matrix} R (W) & = E {∥μ - \hat{μ} (W)∥}^{2} \\ = E {∥s_{1} μ - s_{1} {\hat{μ}}_{o l s} (W_{1}^{*}) + s_{2} μ - s_{2} {\hat{μ}}_{g l s} (W_{2}^{*})∥}^{2} \end{matrix}

(A1)

\begin{matrix} = E [s_{1}^{2} {∥μ - {\hat{μ}}_{o l s} (W_{1}^{*})∥}^{2}] \end{matrix}

(A2)

\begin{matrix} + E [2 s_{1} s_{2} {(μ - {\hat{μ}}_{o l s} (W_{1}^{*}))}^{^{'}} (μ - {\hat{μ}}_{g l s} (W_{2}^{*}))] \end{matrix}

(A3)

\begin{matrix} + E [s_{2}^{2} {∥μ - {\hat{μ}}_{g l s} (W_{2}^{*})∥}^{2}] . \end{matrix}

(A4)

According to Liu and Okui (2013); Liu et al. (2016), it is obvious that the squared terms (1) and (2) are unbiased estimates of the terms (A1) and (A3) plus a constant.

In order to estimate the cross-product term (A2), we can observe that:

\begin{matrix} {(Y - {\hat{μ}}_{o l s} (W_{1}^{*}))}^{^{'}} (Y - {\hat{μ}}_{g l s} (W_{2}^{*})) = & {(μ - {\hat{μ}}_{o l s} (W_{1}^{*}))}^{^{'}} (μ - {\hat{μ}}_{g l s} (W_{2}^{*})) + e^{^{'}} e \\ + 〈e, (I - P (W_{1}^{*})) μ〉 + 〈e, (I - G (W_{2}^{*})) μ〉 \\ - e^{^{'}} P (W_{1}^{*}) e - e^{^{'}} G (W_{2}^{*}) e \end{matrix}

and

E 〈e, (I - P (W_{1}^{*})) μ〉 = E 〈e, (I - G (W^{*}_{2})) μ〉 = 0

. Therefore, the sum of the terms (3) and (4) unbiasedly estimates the cross-product term (A2). □

Proof of Theorem 1.

We define

L_{o l s} (W) \equiv {(μ - {\hat{μ}}_{o l s} (W))}^{^{'}} (μ - {\hat{μ}}_{o l s} (W))

and

L_{f g l s} (W) \equiv {(μ - {\hat{μ}}_{f g l s} (W))}^{^{'}} (μ - {\hat{μ}}_{f g l s} (W))

. Note that:

\begin{matrix} {\tilde{C}}_{n}^{F} (W) - L^{F} (W) & = s_{1}^{2} [C_{M M A}^{F} (W_{1}^{*}) - L_{o l s} (W_{1}^{*})] + s_{2}^{2} [C_{I_{n}}^{F} (W_{2}^{*}) - L_{f g l s} (W_{2}^{*})] \\ + 2 s_{1} s_{2} \{〈e, (I - P (W_{1}^{*})) μ〉 + 〈e, (I - G^{F} (W_{2}^{*})) μ〉 \\ + {\hat{σ}}^{2} t r (P (W_{1}^{*})) + t r (G^{F} (W_{2}^{*}) \hat{Ω}) - e^{^{'}} P (W_{1}^{*}) e - e^{^{'}} G^{F} (W_{2}^{*}) e \\ + e^{^{'}} e - {\hat{e}}^{^{'}} \hat{e}\}, \end{matrix}

where

C_{M M A}^{F} = C_{M M A} (W_{1}^{*}) = {∥Y - {\hat{μ}}_{o l s} (W_{1}^{*})∥}^{2} + 2 {\hat{σ}}^{2} t r (P (W_{1}^{*}))

and

C_{I_{n}}^{F} (W_{2}^{*}) = {∥Y - {\hat{μ}}_{g l s} (W_{2}^{*})∥}^{2} + 2 t r (G^{F} (W_{2}^{*}) \hat{Ω})

.

Using the results of Hansen (2007); Liu et al. (2016), it can be shown that the supremum with respect to

W \in H_{M} (N)

of all the absolute values of the terms except

e^{^{'}} e - {\hat{e}}^{^{'}} \hat{e}

divided by risk function

R (W)

go to zero in probability. We only need to show:

sup_{W} |\frac{e^{^{'}} e - {\hat{e}}^{^{'}} \hat{e}}{R (W)}| \to_{p} 0 .

This can easily be done by modifying Lemma 7 of Liu et al. (2016). By replacing

s_{i j}

, the weight defined for k-NN in Liu et al. (2016) with one, we have:

\begin{matrix} sup_{W} |\frac{e^{^{'}} e - {\hat{e}}^{^{'}} \hat{e}}{R (W)}| & \leq |e^{^{'}} e - {\tilde{e}}^{^{'}} \tilde{e}| / ξ_{n} + (k_{L} / (n - k_{L})) {\tilde{e}}^{^{'}} \tilde{e} / ξ_{n} \\ = O_{p} (k_{l} + k_{l} \sqrt{\sum_{j = 1}^{n} b_{l j}^{2}}) / ξ_{n} + o_{p} (1) \\ \to 0 . \end{matrix}

The proof is complete. □

References

Danilov, Dmitry, and Jan R. Magnus. 2004. On The Harm That Ignoring Pretesting Can Cause. Journal of Econometrics 122: 27–46. [Google Scholar] [CrossRef]
Hansen, Bruce E. 2007. Least Squares Model Averaging. Econometrica 75: 1175–89. [Google Scholar] [CrossRef]
Hansen, Bruce E., and Jeffrey S. Racine. 2012. Jackknife Model Averaging. Journal of Econometrics 167: 38–46. [Google Scholar] [CrossRef]
Li, Ker-Chau. 1987. Asymptotic Optimality for C_p, C_L, Cross-Validation and Generalized Cross-Validation: Discrete Index Set. Annals of Statistics 15: 958–75. [Google Scholar] [CrossRef]
Liu, Qingfeng, and Ryo Okui. 2013. Heteroscedasticity-Robust C_p Model Averaging. Econometrics Journal 16: 463–72. [Google Scholar] [CrossRef]
Liu, Qingfeng, Ryo Okui, and Arihiro Yoshimura. 2016. Generalized Least Squares Model Averaging. Econometric Reviews 35: 1692–752. [Google Scholar] [CrossRef]
Makridakis, Spyros, Evangelos Spiliotis, and Vassilios Assimakopoulos. 2018. The M4 Competition: Results, Findings, Conclusion and Way Forward. International Journal of Forecasting 34: 802–8. [Google Scholar] [CrossRef]
Wan, Alan TK, Xinyu Zhang, and Guohua Zou. 2010. Least Squares Model Averaging by Mallows Criterion. Journal of Econometrics 156: 277–83. [Google Scholar] [CrossRef]
Xie, Tian. 2017. Heteroscedasticity-Robust Model Screening: A Useful Toolkit for Model Averaging in Big Data Analytics. Economics Letters 151: 119–22. [Google Scholar] [CrossRef]
Yuan, Zheng, and Yuhong Yang. 2005. Combining Linear Regression Models: When and How? Journal of the American Statistical Association 100: 1202–14. [Google Scholar] [CrossRef]
Zhang, Xinyu, Alan TK Wan, and Guohua Zou. 2013. Model Averaging by Jackknife Criterion in Models with Dependent Data. Journal of Econometrics 174: 82–94. [Google Scholar] [CrossRef]
Zhang, Xinyu, Guohua Zou, and Raymond J. Carroll. 2015. Model Averaging Based on Kullback-Leibler Distance. Statistica Sinica 25: 1583–98. [Google Scholar] [CrossRef] [PubMed]
Zhao, Shangwei, Aman Ullah, and Xinyu Zhang. 2018. A Class of Model Averaging Estimators. Economics Letters 162: 101–6. [Google Scholar] [CrossRef]

Figure 1. Performance of the methods relative to generalized least-squares model averaging (GLSMA): Heteroscedasticity-robust Cp (HRCp), Mallows model averaging (MMA), MMA-GLSMA and HRCp-GLSMA.

Table 1. Sample mean squared error (MSE) for the case with homoscedastic error.

	$R^{2}$	GLSMA	HRCp	MMA	MMA-GLSMA	HRCp-GLSMA
	$0.1$	6.80	6.55	6.50	6.16	6.18
	$0.2$	9.50	9.34	9.34	8.81	8.83
	$0.3$	12.20	12.09	12.14	11.56	11.56
	$0.4$	15.25	15.16	15.25	14.63	14.60
Nonpara.	$0.5$	19.06	18.97	19.08	18.52	18.47
	$0.6$	24.33	24.21	24.34	23.92	23.87
	$0.7$	32.64	32.48	32.61	32.48	32.39
	$0.8$	48.66	48.33	48.51	49.18	49.19
	$0.9$	95.66	94.88	95.07	98.76	99.08
	$0.1$	6.56	6.55	6.50	6.04	6.05
	$0.2$	9.42	9.34	9.34	8.83	8.82
	$0.3$	12.22	12.09	12.14	11.61	11.60
	$0.4$	15.31	15.16	15.25	14.74	14.67
MLE	$0.5$	19.14	18.97	19.08	18.63	18.56
	$0.6$	24.44	24.21	24.34	24.00	23.95
	$0.7$	32.74	32.48	32.61	32.50	32.43
	$0.8$	48.72	48.33	48.51	49.13	49.04
	$0.9$	95.42	94.88	95.07	98.70	98.56

Table 2. Sample MSE for the case with mild heteroscedastic error.

	$R^{2}$	GLSMA	HRCp	MMA	MMA-GLSMA	HRCp-GLSMA
	$0.1$	6.43	6.51	6.97	6.18	6.00
	$0.2$	8.62	9.13	9.37	8.32	8.18
	$0.3$	10.77	11.58	11.74	10.48	10.36
	$0.4$	13.02	14.07	14.21	12.80	12.77
Nonpara.	$0.5$	15.76	16.94	17.11	15.65	15.61
	$0.6$	19.45	20.66	20.86	19.45	19.36
	$0.7$	25.20	26.34	26.56	25.29	25.28
	$0.8$	36.05	37.04	37.26	36.50	36.44
	$0.9$	67.46	67.87	68.10	69.53	69.34
	$0.1$	6.80	6.51	6.97	6.47	6.30
	$0.2$	9.23	9.13	9.37	8.73	8.58
	$0.3$	11.59	11.58	11.74	11.02	10.90
	$0.4$	14.06	14.07	14.21	13.48	13.39
MLE	$0.5$	16.95	16.94	17.11	16.41	16.32
	$0.6$	20.71	20.66	20.86	20.25	20.14
	$0.7$	26.43	26.34	26.56	26.04	25.96
	$0.8$	37.21	37.04	37.26	37.13	37.07
	$0.9$	68.36	67.87	68.10	69.86	69.79

Table 3. Sample MSE for the case with strong heteroscedastic error.

	$R^{2}$	GLSMA	HRCp	MMA	MMA-GLSMA	HRCp-GLSMA
	$0.1$	13.44	14.88	20.07	15.52	13.70
	$0.2$	16.13	18.68	23.17	18.05	16.30
	$0.3$	18.92	22.74	26.47	20.80	19.17
	$0.4$	22.02	27.11	30.24	24.06	22.69
Nonpara.	$0.5$	25.91	32.18	34.74	27.96	26.73
	$0.6$	30.89	38.52	40.48	33.13	32.21
	$0.7$	38.28	47.03	48.46	40.62	39.94
	$0.8$	51.15	60.68	61.92	53.80	53.37
	$0.9$	85.65	94.97	96.06	89.21	89.15
	$0.1$	17.02	14.88	20.07	17.52	15.86
	$0.2$	20.15	18.68	23.17	20.23	18.64
	$0.3$	23.52	22.74	26.47	23.40	21.94
	$0.4$	27.40	27.11	30.24	26.99	25.59
MLE	$0.5$	32.01	32.18	34.74	31.40	30.21
	$0.6$	37.90	38.52	40.48	37.02	36.00
	$0.7$	46.35	47.03	48.46	45.13	44.29
	$0.8$	59.56	60.68	61.92	58.65	57.93
	$0.9$	93.86	94.97	96.06	93.65	93.33

Table 4. Averages of the OLS (

{\bar{W}}_{1}

) and GLS (

{\bar{W}}_{2}

) parts of the weight vector for HRCp-GLSMA.

Table 4. Averages of the OLS (

{\bar{W}}_{1}

) and GLS (

{\bar{W}}_{2}

) parts of the weight vector for HRCp-GLSMA.

$R^{2}$	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
Homoscedastic Cases
Nonp.OLS ( ${\bar{W}}_{1}$ )	0.50	0.50	0.50	0.50	0.49	0.49	0.49	0.49	0.49
Nonp. GLS ( ${\bar{W}}_{2}$ )	0.50	0.50	0.50	0.50	0.51	0.51	0.51	0.51	0.51
MLE OLS ( ${\bar{W}}_{1}$ )	0.50	0.50	0.50	0.50	0.50	0.50	0.50	0.50	0.50
MLE GLS ( ${\bar{W}}_{2}$ )	0.50	0.50	0.50	0.50	0.50	0.50	0.50	0.50	0.50
Mild Heteroscedastic Cases
Nonp. OLS ( ${\bar{W}}_{1}$ )	0.49	0.49	0.49	0.49	0.49	0.49	0.49	0.49	0.49
Nonp. GLS ( ${\bar{W}}_{2}$ )	0.51	0.51	0.51	0.51	0.51	0.51	0.51	0.51	0.51
MLE OLS ( ${\bar{W}}_{1}$ )	0.49	0.49	0.49	0.49	0.49	0.49	0.50	0.50	0.50
MLE GLS ( ${\bar{W}}_{2}$ )	0.51	0.51	0.51	0.51	0.51	0.51	0.50	0.50	0.50
Strong Heteroscedastic Cases
Nonp. OLS ( ${\bar{W}}_{1}$ )	0.48	0.48	0.48	0.48	0.48	0.48	0.48	0.48	0.48
Nonp. GLS ( ${\bar{W}}_{2}$ )	0.52	0.52	0.52	0.52	0.52	0.52	0.52	0.52	0.52
MLE OLS ( ${\bar{W}}_{1}$ )	0.49	0.49	0.49	0.49	0.49	0.49	0.49	0.49	0.49
MLE GLS ( ${\bar{W}}_{2}$ )	0.51	0.51	0.51	0.51	0.51	0.51	0.51	0.51	0.51

Table 5. Averages of the OLS (

{\bar{W}}_{1}

) and GLS (

{\bar{W}}_{2}

) parts of the weight vector for MMA-GLSMA.

Table 5. Averages of the OLS (

{\bar{W}}_{1}

) and GLS (

{\bar{W}}_{2}

) parts of the weight vector for MMA-GLSMA.

$R^{2}$	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
Homoscedastic Cases
Nonp. OLS ( ${\bar{W}}_{1}$ )	0.50	0.49	0.49	0.49	0.49	0.49	0.49	0.49	0.49
Nonp. GLS ( ${\bar{W}}_{2}$ )	0.50	0.51	0.51	0.51	0.51	0.51	0.51	0.51	0.51
MLE OLS ( ${\bar{W}}_{1}$ )	0.50	0.50	0.50	0.50	0.50	0.50	0.50	0.50	0.50
MLE GLS ( ${\bar{W}}_{2}$ )	0.50	0.50	0.50	0.50	0.50	0.50	0.50	0.50	0.50
Mild Heteroscedastic Cases
Nonp. OLS ( ${\bar{W}}_{1}$ )	0.50	0.49	0.49	0.49	0.49	0.49	0.49	0.49	0.49
Nonp. GLS ( ${\bar{W}}_{2}$ )	0.50	0.51	0.51	0.51	0.51	0.51	0.51	0.51	0.51
MLE OLS ( ${\bar{W}}_{1}$ )	0.50	0.50	0.50	0.50	0.50	0.50	0.50	0.50	0.50
MLE GLS ( ${\bar{W}}_{2}$ )	0.50	0.50	0.50	0.50	0.50	0.50	0.50	0.50	0.50
Strong Heteroscedastic Cases
Nonp. OLS ( ${\bar{W}}_{1}$ )	0.50	0.50	0.49	0.49	0.49	0.49	0.49	0.49	0.49
Nonp. GLS ( ${\bar{W}}_{2}$ )	0.50	0.50	0.51	0.51	0.51	0.51	0.51	0.51	0.51
MLE OLS ( ${\bar{W}}_{1}$ )	0.50	0.50	0.50	0.50	0.50	0.50	0.50	0.50	0.50
MLE GLS ( ${\bar{W}}_{2}$ )	0.50	0.50	0.50	0.50	0.50	0.50	0.50	0.50	0.50

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Q.; Vasnev, A.L. A Combination Method for Averaging OLS and GLS Estimators. Econometrics 2019, 7, 38. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics7030038

AMA Style

Liu Q, Vasnev AL. A Combination Method for Averaging OLS and GLS Estimators. Econometrics. 2019; 7(3):38. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics7030038

Chicago/Turabian Style

Liu, Qingfeng, and Andrey L. Vasnev. 2019. "A Combination Method for Averaging OLS and GLS Estimators" Econometrics 7, no. 3: 38. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics7030038

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Combination Method for Averaging OLS and GLS Estimators^†

Abstract

1. Introduction

2. Method

2.1. Infeasible Combination Estimator and Information Criteria

2.2. Feasible Combination Estimator and Information Criteria

3. Properties of the Criteria

4. Simulation Study

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

A Combination Method for Averaging OLS and GLS Estimators †

Abstract

1. Introduction

2. Method

2.1. Infeasible Combination Estimator and Information Criteria

2.2. Feasible Combination Estimator and Information Criteria

3. Properties of the Criteria

4. Simulation Study

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

A Combination Method for Averaging OLS and GLS Estimators^†