Next Article in Journal
Empirical Analysis on the Performance of Rural Credit Cooperative’s Shareholding Reform Based on the Rationale of Isomorphic Incentive Compatibility
Previous Article in Journal
Genome-Wide Analysis of the Trihelix Gene Family and Their Response to Cold Stress in Dendrobium officinale
Previous Article in Special Issue
Effect of Hierarchical Parish System on Portuguese Housing Rents
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Revisiting the Spatial Autoregressive Exponential Model for Counts and Other Nonnegative Variables, with Application to the Knowledge Production Function

ISEG-Lisbon School of Economics and Management, Universidade de Lisboa, 1200-781 Lisboa, Portugal
*
Author to whom correspondence should be addressed.
Sustainability 2021, 13(5), 2843; https://0-doi-org.brum.beds.ac.uk/10.3390/su13052843
Submission received: 29 November 2020 / Revised: 22 February 2021 / Accepted: 3 March 2021 / Published: 5 March 2021
(This article belongs to the Special Issue Spatial Econometrics Analysis of Sustainability)

Abstract

:
This paper proposes a two-step pseudo-maximum likelihood estimator of a spatial autoregressive exponential model for counts and other nonnegative variables; it is particularly useful for dealing with zeros. It considers a model specification allowing us to easily determine the direct and indirect partial effects of explanatory variables (spatial spillovers and externalities). A simulation study shows that this method generally behaves better in terms of bias and root mean square error than existing procedures. An empirical example estimating a knowledge production function for the NUTS II European regions is analyzed. Results show that there is spatial dependence between regions on the creation of innovation, where regions less able to transform R&D expenses into innovation benefit from knowledge spatial spillovers through indirect effects. It is also concluded that the socioeconomic environment is important and that, unlike public R&D institutions, private companies are efficient at knowledge production.

1. Introduction

Many empirical applications with spatial data concern the modeling of counts and other nonnegative response variables. Examples are the modeling of trade flows, migration flows, patenting citation and patent creation, number of crashes, firm location and firm birth, number of new patients contracting a given disease, etc. Conventional practices opt to logarithmically transform the dependent variable in order to apply the well-known spatial linear models. This is the approach followed in [1] to model the interregional trade of goods at the NUTS3 level in Spain, in [2] to explain labor migration flows in China, and in [3] to investigate the effect of intraregional labor mobility in the production of knowledge in Europe, to give just a few examples. However, Silva and Tenreyro [4], in the context of cross-sectional data, note that modeling logarithmically transformed variables with a linear model may lead to bias in estimation when heteroscedasticity is present, or to distortions in parameter estimates caused by the need to add a constant to zero observations. The authors propose using the Poisson pseudo-maximum likelihood (PPML) estimator of the model for the untransformed variables as an alternative to ordinary least squares (OLS) of the loglinear model.
Spatial autoregressive models are popular to address spatial dependence. Elhorst [5] discusses the relevance of such models in recent applied spatial econometrics. One reason is that they quantify indirect spatial spillovers, as is pointed in [6,7]. While linear spatial autoregressive models are widely used in the literature, nonlinear spatial autoregressive models, namely models for counts or other nonnegative variables, are not so popular because of their complexity in estimation and in the derivation of marginal effects and spatial externalities (see, for example, Ref. [8], containing a review of the state of the art in spatial econometrics).
A spatial autoregressive Poisson model for counts was first introduced by Besag [9]. His specification includes the spatial lagged count variable in the exponential function that gives the Poisson conditional mean. However, this approach has a severe limitation in that it accommodates only negative spatial dependence to prevent the count process from being explosive. Attempts to overcome this limitation were made in [10], introducing a Winsorized Poisson, and [11], using a spatial filter that uses judiciously selected eigenvectors as regressors. However, these procedures may be computationally demanding and/or lack interpretability, namely because marginal effects and spatial externalities are hard or impossible to calculate given that the model is not invertible and, consequently, there is no reduced form for the dependent variable as a function of the so-called Leontief inverse or spatial multiplier ( I ρ W ) 1 (with ρ the spatial autocorrelation coefficient and W a spatial weighting matrix).
A promising approach is introduced by [12] to model counts that are Poisson distributed, considering that their conditional mean is a function of the logarithm of the spatially weighted conditional means of the neighbors (instead of the spatial lagged count variable as in [9]), in order to have invertibility, and call this model the SAR Poisson. This specification is easy to estimate, allowing for a simple calculation of marginal effects and spatial spillovers. However, because it involves the calculation of the logarithm of the count outcomes in estimation, it has to deal with observations that are zero.
Other approaches are based on Bayesian hierarchical modeling and Markov random fields, considering conditional autoregressive schemes for spatial errors like in [13,14]. These models are typically estimated by Bayesian Markov Chain Monte Carlo (MCMC), which is computationally very burdensome and does not allow for the calculation of spatial externalities. See [15] for an application of a Poisson Bayesian hierarchical model to patent citation flows, estimated with MCMC methods. A comprehensive review of spatial econometric models for count data is given in [16].
This paper proposes a PPML estimation procedure for spatial autoregressive exponential regression (SAR-E) that circumvents the problem of dealing with observations that are zero. The model specification is based on the idea of [12]. In a simulation study, the estimation procedure introduced in this work shows better properties in finite samples than the existing procedures, especially when the spatial autocorrelation coefficient is not close to one. The determination of partial effects is emphasized and the indirect effects are deduced, which enhances the understanding of regional linkages.
This work suggests applying this approach to model counts (not necessarily Poisson distributed) or, in general, other nonnegative variables, extending the applicability of [4] to spatial data.
To illustrate the usefulness of the approach introduced, an empirical study that estimates a knowledge production function (KPF) and knowledge spillovers across European regions is carried out, modeling the number of new patents per million inhabitants. Despite the existence of many applications that estimate KPFs, to our knowledge, only [17] uses a nonlinear SAR approach, namely a Bayesian spatial Tobit regression estimated by MCMC. The one presented here is the first that estimates a spatial autoregressive exponential model for counts, which circumvents the problem of regions with zero patents and simultaneously determines knowledge externalities across regions.

2. The SAR-E Regression

2.1. Model Specification and Partial Effects

This work proposes to model an outcome that is a count or other nonnegative variable, showing spatial dependence, by the following spatial autoregressive exponential specification of the conditional mean (SAR-E regression), which is based in the spatial lag model of counts of [12]:
E ( y | X ) = μ = exp ( ρ W log ( μ ) + X β ) ,
where y is the vector with observations of the dependent variable for n spatial locations, μ is the vector with a conditional mean of y , X is a matrix with observations of k explanatory variables for n spatial regions, W is a spatial weighting matrix, and β and ρ are unknown coefficients to be estimated. Observe that, according to Equation (1), the conditional mean of one location i , μ i , is determined as a function of the characteristics of location i through the observed values for the explanatory variables and of a weighted average of the conditional mean of neighboring locations.
Equation (1) serves three purposes. Firstly, it expresses the conditional expectation of a nonnegative and, in particular, a count variable. Observe that count variables are often assumed to have a Poisson distribution, whose expected conditional mean is an exponential function of a set of explanatory variables. Secondly, it incorporates the spatial dependence of the data by means of an autoregressive term, extending the well-known SAR or Spatial Lag linear model to the nonlinear context. Finally, it is invertible, which allows us to easily calculate the partial effects of variables and, in particular, analyze global spatial interactions between regions with the identification of spatial spillovers and externalities.
The reduced form of Equation (1) is:
E ( y | X ) = μ = exp [ ( I ρ W ) 1 X β ] .
The partial effects of the explanatory variables are deduced from Equation (2) leading to,
μ x k = [ μ 1 x 1 k μ 1 x n k μ n x 1 k μ n x n k ] = β k μ D I A G ( I ρ W ) 1 ,
where μ D I A G is a diagonal matrix of order n with elements μ i . Observe that Equation (3) is a n × n matrix of partial effects, where the elements in the main diagonal are direct effects of the kth explanatory while the off-diagonal elements are indirect effects.
Considering that A = ( I ρ W ) 1 , then the direct partial effects in region i are equal to
μ i x i k = β k a i i μ i i = 1 , , n ,
and give the expected outcome of a given location due to a one-unit variation of the kth explanatory in the same location.
When the spatial weighting matrix is row-normalized, indirect effects can be divided into spill-in and spill-out spatial spillovers. The spill-in spillover measures the cumulative sum of spatial spillovers that location i receives from all neighboring locations—that is, the sum of expected impacts in the outcome of location i due to a one-unit variation of the kth explanatory in each neighboring location j—and can be calculated as follows:
s p i l l i n i = β k j = 1 j i n a i j μ i i = 1 , , n ,
which is the cumulative sum of all off-diagonal elements in row i of Equation (3).
The spill-out spillover effect is the sum of all spatial spillovers that location i transfers to neighboring locations—that is, the sum of the expected impact in the outcome of each location j neighbor from i when the kth explanatory in location i varies by one unit and is equal to
s p i l l o u t i = β k j = 1 j 1 n a j i μ j i = 1 , , n ,
or, equivalently, the sum of all off-diagonal elements in column i of Equation (3).
Each region has a direct, a spill-in and a spill-out partial effects. The values analyzed in empirical applications are usually the average of each of these effects over all regions, constituting, respectively, the average direct partial effect, the average spill-in spatial spillover (Aspill-in), and the average spill-out spatial spillover (Aspill-out), that is,
A s p i l l i n = 1 n i = 1 n s p i l l i n i A s p i l l o u t = 1 n i = 1 n s p i l l o u t i

2.2. Estimation

When the dependent variable is a count with a Poisson distribution, the full information maximum likelihood (FIML) estimator of the reduced form in Equation (2) is derived in [12]. For variables that are not Poisson distributed, which includes some types of counts and other general nonnegative variables, those results can be used in a Poisson pseudo-maximum likelihood (PPML) context, assuming that the conditional mean is correctly specified according to Equation (1). Since the seminal work of [18], PPML has become popular for model estimation because it extends the technique of maximum likelihood to situations where the conditional distribution of the outcome does not need to be specified, but its conditional expectation has to be an exponential function of a linear index. The idea is to use the Poisson probability function to build the likelihood function, even if the outcome is not Poisson distributed, requiring that its expectation coincides with the expectation of a Poisson-distributed variable. As a misspecified distribution was used to define the likelihood function, the covariance matrix of the estimator and, in particular, standard errors, need to be estimated with a robust estimator. Silva and Tenreyro [4] disseminated PPML to estimate the gravity model, which is a particular case of exponential regression.
The PPML approach proposed here results in estimating the unknown coefficients by FIML and the respective standard errors with a robust estimator to safeguard them from variance misspecification, like in situations where there is overdispersion. However, the authors of [12] report severe difficulties in obtaining numerical solutions for FIML estimates. Therefore, they recommend instead a limited information maximum likelihood (LIML) two-step estimator. The first step delivers an estimate for the unknown variable W log ( μ ) obtained with an OLS regression of W log ( y ) in the set of regressors X , W X and W 2 X . In the second step, a Poisson regression is performed with regressors W log ( y ) ^ and X . An expression for the second stage adjusted covariance matrix is given in [12].
This paper proposes a two-step procedure that extends and refines the method described above in two ways. First, it suggests extending the estimation for a pseudo-maximum likelihood framework in order to encompass the modeling of a vast set of outcomes. This approach requires additional care in the estimation of the covariance matrix in the second step. Second, it proposes a different estimation procedure for the first step that circumvents the problem of observations that are zeros. Therefore, the following two-step PPML procedure to estimate the SAR-E model in Equation (1) is recommended:
  • Run a PPML regression of y on X , W X and W 2 X and calculate the predicted values y ^ .
  • Run a PPML regression of y on W log ( y ^ ) and X .
Observe that the second step of [12] uses W log ( y ) ^ , the fitted values for the variable W log ( y ) , while the second step of the proposed method uses W log ( y ^ ) , which is a transformation of the fitted values of variables y , y ^ .
Standard errors in the second step should take into consideration the pseudo-maximum likelihood framework where the Poisson variance may be misspecified, and should account for the sampling variation in the regressor W log ( y ^ ) . To overcome these issues, the use of bootstrap standard errors is recommended in the second step. This procedure is easy to implement because it requires only software with a command for Poisson regression and bootstrap standard errors, like STATA [19]. (For non-negative outcomes other than counts, we advise using the command “glm” in STATA with the option “family(Poisson)” instead of “Poisson”.)

3. Simulation Study

In the simulation study, the two-step estimator introduced in Section 2 with a first step based on maximum likelihood (SAR-PPML 1stStep-ML) is compared with the two-step estimator of [12], with the first step being an OLS regression (SAR-PPML 1stStep-OLS) and aspatial PPML. For SAR-PPML 1stStep-OLS, when calculations need the logarithm of the outcome, an ad hoc constant equal to 1 was added to observations that were zeros. Simulations were performed with R [20].

3.1. Simulation Design

The simulation design closely follows that in [12], which is most closely related to other spatial econometric experimental designs, such as those in [21,22,23].
The random dependent variable is generated as y i Poisson ( μ i ) with μ i = exp ( A i X β ) where A i is the ith row of ( I ρ W ) 1 . The design matrix includes two covariates, X 1 and X 2 , where the first was randomly generated from a normal distribution, with mean and variance equal to 1 and 2, respectively. Since econometric studies usually incorporate a mix of continuous and dummy variables, following [4], X 2 is a dummy variable randomly generated from the Bernoulli distribution with a mean equal to 0.5.
The study considers three alternative spatial weight matrices. They were calculated using the same two-step procedure found in other spatial econometrics simulation studies (see, e.g., [24]). First, n space units are randomly drawn within in the unit square. Secondly, a matrix W0 is constructed given a criterion, and normalized by rows, so that the sum of all elements in each row is 1. In the present study, two different criteria were used, resulting in three alternative spatial weighting matrices. W1 and W3 intend to replicate matrices generated with a contiguity criterion, with neighbors chosen based on the nearest neighbor distance, fixing for W1 that each unit has seven neighbors (the seven units that are closest), while for W3 the number of neighbors is four, which is close to the average number of neighbors observed in the empirical study included in the next section. On the other hand, W2 is created based on an inverse distance criterion, using the Euclidean distance between each unit. The matrix W2 is said to be denser than the matrix W1, since W2 contains more nonzero entries, and matrix W1 is denser than matrix W3.
Monte Carlo simulations were conducted for each design of W and for each of the three estimators described above. The sample size, n, varied over the set [100; 250; 500; 750; 1000], and the spatial autoregressive parameter, ρ , varied over the set [0; 0.2; 0.4; 0.6; 0.8]. The parameters associated with variables X 1 and X 2 , β 1 and β 2 , respectively, were held fixed at 0.5. The intercept was set to zero.
For each experiment, 1000 replications were used. This is the usual number of replications used in Monte Carlo studies with spatial data (see [12,21,22,23,24,25], among others). The bias was calculated as the average in the 1000 replications of the difference between the estimated value of the coefficient in each simulated sample and the respective true value. The RMSE was also calculated for each estimated coefficient as the square root of the sum between the square of the bias and the empirical variance in the 1000 replications.

3.2. Monte Carlo Results

It should be noted that the results obtained referring to W1 are quite similar to those obtained with W3. This suggests that estimators should not be sensitive to the density of the spatial weighting matrix when using a contiguity criterion. For this reason, the analysis of the remaining results will focus only on experiments related to the use of W1 and W2 matrices. The results for W3 can be found in Table A5 and Table A6 of Appendix A.
Table A1 and Table A3 in Appendix A show the results for the bias of the estimated coefficients for each estimation method, considering the spatial weighting matrix based, respectively, on the contiguity criterion, W1, and the inverse distance criterion, W2. Both SAR-PPML estimators show similar and quite satisfactory results, with the SAR-PPML 1stStep-ML presenting a lower absolute value for bias for lower and median levels of spatial dependence, while the SAR-PPML 1stStep-OLS appears to behave better for values of ρ closer to 1. It is worth noting that both estimators have lower bias, as an absolute value, to estimate the coefficient of the continuous variable than that of the dummy variable. Note, also, that when ρ increases, both estimators present a smaller absolute value of bias when using matrix W2 compared to matrix W1. Nevertheless, this difference is negligible, especially for a large n. Finally, the aspatial PPML estimator shows progressively worse results as ρ increases, as expected, being slightly better than the SAR methods when there is no spatial dependence ( ρ = 0 ) .
Concerning the bias of the spatial autoregressive coefficient, ρ , globally, the SAR-PPML 1stStep-ML shows better performance than the remaining estimators, especially when n is large. However, for ρ = 0.8 it shows a higher bias, in absolute value, particularly for the W2 matrix. Although slightly worse than the first, the SAR-PPML 1stStep-OLS presents satisfactory results, namely for high levels of spatial dependence. On the other hand, in general, the use of a special weighting matrix based on the inverse distance between locations produces higher bias when estimating the spatial autoregressive parameter.
Table A2 and Table A4 in Appendix A show the results referring to RMSE. From a general point of view and regarding β 1 , the SAR-PPML 1stStep-ML presents the best results, particularly for W1. However, the SAR-PPML 1stStep-OLS produces a more desirable set of results for higher values of ρ . In both estimators, it is noted that, as the spatial dependence and the sample size increase, the RMSE decreases. This result is only slightly altered when ρ = 0.8 . As expected, the aspatial ML estimator only shows satisfactory results when ρ = 0 . As for β 2 , the conclusions are quite similar to β 1 , with the disclaimer that the RMSEs for this coefficient are much higher, especially when the sample size is small. Estimations involving the W1 matrix have slightly better results. Lastly, the aspatial estimator is, again, quite far from the results of the other estimators.
Both SAR-PPML estimators present quite similar results regarding the RMSE values for the estimation of the coefficient of spatial dependence, ρ , with the SAR-PPML 1stStep-ML showing better results as the sample size increases. It is also important to note that the SAR-PPML 1stStep-ML exhibits a higher RMSE for matrix W2 for high levels of spatial dependence when compared to SAR-PPML 1stStep-OLS. However, in general, the use of W1 seems to trigger better results.
In summary, these results are in line with those obtained in other simulation studies such as [4,12,22,24,25], suggesting the following conclusions. First, the estimator SAR-PPML 1stStep-ML presents the best performance, except for high spatial dependence, when ρ = 0.8 . Keep in mind, however, that most empirical applications give low and median values for the spatial dependence parameter. Since this estimator does not rely on logarithmic transformation of the dependent variable and uses PPML regression instead of a loglinear estimation in the first step, this result seems to be in agreement with that found by [4]. Another interesting result is that there is a higher distortion for the estimated coefficient of the dummy variable compared to the estimated coefficient of the continuous variable, suggesting that the distribution of the explanatory variables can condition the performance of the estimators, a conclusion that is also made by [12]. Other similar conclusions between studies are the fact that the RMSE decreases as the spatial dependence and sample size increase, and that the spatial weighting matrix criteria influence the results. Several studies have already addressed this issue, such as [24], where the authors found that the RMSE of coefficient estimators appears to be generally higher for the spatial weighting matrix based on inverse distance, suggesting that the variance of the estimated coefficients may, somehow, be related to the density of the spatial weights matrix chosen. Another expected conclusion was the poor performance of the Aspatial PPML estimator in the presence of spatial dependence, which presented an accentuated upward bias for the coefficients of X1 and X2. This result is in agreement with [26], who found biased and inconsistent estimators when spatial dependence was not taken into consideration. In addition, it is interesting to note that the distortion of results is more significant for values of ρ near 1, which is in line with the results of [22].
To assess the performance of both estimators under misspecification, a new design was considered where X1 shows spatial dependence instead of being i.i.d. Therefore, X1 was simulated according to the following spatial autoregressive process:
X 1 = ( I 0.5 W 1 ) 1 ε   with   ε N ( 0 , I ) .
The other variables were generated as before, with the coefficients retaining the same values. Estimation was implemented as if X1 was i.i.d. (ignoring that it is spatially autocorrelated). The results obtained for 1000 replications, considering the spatial weighting matrix W1, are included in Table A7 for bias and in Table A8 for RMSE, while Table A9 and Table A10 show, respectively, the bias and RMSE when the spatial weighted matrix is W2. Results show that ignoring spatial autocorrelation in the explanatory variable leads to noticeably higher bias and RMSE in the estimation of all parameters, especially in the estimation of the spatial autocorrelation coefficient. Both estimators show similar performance in estimating the coefficient of X1, whether the spatial matrix is based on the nearest neighbor criterion (W1) or the inverse distance (W2). The new estimator introduced, SAR-PPML 1stStep-ML, shows better performance than SAR-PPML 1stStep-OLS for the coefficient of X2 when the spatial matrix is W1. The improvement in performance of SAR-PPML 1stStep-ML over the SAR-PPML 1stStep-OLS is especially visible in the estimation of ρ for both spatial weighting matrices.

4. Empirical Application

This section illustrates the usefulness of the SAR-E regression introduced in Section 2 by an empirical example that estimates a knowledge production function to explain the creation of innovation in European regions. For the sake of comparison, the estimator of [12] is also calculated.
Following the arguments of [27], the number of patents in a given region per million inhabitants (Pat) is used as a proxy for knowledge creation. See also [28] for a discussion on measuring innovation. The equation to be estimated is
E ( P a t i | x i ) = μ i = exp ( ρ j = 1 j i n w i j log ( μ j ) + x i β ) ,
where x i is a vector with explanatory variables that will be introduced in Section 4.1, β is a vector of unknown coefficients to be estimated, ρ is the unknown spatial autocorrelation coefficient, and w i j are the elements of a spatial weighting matrix. In this empirical application, the spatial weighting matrix was calculated based on a queen contiguity criterion and is row normalized. All estimations were conducted using R [20]. The exploratory data analysis was performed according to QGIS [29] and GeoDa [30].

4.1. Data and Variables

The data were collected from Eurostat regional statistics. They contain data on 234 NUTS II regions from 24 European countries, of which 22 belong to the European Union, with the addition of the United Kingdom and Norway. NUTS is a nomenclature of territorial units for statistics developed and regulated by the European Union, defining a hierarchical system of regions with three different levels. At the top of the hierarchy are the NUTS 0 regions, referring to countries. The next level is NUTS 1, representing major socioeconomic regions within countries, followed by NUTS 2 regions, which are subdivisions of NUTS 1, and NUTS 3 regions, which are subdivisions of NUTS 2. All data refer to 2012. Regions with no neighbors were excluded (like Portuguese and French islands). Finally, NUTS II London (UK) and Centre (France), were discarded for incongruity of data. The list of countries in the database is in Appendix B.
The description of variables used in this study, together with the expected outcome of the estimated associated coefficient, can be found in Table 1.
Since [31] introduced the knowledge production function, the use of variables related to R&D has become normal when modeling the creation of innovation. Following [27,28,32,33,34] different impacts on the creation of knowledge from expenditure and human resources in R&D were considered according to its source (from the private and business sector, from government, or from universities). It is expected that more R&D expenditure, as well as more full-time R&D employees, will trigger an increase in knowledge creation. Therefore, the expected outcome of the estimated coefficients related to these variables should be positive. However, the literature suggests that this happens only for the R&D resources of the private sector. For both the public sector and universities, the effect of those variables often appears to be negative or statistically negligible (see, e.g., [27,33,35]). This behavior may be explained in the case of universities by the fact that their main contribution to knowledge creation arises in the form of scientific articles and not patents, while for the public sector it may be due to a certain inefficiency of public institutions in the production of knowledge (see [33,34]).
Three variables aiming to capture the effect of the “innovative environment” are considered. The first is the percentage of graduates in the population between 25 and 65 years old, proxying the level of education of the population in the region. The second is GDP per capita, which proxies the technological sophistication and the size of the economy. Finally, the third is the tuberculosis mortality rate, considered as a proxy for the level of poverty of the inhabitants, as several studies relate tuberculosis with poverty (see, for example, [33]). It is expected that a better socioeconomic environment will boost innovation (as in [33]). In addition, the number of inhabitants was defined as the control variable. Table 2 includes the descriptive statistics of the variables used in this study. Additionally, we note that 6% of the regions in the sample registered no patents.
The correlation matrix of these variables is shown in Table A11 of Appendix B. Pairwise correlations between explanatory variables do not exceed the common threshold of 0.8, as recommended in [36], which leads us to not anticipate collinearity problems in estimation.

4.2. Exploratory Spatial Analysis

Analyzing the spatial distribution map of the variable Pat per quartile in Figure A1 in Appendix B, we see the existence of a cluster effect, with a concentration of patenting taking place in Central Europe, Southern England, and Scandinavia, with the number of new patents in Southern and Eastern Europe being modest. On the other hand, Moran’s I test for spatial autocorrelation, applied to patents, shows a value for the test statistic equal to 0.6045, with a p-value equal to 0.000, denoting evidence of positive spatial dependence. This conclusion is supported by the Moran diagram (Figure A2 in Appendix B). Analyzing the latter, it is worth noting that most of the observations are in the 1st and 3rd quadrant, and therefore, the majority of regions with a higher (smaller) number of new patents have neighboring regions where this number is also higher (smaller). Analyzing the LISA3 indicators in Figure A3 of Appendix B, we see two high patent clusters in Central Europe and Scandinavia, together with low patent clusters in the Iberian Peninsula and Eastern Europe. It is also possible to identify two other clusters where patenting tends to be low, northern Britain and southern Italy. Figure A4 of Appendix B shows the LISA significance map, inferring that the results are more significant concerning the Central European, Iberian Peninsula, and Eastern Europe clusters.

4.3. Estimation Results

Equation (7) is estimated with the introduced SAR-PPML 1stStep-ML estimator. For comparative purposes, the results obtained with the alternative estimator from [12], the SAR-PPML 1stStep-OLS, are presented as well.
In SAR-PPML 1stStep-OLS an ad hoc constant (c = 1) was added when patents were 0.
Table 3 includes the estimates for coefficients of the knowledge production function, together with their bootstrap standard errors. Bear in mind that, because both estimators use an explanatory variable that is the result of a fit obtained in the first step, the usual standard errors are not valid. The introduced PPML 1stStep-ML behaves better in terms of goodness of fit, with the value for the loglikelihood being noticeably higher.
For both estimations, the coefficient related to the spatially lagged variable is positive and significant (p-value < 0.01), thus inferring that there is a clear positive spatial dependence between regions regarding innovation creation, which matches the results of [34,37].
As for the remaining explanatory variables, the variable R&D_B appears to be significant at 1% in all estimates. In contrast, R&D_U is not significant, which can be explained by the fact that university contributions are mostly in the form of scientific articles and not patents, as mentioned before. On the other hand, R&D_G is significant at 10% in SAR-PPML 1stSep-ML; however, it presents a negative sign. These results converge with those of [32,33,34], which also disclose evidence of inefficiency in the use of R&D resources of the public sector. In addition, these authors also conclude that R&D expenditures in the private sector are more important to trigger knowledge creation than those from the public sector or universities.
The variables related to the “Innovative Environment,” Educ and Pop, are not statistically significant, while GDP is statistically significant at 1% in both estimations, with a positive sign. Finally, the mortality rate appears significant at 5% only in the SAR-PPML 1stStep-ML, showing a negative sign. These results are in line with expectations, as greater technological sophistication is generally associated with lower levels of poverty and higher quality of life, which fosters the growth of innovation in a region. These results corroborate studies such as [28,33], the authors of which conclude that an “innovative environment” is important for increasing knowledge creation.
Given the nonlinearity of the model, it is through the average partial effects (APE) that it is possible to quantify the impact of the variation of the explanatory variables on the dependent variable, ceteris paribus. Given the autoregressive structure of the model, it is possible to measure the indirect partial effects, that is, spatial externalities, together with the direct ones. These are included in Table 4.
Concerning the average direct effects, and the SAR-PPML 1stStep-ML, an increase of 1 percentage point in the tuberculosis mortality rate in the region results, on average, in a drop of 20.4 patents per million inhabitants in the same region. On the other hand, an increase in GDP per capita of just 100 euros in the region may trigger an increase, on average, of 0.4 patents per million inhabitants in the same region.
Regarding the variables of expenditure on R&D, these can present the most interesting results for economic decision makers. An increase of 10 euros per capita in a region in public R&D entities means, on average, a decrease of 2.25 patents in that region per million inhabitants. Keep in mind that the respective coefficient estimate is statistically significant only at 10%. Now, given the inefficiency inferred there, a policy maker may transfer the financial resources of these institutions to private R&D companies, since these, for each increase of 10 euros per capita in R&D expenses, trigger an increase of approximately one patent per million inhabitants in the same region. The spatial distribution map of the SAR-Poisson 1stStep-ML direct partial effect (DPE) per quartile related to the variable R&D_B is represented in Figure A5 in Appendix B. It is clear that the regions with the most efficient companies for transforming R&D expenses into patents are located in Central Europe, southern Great Britain, and Scandinavia. Therefore, regions in Eastern and Southern Europe should initiate reforms in the private R&D creation system, seeking an increase in efficiency. These reforms require the recruitment of more qualified personnel and investment in more sophisticated technology.
As for the indirect effects, the variables related to the “innovative environment” show higher indirect effects in absolute value than direct, showing that not only the socioeconomic situation of the region is central to the creation of knowledge, but also the interregional environment.
Concerning the R&D expenditure variables, investment in government R&D institutions also does not benefit neighboring regions in the knowledge creation process, since both the spill-out effect and spill-in effect are negative. On the other hand, investment in private R&D in one region will have a positive impact in neighboring regions: a variation of 10 euros per inhabitant in private R&D expenditure in all neighboring regions of i results in an increase of 1.74 new patents in region i. Conversely, an increase of 10 euros per inhabitant in expenditure on private R&D in region i results in an increase, on average, of 1.68 in all neighboring regions. These facts highlight the presence of knowledge spillovers between regions. Figure A6 and Figure A7 in Appendix B represent the spatial distribution map of spill-in and spill-out effects per quartile obtained with the SAR-PPML 1stStep-ML, respectively, of the variable R&D_B. It can be concluded that, in addition to the Central European cluster, which shows a strong relationship in the creation of knowledge, regions in Southern and Eastern Europe, as well as some regions in Southern England, have a remarkable capacity for absorbing innovation. Regarding the spill-out effects, the Central European and Scandinavian clusters are the biggest “exporters” of knowledge spillovers. Interestingly, some regions that present a lower DPE with the investment in private R&D, as is the case in Eastern Europe and the North of the United Kingdom, present higher values of spill-out and spill-in. Therefore, one may conclude that, despite having a lower capacity for innovation, these regions have a strong interconnection, which leads to high levels of knowledge spillover. This can be explained by a possible commitment of companies to strong interregional cooperation links, so that investment in one company is positively reflected in the others. These links can be a strategy to overcome the difficulty of competing solo against regions with high numbers of patents. Therefore, political and economic decision-makers in regions with lower patent capabilities should create incentives for the creation of knowledge-sharing networks, thus enabling increased competitiveness.

5. Conclusions

Many applications in spatial econometrics concern the modeling of count outcomes or other nonnegative variables. This work proposes modeling such variables by a spatial autoregressive exponential (SAR-E) regression instead of using SAR loglinear models, in line with the reasoning of [4] in the context of cross-sectional data. A two-step PPML procedure for the SAR-E model is suggested that circumvents the problem of dealing with zeros. A simulation study verifies that the introduced estimator shows better performance than the previous estimation procedures, independently of the sample size, especially when the autoregressive coefficient is not close to 1, which is the case for many applications with economic data.
The usefulness of the proposed approach is illustrated in an empirical application to analyze the main determinants of knowledge creation and to quantify the spatial knowledge spillovers across different European NUTS II regions. There, evidence of the spatial dependence on the creation of innovation in Europe is found. In addition, it is inferred that social and economic factors determine the creation of knowledge, as is the case with quality-of-life standards and technological sophistication. It also appears that public R&D institutions are inefficient, unlike private institutions, with the latter being the major promoters of innovation creation in the analyzed regions. It is also inferred that an increase in R&D expenditure by private institutions positively influences the creation of innovation in neighboring regions. Given these results, it is possible to conclude that regions with low levels of knowledge creation try to overcome this limitation by strengthening relationships with neighboring regions, thereby increasing the absorptive capacity for innovation and creating strong clusters of knowledge sharing.
In the empirical study, there were some noteworthy differences in the results obtained with the method introduced in this paper to estimate the SAR-E, two-step PPML, and the existing method of [12] concerning both the statistical significance and magnitude of some coefficient estimates, namely the autoregressive parameter. Differences in the latter explain the visible differences in the indirect effects of variables, the spill-in and spill-out spillovers, obtained with the two methods, where spillovers obtained with the proposed method are higher in absolute value. These differences are not unexpected because the data for the response variable show a clear percentage of zeros. Given the results of the simulation study and the fact that the method introduced here is better able to handle zeros in the dependent variable, it is expected that it will deliver better estimates.

Author Contributions

Conceptualization, I.P.; methodology, I.P. and L.G.; software, L.G.; validation, L.G.; formal analysis, L.G.; investigation, I.P. and L.G.; data curation, L.G.; writing—original draft preparation, I.P. and L.G.; writing—review and editing, I.P. and L.G.; visualization, L.G.; supervision, I.P. All authors have read and agreed to the published version of the manuscript.

Funding

I.P. was partially supported by the Project CEMAPRE/REM UIDB/05069/2020, financed by FCT/MCTES through national funds.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Bias: SAR-Poisson, SAR-LogLinear, and Aspatial ML Poisson with W1 for 1000 replicates.
Table A1. Bias: SAR-Poisson, SAR-LogLinear, and Aspatial ML Poisson with W1 for 1000 replicates.
β1-SAR-Poisson 1stStep-ML—W1β1-SAR-Poisson 1stStep-OLS—W1β1-Aspatial Poisson ML—W1
Rho/n100250500750100010025050075010001002505007501000
0.0−0.0028−0.0011−0.00070.00000.0000−0.0011−0.0010−0.0006−0.0003−0.00030.0003−0.0004−0.00050.00000.0000
0.2−0.00130.0000−0.00070.0001−0.0004−0.0047−0.0041−0.00440.0040−0.00360.03110.03120.03120.03100.0308
0.4−0.00060.0000−0.00060.00020.0002−0.0032−0.0047−0.0046−0.0044−0.00460.08910.08680.08720.08720.0870
0.60.00030.00070.00150.00120.0015−0.0013−0.0019−0.0020−0.0022−0.00200.20210.19790.19880.19690.1971
0.80.00490.00460.00400.00320.00430.00210.00140.00170.00050.00040.49570.48460.48860.47710.4861
β2-SAR-Poisson 1stStep-ML—W1β2-SAR-Poisson 1stStep-OLS—W1β2-Aspatial Poisson ML—W1
Rho/n100250500750100010025050075010001002505007501000
0.0−0.0062−0.00280.00260.00040.0003−0.0105−0.0021−0.0010−0.0015−0.0015−0.0064−0.0004−0.0003−0.0004−0.0007
0.2−0.00160.00120.0014−0.00040.0003−0.0132−0.0118−0.0156−0.0138−0.01300.10180.10350.10280.10270.1036
0.40.00240.00090.0031−0.00360.0043−0.0143−0.0182−0.0188−0.0191−0.01790.27370.27680.27600.27890.2799
0.60.00170.00230.0040−0.00010.0027−0.0052−0.0051−0.0097−0.0094−0.00980.62010.63270.63910.63630.6383
0.8−0.0015−0.0022−0.00580.0077−0.00780.00720.00660.00650.00170.00171.75301.76351.79641.81821.8423
Rho-SAR-Poisson 1stStep-ML—W1Rho-SAR-Poisson 1stStep-OLS—W1
Rho/n10025050075010001002505007501000
0.00.00420.00150.0005−0.0012−0.00080.00370.0018−0.00030.00180.0014
0.20.0012−0.00250.0006−0.00030.00030.00790.01050.01490.01310.0119
0.4−0.0036−0.0014−0.0012−0.0023−0.00270.01690.02340.02460.02470.0249
0.6−0.0002−0.0004−0.0018−0.0016−0.00100.01600.01810.02040.02080.0209
0.80.00470.00390.00480.00500.00500.00180.00340.00420.00610.0082
Table A2. RMSE: SAR-Poisson, SAR-LogLinear, and Aspatial ML Poisson with W1 for 1000 replicates.
Table A2. RMSE: SAR-Poisson, SAR-LogLinear, and Aspatial ML Poisson with W1 for 1000 replicates.
β1-SAR-Poisson 1stStep-ML—W1β1-SAR-Poisson 1stStep-OLS—W1β1-Aspatial Poisson ML—W1
Rho/n100250500750100010025050075010001002505007501000
0.00.02420.01540.01020.00830.00690.02700.01630.01170.00900.00770.01710.01740.01220.00990.0084
0.20.02510.01520.01070.00870.00710.02710.01650.01190.00920.00830.04240.03530.03300.03210.0316
0.40.02230.01370.01020.00770.00700.02280.01410.01060.00890.00800.09460.08930.08850.08800.0877
0.60.01760.01060.00800.00650.00610.01810.01130.00790.00650.00560.21220.20270.20170.19890.1987
0.80.01610.01340.01240.01130.01060.01740.01550.01230.01200.01180.53800.50630.50290.48870.4954
β2-SAR-Poisson 1stStep-ML—W1β2-SAR-Poisson 1stStep-OLS—W1β2-Aspatial Poisson ML—W1
Rho/n100250500750100010025050075010001002505007501000
0.00.10340.06490.04540.03570.03140.10850.06480.04950.03820.03300.07160.07520.05640.04490.0382
0.20.09120.05670.04010.03330.02870.09990.06310.04370.03570.03160.15880.12630.11320.10890.1089
0.40.08110.04860.03530.03000.02610.08490.05310.04110.03540.03090.29670.28540.28000.28190.2821
0.60.06060.03950.03050.02770.02380.06430.04030.02920.02420.02150.64830.64540.64650.64230.6432
0.80.06020.05480.05580.05020.04760.05890.04870.04270.04040.04281.90531.84261.85151.86041.8784
Rho-SAR-Poisson 1stStep-ML—W1Rho-SAR-Poisson 1stStep-OLS—W1
Rho/n10025050075010001002505007501000
0.00.11120.06760.04740.03830.03200.11910.07340.05320.03980.0357
0.20.07920.04830.03300.02760.02390.08730.05420.03950.03180.0279
0.40.05120.02890.02140.01730.01490.05600.03900.03320.03070.0294
0.60.02370.01480.01220.01100.00970.03110.02440.02370.02300.0224
0.80.01420.01180.01060.00970.00940.01520.01460.01310.01370.0162
Table A3. Bias: SAR-Poisson, SAR-LogLinear, and Aspatial ML Poisson with W2 for 1000 replicates.
Table A3. Bias: SAR-Poisson, SAR-LogLinear, and Aspatial ML Poisson with W2 for 1000 replicates.
β1-SAR-Poisson 1stStep-ML—W2β1-SAR-Poisson 1stStep-OLS—W2β1-Aspatial Poisson ML—W2
Rho/n100250500750100010025050075010001002505007501000
0.0−0.0006−0.00100.0001−0.00010.0003−0.0001−0.00100.0001−0.00020.0002−0.0008−0.00040.00010.00000.0001
0.2−0.0017−0.0002−0.0001−0.0005−0.0004−0.0019−0.0007−0.0004−0.0006−0.00050.02900.02910.02850.02850.0286
0.40.00030.0002−0.00060.0003−0.0002−0.0007−0.0002−0.00080.0001−0.00030.07690.07610.07540.07500.0740
0.60.0012−0.0005−0.00030.0001−0.00030.0008−0.0004−0.00020.0002−0.00010.17110.16620.16090.16090.1592
0.80.0025−0.0015−0.0016−0.0017−0.00150.00030.00020.00000.00000.00010.40100.38730.37430.36300.3577
β2-SAR-Poisson 1stStep-ML—W2β2-SAR-Poisson 1stStep-OLS—W2β2- Aspatial Poisson ML—W2
Rho/n100250500750100010025050075010001002505007501000
0.0−0.00120.0010−0.00270.0005−0.0004−0.00050.0010−0.0029−0.0009−0.0005−0.0785−0.0014−0.00110.0004−0.0005
0.2−0.0038−0.0021−0.0011−0.00030.0001−0.0058−0.0035−0.0019−0.0003−0.0003−0.05200.09910.09980.10220.1009
0.40.00310.0020−0.0003−0.00030.0008−0.0006−0.0002−0.0012−0.00030.0002−0.02190.26710.26630.26820.2670
0.60.00450.00180.0013−0.00070.0005−0.00070.00100.00150.00090.00080.01430.59050.60000.60340.6027
0.80.03970.01720.0109−0.00690.00810.00150.0006−0.00010.00000.00040.03641.63081.65581.66381.6922
Rho-SAR-Poisson 1stStep-ML—W2Rho-SAR-Poisson 1stStep-OLS—W2
Rho/n10025050075010001002505007501000
0.0−0.0152−0.00200.0001−0.0010−0.0017−0.00820.00070.00180.0010−0.0006
0.2−0.00330.00060.00070.00090.0015−0.0122−0.0108−0.0115−0.0121−0.0119
0.4−0.0038−0.00090.00240.00020.0007−0.0064−0.00300.0000−0.0026−0.0016
0.60.01240.00720.00400.00210.00260.01260.01440.01520.01520.0154
0.80.05010.03410.02360.01660.01430.00470.00440.00450.00450.0043
Table A4. RMSE: SAR-Poisson, SAR-LogLinear, and Aspatial ML Poisson with W2 for 1000 replicates.
Table A4. RMSE: SAR-Poisson, SAR-LogLinear, and Aspatial ML Poisson with W2 for 1000 replicates.
β1-SAR-Poisson 1stStep-ML—W2β1-SAR-Poisson 1stStep-OLS—W2β1-Aspatial Poisson ML—W2
Rho/n100250500750100010025050075010001002505007501000
0.00.03870.02320.01650.01420.01140.04020.02360.01670.01430.01140.02760.01780.01200.00950.0082
0.20.03630.02070.01510.01210.01100.03600.02070.01510.01210.01100.03860.03270.03000.02960.0294
0.40.03040.01870.01360.01030.00880.03000.01860.01360.01030.00880.08200.07810.07630.07570.0745
0.60.02410.01340.00930.00760.00660.02330.01340.00930.00760.00650.17840.16940.16250.16210.1603
0.80.03050.01560.01230.01020.00920.01000.00560.00380.00310.00260.42560.40190.38160.36920.3626
β2-SAR-Poisson 1stStep-ML—W2β2-SAR-Poisson 1stStep-OLS—W2β2-Aspatial Poisson ML—W2
Rho/n100250500750100010025050075010001002505007501000
0.00.14870.09140.06690.05410.04880.15200.09240.06740.05430.04890.12240.07970.05510.04360.0394
0.20.13680.08430.06240.04950.04210.13760.08450.06250.04960.04220.15120.12380.11090.10950.1075
0.40.11820.07580.05150.04120.03620.11860.07610.05150.04110.03610.28660.27320.27140.26910.2709
0.60.09020.05270.03870.03020.02740.08940.05290.03850.03000.02730.60540.60640.60610.60500.6043
0.80.13100.06130.04020.02700.02770.03760.02160.01560.01260.01051.68701.68981.67811.70391.7125
Rho-SAR-Poisson 1stStep-ML—W2Rho-SAR-Poisson 1stStep-OLS—W2
Rho/n10025050075010001002505007501000
0.00.21960.12830.09340.08030.0687−0.00820.00070.00180.0010−0.0006
0.20.14190.08600.06360.04840.0420−0.0122−0.0108−0.0115−0.0121−0.0119
0.40.08140.05130.03600.02870.0245−0.0064−0.00300.0000−0.0026−0.0016
0.60.05710.03770.02640.01870.01570.01260.01440.01520.01520.0154
0.80.08400.04930.02900.02760.02450.00470.00440.00450.00450.0043
Table A5. Bias: SAR-Poisson. SAR-LogLinear, and Aspatial ML Poisson with W3 for 1000 replicates.
Table A5. Bias: SAR-Poisson. SAR-LogLinear, and Aspatial ML Poisson with W3 for 1000 replicates.
β1-SAR-Poisson 1stStep-ML—W3β1-SAR-Poisson 1stStep-OLS—W3β1-Aspatial Poisson ML—W3
Rho/n100250500750100010025050075010001002505007501000
0.0−0.0013−0.0009−0.00060.00000.0000−0.0015−0.0004−0.0002−0.00070.0002−0.0008−0.0004−0.00040.00000.0001
0.2−0.0002−0.00030.0001−0.00030.0002−0.0068−0.0057−0.0049−0.0051−0.00490.03340.03200.03280.03270.0327
0.40.00180.00310.00300.00330.0031−0.0074−0.0067−0.0061−0.0066−0.00590.09450.09550.09610.09500.0954
0.60.00530.00550.00490.00450.0076−0.0043−0.0036−0.0040−0.0047−0.00410.21620.22430.22490.22490.2227
0.80.03330.02560.02710.02690.02580.00240.00080.0008−0.0021−0.00020.55650.57440.57100.57810.5701
β2-SAR-Poisson 1stStep-ML—W3β2-SAR-Poisson 1stStep-OLS—W3β2-Aspatial Poisson ML—W3
Rho/n100250500750100010025050075010001002505007501000
0.0−0.0016−0.00200.0028−0.00060.0001−0.0069−0.0032−0.0038−0.0011−0.0013−0.0014−0.00110.0034−0.00050.0000
0.20.00200.00140.00530.00430.0041−0.0173−0.0166−0.0191−0.0190−0.01840.10170.10320.10520.10470.1046
0.40.01750.02060.01980.02210.0212−0.0227−0.0236−0.0241−0.0259−0.02670.28220.28590.28640.28690.2874
0.60.03820.03080.02960.02630.0248−0.0144−0.0147−0.0169−0.0191−0.01960.64690.65720.66120.66400.6653
0.80.09780.09190.09820.04900.03110.00670.0086−0.0031−0.0136−0.01531.81951.91171.94741.95971.9623
Rho-SAR-Poisson 1stStep-ML—W3Rho-SAR-Poisson 1stStep-OLS—W3
Rho/n10025050075010001002505007501000
0.00.00060.00000.0002−0.0013−0.00020.00430.00060.00180.0015−0.0005
0.2−0.00080.0015−0.00060.00070.00020.01930.01990.02150.02270.0219
0.4−0.0050−0.0086−0.0081−0.0088−0.00900.03030.03290.03360.03580.0346
0.60.0002−0.0041−0.0048−0.0059−0.01140.02360.02470.02750.02960.0288
0.80.01180.00600.0003−0.0034−0.00150.01140.01120.01580.02390.0231
Table A6. RMSE: SAR-Poisson. SAR-LogLinear and Aspatial ML Poisson with W3 for 1000 replicates.
Table A6. RMSE: SAR-Poisson. SAR-LogLinear and Aspatial ML Poisson with W3 for 1000 replicates.
β1-SAR-Poisson 1stStep-ML—W3β1-SAR-Poisson 1stStep-OLS—W3β1-Aspatial Poisson ML—W3
Rho/n100250500750100010025050075010001002505007501000
0.00.03150.01980.01340.01090.00900.03580.02120.01480.01190.01000.02760.01780.01190.00950.0082
0.20.03260.01980.01330.01170.00990.03560.02120.01470.01180.01090.04270.03670.03450.03390.0336
0.40.03160.01970.01390.01180.01070.03240.01990.01440.01190.01060.10150.09830.09760.09590.0962
0.60.02710.01740.01250.01030.01340.02660.01600.01190.01040.00870.22880.23000.22840.22750.2246
0.80.06550.05020.04720.04550.04390.04940.03610.03270.03440.03240.60650.60610.59630.59720.5870
β2-SAR-Poisson 1stStep-ML—W3β2-SAR-Poisson 1stStep-OLS—W3β2-Aspatial Poisson ML—W3
Rho/n100250500750100010025050075010001002505007501000
0.00.13560.08550.05920.04740.04130.14080.08650.06480.05090.04330.12240.07970.05400.04360.0394
0.20.12260.07590.05250.04370.03980.12950.08010.05730.04580.04230.15440.12420.11460.11260.1109
0.40.11430.07880.06450.06190.05580.11520.07180.05210.04560.04250.30830.29690.29150.29020.2899
0.60.11840.08510.07210.05820.03110.08980.05710.03940.03600.03410.68910.67640.67170.67150.6705
0.80.21340.18670.18640.19990.19490.17010.12970.11550.12170.11642.03172.06552.10072.06812.0419
Rho-SAR-Poisson 1stStep-ML—W3Rho-SAR-Poisson 1stStep-OLS—W3
Rho/n10025050075010001002505007501000
0.00.12720.07840.05470.04360.03680.14140.08850.06210.04780.0441
0.20.09830.05540.03800.03200.02740.11160.06470.04820.04000.0368
0.40.06410.03770.02940.02680.02440.07170.05010.04190.04040.0382
0.60.04400.03080.02540.02030.02480.04220.03240.03080.03220.0307
0.80.07740.05420.04310.04170.04160.04290.0365630.03420.04000.0387
Notes: (1) Bias is estimated as the average of 1000 simulation replicates of the difference between the parameter estimate and its true value. (2) RMSE is estimated as the square root of the sum between the square of the bias and the empirical variance of the estimated coefficient calculated after 1000 replicates. (3) SAR-Poisson 1stStep-ML is estimated using a two-step process. In the first step, the unobservable conditional mean, μ, is estimated using a PPML regression, and in the second step, the coefficients β1, β2, and ρ are also estimated using a PPML regression. (4) SAR-Poisson 1stStep-OLS is estimated using a two-step process. In the first step, the unobservable variable Wlog(μ) is estimated using an OLS regression of Wlog(y), adding an ad hoc constant (c = 1) when y = 0 and, in the second step, the coefficients β1, β2, and ρ are estimated using a PPML regression. (5) W1 and W3 are contiguity matrices created using the nearest neighbor criterion, where it is computationally defined that each unit will have seven units as neighbors for W1 and four units as neighbors for W2, which are the closest. W2 is created based on an inverse distance criterion, using the Euclidean distance between each unit.
Table A7. Bias: SAR-Poisson. SAR-LogLinear with W1 for 1000 replicates; X1 with spatial dependence.
Table A7. Bias: SAR-Poisson. SAR-LogLinear with W1 for 1000 replicates; X1 with spatial dependence.
β1-SAR-Poisson 1stStep-ML—W1β1-SAR-Poisson 1stStep-OLS—W1
Rho/n10025050075010001002505007501000
0.0−0.0123−0.0068−0.0035−0.0022−0.0031−0.0077−0.0056−0.0062−0.0018−0.0028
0.2−0.0024−0.00210.00010.00030.00080.00440.00320.00560.00660.0068
0.4−0.0024−0.0016−0.0009−0.00040.00040.00770.00860.01020.01040.0113
0.6−0.0029−0.0009−0.0033−0.0019−0.00080.00860.01210.01070.01260.0136
0.8−0.0093−0.0164−0.0191−0.0220−0.02260.00970.00890.01000.01070.0097
β2-SAR-Poisson 1stStep-ML—W1β2-SAR-Poisson 1stStep-OLS—W1
Rho/n10025050075010001002505007501000
0.0−0.0205−0.0086−0.0054−0.0020−0.0030−0.0187−0.0094−0.0031−0.0021−0.0038
0.2−0.0123−0.0024−0.0039−0.0028−0.0011−0.0364−0.0302−0.0326−0.0303−0.0289
0.4−0.00730.0012−0.00090.00180.0017−0.0583−0.0553−0.0573−0.0554−0.0564
0.60.00360.0033−0.00140.00220.0021−0.0694−0.0757−0.0789−0.0795−0.0788
0.80.00130.00450.0016−0.00180.0001−0.0532−0.0631−0.0686−0.0789−0.0778
Rho-SAR-Poisson 1stStep-ML—W1Rho-SAR-Poisson 1stStep-OLS—W1
Rho/n10025050075010001002505007501000
0.00.02640.00790.00350.00210.00450.02060.01240.00710.00290.0063
0.2−0.0012−0.00320.00340.00280.0001−0.00560.00350.00940.00660.0051
0.40.00730.00520.00750.00620.00500.02040.03650.04190.04280.0435
0.60.01730.00880.01060.00810.00600.05860.08070.08680.08930.0895
0.80.04000.03060.03230.03290.03140.05550.07660.08370.09260.0920
Table A8. RMSE: SAR-Poisson. SAR-LogLinear with W1 for 1000 replicates; X1 with spatial dependence.
Table A8. RMSE: SAR-Poisson. SAR-LogLinear with W1 for 1000 replicates; X1 with spatial dependence.
β1-SAR-Poisson 1stStep-ML—W1β1-SAR-Poisson 1stStep-OLS—W1
Rho/n10025050075010001002505007501000
0.00.11420.07030.04770.03990.03590.11400.07020.04720.03950.0353
0.20.10680.07000.04770.04040.03480.10430.06630.04620.03910.0340
0.40.11030.06660.04620.03940.03250.10510.06360.09960.08240.0324
0.60.09500.06040.04080.03260.03020.09100.05860.03960.03330.0320
0.80.07630.04870.04080.03850.03730.07070.04620.03320.02910.0263
β2-SAR-Poisson 1stStep-ML—W1β2-SAR-Poisson 1stStep-OLS—W1
Rho/n10025050075010001002505007501000
0.00.17400.10770.07670.06370.05360.21560.13230.09480.07710.0659
0.20.16350.10420.07270.06170.05200.20570.12740.09200.07940.0681
0.40.16380.10030.07200.05730.04980.20030.12740.04490.03850.0769
0.60.15480.09560.06950.05860.04760.18220.12570.10420.09580.0892
0.80.15480.10730.10170.09690.09000.15820.11170.09590.09620.0900
Rho-SAR-Poisson 1stStep-ML—W1Rho-SAR-Poisson 1stStep-OLS—W1
Rho/n10025050075010001002505007501000
0.00.37330.23270.16470.12590.11090.40420.25540.17680.13930.1236
0.20.29340.17440.11750.10170.08180.33580.19860.13420.11730.0986
0.40.22050.12020.08300.06750.05750.26150.14770.11020.09040.0795
0.60.13680.07330.05100.04230.03570.16100.11420.09990.09680.0948
0.80.08730.05170.04510.04430.04170.11090.09270.09260.09980.0974
Table A9. Bias: SAR-Poisson. SAR-LogLinear with W2 for 1000 replicates; X1 with spatial dependence.
Table A9. Bias: SAR-Poisson. SAR-LogLinear with W2 for 1000 replicates; X1 with spatial dependence.
β1-SAR-Poisson 1stStep-ML—W2β1-SAR-Poisson 1stStep-OLS—W2
Rho/n10025050075010001002505007501000
0.0−0.0123−0.0024−0.0026−0.0002−0.0015−0.0064−0.0010−0.00220.0016−0.0013
0.2−0.0040−0.00320.0005−0.00010.0007−0.0016−0.0023−0.00430.00020.0008
0.4−0.00420.0027−0.0006−0.00080.0006−0.00330.0034−0.0002−0.00060.0008
0.6−0.00160.00160.00020.00090.0005−0.00140.00200.00030.00090.0005
0.80.00010.00490.00290.00130.0005−0.00070.00120.0013−0.0005−0.0007
β2-SAR-Poisson 1stStep-ML—W2β2-SAR-Poisson 1stStep-OLS—W2
Rho/n10025050075010001002505007501000
0.0−0.0205−0.0105−0.0039−0.0002−0.0039−0.0101−0.0040−0.00260.0002−0.0034
0.2−0.0122−0.0033−0.00420.0030−0.0031−0.0088−0.00450.00090.0025−0.0038
0.4−0.0128−0.0010−0.00440.0011−0.0026−0.0116−0.0055−0.0058−0.0005−0.0036
0.6−0.0010−0.0026−0.0007−0.00090.0013−0.0064−0.0063−0.0018−0.00180.0004
0.8−0.0030−0.0034−0.0002−0.0011−0.0009−0.0024−0.00010.00240.00140.0006
Rho-SAR-Poisson 1stStep-ML—W2Rho-SAR-Poisson 1stStep-OLS—W2
Rho/n10025050075010001002505007501000
0.00.0264−0.0183−0.0201−0.0258−0.0070−0.0024−0.00450.0007−0.00840.0027
0.20.0155−0.0226−0.0082−0.0167−0.0048−0.0681−0.0597−0.0580−0.0656−0.0571
0.40.0100−0.01000.0002−0.00580.0005−0.0904−0.0728−0.0654−0.0701−0.0653
0.60.00710.00180.00120.0023−0.0018−0.0497−0.0225−0.0170−0.0142−0.0152
0.80.01500.01360.00560.00660.00440.01350.04720.05700.05970.0620
Table A10. RMSE: SAR-Poisson. SAR-LogLinear with W2 for 1000 replicates; X1 with spatial dependence.
Table A10. RMSE: SAR-Poisson. SAR-LogLinear with W2 for 1000 replicates; X1 with spatial dependence.
β1-SAR-Poisson 1stStep-ML—W2β1-SAR-Poisson 1stStep-OLS—W2
Rho/n10025050075010001002505007501000
0.00.11420.07090.05030.04180.03530.11170.07170.05040.04210.0354
0.20.11130.06560.04720.03950.03370.11160.06570.04730.03970.0337
0.40.10720.06680.04450.03930.03300.10630.06660.04440.03900.0330
0.60.09700.05940.04070.03450.03090.09640.05910.04060.03430.0308
0.80.07740.04570.03080.02420.02140.07700.04410.03060.02410.0213
β2-SAR-Poisson 1stStep-ML—W2β2-SAR-Poisson 1stStep-OLS—W2
Rho/n10025050075010001002505007501000
0.00.17400.13370.09860.08340.07100.22790.14660.10070.08540.0723
0.20.21440.13430.09490.07970.06900.22640.13970.09620.08110.0698
0.40.20030.12930.09080.07530.06600.21430.13220.09150.07580.0661
0.60.19010.11900.08610.06700.06120.19500.11850.08540.06710.0612
0.80.15220.09700.06240.05180.04430.15420.09010.06210.05180.0442
Rho-SAR-Poisson 1stStep-ML—W2Rho-SAR-Poisson 1stStep-OLS—W2
Rho/n10025050075010001002505007501000
0.00.37330.49560.35790.30200.25380.48720.30620.20270.17410.1505
0.20.39810.36210.23930.20000.17180.41800.26430.18080.15730.1370
0.40.41640.23560.15610.12640.11060.39290.22390.15680.13470.1222
0.60.34350.13680.08970.06670.06190.29330.15410.11020.08520.0786
0.80.15940.05710.03380.02940.02500.22200.07810.06620.06430.0650

Appendix B. Countries in the Sample

The 234 NUTS II regions in the dataset used in the empirical application come from the following countries: Bulgaria, the Czech Republic, Denmark, Germany, Estonia, Ireland, Spain, France, Croatia, Italy, Latvia, Lithuania, Hungary, the Netherlands, Austria, Poland, Portugal, Romania, Slovakia, Finland, Sweden, the United Kingdom, and Norway. Belgium, Switzerland, and Greece were discarded given the considerable lack of data in several Nuts II from these countries.
Table A11. Correlation matrix of the variables.
Table A11. Correlation matrix of the variables.
PatR&D_BR&D_GR&D_UPers_BPers_GPers_UEducPopGDPMort
Pat1.000
R&D_B0.7171.000
R&D_G0.3010.4381.000
R&D_U0.3900.5330.5101.000
Pers_B0.4740.6010.3320.2111.000
Pers_G0.1530.2150.5910.1170.6221.000
Pers_U0.1640.2860.3290.2610.7460.6741.000
Educ0.2630.4500.4080.4400.2960.2530.3551.000
Pop0.0560.0820.119−0.0880.6630.6510.7750.0101.000
GDP0.5730.6390.5190.6420.3680.1480.2260.559−0.0431.000
Mort−0.285−0.248−0.176−0.266−0.141−0.017−0.099−0.2750.082−0.4661.000
Figure A1. Spatial distribution map of the variable Pat per quartile in 2012.
Figure A1. Spatial distribution map of the variable Pat per quartile in 2012.
Sustainability 13 02843 g0a1
Figure A2. Moran’s I diagram for the variable Pat.
Figure A2. Moran’s I diagram for the variable Pat.
Sustainability 13 02843 g0a2
Figure A3. Local indicators of spatial association for the variable Pat.
Figure A3. Local indicators of spatial association for the variable Pat.
Sustainability 13 02843 g0a3
Figure A4. Local indicators of spatial association significance map for the variable Pat.
Figure A4. Local indicators of spatial association significance map for the variable Pat.
Sustainability 13 02843 g0a4
Figure A5. Spatial quartile distribution map of direct partial effect of R&D_B.
Figure A5. Spatial quartile distribution map of direct partial effect of R&D_B.
Sustainability 13 02843 g0a5
Figure A6. Spatial quartile distribution map of spill-in knowledge spatial spillovers of R&D_B.
Figure A6. Spatial quartile distribution map of spill-in knowledge spatial spillovers of R&D_B.
Sustainability 13 02843 g0a6
Figure A7. Spatial quartile distribution map of spill-out knowledge spatial spillovers of R&D_B.
Figure A7. Spatial quartile distribution map of spill-out knowledge spatial spillovers of R&D_B.
Sustainability 13 02843 g0a7

References

  1. Alamá-Sabater, L.; Márquez-Ramos, L.; Navarro-Azorín, J.M.; Suárez-Burguet, C. A two-methodology comparison study of a spatial gravity model in the context of interregional trade flows. Appl. Econ. 2015, 47, 1481–1493. [Google Scholar] [CrossRef]
  2. Fu, Y.; Gabriel, S.A. Labor migration, human capital agglomeration and regional development in China. Reg. Sci. Urban Econ. 2012, 42, 473–484. [Google Scholar] [CrossRef]
  3. Miguélez, E.; Moreno, R. Research Networks and Inventors’ Mobility as Drivers of Innovation: Evidence from Eu-rope. Reg. Stud. 2013, 47, 1668–1685. [Google Scholar] [CrossRef]
  4. Silva, J.M.C.S.; Tenreyro, S. The log of gravity. Rev. Econ. Stat. 2003, 88, 641–658. [Google Scholar] [CrossRef] [Green Version]
  5. Elhorst, J.P. Applied spatial econometrics: Raising the bar. Spat. Econ. Anal. 2010, 5, 9–28. [Google Scholar] [CrossRef]
  6. Anselin, L. Spatial externalities, spatial multipliers, and spatial econometrics. Int. Reg. Sci. Rev. 2003, 26, 153–166. [Google Scholar] [CrossRef]
  7. Lesage, J.P.; Chih, Y.-Y. Interpreting heterogeneous coefficient spatial autoregressive panel models. Econ. Lett. 2016, 142, 1–5. [Google Scholar] [CrossRef]
  8. Anselin, L. Thirty years of spatial econometrics. Pap. Reg. Sci. 2010, 89, 3–25. [Google Scholar] [CrossRef]
  9. Besag, J. Spatial interaction and the statistical analysis of lattice systems. J. R. Stat. Soc. Ser. B Stat. Methodol. 1974, 36, 192–225. [Google Scholar] [CrossRef]
  10. Kaiser, M.S.; Cressie, N. Modeling Poisson variables with positive spatial dependence. Stat. Probab. Lett. 1997, 35, 423–432. [Google Scholar] [CrossRef]
  11. Griffith, D. A Spatial Filtering Specification for the Auto-Poisson Model. Stat. Probab. Lett. 2002, 58, 245–251. [Google Scholar] [CrossRef]
  12. Lambert, D.M.; Brown, J.P.; Florax, R.J. A two-step estimator for a spatial lag model of counts: Theory, small sample performance and an application. Reg. Sci. Urban Econ. 2010, 40, 241–252. [Google Scholar] [CrossRef] [Green Version]
  13. Sengupta, A.; Cressie, N. Empirical Hierarchical Modelling for Count Data using the Spatial Random Effects Model. Spat. Econ. Anal. 2013, 8, 389–418. [Google Scholar] [CrossRef]
  14. Czado, C.; Schabenberger, H.; Erhardt, V. Non nested model selection for spatial count regression models with applica-tion to health insurance. Stat. Pap. 2014, 55, 455–476. [Google Scholar] [CrossRef]
  15. LeSage, J.P.; Fischer, M.M.; Scherngell, T. Knowledge spillovers across Europe: Evidence from a Poisson spatial inter-action model with spatial. Pap. Reg. Sci. 2007, 86, 393–421. [Google Scholar] [CrossRef] [Green Version]
  16. Glaser, S. A Review of Spatial Econometric Models for Count Data (No. 19-2017); Hohenheim Discussion Papers in Business, Economics and Social Sciences; Universität Hohenheim, Fakultät Wirtschafts- und Sozialwissenschaften: Stuttgart, Germany, 2017. [Google Scholar]
  17. Autant-Bernard, C.; LeSage, J. Quantifying knowledge spillovers using spatial econometric tools. J. Reg. Sci. 2011, 51, 471–496. [Google Scholar] [CrossRef]
  18. Gourieroux, C.; Monfort, A.; Trognon, A. Pseudo maximum likelihood methods: Applications to Poisson models. Econometrica 1984, 52, 701. [Google Scholar] [CrossRef]
  19. StataCorp LLC. Stata Statistical Software: Release 16; StataCorp LLC: College Station, TX, USA, 2019. [Google Scholar]
  20. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2020. [Google Scholar]
  21. Kelejian, H.H.; Prucha, I.R. HAC estimation in a spatial framework. J. Econ. 2007, 140, 131–154. [Google Scholar] [CrossRef] [Green Version]
  22. Klier, T.; McMillen, D.P. Clustering of auto supplier plants in the United States: Generalized method of moments spa-tial logit for large samples. J. Bus. Econ. Stat. 2008, 26, 460–471. [Google Scholar] [CrossRef]
  23. Fingleton, B.; Le Gallo, J. Estimating spatial models with endogenous variables, a spatial lag and spatially dependent disturbances: Finite sample properties*. Pap. Reg. Sci. 2008, 87, 319–339. [Google Scholar] [CrossRef]
  24. Santos, L.S.; Proença, I. The inversion of the spatial lag operator in binary choice models: Fast computation and a closed formula approximation. Reg. Sci. Urban Econ. 2019, 76, 74–102. [Google Scholar] [CrossRef] [Green Version]
  25. Billé, A.G. Computational Issues in the Estima tion of the Spatial Probit Model: A Comparison of Various Estimators. Rev. Reg. Stud. 2013, 43, 131–154. [Google Scholar]
  26. Anselin, L.; Le Gallo, J. Interpolation of air quality measures in hedonic house price models: Spatial aspects. Spat. Econ. Anal. 2006, 1, 31–52. [Google Scholar] [CrossRef]
  27. Buesa, M.; Heijis, S.; Baumert, T. The determinants of regional innovation in Europe: A combined factorial and a re-gression knowledge production function approach. Res. Policy 2010, 39, 722–735. [Google Scholar] [CrossRef]
  28. Acs, Z.J.; Anselin, L.; Varga, A. Patents and innovation counts as measures of regional production of new knowledge. Res. Policy 2002, 31, 1069–1085. [Google Scholar] [CrossRef]
  29. QGIS Development Team. QGIS Geographic Information System. Open Source Geospatial Foundation Project. 2020. Available online: https://www.qgis.org/en/site/ (accessed on 20 January 2021).
  30. Anselin, L.; Syabri, I.; Kho, Y. GeoDa: An Introduction to Spatial Data Analysis. Geogr. Anal. 2006, 38, 5–22. [Google Scholar] [CrossRef]
  31. Griliches, Z. Issues in assessing the contribution of research and development to productivity growth. Bell J. Econ. 1979, 10, 92. [Google Scholar] [CrossRef]
  32. Krammer, S.M. Drivers of national innovation in transition: Evidence from a panel of Eastern European countries. Res. Policy 2009, 38, 845–860. [Google Scholar] [CrossRef]
  33. Ferreira, V.; Godinho, M.M. The determinants of innovation. In Dynamics of Knowledge Intensive Entrepreneurship: Business Strategy and Public Policy; Routledge: London, UK, 2015; p. 304. [Google Scholar]
  34. Zhang, F.; Wang, Y.; Liu, W. Science and Technology Resource Allocation, Spatial Association, and Regional Innova-tion. Sustainability 2020, 12, 694. [Google Scholar] [CrossRef] [Green Version]
  35. Autant-Bernard, C. Science and knowledge flows: Evidence from the French case. Res. Policy 2001, 30, 1069–1078. [Google Scholar] [CrossRef]
  36. Farrar, D.E.; Glauber, R.R. Multicollinearity in regression analysis: The problem revisited. Rev. Econ. Stat. 1967, 49, 92. [Google Scholar] [CrossRef]
  37. Furková, A. Spatial spillovers and European Union regional innovation activities. Cent. Eur. J. Oper. Res. 2019, 27, 815–834. [Google Scholar] [CrossRef]
Table 1. Variable definitions and expected sign.
Table 1. Variable definitions and expected sign.
VariablesAbbrev.UnitExpected Outcome
The number of patents registered (dependent variable)PatUnit per million inhabitants
Intramural Expenditure on R&D by private businessR&D_BEuros per inhabitant+
Intramural Expenditure on R&D by the governmentR&D_GEuros per inhabitantAmbiguous
Intramural Expenditure on R&D by universitiesR&D_UEuros per inhabitantAmbiguous
Total R&D personnel and researchers in private business (no. of full-time workers)Pers_B-+
Total R&D personnel and researchers in the government (no. of full-time workers)Pers_G-Ambiguous
Total R&D personnel and researchers in universities (no. of full-time workers)Pers_U-Ambiguous
% Population aged 25–64 with Bachelor’s degreeEducPercentage+
PopulationPopNumber of inhabitants
GDP per capitaGDPThousand euros per capita+
Tuberculosis mortalityMortRate per 100,000 inhabitants
Table 2. Descriptive statistics.
Table 2. Descriptive statistics.
VariablesNMeanStd DevMinMax
Pat23489.171106.0450.000590
R&D_B234318.248382.4440.0002441.700
R&D_G23459.81987.6840.000480.600
R&D_U234135.917152.3880.000891.700
Pers_B2345744.3429291.5540.00097,982.000
Pers_G2341467.9792628.7400.00017,934.000
Pers_U2343294.9233558.3470.00034,836.000
Educ23427.3348.70011.20050.100
Pop2341,982,780.51,563,839.6126,620.011,898,502.0
GDP23426.92213.8743.56184.047
Mort2341.0091.3750.1008.800
Table 3. SAR PPML coefficient estimates of the knowledge production function.
Table 3. SAR PPML coefficient estimates of the knowledge production function.
SAR-PPML 1stStep-MLSAR-PPML 1stStep-OLS
VariableCoefficientsBootstrap SECoefficientsBootstrap SE
ρ 6.81 × 10−1 ***0.068386.19 × 10−1 ***0.07821
R&D_B8.91 × 10−4 ***0.000349.18 × 10−4 ***0.00035
R&D_G−2.15 × 10−3 *0.00130−2.01 × 10−30.00157
R&D_U−3.21 × 10−40.00079−5.85 × 10−40.00083
Pers_B−1.33 × 10−50.00003−1.59 × 10−50.00003
Pers_G2.75 × 10−50.000063.43 × 10−50.00006
Pers_U5.07 × 10−50.000051.86 × 10−50.00005
Educ2.58 × 10−40.011657.55 × 10−50.01279
Pop−3.21 × 10−99.62 × 10−86.10 × 10−81.14 × 10−7
GDP3.81 × 10−2 ***0.010034.42 × 10−2 ***0.01127
Mort−1.95 × 10−1 **0.09624−6.72 × 10−20.09071
Log Likelihood−6557.154 −7704.048
N234 234
Notes: Standard errors were computed using the bootstrap method. Significance levels: * 10%, ** 5%, *** 1%.
Table 4. Average direct partial effects and spatial spillovers.
Table 4. Average direct partial effects and spatial spillovers.
SAR-PPML 1stStep-MLSAR-PPML 1stStep-OLS
VariableDirectASpill-inASpill-outDirectASpill-inASpill-out
R&D_B0.09340.17430.16830.08780.11260.1093
R&D_G−0.2250−0.4200−0.4054−0.1921−0.2464−0.2391
R&D_U−0.0336−0.0628−0.0606−0.0560−0.0718−0.0697
Pers_B−0.0014−0.0026−0.0025−0.0015−0.0020−0.0019
Pers_G0.00290.00540.00520.00330.00420.0041
Pers_U0.00530.00990.00960.00180.00230.0022
Educ0.02700.05040.04870.72210.92580.8984
Pop0.00000.00000.00000.00000.00000.0000
GDP3.99337.45367.19534.22885.42225.2617
Mort−20.4060−38.0886−36.7686−6.4270−8.2409−7.9969
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Proença, I.; Glórias, L. Revisiting the Spatial Autoregressive Exponential Model for Counts and Other Nonnegative Variables, with Application to the Knowledge Production Function. Sustainability 2021, 13, 2843. https://0-doi-org.brum.beds.ac.uk/10.3390/su13052843

AMA Style

Proença I, Glórias L. Revisiting the Spatial Autoregressive Exponential Model for Counts and Other Nonnegative Variables, with Application to the Knowledge Production Function. Sustainability. 2021; 13(5):2843. https://0-doi-org.brum.beds.ac.uk/10.3390/su13052843

Chicago/Turabian Style

Proença, Isabel, and Ludgero Glórias. 2021. "Revisiting the Spatial Autoregressive Exponential Model for Counts and Other Nonnegative Variables, with Application to the Knowledge Production Function" Sustainability 13, no. 5: 2843. https://0-doi-org.brum.beds.ac.uk/10.3390/su13052843

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop