## 1. Introduction

Influenza A (H1N1) is one of the most common virus strains causing influenza pandemics in humans [

1]. A new strain of influenza A (H1N1) was identified in North America in the spring of 2009. The virus was found easily circulating among humans [

2]. Given its highly infectious nature [

3] and rapid transmission (made possible via modern transportation [

4]), this new influenza had caused a great concern globally [

1,

5,

6]. The World Health Organization (WHO) raised its influenza pandemic threat level to six (the highest level) on 11 June 2009 [

2]. On 10 August 2010, WHO announced that the H1N1 influenza virus has moved into the post-pandemic period [

7].

During the spread of influenza, spatial waves of infection have been observed between large distant populations [

8]. Spatial models of infectious diseases are being used with increasing frequency to characterize these large-scale patterns and to evaluate the impact of interventions [

9]. Many models have been developed to study the spatial spread of influenza (e.g., [

8,

10–

13]). Viboud

et al. [

8] proposed a gravity model based on transportation theory, which defines the effects of distance (negative effect) and the size (positive effect) of the ‘donor’ and recipient communities. Compared with multigroup models at the scale of households and workplaces/schools [

9], the gravity model is designed for larger spatial scales such as community, city, or country. Following Viboud

et al.’s study, there is a increasing number of applications of the gravity model in the field of infectious disease spread (e.g., [

14,

15]) The objective of our analysis is to evaluate at what spatial scale and temporal phase that the gravity model is valid with acceptable model performance. We used influenza A (H1N1) 2009 pandemic as a case study.

## 3. Results

The GLM demonstrated that, in log-scale, the number of daily cumulative confirmed cases of influenza A (H1N1) was statistically significantly associated (positively) with population size, except for 28 April and

per capita GDP, except for 23–25 April, and negatively associated with distance from Mexico, except for 28 April–1 May (

Figure 1A). The daily cumulative confirmed cases of influenza A (H1N1) in each state of the USA was positively associated with population size, except for 23 and 24 April, positively associated with

per capita GSP for a few days only, and not significantly associated with distance to Mexico, except for 25 April (

Figure 1B). With additional data [the cases of influenza A (H1N1) accumulated every day], the goodness of fit increased as indicated by the deviance/(degree of freedom) approaching unity (

Figure 1). Since May 2009 the patterns were clear that population, GDP, and distance had significant associations with cases of influenza A (H1N1) globally, while only population had a significant association with the influenza cases in each state of the USA (

Figure 1). In conclusion, the epidemic gravity model was appropriate for estimating the global spread of influenza A (H1N1), but not for the national spread in the USA.

Using the regressed coefficients of GLM for the day of 6 July 2009, we obtained the gravity model to estimate cases N of influenza A (H1N1) in each country i (omitting the error terms):

The value and standard errors of the model parameters for variables ln(intercept), ln(G), ln(P), and ln(D) are 3.44 ± 1.496, 1.547 ± 0.111, 1.575 ± 0.113, and 2.108 ± 0.233, respectively. Our estimation of the number of confirmed influenza A (H1N1) cases in each country (

Figure 2B) was highly correlated with observed cases as of July 6, 2009 (

Figure 2A), with the Spearman correlation coefficient being 0.92, p < 0.0001. Regarding to the data (accumulated confirmed cases of each country on 6 July 2009), 84.9% of its sum of square variance is explained by a simple linear regression (regression of observed cases with the estimated cases) using the ordinary least square method. The estimated values are more homogeneous among countries than the observed cases reported by WHO (

Figure 2B).

For each country, we compared the number of predicted cases from the model and reported confirmed cases based on the data on 6 July 2009 (

Figure 3A). Since the number of cases had very high variance, we conducted log transformation to shrink the scale. Using a simple linear regression, we found the predicted values captured 66.78% variance (indicated by R square value) of the number of confirmed cases.

When we used the number of days since 23 April 2009 to the first confirmed infection for each country as the dependent variable in

equation (2), we obtained the following:

We compared the number of predicted days and observed days (

Figure 3B). There were 66 countries or regions that had no confirmed cases were treated as missing (

Figure 3B). Note that, the coefficients in

model (3) had opposite signs in this application (5) as compared to the first application (4). That is, statistically, a higher economic activity (G

_{i}) and larger population size (P

_{i}) would lead to a shorter waiting time to the first confirmed case and longer distance (D

_{i}) would lead to a longer waiting time.

## 4. Discussion

Our results showed that the spread of influenza A (H1N1) among countries was significantly associated to covariates of a set of important socio-economic indicators. The results were consistent with previous findings that air and surface transportation played a significant role in the spread of influenza under both epidemiological survey (e.g., [

3]), mathematical epidemic models [

4] and theoretical simulations (e.g., [

11,

13,

20]).

We modified the epidemic gravity model with the assumption of a surrogate origin (

i.e., Mexico) where the first identified case was from. Although the precise location of the origin of the influenza A (H1N1) 2009 remains unknown, it was believed the virus emerged in Mexico in February 2009 [

21]. From May to July 2009, many cases of influenza A (H1N1) in many countries were imported from USA. Because Mexico and USA is close to each other, so that it did not affect the values of distance (the variable used in GLM) very much.

The significance of each covariate (

i.e., population, GDP, and distance) and model performance varied in the first few days because of small sample sizes (only a few countries and states had identified cases in the early stage of intensive surveillance), and the model became more stable later (

Figure 1). Our modified gravity model was not appropriate in modelling the national dynamic of the confirmed cases in the USA (both distance and GSP were not statistically significant). The reasons are: (1) the distances from different states in USA to Mexico were not well ranked, and distance itself is not a good indicator of human mobility here; (2) the spread of the influenza in USA during May and June were not at the early stage of the spread, the inter-states and intra-states spread ware dominant. As a result, we conclude that the gravity model can be applied for influenza spread on the following conditions: (1) the spread period is long enough for estimating the model parameters; (2). the distance between donor and recipient communities has a good gradient; (3) the spread of influenza is at the early stage of if a single source is taking into account.

The daily cumulative confirmed cases of influenza A (H1N1) was used in our analysis, but these cases may not represent the true prevalence of the infection in each region. The number of cases identified was clearly related to the effort and the resources devoted by the health agencies in a country. For a new infectious disease, it is very likely that many cases probably existed already in many parts of the world before the identification of the first case. This is especially true due to the modern transportation systems and possibly many symptomatic and asymptomatic carriers have travelled to many places outside the borders already before the identification of the cases. Following the extensive media reports right after the first identification of the new subtype of the virus, many countries had increased the screening on border-crossing population without paying much attention to their domestic populations at the beginning of the new influenza A (H1N1) 2009 surveillance. The effort of screening only symptomatic cases or their close contacts of confirmed cases entering the country would result finding the cases from a small and biased sample [

22].

The three covariates in the model were selected the availability and their important roles in global social and economic interactions. GDP represents the economic activity of the people (for international travel), population size represents the susceptible, and distance represents a possible barrier to infection. Our GLM model provides a quantitative method to estimating the parameters in the model. The model we used was heuristic through conceptual reasoning, but the method of finding the parameters in the model was based on statistical estimation. Mathematical and statistical modelling is an important aspect in addressing public health challenges [

23]. Our modelling utilizes social and economic factors and would provide quick insights in understanding the global viral transmission and heath authorities’ efforts.