With an increasing interest in the relationships between environmental contexts and human health-related behaviors and outcomes, many physical activity (PA) researchers have investigated the environmental influences on people’s moderate to vigorous PA. PA is “any bodily movement produced by skeletal muscles that results in energy expenditure” [1
]. Moderate to vigorous PA especially, such as brisk walking and running, brings various health benefits [2
]. Many studies have used geographic information system (GIS)-based buffer analyses due to the simplicity of using them to delineate contextual areas or zones and the effectiveness in deriving contextual variables with them. Buffer zones with a pre-specified distance are, thus, often used in research on PA to delineate areas within which individuals are potentially affected by specific environmental factors.
Home addresses have been a popular type of geographic location for delineating buffer zones. For example, previous studies investigated the effects of neighborhood green spaces around individuals’ home locations on their PA [3
]. Some researchers have sought to identify a reasonable and reliable distance to capture true neighborhood effects. For instance, McGinn and colleagues [7
] suggested that a 20-min walking distance—roughly 1.6 km or 1 mile—was appropriate for defining neighborhood areas from individual home locations in physical health research, whereas Berke and colleagues [8
] claimed that a slightly smaller size, 1 km or 0.6 miles, might better capture the characteristics of people’s residential neighborhoods.
When compared to the static locations, such as home addresses, a few studies have been conducted to explore the effects of the sizes of GPS-based buffers when estimating the dynamic exposure along individuals’ daily GPS trajectories. For GPS-based buffers, dynamic GPS points are used as entities to construct buffers instead of static home locations. By identifying where and when people spend their time and are exposed to a specific environmental influence in their daily lives, using GPS data may help mitigate the uncertain geographic context problem (UGCoP), which arises due to the spatial and temporal uncertainties of contextual influences on people’s health-related behaviors or outcomes [9
]. In the last decade or so, buffer analysis (e.g., 50 or 100 m buffer) has been used to delineate environmental contexts along individuals’ GPS trajectories that have a spatially immediate and momentary influences on individuals’ moderate to vigorous PA [10
]. However, due to the lack of consensus on the best distance for creating GPS-based buffers in previous studies and potential variations in research results due to the use of buffers of different sizes, there is an urgent need to investigate the effects of different sizes of buffers on research findings to provide insights into public health and transportation research.
Thus, this study addresses the methodological issue of the effects of different buffer sizes on the estimated relationships between environmental contexts and active travel modes (ATMs), which is a manifestation of the UGCoP. As a subset of PA, ATMs include only walking and biking in this study, although there are other ATMs in our daily lives, like running, roller-skating, and so on. Sensitivity analyses were conducted to investigate the varying associations between seven environmental factors (crime, trees, parks and open spaces, neighborhood median household income, neighborhood population, transit availability, and traffic collision) and walking and biking, taking into consideration 11 different buffer sizes ranging from 20 to 200 m. In addition, this study used multinomial logistic regression to examine the relationships between buffer size and the impacts of the seven environmental factors and on people’s ATMs, and the statistical significance of each predictor based on the 11 different buffer sizes is explored.
This study contributes to the literature on the accurate estimation of individual exposures to various environmental contexts.
This article is structured as follows. Past studies on the estimation of environmental exposure using buffers are reviewed in Section 2
. The GPS dataset, the GIS dataset, and the analytical methods used in this study are explained in Section 3
. Section 4
presents the results obtained from the statistical analyses on the associations between ATMs and multiple environmental factors, and Section 5
discusses the research findings and concludes the article.
2. Estimation of Individual Environmental Exposure using Buffer Analysis
GIS methods have been essential for estimating the impact of environmental influences on PA. A large body of research has used buffer analysis to delineate neighborhoods around individuals’ home addresses and find empirical evidence regarding the associations between PA and neighborhood characteristics [5
]. Buffering is one type of spatial analysis that can be used to assess the effects of various factors by defining zones around geometric primitives, such as points, lines, or polygons, which represent geographic entities or objects. Circular (radial) buffers with the same distance in all directions are widely used to define neighborhood or contextual areas in an isotropic manner, whereas network buffers are used to define anisotropic contextual areas taking into account individuals’ reachable distances along road networks (Figure 1
). A circular shape has been the most popular form to delineate contextual areas based on a specific distance from a set of entities (e.g., people’s home locations). For instance, McGinn and colleagues [7
] created 1.6 km (1 mile) circular buffers around the homes of 1270 adults in Forsyth County, NC, and Jackson City, MS, to examine the associations between the built-environment and adults’ PA for leisure and transportation purposes. The 1.6 km distance is equivalent to a 20-min walking distance, which seems a reasonable distance for delineating people’s residential neighborhoods based on their home locations. The study found that, in Jackson, MS, people who live in neighborhoods with low-traffic volumes were less likely to meet physical activity recommendations.
Because a neighborhood or contextual area can be delineated based on different characteristics of participants or the environment, various buffer sizes were used in some studies [5
]. Berke et al. [8
], for instance, used smaller buffer sizes (e.g., 100 m, 500 m, and 1 km), which may better capture the walkable areas for older adults around their homes. As one of the characteristics of the built-environment, the effects of green space on people’s PA may depend on the proximity between green space and people’s homes. For example, Maas and colleagues [13
] used 1 km and 3 km circular buffers to measure the percentages of green space around participants’ homes, while Cerin and colleagues [16
] applied 500 m and 1 km network buffers to delineate reachable green space for adults in 12 countries. Browning and Lee [17
] conducted a systematic literature review and found that when buffers were created around individuals’ home addresses, a large number of studies found significant associations between greenness and better physical health or health behaviors, including PA, as the buffer size increases up until 1999 m. Further, Nagel and colleagues [5
] concluded that the significance of the associations between some built-environment factors and PA could vary depending on the sizes of the buffers that represent different ranges of neighborhood or contextual areas.
The increasing adoption of GPS in PA research has led to the identification of the locations where PA occurs. In some studies, GPS points falling within 400 m to 1600 m circular or network buffers from people’s home locations were considered [18
]. These studies combined GPS points with objectively measured PA to understand the effect of residential neighborhoods on people’s PA. Boruff and colleagues [18
] further investigated different buffer types and their influence on research findings. Further, different sizes of buffers were considered in recent studies to delineate more accurate residential neighborhoods for specific PA types (e.g., walking or bicycling) [22
]. In other words, with respect to the size of buffers, the types of PA in question became a critical factor that needed to be taken into account to find appropriate buffer sizes.
However, only a few studies to date have considered buffers along individual trips traced by GPS trajectories for assessing the dynamic influence of environmental factors. Among these studies, Rodríguez et al. [10
] justified the use of 50 m buffers around each GPS point to estimate daily exposures of adolescent females to built-environment characteristics. The purpose of using a 50 m distance was to avoid the potential dependence in the estimated effects of the built-environment between two consecutive GPS points. Further, Burgoine and colleagues [11
] applied a hybrid method by using 100 m circular buffers for estimating environmental exposures during children’s trips from and to school and 800 m network buffers for their residential and school neighborhoods. Regarding the trips from and to the school of children, Harrison and colleagues [24
] used the actual routes derived from GPS points and the predicted routes calculated using the shortest path algorithm to compare the food and PA environments in the 100 m buffers along the two kinds of routes. Yin and colleagues [25
] highlighted that, although the moderate to vigorous PA of youths usually occurred within a 0.25 or 0.3-mile radius around their residences considering their daily trips, their space-time paths were not uniformly distributed in the radial area. In addition, Houston [26
] tested 50, 250, and 500 m buffers created around the GPS trajectories of 55 adults and found that the results varied between different buffer sizes. For example, the magnitude of the impact of green space on moderate to vigorous PA was diminished as the buffer size increased to 500 m.
Therefore, this study conducted an in-depth investigation into the effects of buffer size on research results concerning PA based on GPS trajectories. With regard to the UGCoP, Kwan [27
] highlighted the need for performing sensitivity analysis in particular, in order to better understand the extent to which research findings and contextual influences are affected by different delineations of contextual units. Hence, this study examines whether GPS-based buffer size affects the associations between ATMs, including walking and biking, and multiple physical and social environmental factors that previous studies have not explored. Further, when compared to Houston’s research [26
], this study examines individual environmental exposures using smaller buffers (e.g., from 50 m to 200 m).
4.1. Descriptive Statistics
shows the descriptive statistics of the 168 adults with their personal characteristics and predicted average daily travel time using the optimized travel mode classification algorithm described in Section 3.2
. In the statistical summary, the percentage of females was slightly higher than that of males. Most of the participants were whites and middle-aged adults. According to the population statistics of Chicago (United States Census Bureau, 2010), the percentage of whites in the population of the study area was 45% while it was 81.5% in our sample, and African Americans accounted for 33% of the population, which was higher than the percentage in our samples (10.7%). High-income people comprised the dominant group, and vehicles, including private cars and public transport, were the most-used modes for daily travels, which accounted for one hour per day on average. Since running was rarely performed in the daily lives of the 168 participants, it was excluded from this study. The other three travel modes (i.e., walking, traveling in a car, and biking) were considered.
The total number of GPS points was 156,627 (156,627 observations). The environmental characteristics within the 50 m buffers around these GPS points are described in Table 3
to give a sense of how the participants were dynamically exposed to different environmental contexts in their daily lives. Regarding the predicted travel modes, 40,999 GPS points were identified as walking, whereas only 2545 points were classified as biking. The buffer areas of the GPS points associated with walking had, on average, higher tree density, higher transit availability and battery incidence, and more traffic collisions involving pedestrians and cyclists than the buffer areas of the GPS points associated with biking and in-vehicle. On the other hand, buffer areas of the GPS points associated with biking had, on average, more park and open space areas and higher neighborhood median household incomes. The correlations among the seven predictors were analyzed to evaluate multicollinearity using correlation coefficients, and it was found that there was very little correlation between any pairs of predictors (not presented).
4.2. Sensitivity Analyses of the 11 Sizes of Buffers
The varying associations between participants’ ATMs and the seven environmental factors in Models 1, 2, and 3 are shown in Figure 5
. In all three models, different buffer sizes affected such associations in terms of both the significance levels of the variables and the ORs. When trees, parks and open spaces, transit availability, battery, and traffic collision were included as predictors in Model 1, the associations between participants’ ATMs and the seven environmental factors were mostly significant for walking across the 20 to 200 m buffers, whereas only trees and traffic collision involving pedestrians and cyclists had significant associations for biking consistently for different buffer sizes. When the neighborhood African American was added as a predictor in Model 2, transit availability became more significant for all the buffer sizes for biking versus in-vehicle; however, parks and open spaces and crime still remain non-significant for most of the buffer sizes for biking. The added density of African Americans in a neighborhood was also not significantly associated with biking. On the contrary, the results of Model 3 with the additional neighborhood median household income variable indicate that the implications of buffer sizes are eventually alleviated in trees, transit availability, and neighborhood median household income regarding their significance levels for walking and biking. The rest of the predictors, however, do not show consistent significance levels across the buffer sizes. Particularly, as to parks and open spaces, crime, and the density of neighborhood African Americans, only relatively large buffer sizes—150 and 200 m—make them significant for biking. Further, the predictor of traffic collision is not significant for biking versus in-vehicle until the buffer size reaches 40 m.
In all of the three models, the associations between walking versus in-vehicle and all predictors mostly had high significance levels (p < 0.001), and the ORs varied as the buffer size changed, while the graphs of biking versus in-vehicle mostly show stable trends across the buffer sizes, except for crime and traffic collisions. With regard to the two safety-related factors, the ORs of walking and biking compared to in-vehicle status especially, show common characteristics. They both begin with similar ORs for small buffer sizes around 20 and 30 m, diverge more and more as the buffer size becomes larger, and cross at a certain size (crime) or are widened further (traffic collisions).
Since Model 3 had more significant variables across different buffer sizes, and 200 m was the utmost distance showing the largest number of higher significance levels in all predictors in Model 3, as shown in Figure 6
, it was selected as the most appropriate buffer distance to examine the associations between ATMs and the environmental factors in this study. Figure 6
indicates the significance levels of the seven predictors for walking and biking, and thus, the maximum number of significant variables is 14. In Figure 6
, the histogram indicates that as buffer size gets closer to 200 m, more environmental variables become significant.
The associations of the seven predictors derived with 200 m buffers and travel modes are shown in Table 4
. All the model fit measures, including the three kinds of pseudo-R-squared values, indicate that Model 3 with the added neighborhood median household income better explains variations in the outcome variable—predicted travel modes—than the other two models. The higher percentages of tree areas (OR: 1.05), transit availability (OR: 2.06), incidence of crime (OR: 1.00), and traffic collision (OR: 1.01) were significantly associated with higher odds of walking, whereas traffic collisions (OR: 0.99) were significantly associated with lower odds of biking compared to in-vehicle status in Model 1. For walking, the density of parks and open spaces (OR: 0.97) had a significant association with lower odds of walking against in-vehicle status in Model 1. The results of Model 2 were similar to Model 1, and the model fit measures of Model 2 were not improved much after the density of neighborhood African Americans was added.
Compared to Models 1 and 2, all the variables had significant associations with walking and biking in Model 3, showing much-enhanced R-squared, AIC, and BIC values. Tree density, transit availability, and crime had significant associations with higher odds of both walking (OR: 1.05, 1.92, and 1.00 respectively) and biking (OR: 1.00, 1.09, and 1.00 respectively) compared to in-vehicle status in Model 3. Neighborhood median household income, density of neighborhood African Americans, and traffic accidents crash were associated with higher odds for walking (OR: 1.00, 1.00, and 1.01 respectively), but with lower odds for biking (OR: 0.99, 0.99, and 0.99 respectively) compared to in-vehicle status. On the other hand, parks and open spaces were associated with lower odds of walking (OR: 0.98) and biking (OR: 0.99) against in-vehicle status.
To determine the effects of all the environmental factors in the probability scale on walking or biking, the average marginal effects were calculated. In Model 3, the average marginal effect of transit availability on walking was the highest (0.1061) among all the predictors. It indicated that the probability of walking is approximately 10 percentage points higher for areas with great transit accessibility than areas with low levels of transit accessibility. It was also found that when the surrounding environments had higher tree density, there was approximately a 1% higher probability of walking, on average, than in areas with lower tree density.
5. Discussion and Conclusions
This study explored how different buffer sizes affect the associations between ATMs and multiple environmental factors (including the physical, social, and safety environments) in the estimation of spatially-immediate and temporally-momentary exposures around individuals’ GPS trajectories for PA and transportation research. In addition, the sensitivity analysis with different buffer sizes addressed the UGCoP by showing that the study results are sensitive to the choice of different sizes of buffers and large buffer sizes have more significant findings. Among the three models, Model 3 had more significant variables across different buffer sizes, and 200 m was the most appropriate buffer distance for the model. Based on the ORs and the significance levels of the environmental variables, the study found that buffer size has an influence on the associations between ATMs and the environmental factors, and the findings about the ORs and significance levels are clearly different in walking and biking. Specifically, the associations between biking and/or walking and parks and open spaces, crime, and traffic collisions do not remain consistent, showing an increase or decrease in the ORs and moving from a positive to a negative association or vice versa as the buffer size increases. A possible explanation for this inconsistency is that the changes in the direction of the associations between 20 m and 30 m, when compared to other sizes, are caused by the insufficient size of the 20 m buffer areas, which do not include any park areas and incidents of crime and traffic collision around individual GPS trajectories. Among these three predictors, parks and open spaces particularly showed a decrease in magnitude in its influence on biking over in-vehicle status, which corroborates Houston’s findings [26
The associations between ATMs and the environmental factors are more sensitive for biking than for walking, showing varying statistical significance levels across different buffer sizes over parks and open spaces, transit availability, crime, the density of neighborhood African Americans, and traffic collisions. One common characteristic in the outcomes is that non-significant associations become significant when the buffer size reaches a relatively large distance, like 150 or 200 m. Using 200 m buffers, in this study particularly, produced more significant variables in Model 3 and obtained better model fit assessments based on several pseudo-R-squared measures than the other two models. Furthermore, Model 3 when using 200 m buffers showed the best fit when compared to all the other, shorter buffer distances. Neighborhood-level demographics and socioeconomic characteristics derived with each buffer at a GPS point played an important role in producing the better model, which may be relevant to individuals’ perceptions of opportunities for health-promoting behaviors [48
]. Thus, the evidence on the existence of buffer-size effects on multiple environmental factors obtained in this study provides more systematic insights into PA and transportation research than previous studies regarding GPS-based buffers [10
In the physical environment, the percentage of tree areas as a proxy of greenness derived with 200 m buffers around each GPS point was one of the consistent predictors, showing stable and significant associations in the three models. Tree density has higher ORs for walking and biking when compared to in-vehicle travel. With the objectively measured tree density, this study shows that greenness is likely to be associated with more walking (relative to motorized travel modes), which is consistent with the findings in previous studies [38
]. The role of parks and open spaces in promoting walking and biking is, however, inconsistent with other studies, suggesting that the more park areas that individuals are exposed to in their daily trips, the more significantly the association with lower ORs of active travels, compared to the motorized travel mode [4
]. One possible explanation is that some adults intentionally take a detour when they drive home to enjoy the fleeting natural landscape, including green space, which may give in-vehicle status higher odds than the two ATMs (Bell et al., [54
]). In addition, since parks and open spaces have more complex characteristics, such as quality and availability (Lee & Maheswaran, [55
]), which may affect their associations with the use of ATMs, the findings about the effects of parks and open spaces are not as consistent as those of tree density. The higher ORs of non-motorized travel modes—walking and biking in this study—against in-vehicle status with regard to transit availability also correspond to past studies, indicating that transit facilities encourage people’s use of ATMs and PA [41
]. Thus, to promote walking or biking, urban planners may need to consider such effects of trees, parks and open spaces, and transit availability on active travels.
One piece of salient evidence that this study yielded is that safety-related factors, including crime and traffic collisions, have significant associations with walking and biking. Compared to traveling by private vehicles or public transit, more traffic collisions involving pedestrians and pedal cyclists is significantly associated with a lower likelihood of biking, which provides empirical evidence that traffic collisions constrain PA [56
]. Conversely, there are mixed findings concerning walking. Unlike biking, walking is more likely to occur in areas with more traffic collisions. Furthermore, a higher incidence of battery is significantly associated with higher ORs of walking and biking when compared to in-vehicle status. The higher ORs of walking relative to in-vehicle status in the associations between ATMs and crime and traffic collision were unexpected, suggesting that walking is more likely to happen than in-vehicle status in areas with more crime cases and traffic collision in the immediate surroundings. One possible reason behind these inconsistent associations is that larger buffers included more crimes and traffic collisions around places where walking and biking occurred, and this may have affected the results. Furthermore, with the different findings in the associations between walking and biking and traffic collisions, this study provides empirical evidence that the mechanisms underlying the associations between different travel modes and environmental factors may not work identically. The influence of neighborhood median household income and African American density also indicates that some environmental factors could have opposite effects on the two active modes. For example, walking is likely to be performed in neighborhoods with a high percentages of African Americans and high median household incomes compared to in-vehicle status, while biking has the opposite outcomes.
The optimized travel mode classification algorithm adopted to automatically identify walking, running, biking, and in-vehicle status is one of the innovative parts of this study. Such automatic classification of travel modes only uses GPS trajectories and achieved remarkable accuracy in identifying those four travel modes. With the newly-adopted travel mode classification algorithm, this study suggests a novel way of using estimated travel modes in health, transportation, and urban planning research to understand individuals’ dynamic exposures to environmental factors and their impacts on individuals’ PA, taking into account people’s daily trips recorded by GPS trajectories.
This study, however, has some limitations. First, this study did not consider trip chains as an analytical unit. Physical activity research tends to use each GPS point as the analytical unit for exposure estimation, while transportation research uses trips as the analytical unit to estimate exposures around people’s travel routes [57
]. Specifically, the point-by-point approach can make the results more sensitive to the GPS points with low accuracy. Second, this study does not address the biases associated with selective daily mobility. Because this study focused only on associations rather than causal relationships, it could not ascertain the causal effects of environmental factors on people’s travel (c.f., [58
]). For example, this study could not identify the reasons why people selected a particular type of environment to walk or bike or why exposure to specific environments made people perform walking or biking. Third, correlations between the observations were not addressed in this study. Due to the large number of GPS points at the high frequency (10-s interval), consecutive GPS points may have had very similar values of environmental characteristics for a subject. Observations from different subjects can also be related to each other, since the GPS data were collected from household members, and therefore, parents and their children and siblings may share the same trips. Observations should be independent in many statistical tests, and the ignorance of the correlations between observations can cause the overestimation of p
]. This issue should be examined in future studies by, for instance, comparing the results to those obtained through random samples of the GPS points. Fourth, the optimal buffer size identified in this study may not be generalizable. The optimal buffer size may extend beyond 200 m for other study areas, and different environmental factors may have different optimal buffer sizes. Hence, such variabilities should be further investigated. In addition, the sampling rate of the GPS points may affect the consistency of the results. In this study, GPS points were sampled at a 10-s interval to generate a smaller GPS dataset and to increase computational efficiency for generating the buffers. However, coarser (e.g., 60 s) or finer (e.g., 5 s) sampling scales may have implications for the findings on the associations between ATMs and environmental factors and the effects of different buffer sizes on these associations, since an initial analysis with the GPS dataset at a 60-s interval obtained somewhat different results for some buffer sizes, although mostly similar in general. Further, the intensity of the ATMs was not considered in this study, which could enrich our understanding of the associations. Walking, for instance, can be further divided into light and brisk walking depending on its intensity, which may be affected differently by different environmental factors, as many studies have demonstrated by using accelerometers to identify the intensity levels of individuals’ PA. Lastly, the sample of subjects used in this study is not representative of the larger population of the study areas. The participants in the sample were mostly wealthy, middle-aged whites, which restricts the applicability of our findings about the effects of different buffer sizes on the associations between ATMs and environmental factors to this specific social group. Further, this study did not deal with the temporal aspects of the buffer size effects. Different buffer sizes may have different effects on the study results at different time points, and this should be addressed in future studies in order to better understand the time-sensitive effects of buffer size.
Considering these limitations, future work should further investigate different delineation methods and their impacts on research findings. More aggregated methods, such as activity space and kernel density that are based on trips instead of each GPS points should be explored to mitigate the problems due to the use of a point-by-point approach and to take into account the correlations between observations. The correlations among observations can also be addressed using specialized statistical tests that consider the hierarchical structure of trips and participants [59
]. Nonlinearity that might exist in the associations between ATMs and environmental factors should also be handled in future work using nonlinear statistical models. In addition, the impacts of GPS data sampling rates on research findings will need to be examined. Instead of the 10-s interval, it would be useful to compare the results obtained with the original 5-s intervals and larger intervals to see how the study results vary depending on the sampling rate. Such an investigation will contribute to mobility research in various fields by suggesting a minimum sampling frequency for GPS data. In addition, the categories of ATMs need to be expanded by considering the intensity of ATMs, which can be based on data obtained with accelerometers or on estimations using people’s physiological information, such as age, height, weight, and velocity of walking or biking. Moreover, further research is needed to enhance our understanding of the inconsistencies in the results by focusing on different genders, racial or ethnic groups, and socioeconomic groups. Spatio-temporal analysis will also be needed for exploring some of the predictors, such as parks and open spaces and safety-related factors, which may be time-sensitive and vary between weekdays and weekends.