Lead is an important toxic mental that induces numerous adverse clinical outcomes and antisocial behavior in children and adults [1
]. Lead exposure plays a significant role in life course since prenatal lead exposure could bring forth lasting damage to childhood, adolescence, earlier in adult life or across generations [1
]. Both prenatal and postnatal exposure to lead had relationship with reported antisocial acts [2
]. A number of studies have shown that even low lead exposure in early childhood will cause behavior, hearing, and learning problems in children’s growth and development. No lead level is considered safe [3
]. It is known that the major environmental lead sources include lead-based paint, lead in indoor dust, lead in soil, and lead in water [5
]. Children under six years of age have the highest exposure to lead due to greater hand dust contamination, frequent hand-to-mouth transfer, and higher absorption rates [10
]. The most common physiological effects of lead poisoning include neurological and neurobehavioral, lower IQ, and slowed growth and anemia [1
During the last decades the U.S. federal and state governments have devoted considerable efforts for lead hazard screening, mitigation, and control via intervention regulations and education programs. For example, the Lead Hazard Control Program of New York have assisted in reducing lead paint hazards in houses and provided lead inspection for rental properties and homes built before 1978 in the city of Syracuse (NY, USA), [11
]. Although the national surveys and local surveillance data indicate that children’s blood lead level (BLL) (e.g., ≥10 g/dL) has declined over time [12
], the incident of lead poisoning remains relatively high in some inner cities of USA. For example, in the city of Syracuse, a total of 4903 children under the age of 6 years were tested at least once in 2011. The screening results indicated that 168 (3.43%) of these children had BLL ≥10 μg/dL and 900 (18.36%) had BLL ≥ 5 μg/dL [16
Therefore, lead poisoning is still a concern of public health in the inner and old industrial cities. The urban environment usually suffers from historically emitted leaded gasoline, smelters, iron and steel production, lead-acid-battery manufacturing, nonferrous (brass and bronze) foundries, landfills, waste incinerators, sewage sludge incinerators, hazardous waste sites, power plants (refineries/coal burning), and older homes that contain lead based paint/pipes [18
]. Studies have shown that the incident rate of children’s lead poisoning was associated with environmental lead hazard exposure and socio-economic status such as the population density, proportion of houses built before 1950, percentage of poverty level and minority [19
]. For example, Haley and Talbot [21
] and Morrison et al. [22
] showed that the age of houses was one of the most significant factors for predicting the children’s blood lead levels. However, in reality, the relationships between children’s blood lead levels and environment risk factors are not simple and homogeneous in nature, because the socio-economic status of households may be clustered in geographical areas of the inner cities [20
]. The governments’ lead hazard control projects have also changed the geographical distributions of lead contamination exposure over time. All those factors make the relationships between children’s lead exposure/poisoning and environmental risk factors complicated and heterogeneous over space [19
Regression models have been widely used in statistical and ecological analyses of epidemiological research. However, most published regression models are global in nature, assuming that the relationships between response variable and risk factors are constant or spatially homogenous. This assumption may not be reasonable for many cases in health studies [25
]. From a policy-making viewpoint, a global model is useful to summarize how health outcomes are shaped, but does not provide detailed information for effective and efficient regulations and interventions [26
]. On the other hand, the spatially varying relationships may provide us opportunities to search and identify the key and specific areas across a large geographical study region. Particularly, a localized regression method, geographically weighted regression (GWR), has become popular in different disciplines in recent years (e.g., [11
]). The applications of GWR in epidemiological research helped to identify the “hot spots” of the inner cities where the disease rate is more severe than other areas, which may be more important and useful to policy makers for the purpose of planning and management [24
In addition, most regression models, including both global and GWR models, focus on the “average” relationships between response variable and predictor variables, and provide the prediction on the conditional mean (i.e., central behavior) of the response variable given the values of the predictor variables [35
]. In practice, people may also be interested in the conditional percentiles (or quantiles) and/or extremes of a response variable distribution in a linear model in order to explore more complete and comprehensive relationships between response variable and predictor variables [36
]. For this purpose, quantile regression has been used in different study fields since its development [38
]. Zhang et al. compared quantile regression with other regression methods for modeling the self-thinning upper boundary line of forest stands [39
]. Koenker and Hallock used quantile regression to study re-employment bonus at different quantiles in microeconomics [40
]. Levin applied quantile regression in education to study the class size and peer effects [41
]. Wei et al. utilized quantile regression for growth charts where the percentile curves are commonly used to screen abnormal growth [42
]. The main advantages of quantile regression include: (1) it is flexible to model data with heterogeneous conditional distributions and robust to outliers; (2) it makes no distributional assumption about the error term in the model so that it provides efficient estimates of model coefficients under non-Gaussian conditions; and (3) it can estimate multiple rates of change (slopes) from minimum to maximum of the response variable, providing a more complete view of the relationships between variables. Thus, quantile regression is especially useful in applications where limiting factors or extremes are more important than the averages, e.g., the upper quantiles of air pollution levels are more critical to public healthcare [36
Further, Chen et al. developed geographically weighted quantile regression (GWQR) to integrate the GWR methodology with the quantile regression framework [44
]. This innovative approach provides a foundation for modeling spatially nonstationary relationships between variables at a range of conditional quantiles of the response variable distributions. Since then, however, to our best knowledge there were very limited applications of GWQR. The objective of this study was to apply global quantile regression (QR) and geographical weighted quantile regression (GWQR) to model the relationships between children’s lead poisoning and environmental risk factors referenced at different quantiles, rather than at the conditional mean, of the response variable. Hopefully, quantile regression models enabled us to explore the full distribution of children’s lead poisoning and identify “high risk” locations or areas in the neighborhood of the inner city of Syracuse, NY, USA.
The geographical map of the observed proportion or incident rate of children’s BLL ≥ 5 µg/dL is illustrated in Figure 2
, and the geographical maps of the three predictor variables are shown in Figure 3
. The higher incident rates of children lead poisoning existed in the center of the city, along the northern and southern sides of the non-residential area (commercial centers), where the residential houses were relatively older (built before 1920) with lower taxable values (<40 K, especially on the southern side), as well as relatively higher soil lead concentration (Figure 3
Global quantile regression models were fitted with the three predictor variables at τ = 0.25, 0.50, 0.75 and 0.90 quantiles (Table 3
). It was evident that at each quantile, the regression coefficients of both building year and town taxable values were negative, indicating that the incident rate of children lead poisoning reduced with newer buildings and/or higher taxable values of the houses. In contrast, soil lead concentration was positively related to the incident rate of children lead poisoning, implying that higher soil lead concentration around the house may cause higher risks of children’s lead poisoning. Further, these negative or positive relationships between response variable and three environmental factors became stronger for larger quantiles (Figure 4
In terms of statistical testing on the regression coefficients, the predictor variables taxable values and soil lead were statistically significant at all four selected quantiles, while the building year was significantly related to the incident rate of the children’s lead poisoning at the 0.50, 0.75 and 0.90 quantiles, but not important at the 0.25 quantile (Table 3
). In order to compare the importance of the three predictor variables, all variables in Equation (6) were standardized to mean = 0 and standard deviation = 1 and the four global quantile regression models were re-fitted. The standardized model coefficients (STB) were also listed in Table 3
. It was interesting that the town taxable value was the most important predictor variable at each quantile, followed by the soil lead concentration around the house which had similar level of (positive) impact as the taxable value. In contrast, the building year had much smaller (negative) impact on the incident rate of children lead poisoning than the taxable value of the house (Table 3
We found that the existence of spatial autocorrelation and heterogeneity in the observed incident rate of children’s lead poisoning and the model residuals of the four global quantile regression models (Table 4
). The positive Moran’s I values indicated that the “high risk” census blocks were neighboring with the “high risk” blocks, while the “low risk” census blocks were neighboring with the “low risk” blocks across the inner city of Syracuse, NY, USA. Although the values of the Moran’s I were not particularly large, these spatial autocorrelations were statistically significant. In addition, the Moran’s I became smaller for in the model residuals for larger quantiles, implying that “high risk” census blocks (i.e., more children with BLL ≥ 5 µg/dL) were less spatially clustered (Table 4
The summary statistics of GWQR coefficient estimates were listed for the four selected quantiles in Table 5
, including the mean, median, minimum and maximum of these 1393 sets of GWQR model coefficients. It was clear, in terms of the means and medians of these spatially varying model coefficients, the patterns and trends of the coefficients were similar to those of the global QR model coefficients: (1) the coefficients were negative for the building year and taxable values, but positive for the soil lead concentration; and (2) these negative or positive relationships became stronger for larger quantiles (Table 5
However, across the geographical study area, the GWQR model coefficients varied from location to location. One way of assessing the spatial heterogeneity of the coefficients is to use the interquartile ranges (IQR) of the localized coefficients compared with the standards error of the global model coefficients. If the IQR is twice as large as the standard error, it indicates that spatial non-stationarity exists in the relationships between response variable and accompanying predictor variables [44
]. Table 5
indicated that most IQRs of the localized model coefficients were at least twice the standard errors of the global model coefficients, except for soil lead concentration at τ = 0.25 and 0.90. Our results suggested that the relationships between the incident rate of children lead poisoning and environmental risk factors indeed varied across the inner city of Syracuse.
illustrates the model predictions from both global QR model and GWQR model at the quantile τ = 0.75, because the observed 75th quantile of the response variable was 0.18 (Table 2
), indicating that 25% of these 1393 census blocks (or 348 blocks) had 18% or more children’s BLL ≥ 5 µg/dL among all tested children in the census block. Those census blocks may be considered “high risk” areas of children lead poisoning and should be identified for lead control treatment. The predictions from the global QR (τ = 0.75) model show relatively larger areas of the inner city of Syracuse would have 20% or more children having elevated BLL ≥ 5 µg/dL in those census blocks, or even >40% of children having BLL ≥ 5 µg/dL in those census blocks (Figure 5
a). One the other hand, the predictions from the GWQR (τ = 0.75) model show much smaller areas, or more specific “spatial spots” of 20% or more children having elevated BLL ≥ 5 µg/dL in those census blocks (Figure 5
b), which may provide more useful information to the government agencies for the lead hazard control programs.
In addition, the GWQR modeling method produced the model coefficients for each location in the study area at any specific quantile. Those sets of model coefficients can be mappable using geographic information system like ArcGIS. Following [44
], we constructed the geographic maps of model coefficients where the local t
-test exceed ±1.96 (i.e., statistically significant) at each quantile of τ = 0.25, 0.50, 0.75 and 0.90 for each predictor variable (Figure 6
, Figure 7
and Figure 8
shows the significant model coefficients of the Building Year. Starting on τ = 0.50, it seems that the significant negative model coefficients were associated with the old residential houses. The larger the quantile was, the more negative the model coefficients were, which were more concentrated on the older houses. For example, for τ = 0.90, the larger negative model coefficients were significant on the northern side of the inner city of Syracuse where the houses built before 1920 were located (Figure 3
a). Thus, those areas may have higher risk of children lead poisoning (recall that the observed 90th quantile was 0.40, indicating that 10% of these 1393 census blocks had 40% or more children’s BLL ≥ 5 µg/dL among all tested children in the census block).
shows the significant model coefficients of the Taxable Value, which was more important than the Building Year (Table 3
). It appeared that the significant negative model coefficients of the Taxable Value were strongly related to the residential houses on the southern side of the inner city of Syracuse (Figure 3
b). This local model coefficient was statistically significant across the almost entire inner city area for each quantile. Again, the larger the quantile was, the more negative the model coefficients were.
is the geographic distribution of significant coefficients for the soil lead concentration. Generally, the soil lead was positively related to the incident rate of children’s lead poisoning (Table 3
). The observed high soil lead concentration was related to older houses with lower taxable values (Figure 3
c). It seems that, for τ = 0.75, the larger positive model coefficients were significant on both northern and southern sides of the inner city of Syracuse, while for τ = 0.90, the significant positive model coefficients were more on the southern side of the inner city of Syracuse.
Quantile regression (QR) is aimed at estimating the conditional quantiles (e.g., 5th or 90th quantiles) of response variable given the predictor variables. Compared to the conventional OLS regression which models the average relationships, QR computes several regression curves corresponding to different quantiles or percentiles of the response variable. Therefore, the feature of QR is to offer more complete ranges of statistical models than the classical conditional mean regression [38
]. The regression relationships at the “average” level and at different quantiles are quite different. Shao et al. (2017) concluded that more recently built houses with higher average town taxable values would have lower likelihood of lead poisoning, while the children would have a higher chance of exposure to lead if the surrounding of the house had higher soil lead concentration for the “average” relationship via Poisson models [11
]. In this study, the global quantile models confirmed that influential directions extending to both lower and upper tails. This importance pattern of the three predictor variables was different from those of the four quantile models (Table 3
). In practice, when the upper or lower quantiles of the response variable are more interested to researchers or policy makers, QR is particularly useful for exploring the full range of the variable relationships [36
]. However, the global QR only offers a picture of global regularity. The issues of local spatial constituents, structure, variability, and complexity are still undetected.
The GWQR modeling method developed by [44
], which integrates QR into GWR, accounts for both spatial effects and quantile distribution of the response variable. Specifically, GWQR allows researchers to explore non-stationarity across the study area depending on various quantiles of the response variable. Those two dimensions are particularly important in the fields of epidemiology and public health. We present here a case study of environmental epidemiology, the children’s lead poisoning in Syracuse. It is well known that lead poisoning in the inner cities was caused by lead-based paint, lead in drinking water system, and lead in dust and soil. The census blocks of the inner cities were built at different time periods, so were the water pipeline systems, causing the non-stationary lead hazard exposure and clustered lead poisoning. Therefore, using localized information to study the children lead poisoning would be more precise than using all the data in the study area. Thus, we utilized GWQR to better identify the specific “high risk” spots of children lead poisoning in these at-risk neighborhoods.
The first advantage of GWQR is flexible for modeling data with heterogeneous distributions. Methodologically, it is extended from quantile regression, which makes no distributional assumption about the error term of the model. In other words, GWQR is not stick to the issues which may violate the assumptions of an OLS model such as normality. For example, in this study, about 50 percentages of blocks do not have children’s BLL ≥ 5 µg/dL. Thus, the overdispersion of the response variable violates the assumption of an OLS model, including both global and geographical weighted regression. Therefore, GWQR is unbiased when the tails and central location of the conditional distribution vary differently in the response variable [44
Although an increasing interest in modeling the locally varying relationships has emerged in recent years, most geographically weight models are limited to link the conditional mean of the response variable to the predictor variables. However, our results indicated that the magnitudes of the estimated model coefficients of all predictor variables were very different between the 50th quantile model against the 90th quantile model (Table 5
). It is noticeable that the not only the magnitudes of the model coefficients were dissimilar, but also the locations of important predictors were divergent in terms of significance tests (Figure 6
, Figure 7
and Figure 8
), indicating that the impacts of the environmental factors were spatially varying for different risk levels (i.e., different quantiles) of children lead poisoning. In this study, the observed 75th quantile of response variable was considered “high risk” areas of children’s lead poisoning and should be identified for lead control treatment. The important factors of soil lead consideration (Figure 8
c) were located at the high-risk clusters in central blocks of Syracuse (Figure 1
). This indicated that the soil concentration is strongly associated with high risks of lead poisoning. But in the previous study using a spatial hurdle model [52
], the soil concentration was not strongly associated with high risks of lead poisoning via average level. Therefore, GWQR is able to help researchers to explore the locally detailed relationships simultaneously across different conditional distributions of the response variable and provide deeper insights in terms of the disease prevention and protection. The identification of those “high risk” spots would assist the government agencies to make better decisions on where and what the lead hazard treatment should focus on.
Although GWQR is an improved regression method toward a spatial quantile-based analysis, some issues still need to be further developed and discussion. The first issue is the bandwidth selection. In this study, the variogram model was fitted through the variogram values in order to find the optimal bandwidth and kernel function [53
]. We tried three ways for the variograms, including the observed responsible variable, the residuals of the global OLS regression, and the residuals based on 0.25, 0.50, 0.75 and 0.90 global quantile models. All optimal models were resulted to Gaussian due to the smallest AIC. Estimated bandwidths were 2324, 2712, 2548, 2461, 2640, and 2544 m, respectively. Given the estimation of the bandwidths were very close, we chose 2324 m as the bandwidth referenced on the observed incident rate of the children’s blood lead level for the GWQR analysis. However, optimal bandwidth may be selected in terms of other criterions, such as subjective and smallest cross-validation error [54
]. The better bandwidth selection method for quantile regression, which uses linear programming distinct from OLS, demands further study.
Another issue in this study was the boundary effect of spatial data analysis. In the middle of our study region, the residential census blocks were separated by the non-residential blocks (commercial centers) in the center of the Syracuse and at the edge of map (Figure 2
). Therefore, some census blocks may be surrounded by the blocks defined by the bandwidth where there were no data of children’s BLL available. This could be a potential problem for fitting the GWQR models [53
]. Thus, additional information from other neighborhood cities may be required to correct that edge effect and improve model explanatory range.
Children lead poisoning has been a concern of public health in the USA, especially in some inner cities where the residential houses are much older with poor conditions. Thus, young children may have higher risks to exposure to lead hazard. Over the last two decades researchers have been using regression models, including both global and GWR models, to investigate the relationships between children lead poisoning and environment factors. However, most regression models focus on the “average” variable relationships and provide the prediction on the conditional mean (i.e., central behavior) of the response variable. However, for the studies of environmental epidemiology, the conditional percentiles (or quantiles) of children lead poisoning may be more interesting to researchers and public agencies, because different quantiles represent different levels of “risks” of children’s exposure to lead poisoning. In this study we applied global quantile regression (QR) and geographical weighted quantile regression (GWQR) to model the relationships between children’s lead poisoning and three environmental factors at different quantiles (25th, 50th, 75th and 90th). The response variable was the incident rate of children’s BLL ≥ 5 µg/dL in each census block, and the three predictor variables were (1) building year, (2) natural logarithm of town taxable values (in thousand dollars), and (3) soil lead concentration (ppm).
Our results indicated that at each quantile, the regression coefficients of both global QR and GWQR models were (1) negative for both building year and town taxable values, indicating that the incident rate of children lead poisoning reduced with newer buildings and/or higher taxable values of the houses; and (2) positive for the soil lead concentration, implying that higher soil lead concentration around the house may cause higher risks of children’s lead poisoning. Further, these negative or positive relationships between response variable and three environmental factors became stronger for larger quantiles. The GWQR models enabled us to explore the full distribution of children’s lead poisoning and identify “high risk” locations or areas in the neighborhood of the inner city of Syracuse, NY, USA which would provide useful information to assist the government agencies to make better decisions on where and what the lead hazard treatment should focus on.