Characteristics of the study population are given in Table 1
. Out of 31,126 persons screened for the 1999–2004 NHANES cohorts, 49.3% (n = 15,332) were aged 20 or older. Of these, 29.3% (n = 4,496) were included in the laboratory assessment of PCBs. However, 377 had missing values for all PCB analytes, and were therefore not included in further analyses, leading to a final sample of 4,119 individuals. In this population, 43.9% met one of the three criteria to be categorized as hypertensive. Among these, 72.3% reported having had a physician diagnosis of hypertension, 67.6% reported taking an antihypertensive medication, and 52.9% had elevated blood pressure during the medical exam. Nearly one-fifth of those classified as hypertensive had only elevated blood pressure (no diagnosis or antihypertensive medication; 19.7%). The systolic and diastolic blood pressure readings for these individuals were generally borderline, with means of 152.4 (SD = 78.6, interquartile range [IQR]: 142.0–161.0) and 78.6 (SD = 17.6, IQR: 70.0–90.7), respectively. Among those with a physician diagnosis of hypertension, 82.5% were on antihypertensive medication. On average, those with hypertension were older (mean age of 60.8 [SD = 16.7] years compared to 40.4 [SD = 15.6] years) and had higher BMIs (mean BMI of 29.5 [SD = 6.4] compared to 27.1 [SD = 5.7]). The average total cholesterol among those with hypertension was somewhat higher compared to normotensive individuals (206.3 [SD = 44.2] mg/dL compared to 198.3 [SD = 41.9] mg/dL), as were the average systolic and diastolic blood pressures (141.3 [SD 22.8] mm Hg compared to 114.6 [SD = 11.3] mm Hg, and 73.1 [SD = 17.0] mm Hg compared to 68.5 [SD = 10.8] mm Hg, respectively).
In multivariate logistic regression analysis, the covariates most strongly associated with risk of hypertension were age, race/ethnicity and BMI (all p-values < 0.0001); strong associations were also noted for serum lipids (p-value = 0.0015) and family history of CVD (p-value = 0.0021). Compared with non-Hispanic whites, non-Hispanic Black race/ethnicity was associated with higher risk (OR = 2.04, 95% CI: 1.64–2.54), while Mexican-American race/ethnicity was associated with reduced risk (OR = 0.78, 95% CI: 0.64–0.96). A 10-year increase in age was associated with an OR of 2.09 (95% CI: 1.98–2.20). Physical activity (p-value = 0.14), total cholesterol (p-value = 0.11), current smoking (p-value = 0.86) and sex (p-value = 0.60) were not strongly associated with risk of hypertension. Regardless, all of these covariates were retained in further analyses as a priori selected potential confounders.
The highest PCB concentrations were seen for PCBs 153, 180 and 138 (geometric means [GMs] of 0.20, 0.15, and 0.14 ng/g, respectively; Table 2
). Serum concentration of total PCBs were similar by gender (GMs of 1.1 ng/g for both males and females), increased with age (GM of 0.6 in those aged 20–39 compared to 2.3 in those aged 70 or older), and were highest among non-Hispanic Black and non-Hispanic White participants (GMs of 1.3 and 1.2 ng/g, respectively). Total PCBs also varied by body mass, with the lowest levels seen among those in the normal weight category (18.5 ≤ BMI < 25: GM of 1.0 ng/g) and the highest among those in the overweight (25 ≤ BMI < 30) and underweight (BMI < 18.5) categories (GM of 1.2 for both); those in the obese category had a GM of 1.1 ng/g. Finally, participants with hypertension had higher total PCB levels compared to non-hypertensive subjects (GMs of 0.8 and 1.6 ng/g, respectively).
presents the results of multivariate logistic regression analysis, where each PCB or congener grouping is included one at a time in a model with potential confounders. There was an increased odds of hypertension among those in the highest compared to the lowest quartile of total PCBs (OR = 1.38, 95% confidence interval [CI]: 1.02–1.87). The association was of borderline significance when treating total PCB serum concentration as a continuous variable (beta = 0.0689 [SE = 0.0360]). Congener groups based on activity (estrogenic, dioxin-like) and structure (mono-ortho
, tri- and tetra-ortho
substituted) were also significantly associated with increased risk of hypertension, although results varied with treatment of the exposure variable (continuous or categorical). In the congener-specific models, there was a significant association between risk of hypertension and numerous PCBs, although in some cases the effect estimates from categorical and continuous models showed opposing effects—a decreased risk of hypertension associated with the middle quartiles of exposure but increased risk predicted from the continuous exposure model. For example, PCB 66 was significantly associated with increased odds of hypertension when treated as a continuous variable; however, odds ratios for the increasing tertiles of exposure above the LOD were 0.95, 0.77, and 1.13. It is possible that the significant association in the linear exposure model is driven by the change from the second to the third tertile (i.e.
, from 0.77 to 1.13), which is larger in magnitude than the shift from the referent to the first, or first to second groups. Similarly for other PCBs with this pattern—the increase in risk in one region of the exposure-response curve may have large influence on the single slope estimate in the linear exposures model. We explored potential non-linear or non-monotonic relationships between PCB concentration and risk using GAMS. The GAM form tested had 4 degrees of freedom; one degree is taken by the parametric, linear part of the model, and 3 remain for the smoothed spline. In nearly all cases, the linear portion of the association was statistically significant at the alpha = 0.05 level. The spline component was statistically significant for both grouped PCB concentrations (estrogenic, mono-ortho
substituted) and specific congeners (PCBs 74, 99, 118, 138, 146, 153, 156), indicating non-linearity in the association with risk of hypertension. Most of these showed roughly quadratic relationships, similar to that shown in the partial prediction plot for PCB 138 [Figure 1(a)
The spline component was not necessarily significant for the PCBs which showed decreased risk in certain exposure categories, but overall increased risk in the linear exposure model, such as PCB 101. As seen in the partial prediction plot for PCB 101 in Figure 1(b)
, the exposure-response relationship is relatively flat over much of the exposure range, then rises in a relatively linear fashion; thus, although the categorical model results indicated decreased risk in lower exposure ranges, the overall curve does not show a significant departure from linearity. The slope estimate corresponding to the linear portion in the GAM analysis was generally similar (in direction and magnitude) to the slope estimated from the simple linear exposure model (for example, slope estimates of 4.05 in the GAM analysis and 4.05 in the linear exposure model for PCB 52). The exception is PCBs 156 and 157, where the direction and magnitude both changed. The partial prediction plot for PCB 156 (similar in shape to that for PCB 157) in Figure 1(c)
shows a relatively flat relationship in the lower exposure range, followed by a steep drop off; for comparison, the partial prediction plot for PCB 138 (similar to the plots for the other PCBs with significant spline components) shows a more gradual change, with increased risk in the low dose region tailing off as exposure increases. The shaded area in each plot is the Bayesian 95% confidence band around the estimate, and indicates greater uncertainty with increasing PCB level; this is expected, since few individuals had very high serum concentrations.
In order to investigate potential multicollinearity among PCB congeners, a logistic regression model was constructed with all PCB congeners included as predictor variables (along with the same potential confounders indentified above). In this model, multiple congeners showed an association with hypertension risk—PCBs 99, 118 and 128 had p-values<0.05, while PCBs 105 and 167 had p-values between 0.05 and 0.10. However, regression diagnostics indicated the presence of multicollinearity between PCBs 157 and 167; PCBs 170 and 180; and PCBs 146 and 153. Therefore, we continued to the next set of analyses to explore potential clustering of variables and data reduction.
Cluster analysis identified 4 clusters of similarly acting PCBs (Table 4
). Based on these results, four new variables were created as the sum of the concentrations of the congeners within each cluster. However, although these clusters explained over 80% of the variance, only two of the cluster variables were significant in multivariate logistic regression (p-values of 0.13, 0.001, 0.09 and 0.02). Discriminant analysis identified 12 congeners as the most informative. When including these 12 in a discriminant analysis, 35.33% of cases were classified correctly. Principal component analysis showed that four components had eigenvalues >1; in the first component (eigenvalue of 13.21), there was no single dominant PCB and all weights were positive. Most congeners had weights between 0.2 and 0.3, with a few (PCBs 52, 66, 101, 128) having weights lower than this range. The remaining three components had much smaller eigenvalues (2.14 to 1.03) and a wider range of weights, including both positive and negative values. These four components were entered into a multivariate logistic regression model along with previously identified potential confounders. All of the factors were associated with odds of hypertension, although with borderline statistical significance (p-values ranging from 0.05 to 0.08).
Finally, non-linear optimization was used to construct a maximally informative weighted sum of the standardized PCB concentrations. The initializing parameter values gave equal weight to each congener, and the Newton-Raphson with line search technique was used to determine the weights which maximized the log-likelihood function for the logistic regression model. The congeners with non-zero contributions to the weighted sum were: PCBs 66 (weight = 0.32), 101 (weight = 0.08), 118 (weight = 0.22), 128 (weight = 0.09) and 187 (weight = 0.30). A new variable was constructed to represent this weighted sum of the centered congeners, which was significantly associated with hypertension risk in the multivariate logistic regression model (beta = 0.39 [SE = 0.09], p-value < 0.0001).
As a sensitivity analysis, these procedures were repeated using lipid adjusted PCB serum concentrations rather than including serum lipids as a separate covariate. Results were very similar to those for the main analysis, but in most cases the effect estimates were somewhat attenuated.
This study used epidemiologic data demonstrate an approach to analyzing complex exposures. Classical methods in epidemiology may not be adequate, due to the large number of exposures relative to the sample size, correlated exposures, and exposures of varying ranges and potencies. This example begins with standard epidemiologic regression analyses. Further steps are to investigate potential non-linearity of the association between exposure and outcome using splines, and multicollinearity among predictors using regression diagnostics. Next, cluster analysis, discriminant analysis, and principal component approaches are used to identify most informative congeners and clusters of congeners. Finally, non-linear optimization is used to construct a maximally informative weighted sum of the multiple congeners. Each of these analytic approaches has strengths and limitations. Multiple approaches may be used to investigate non-linearity in the exposure-response relationship. We chose to evaluate PCB exposure as a categorical variable and to construct splines to assess non-linearity in relation to hypertension risk. The use of categorical variables is straightforward, but results and interpretation may depend on cut-points selected (which in turn may be dependent upon sample size and distribution of exposure among cases and non-cases). Further, if the study population has no unexposed individuals, the choice of a referent group may be problematic. The use of splines offers an advantage in that it is not necessary to select arbitrary cut-points, but results are less easily interpreted and may depend on knot selection and placement. Evaluating correlation among exposure variables is also important in understanding their cumulative impact on risk of a given outcome. One option is to use regression diagnostics. However, there is no clear definition or cutoffs to identify multicollinearity, and interpretation may depend on model form. As an alternative, discriminant analysis may be used to identify the most informative exposure variables—those which have the greatest ability to discriminate between cases and non-cases. Similarly to regression variable selection, discriminant analysis may use forward, backward or stepwise selection of exposure variables. Results may differ depending on the entry method used and criteria for retention; further, only one variable is entered or removed at a time, which does not account for relationships among variables not already in the model. Two options to identify related exposures are cluster analysis and principal component analysis. Both techniques are commonly used for data reduction—cluster analysis groups variables with the goal of finding clusters that are as correlated as possible within the cluster, and as uncorrelated as possible with variables outside the cluster. However, while this approach identifies correlated variables, it does not necessarily provide insight into which are the most informative among the cluster members, or the best way to combine information from cluster members. As an alternative, principal component analysis creates new, uncorrelated component variables from linear combinations of the original, correlated variables. These new components are structured to explain as much of the variance as possible, through selection of the variables in each component, and the weight assigned to each variable. One limitation of these analyses is sensitivity to scale, for example if variables have different units or ranges of distribution. In addition, the interpretation of the new component variables is not always straightforward. To address this issue of interpretation and provide another alternative, we used a non-linear optimization approach to construct an optimally weighted linear combination of exposure variables. In contrast to principal component analysis, where the goal is to maximize the proportion of variance explained, this approach maximizes the likelihood function associated with the statistical model. The variables are centered prior to analysis, and a linear constraint is imposed that the weights sum to one. This way, the weights assigned provide a sense of ‘how much’ of the exposure-response relationship is due to any one exposure variable. Taken together, these approaches augment traditional epidemiologic analyses in the situation where there are multiple, possibly correlated predictor variables of varying potencies and ranges.
The case study examined PCB body burden and risk of hypertension, since previous studies have provided suggestive evidence for this association. We found that in the NHANES, serum concentrations of PCBs did vary by demographic characteristics and hypertensive status. Serum concentrations tended to be higher among older participants. This is likely due to the restrictions of PCB production in the 1970s, as increasing age reflects a longer period of exposure. There was also variation by body mass, and different patterns by BMI category among men compared to women. This variability may be due to the storage of lipophilic PCBs in adipose tissue; individuals gaining adipose tissue may sequester more PCBs (thus lowering serum levels) while those losing adipose tissue release more PCBs into the bloodstream. Such changes may occur during weight gain or loss, pregnancy and lactation [10
We observed associations between certain PCB congeners and hypertension risk, after controlling for potential confounders. The most informative congeners identified by the weighted sum approach were PCBs 66, 101, 118, 128 and 187, with PCBs 66 and 118 (which have a nearly identical structure) and 187 having the largest weights. Each of these was also statistically significantly associated with risk of hypertension in individual congener-specific models. However, these five congeners do not share a common structure (PCBs 66 and 118 are mono-ortho
substituted, PCBs 101 and 128 are di-ortho
substituted, PCB 187 is tri-ortho
substituted), and two of the five (PCBs 66 and 128) are considered estrogenic. A 2008 study by Everett et al.
identified seven PCBs (including PCB 118 and 187, although PCB 66 was not examined) associated with hypertension risk in the 1999–2002 NHANES cohort; each of which were also identified as most informative in these analyses. The authors hypothesized that the specific arrangement of chlorine atoms may explain differences in association even among congeners similar in structure or activity [14
]. Goncharov et al.
examined the relationship between total PCBs and blood pressure among Alabama residents living near a PCB production facility, and found significant associations with not only clinical hypertension, but also systolic and diastolic blood pressure [15
]. A subsequent study in the same population further examined the relationship between PCB exposure and systolic and diastolic blood pressure, and reported significant associations even among those in the normotensive range [4
]. Total PCBs, di-ortho
and tri- and tetra-ortho
PCBs were associated with both systolic and diastolic blood pressure, and there were borderline associations between estrogenic PCBs and systolic blood pressure, and mono-ortho
PCBs and diastolic blood pressure. One consideration in comparing these findings to those of the present study, is the difference in study populations. The NHANES is a population based sample, with generally low levels of exposure to many PCB congeners, while the Alabama population has relatively high exposure to a specific industrial PCB mixture (in addition to background sources of exposure). Although not directly comparable to the present study due to differences in the outcome measured, we also found increased risk of hypertension with increasing levels of each of these groups.
There were some limitations in this analysis. Hypertension status was based on several measures, including physician diagnosis, taking an antihypertensive medication, and measured blood pressure. Some of these measures may be more sensitive and specific than others, so as a sensitivity analysis, we re-defined the hypertensive group to include only those with physician diagnosis with medication, or physician diagnosis without medication but with elevated blood pressure. In this group, increased risk was noted for total PCBs based on categorical analysis (ORs for quartiles 2, 3, and 4 vs.
quartile 1 of 1.43, 1.59 and 1.40, respectively); this appeared to be due mainly to increased risk associated with the di-ortho substituted PCBs (ORs for quartiles 2, 3, and 4 vs.
quartile 1 of 1.54, 1.59 and 1.38, respectively). Since the NHANES is a cross-sectional design, it is not possible to establish the temporality between exposure to PCBs and risk of hypertension. PCBs do have a relatively long half-life (on the order of years), so that exposure assessment during the NHANES examination may be representative of exposure during a time period relevant to the etiology of hypertension. When an analyte concentration was below the LOD, the value was replaced with LOD/√2. However, this method has been shown to perform well when the number of non-detects is moderate [16
]. The limits of detection have changed with subsequent NHANES cycles (generally becoming lower), which affects the proportion of individuals with non-detectable analyte levels across cycles. We combined three cycles to increase sample size and number of individuals with detectable levels of PCBs, but it is possible that changing detection limits may affect these results. We did not adjust for multiple hypothesis testing (all p-values are uncorrected), since the purpose of this analysis was to present a methodological approach rather than explore the relationship between PCB body burden and risk of hypertension. However, adjustment for multiple testing would decrease the number of significant findings. Finally, the biological mechanisms for observed associations between PCBs and hypertension risk are not clear; this is underscored by the fact that congeners with similar structure or biological activity may not have similar associations with the health outcome, even after attempts to correct for differences in exposure level. Strengths of this analysis include the use of the multiple NHANES cycles to generate robust estimates, consideration of multiple important PCBs, and the ability to control for important covariates.