Next Article in Journal
The Contribution of Neighborhood Tree and Greenspace to Asthma Emergency Room Visits: An Application of Advanced Spatial Data in Los Angeles County
Previous Article in Journal
Children’s Perceived and Actual Physical Activity Levels within the Elementary School Setting
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bayesian Group Index Regression for Modeling Chemical Mixtures and Cancer Risk

1
Department of Biostatistics, School of Medicine, Virginia Commonwealth University, Richmond, VA 23298-0032, USA
2
UC Berkeley School of Public Health, University of California, Berkeley, CA 94704-7394, USA
3
Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20850, USA
*
Author to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2021, 18(7), 3486; https://0-doi-org.brum.beds.ac.uk/10.3390/ijerph18073486
Submission received: 19 February 2021 / Revised: 18 March 2021 / Accepted: 24 March 2021 / Published: 27 March 2021

Abstract

:
There has been a growing interest in the literature on multiple environmental risk factors for diseases and an increasing emphasis on assessing multiple environmental exposures simultaneously in epidemiologic studies of cancer. One method used to analyze exposure to multiple chemical exposures is weighted quantile sum (WQS) regression. While WQS regression has been demonstrated to have good sensitivity and specificity when identifying important exposures, it has limitations including a two-step model fitting process that decreases power and model stability and a requirement that all exposures in the weighted index have associations in the same direction with the outcome, which is not realistic when chemicals in different classes have different directions and magnitude of association with a health outcome. Grouped WQS (GWQS) was proposed to allow for multiple groups of chemicals in the model where different magnitude and direction of associations are possible for each group. However, GWQS shares the limitation of WQS of a two-step estimation process and splitting of data into training and validation sets. In this paper, we propose a Bayesian group index model to avoid the estimation limitation of GWQS while having multiple exposure indices in the model. To evaluate the performance of the Bayesian group index model, we conducted a simulation study with several different exposure scenarios. We also applied the Bayesian group index method to analyze childhood leukemia risk in the California Childhood Leukemia Study (CCLS). The results showed that the Bayesian group index model had slightly better power for exposure effects and specificity and sensitivity in identifying important chemical exposure components compared with the existing frequentist method, particularly for small sample sizes. In the application to the CCLS, we found a significant negative association for insecticides, with the most important chemical being carbaryl. In addition, for children who were born and raised in the home where dust samples were taken, there was a significant positive association for herbicides with dacthal being the most important exposure. In conclusion, our approach of the Bayesian group index model appears able to make a substantial contribution to the field of environmental epidemiology.

1. Introduction

There are more than 80,000 chemicals on the market in the United States alone and some are found in many consumer products [1]. Hence, individuals are exposed to chemical mixtures daily. Traditionally, epidemiologic studies of cancer and environmental chemical exposures have evaluated chemicals independently using a single-chemical regression approach [2,3,4,5,6,7,8]. More recently, there has been a growing interest in the literature on understanding the joint effects of multiple environmental risk factors for diseases [9,10,11,12] and an increasing emphasis on assessing multiple environmental exposures simultaneously in epidemiologic studies of cancer [13,14]. In this paper, we focus on exposure to multiple diverse environmental chemicals and develop a statistical method that expands on weighted quantile sum regression (WQS) [14] to model cancer risk. WQS regression was developed to identify the truly “bad actors” when modeling exposure to a chemical mixture in a risk assessment setting. This constrained regression method is designed to accommodate highly correlated data that create collinearity issues with traditional regression methods. In WQS regression, a weighted index of exposures is estimated, where the weights for each chemical exposure are constrained to be between 0 and 1 and sum to 1. This approach has been used in many studies of environmental mixtures and health outcomes. For example, WQS was used to model non-Hodgkin lymphoma risk related to a mixture of 27 chemicals in the NCI-SEER NHL study [15].
While WQS regression has been demonstrated to have good sensitivity and specificity when identifying important exposures [14,16], it has certain limitations. One limitation of WQS regression is that it uses a two-step model fitting process and a splitting of data into training and validation sets that decreases power and stability with small datasets that are common in epidemiology. Another limitation of WQS regression is that all chemical exposures in the weighted index are constrained to have associations in the same direction with the outcome. This constraint does not allow for the realistic situation when chemicals in different classes have different associations with a health outcome in both direction and magnitude. For example, there is evidence that insecticides have a negative association with non-Hodgkin lymphoma (NHL) [2], while organochlorine compounds such as some PCB congeners have a positive association with NHL [3]. Considering the multitude of diverse chemicals to which individuals are exposed daily, more flexible approaches to modeling environmental cancer risk are needed.
To overcome the single-index limitation of WQS regression, we have proposed grouped weighted quantile sum (GWQS) regression to enable multiple groups of chemicals in the model, where each chemical group can have a different magnitude and direction of association with the outcome [17,18]. GWQS moves the analytical approach to environmental risk assessment toward more realistic models of environmental exposures by estimating a weighted index for each group of exposures. A simulation study of GWQS demonstrated that it had better power, sensitivity, specificity, and goodness-of-fit than WQS when there were two or more groups of exposures [19]. GWQS also performed better overall than lasso and the group lasso with a minimax concave penalty. This simulation study showed the inability of both lasso and WQS regression to estimate exposure effects for realistic mixtures with different groups of chemicals that were positively and negatively associated with risk. Both WQS and lasso produced an effect estimate that averaged over positive and negative effects, resulting in effect estimates that were biased toward the null. While this assessment was encouraging for the application of GWQS in studies of environmental cancer risk, GWQS still has the limitation of a two-step estimation process and splitting of data into training and validation sets, which can result in reduced power in small epidemiologic studies.
We propose to use a Bayesian framework to create a more flexible and complex GWQS model that does not require two-step estimation. We have previously used Bayesian index regression to create single-index models of neighborhood deprivation and risk of elevated blood lead levels [20,21] and tobacco retail outlet rates [22]. In this paper, we extend the Bayesian index model to incorporate multiple exposure indices (similar to GWQS regression) and term the approach the Bayesian group index model, a new way to estimate the health effects of chemical mixtures.

2. Materials & Methods

2.1. Bayesian Group Index Regression

The basic Bayesian index regression model for a binary health outcome y i ~ Bernoulli ( p i ) is specified through the log-odds of disease for the ith subject as
logit ( p i ) = β 0 + β 1 ( j = 1 C w j q i j ) + z i T ϕ
where the left-hand side of the equation is the logit of the disease probability p i , w j is the weight parameter for the j th exposure with quantile score q i j for the ith individual, β1 is the effect for the index, and z i T is a vector of covariates with corresponding effects in vector ϕ . Quantiles are used instead of raw data to reduce the effect of outliers and account for different concentration scaling for different exposures. Any reasonable definition of quantiles could be used, including deciles. In this regression model there is one weighted index using C number of exposures. The weights w j represent relative importance of the exposures and are constrained to be between 0 and 1 and to sum to 1. Assignment of distributions for the model parameters completes the model specification. The index weights w 1 , , w C are given a Dirichlet prior with parameters α = (α1, …, αc). The Dirichlet prior is convenient because it assures that the weights w j ( 0 , 1 ) and j = 1 C w j = 1 . The intercept, index regression coefficient, and covariate regression coefficients are assigned vague normal priors, β 1 ~ N o r m a l ( 0 ,   τ 1 ) with precision τ 1 = 1 / σ 1 2 and σ 1 ~ U n i f o r m ( 0 , 100 ) . An improper uniform distribution α ~ d f l a t ( ) could also be used for the intercept, particularly if random effects are included in the model.
To better model multiple sets of diverse environmental exposures, we extend the Bayesian index model to a Bayesian group index model that allows for multiple exposure groups, each with potentially different direction and magnitude of association with the health outcome. The Bayesian group index model includes a weighted exposure index and associated effect for each exposure group. For example, a model for three groups of exposures is
logit ( p i ) = β 0 + β 1 ( j = 1 C 1 w j 1 q i j 1 ) + β 2 ( j = 1 C 2 w j 2 q i j 2 ) + β 3 ( j = 1 C 3 w j 3 q i j 3 ) + z i T ϕ
where w j 1 is the weight for the jth exposure in the first index, q i j 1 is the quantile for the jth exposure in the first index for the ith subject, and the weights and quantiles are defined similarly for the second and third exposure indexes. There is a variable number C k of exposures in each index and each index has a regression coefficient β k (k = 1, 2, 3 in this example). This model can identify the most important among the groups of exposures through posterior inference on the index effects β 1 , β 2 , β 3 and the most important variables in each index through posterior inference on the weights.
The priors for the parameter in this model are similar to the base Bayesian index regression model, with the single index priors extended for multiple groups. The weights for each index follow a Dirichlet prior and the index effect for each group follows a vague normal prior. Different choices of priors are possible for the index effect parameters. For example, a mixture prior with a penalty could be used for the index effects to overcome any model instability due to collinearity [23]. This model could also be extended to include a subject-level random effect ψ i to account for residual confounding at the individual level, and a natural choice for the prior would be ψ i ~ N o r m a l ( 0 , τ ψ ) with precision τ ψ = 1 / σ ψ 2 and σ ψ ~ U n i f o r m ( 0 , 100 ) . Markov chain Monte Carlo (MCMC) is used to estimate the model parameters. Convergence of the MCMC algorithm is done using the Gelman-Rubin or Geweke diagnostic statistics. Our implementation of the Bayesian group index regression model is available in an R package titled BayesGWQS [24] to facilitate use by other researchers.

2.2. Simulation Study Design

To assess the performance of the Bayesian group index model, we generated chemical concentration data over several different exposure scenarios, which varied in the amount of chemical correlation, the number of chemical groups, and the strength of association between each group and the outcome. There were four scenario sets (A–D) that varied in total group number and number of chemicals per group. We also considered different exposure effect strengths (Strengths 1–5). For each scenario set, we started with a null effect of odds ratio (OR) = 1.00 and then increased the association in strength for each chemical group (both negative and positive associations). In scenario sets A–C, for positive associations the strengths 2–5 denote ORs of 1.50, 2.00, 2.50, and 3.00, respectively, while negative associations were the reciprocals of these ORs (0.67, 0.50, 0.40, 0.33, respectively). To evaluate the ability of the models to estimate smaller true effects, the positive (negative) scenario set D had effect sizes of 1.00 (1.00), 1.25 (0.80), 1.50 (0.67), 1.75 (0.57), and 2.00 (0.50), respectively. Moreover, the sample size in scenario set D was reduced to 500 from 1000 in scenario sets A-C to evaluate model performance in smaller studies such as the California Childhood Leukemia Study (CCLS).
For the correlation amongst the chemicals, a weak, moderate, and strong correlation structure was considered in each scenario set. These three correlation structures were: weak (W) with correlation of 0.5 within group and 0.1 across group, moderate (M) with correlation of 0.7 within group and 0.3 across group, and strong (S) with correlation of 0.9 within group and 0.5 across group. Correlation structures for the chemicals were specified through a covariance matrix. This covariance matrix was generated via a vector of means and a vector of standard deviations that also allowed for generation of the data as multivariate normal. Four quantiles of the exposures were used in all simulations for computation of the weighted index in each group (e.g., q i j = 0 , 1 , 2 , 3 ).
For the first scenario set with 9 chemicals (Scenario set A), we generated five different chemicals in one group and four different chemicals in another group. The first group had a negative association with the outcome and the second group had a positive association (except for the null effect scenarios). Through setting of the chemical weights, two chemicals in each group were specified to be important and the remaining were set to be not important. The important chemicals in each group were given equal weight, with the weights summing to 1 for each group (e.g., two important chemicals in a group would lead to each having a weight of 0.5). Unimportant chemicals were assigned a true weight of 0.
For the second scenario set (Scenario set B), there were 14 chemicals allocated among three groups. Group 1 had a negative association and groups 2 and 3 each had a positive association with the outcome (excluding the null effects scenario). There was one important chemical in each group. The third scenario set (Scenario set C) was similar to Scenario set B except that group 2 had two important chemicals and groups 1 and 3 each had three important chemicals. Scenario set D had the same group structure as scenario set C. The different terms used in the simulation scenarios are summarized in Table 1. The terms are used to succinctly present the simulation study results. Each individual scenario is defined in Supplementary Table S1.
Based on the defined exposure scenarios, we replicated a case–control study with a relatively balanced number of cases and controls (50 ± 10% cases) for a binary outcome y in every iteration of the data generation. The outcome was distributed as y ~ B i n o m i a l ( n , p ) where p = 1 1 + e η and η = β 0 * + j = 1 K β j * [ W Q S j * ] and the star notation indicates the true parameter values. As no covariates were used to generate the data, the term z T ϕ = 0 . We simulated 100 data sets for each scenario to replicate 100 studies.
To assess the performance of the Bayesian group index model in comparison to GWQS, we calculated the power, bias, and mean squared error (MSE) of the exposure effects for each of the groups, as well as the specificity and sensitivity for identifying unimportant versus important chemicals. When calculating power, we examined the proportion of 95% credible (or confidence for GWQS) intervals of the odds ratios of chemical group associations that did not contain 1.00. We calculated sensitivity as the proportion of truly important chemicals that the model identified as important. This was done by identifying if the weights of the important chemicals were estimated to be greater than or equal to 1 C j . Specificity was defined as the proportion of the truly unimportant chemicals that were correctly identified as unimportant by the models. We defined a chemical as unimportant if its weight was estimated to be less than the threshold of 1 C j . We fitted the Bayesian group index models using our R package BayesGWQS [24] and fitted the GWQS models using our R package groupWQS [18]. A vignette for groupWQS is available on The Comprehensive R Archive Network [18] that demonstrates use of the package.

2.3. Data Analysis

To apply the Bayesian group index model to observed data, we analyzed childhood leukemia risk in the CCLS, which is a population-based case–control study conducted in 18 counties in the Central Valley and 17 counties within the San Francisco Bay area and designed to assess the relationships between genetic factors and environmental exposures and childhood leukemia [25]. Cases were identified within three days after diagnosis from 1995 to 2012 from nine pediatric clinical centers in the study area. Inclusion criteria included: (1) residence in California at the time of diagnosis, (2) without prior cancer diagnosis, (3) age under 15 years, and (4) having a Spanish- or English-speaking biological parent. Controls were selected from state birth certificate files and matched to cases on sex, date of birth, Hispanic ethnicity, and (maternal) race.
Participating parents were initially interviewed to ascertain information about their children’s exposure to potential risk factors for leukemia. A subset of the families participated in a second in-home interview during which dust samples were collected. Dust was collected from homes of controls and cases who were younger than 8 years at the time of diagnosis (similar reference date for controls) who were living at the diagnosis home. The condition of living in the diagnosis home was used so that the dust sample from carpet would represent the exposures over a significant part of the early life of a child. A total of n = 583 children participated after the second interview, of which 277 were cases and 306 were controls.
Dust samples were collected from a rug or carpet in the room where the child spent the most time while awake (commonly the family room) by a high-volume small surface sampler (HVS3) and/or from the household vacuum cleaner. Colt et al. [26] found the household vacuum to be a valid alternative to the HVS3 for detecting, ranking, and quantifying concentrations of pesticides and other compounds. After extraction, concentrations of 64 organic chemicals were measured using gas chromatography/mass spectrometry [26]. Nine metals were measured using inductively coupled plasma/mass spectrometry (ICP/MS) combined with microwave-assisted acid digestion. After excluding participants due to missing covariate information, 296 controls and 268 cases were included in this analysis (n = 564). We used exposures for 49 chemicals (Supplementary Table S2 in the Supplementary Materials) where at least one-fifth of the measurements were above the detectable limit. The chemical concentrations that were below this limit were imputed between 0 and the limit of detection using univariate imputation with the assumption of a lognormal distribution.
The concentrations for some of the pairs of chemicals measured in dust were observed to be strongly correlated. The chemicals that had the strongest correlation with each other were found to be in the same class of chemical. As an example, several of the PAHs were highly correlated (e.g., r = 0.90 for chrysene and benzo[a]anthracene). In addition, congeners or chemicals within the following classes of chemicals were highly correlated: PCBs, organochlorine insecticides, and pyrethroid insecticides. Such strong correlations observed between chemicals in these classes makes modeling simultaneous chemical exposure effects untenable via traditional regression methods. In this case, the use of mixture analysis methods such as the Bayesian group index model is warranted.
To analyze the association between chemical exposure and childhood leukemia, we put the 49 chemicals into the following groups: insecticides, PAHs, PCBs, metals, herbicides, and the tobacco exposure markers of cotinine and nicotine. These groups were based on their use (e.g., insecticides, herbicides) or structural similarity (PCBs, PAH). The fungicide ortho-phenylphenol was placed in the herbicide group. We then estimated the risk of childhood leukemia associated with each of the six exposure groups simultaneously using the Bayesian group index model while adjusting for the following covariates: child’s sex, age, ethnicity, annual household income, mother’s age at birth of child, mother’s education level, and if the child had lived in the dust sampling residence since the time of birth. In addition, we performed a stratified analysis using the binary variable for whether the child had lived in the dust sample home since birth as the stratifying variable instead of an adjustment variable. This was done to determine if the exposure effects were greater for those whose who lived in the same house (where the dust samples had been taken) since birth. There were n = 279 children who lived in the same house since birth and n = 285 who did not. In fitting the models, we used quartiles of exposures and 15,000 Markov chain Monte Carlo (MCMC) iterations with two chains and 5000 iterations as a burn-in sample. Convergence of all parameters in the model was verified via the Gelman-Rubin diagnostic statistic (i.e., upper CI was less than 1.10). We summarized the results through posterior mean estimates of the ORs and 95% credible intervals for each exposure group and also with forest plots. We identified the important chemical exposures in each chemical group using the posterior mean weight estimates and visualized them via weight plots. Model fitting was done using our R package BayesGWQS [24]. Study protocols involving research with human participants were approved by the institutional review boards at the University of California, Berkeley, the National Cancer Institute, and Virginia Commonwealth University.

3. Results

3.1. Simulation Study

We present results for scenarios set D in the main paper, while scenario set A–C results are in the Supplemental Materials. The estimated odds ratios and power for the Bayesian group index model and GWQS regression in scenario set D are listed in Table 2. Both the Bayesian group index and GWQS models accurately estimated the odds ratio across different strengths of association and chemical correlations. In scenario sets A-C, GWQS was slightly closer to the true odds ratio (Supplementary Tables S3–S5), particularly when there was weak or moderate chemical correlation and strong association (true OR = 3.0). In these scenarios, the Bayesian model tended to estimate higher odds ratios for the positive association chemical groups than did GWQS. Type I error rates were similar, but slightly higher for GWQS. The most notable different between models was in power, where the Bayesian group index model was consistently more powerful than the GWQS model in scenario set D. For example, with moderate correlation structure and OR of 1.50 or 0.67, the power for each of the chemical group coefficients is around 0.91 to 0.95 for the Bayesian group index model while it is between 0.62 and 0.69 for the GWQS regression model. However, when the sample size and effect sizes were larger in scenario sets A-C, differences were smaller and both methods reached a power of (or near) 1.00 with larger true strengths of association (Supplementary Tables S3–S5). As expected, true stronger associations led to higher power in both models. These findings were consistent with the results for scenario set A and B in Supplementary Tables S3 and S4, respectively. The models in scenario set C and D had greater power than scenario set B due to increased signal from having multiple rather than a single important chemical(s).
Table 3 compares the bias and MSE of both models for scenario set D. Supplementary Tables S6–S8 in the Supplemental Materials contain the bias and MSE for scenario sets A–C. Most of the associations with true OR > 1 had positive bias in the Bayesian group index model in scenario sets A–C, but this did not occur as often in scenario set D. While the bias appears to be slightly larger in the Bayesian group index model, the MSE is also smaller. The slightly larger bias found in the Bayesian index model reflects the estimated odds ratio findings. As expected, in both models a larger true odds ratio generally led to larger MSE values across scenarios.
Table 4 compares the sensitivity and specificity found in GWQS and the Bayesian group index models for scenario set D, while Supplementary Table S9 displays these for the remaining scenario sets (A–C). We see similar patterns in both models when the correlation structure gets stronger. Outside of the null effect, as the correlation structure gets stronger both the sensitivity and specificity for Bayesian group index model and GWQS decrease. This is because stronger correlation in the predictors makes identification of the important chemicals more difficult. As expected, a larger odds ratio led to larger values in sensitivity and specificity in both models. Outside of the null effect and small effect sizes, sensitivity and specificity were better with the Bayesian group index model overall. When there were multiple important chemicals in each group (scenario sets C and D) instead of one important chemical per group (scenario set B), the specificity increased. When going from two groups (scenario set A) to three groups (scenario set B) in the mixture, the specificity decreased. This suggests that the model performance decreases with increasing number of groups in the mixture, but that it increases as the number of important members in each group increases.

3.2. Application to Childhood Leukemia Risk Estimates

A summary of the demographics for children with and without leukemia from the CCLS study is presented in Table 5. Child’s age, sex, and mother’s age were equally distributed between cases and controls, while more control children were in the highest household income bracket (53.7%) compared to cases (39.9%). In addition, mothers of cases had slightly lower rates of post-secondary education (39.2%) compared to controls (45.6%). A larger proportion of controls resided in the same residence since birth (53.7%), compared to cases (44.8%).
The odds ratios for childhood leukemia associated with each of the six exposure groups calculated from the Bayesian group index model for the CCLS are in Table 6. Insecticides had a significant negative effect indicated by an odds ratio of 0.64 (95% CI: 0.40, 0.99). PCBs, PAHs, and herbicides had positive effects that were not significant according to the 95% credible intervals. Metals and tobacco markers had inverse but not statistically significant effects. The pattern in effects is clearly visible in the forest plot of the estimates in Figure 1. The variability was greatest for herbicides according to the credible intervals. The estimated weights of the chemical components for each group are plotted in Figure 2. Among insecticides, carbaryl was overwhelmingly the most important chemical with a posterior mean weight of 0.144. The highest category of household income (USD 75,000 or more) was associated (OR = 0.36) with significantly reduced leukemia risk, while living in the sampling household since birth (OR = 0.69) was associated with lowered likelihood of childhood leukemia. Age (child and mother’s), sex, ethnicity, and mother’s education were not significantly associated with childhood leukemia incidence.
Results of the stratified analysis with only children with a different residence since birth show that no chemical groups were found to have a significant association with childhood leukemia, although household income was still inversely associated with childhood leukemia risk (Supplementary Table S10). However, for children that had the same residence since birth (Table 7), herbicides had a significant positive association with childhood leukemia (OR = 2.22, 95% CI: 1.45, 3.61). In addition, insecticides were found to have a stronger (yet more variable) negative association with childhood leukemia (OR = 0.50, 95% CI: 0.23, 0.99) than with the non-stratified analysis. Forest plots visualize the chemical group associations for childhood leukemia for children who changed residence since birth in Figure S1 and for children with the same residence since birth in Figure 3. The estimated weights of the chemical components for the stratified analyses are plotted in Figure S2 (changed residence since birth) and Figure 4 (same residence since birth). Among the harmful herbicides in the latter stratum, dacthal was the most important with a posterior mean weight of 0.646. For insecticides, weights were evenly distributed across chemicals, as most chemicals had posterior mean weights between 0.035 and 0.060 and cis-Permethrin had the largest posterior mean weight of 0.065.

4. Discussion

In this paper, we proposed the Bayesian group index model for chemical mixture analysis for the realistic situation of multiple groups of exposures each with a potentially different magnitude and direction of association with the health outcome. We conducted a simulation study to evaluate the relative performance of the Bayesian group index model and the frequentist approach of GWQS regression and found that the two methods performed similarly for larger studies (n = 1000), but that the Bayesian group index model performed better for smaller studies (n = 500) with smaller strengths of association (OR < 2.0). The Bayesian group index model had more power to find significant exposure effects in smaller studies (with power differences of 0.2 or more). In addition, the Bayesian group index model was more sensitive and more specific than GWQS, particularly for studies with small sample sizes. While the Bayesian approach was more powerful, it also had larger positive bias in effect estimates in the larger studies. Based on the sum of the findings, we recommend use of the Bayesian index model over the GWQS model, particularly for small studies.
For the implementation of the Bayesian group index model in this paper, we used our BayesGWQS R package [24] on The Comprehensive R Archive Network (CRAN). In addition to the BayesGWQS package, our implementation of the GWQS model is also available as an R package entitled GroupWQS [18] along with a vignette on CRAN. From the user’s perspective the packages are similar, utilizing the same workflow and providing tools to organize and then analyze data as well as visualize results. However, the estimation is very different in the two packages. The GroupWQS package first splits the data into training and validation sets, next estimates the index weights of the GWQS model with bootstrap samples of the training set, and then estimates the other model parameters using the validation set. Parameter estimation is done through nonlinear optimization available in the solnp function of the Rsolnp R package. BayesGWQS estimates model parameters by implementing MCMC available in Just Another Gibbs Sampler (JAGS) using all the data. The two packages each offer distinct advantages to researchers depending on the context of their work. GroupWQS tends to have faster runtime, but uses a two-step estimation process. BayesGWQS has a longer runtime, but allows researchers working with smaller sample sizes to maximize power by avoiding data splitting. Currently, both packages require the user to specify the groups of exposures, which could be done based on chemical family, empirical correlations, or another approach to group similar exposures.
The Bayesian framework for the group index model also allows for more straightforward extension to more complex models that include individual and spatial random effects [20,21,22]. We have previously used both exchangeable and spatially correlated random effects in Bayesian single index models. In addition, imputation of chemical concentrations below the limit of detection can also be accounted for within the Bayesian index model approach. We are currently working on approaches for imputing missing chemical concentrations within the Bayesian group index model.
When applying the Bayesian group index model to observational data from the CCLS, we found a negative and significant association between insecticides (OR = 0.50) and leukemia, with carbaryl (weight = 0.14) being the most important chemical. This finding is consistent with our previous analysis using the frequentist GWQS approach, where we found an OR = 0.43 for insecticides and weight = 0.21 for carbaryl [19]. This finding is similar to results from individual insecticide logistic regression models; however, in both the individual insecticide model and the model adjusted for multiple insecticides, the inverse association with carbaryl was not statistically significant [27]. Additionally, similar were positive yet not significant associations for PCBs (OR = 1.15) and PAHs (OR = 1.16), where we previously found OR = 1.29 for PCBs and OR = 1.31 for PAHs. In the current paper, we conducted a stratified analysis based on the duration of the child’s residence in the home from which the dust sample was collected. We found stronger exposure effects for the children who had lived their entire lives in the home where dust samples were taken. In this set of residentially stable children, there was a strong and significant effect for herbicides (OR = 2.22), with dacthal being the most important exposure in this chemical group (weight = 0.65). This adds to our previous findings about the contribution of dacthal exposure (weight = 0.31 in the full study population) among the herbicides to increased risk (OR = 1.79) of childhood leukemia [19]. However, we did not previously evaluate risk among this group due to the decreased power of GWQS for stratified analyses. In addition to our previous mixture analysis finding, there was significantly elevated risk of acute lymphocytic leukemia (ALL) associated with the presence of dacthal in house dust (detected vs. not detected OR=1.52, 95% CI:1.03, 2.23) in a previous analysis of herbicide exposures in the CCLS [28]. Logistic regression analyses using individual chemicals yielded a positive yet not significant association between dacthal concentration quantiles and ALL risk [28]. While our results suggest some significant associations with environmental chemicals and childhood leukemia, more studies are needed to determine if these findings generalize to other geographic areas. In addition, while we adjusted for several potential risk factors and confounders, residual confounding cannot be ruled out in our analysis.

5. Conclusions

In conclusion, our approach of the Bayesian group index model has the potential to make a substantial contribution to the field of environmental epidemiology, particularly for chemical mixture analysis. The method allows for multiple groups of environmental chemical exposures each with a potentially different magnitude and direction of association with the health outcome, and allows for a richer assessment of environmental exposures. Simulation study evaluation shows that it compares favorably with other methods for mixture analysis, especially GWQS regression, and is easily extended to include more complexity in the model. While we applied the method in an environmental chemical risk analysis of childhood leukemia considering different classes of chemicals, it should be applicable to many other diseases with suspected environmental causes. Hopefully, this method will enable investigators to uncover multiple environmental determinants of disease in future studies.

Supplementary Materials

The following are available online at https://0-www-mdpi-com.brum.beds.ac.uk/article/10.3390/ijerph18073486/s1, Table S1: Individual scenarios in the scenarios sets of the simulation study, Table S2: List of chemicals and their group used in the CCLS analysis, Table S3: Estimated odds ratio and power from the Bayesian group index model and group weighted quantile sum for Scenario A, Table S4: Estimated odds ratio and power from the Bayesian group index model and group weighted quantile sum for Scenario B, Table S5: Estimated odds ratio and power from the Bayesian group index model and group weighted quantile sum for Scenario C, Table S6: MSE and for the Bayesian group index model and group weighted quantile sum for Scenario A, Table S7: MSE and bias for the Bayesian group index model and group weighted quantile sum for Scenario B, Table S8: MSE and bias for effect estimates from the Bayesian group index model and group weighted quantile sum regression for Scenario C, Table S9: Sensitivity and specificity for the Bayesian group index model and group weighted quantile sum regression for simulation scenarios A-C, Table S10: Odds ratio estimates for chemical groups and demographic covariates from the Bayesian group index model for subjects with different residence since birth, Figure S1: Forest plot of chemical group effects for childhood leukemia in children who changed residence since birth, Figure S2: Estimated chemical weights for chemical groups from the Bayesian group index model for childhood leukemia in the CCLS in children who changed residence since birth.

Author Contributions

Conceptualization, D.C.W.; methodology, D.C.W.; software, D.C.W., S.R. and M.C.; validation, D.C.W. and S.R.; formal analysis, D.C.W. and S.R.; investigation, D.C.W. and S.R.; resources, D.C.W., C.M. and M.H.W.; data curation, C.M., T.P.W. and M.H.W.; writing—original draft preparation, S.R. and D.C.W.; writing—review and editing, S.R., M.C., D.C.W., C.M., T.P.W. and M.H.W.; visualization, S.R.; supervision, D.C.W.; project administration, D.C.W.; funding acquisition, D.C.W., C.M., and M.H.W. All authors have read and agreed to the published version of the manuscript.

Funding

Research reported in this publication was supported by the National Cancer Institute of the National Institutes of Health under Award Number R21CA238370. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. The CCLS dust sample study was financially supported by subcontracts 7590-S-04 (University of California, Berkeley) and 7590-S-01 (Battelle Memorial Institute) under National Cancer Institute (NCI) contract N02-CP-11015 (Westat); and National Institute of Environmental Health Sciences grants R01ES009137 and P42ES04705-18 (University of California, Berkeley). This research was also supported by the Intramural Research Program of the National Institutes of Health and the NCI.

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board of Virginia Commonwealth University (HM20002035, 31 January 2020).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The CCLS data presented in this study are available on request from the senior author. The data are not publicly available due to privacy restrictions.

Acknowledgments

We thank the CCLS families for their participation. We also thank the clinical investigators at the following collaborating hospitals for help in recruiting patients: University of California Davis Medical Center (Jonathan Ducore), University of California San Francisco (Mignon Loh and Katherine Matthay), Children’s Hospital of Central California (Vonda Crouse), Lucile Packard Children’s Hospital (Gary Dahl), Children’s Hospital Oakland (James Feusner), Kaiser Permanente Oakland (Daniel Kronish and Stacy Month), Kaiser Permanente Roseville (Kent Jolly and Vincent Kiley), Kaiser Permanente Santa Clara (Carolyn Russo, Denah Taggart, and Alan Wong), and Kaiser Permanente San Francisco (Kenneth Leung). Finally, we acknowledge the entire California Childhood Leukemia Study staff for their effort and dedication.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Reuben, S.H. For President’s Cancer Panel. 2008–2009 Annual Report in President’s Cancer Panel. Reducing Environmental Cancer Risk: What We Can Do Now. Available online: https://deainfo.nci.nih.gov/advisory/pcp/annualReports/pcp08-09rpt/PCP_Report_08-09_508.pdf (accessed on 8 January 2021).
  2. Colt, J.S.; Severson, R.K.; Lubin, J.; Rothman, N.; Camann, D.; Davis, S.; Cerhan, J.R.; Cozen, W.; Hartge, P. Organochlorines in Carpet Dust and Non-Hodgkin Lymphoma. Epidemiology 2005, 16, 516–525. [Google Scholar] [CrossRef]
  3. Colt, J.S.; Davis, S.; Severson, R.K.; Lynch, C.F.; Cozen, W.; Camann, D.; Engels, E.A.; Blair, A.; Hartge, P. Residential Insecticide Use and Risk of Non-Hodgkin’s Lymphoma. Cancer Epidemiol. Biomark. Prev. 2006, 15, 251–257. [Google Scholar] [CrossRef] [Green Version]
  4. De Roos, A.J.; Hartge, P.; Lubin, J.H.; Colt, J.S.; Davis, S.; Cerhan, J.R.; Severson, R.K.; Cozen, W.; Patterson, D.G.; Needham, L.L.; et al. Persistent Organochlorine Chemicals in Plasma and Risk of Non-Hodgkin’s Lymphoma. Cancer Res. 2005, 65, 11214–11226. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Brown, L.M.; Blair, A.; Gibson, R.; Everett, G.D.; Cantor, K.P.; Schuman, L.M.; Burmeister, L.F.; Van Lier, S.F.; Dick, F. Pesticide exposures and other agricultural risk factors for leukemia among men in Iowa and Minnesota. Cancer Res. 1990, 50, 6585–6591. [Google Scholar]
  6. Ward, M.H.; Colt, J.S.; Metayer, C.; Gunier, R.B.; Lubin, J.; Crouse, V.; Nishioka, M.G.; Reynolds, P.; Buffler, P.A. Residential Exposure to Polychlorinated Biphenyls and Organochlorine Pesticides and Risk of Childhood Leukemia. Environ. Health Perspect. 2009, 117, 1007–1013. [Google Scholar] [CrossRef] [Green Version]
  7. Zahm, S.; Ward, M. Pesticides and childhood cancer. Environ. Health Perspect. 1998, 106, 893–908. [Google Scholar] [PubMed]
  8. Purdue, M.P.; Hoppin, J.A.; Blair, A.; Dosemeci, M.; Alavanja, M.C. Occupational exposure to organochlorine insecticides and cancer incidence in the Agricultural Health Study. Int. J. Cancer 2006, 120, 642–649. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  9. Everett, C.J.; Mainous, A., III; Frithsen, I.L.; Player, M.S.; Matheson, E.M. Association of polychlorinated biphenyls with hypertension in the 1999–2002 National Health andNutrition Examination Survey. Environ. Res. 2008, 108, 94–97. [Google Scholar] [CrossRef]
  10. Patel, C.J.; Bhattacharya, J.; Butte, A.J. An Environment-Wide Association Study (EWAS) on Type 2 Diabetes Mellitus. PLoS ONE 2010, 5, e10746. [Google Scholar] [CrossRef] [PubMed]
  11. Patel, C.J.; Cullen, M.R.; Ioannidis, J.P.A.; Butte, A.J. Systematic evaluation of environmental factors: Persistent pollutants and nutrients correlated with serum lipid levels. Int. J. Epidemiol. 2012, 41, 828–843. [Google Scholar] [CrossRef] [Green Version]
  12. Park, S.K.; Tao, Y.; Meeker, J.D.; Harlow, S.D.; Mukherjee, B. Environmental Risk Score as a New Tool to Examine Multi-Pollutants in Epidemiologic Research: An Example from the NHANES Study Using Serum Lipid Levels. PLoS ONE 2014, 9, e98632. [Google Scholar] [CrossRef]
  13. Bobb, J.F.; Valeri, L.; Henn, B.C.; Christiani, D.C.; Wright, R.O.; Mazumdar, M.; Godleski, J.J.; Coull, B.A. Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics 2015, 16, 493–508. [Google Scholar] [CrossRef] [PubMed]
  14. Carrico, C.; Gennings, C.; Wheeler, D.C.; Factor-Litvak, P. Characterization of Weighted Quantile Sum Regression for Highly Correlated Data in a Risk Analysis Setting. J. Agric. Biol. Environ. Stat. 2015, 20, 100–120. [Google Scholar] [CrossRef] [PubMed]
  15. Czarnota, J.; Gennings, C.; Colt, J.S.; De Roos, A.J.; Cerhan, J.R.; Severson, R.K.; Hartge, P.; Ward, M.H.; Wheeler, D.C. Analysis of Environmental Chemical Mixtures and Non-Hodgkin Lymphoma Risk in the NCI-SEER NHL Study. Environ. Health Perspect. 2015, 123, 965–970. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  16. Czarnota, J.; Gennings, C.; Wheeler, D.C. Assessment of Weighted Quantile Sum Regression for Modeling Chemical Mixtures and Cancer Risk. Cancer Inform. 2015, 14, 159–171. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  17. Wheeler, D.C.; Czarnota, J. Modeling Chemical Mixture Effects with Grouped Weighted Quantile Sum Regression. In Proceedings of the 28th Annual Conference of the International Society for Environmental Epidemiology (ISEE), Rome, Italy, 1–4 September 2016. [Google Scholar]
  18. Wheeler, D.C.; Carli, M. groupWQS: Grouped Weighted Quantile Sum Regression. R Package Version 0.0.3. Available online: https://cran.r-project.org/web/packages/groupWQS/index.html (accessed on 8 January 2021).
  19. Wheeler, D.C.; Rustom, S.; Carli, M.; Whitehead, T.; Ward, M.H.; Metayer, C. Assessment of grouped weighted quantile sum regression for modeling chemical mixtures and cancer risk. Int. J. Environ. Res. Public Health 2021, 18, 504. [Google Scholar] [CrossRef]
  20. Wheeler, D.C.; Raman, S.; Jones, R.M.; Schootman, M.; Nelson, E.J. Bayesian deprivation index models for explaining variation in elevated blood lead levels among children in Maryland. Spat. Spatio-Temporal Epidemiol. 2019, 30, 100286. [Google Scholar] [CrossRef]
  21. Wheeler, D.C.; Boyle, J.; Raman, S.; Nelson, E.J. Modeling elevated blood lead level risk across the United States. Sci. Total Environ. 2021, 769, 145237. [Google Scholar] [CrossRef]
  22. Wheeler, D.C.; Do, E.; Hayes, R.; Fugate-Laus, K.; Fallavollita, W.; Hughes, C.; Fuemmeler, B. Neighborhood disadvantage and tobacco retail outlet and vape shop outlet rates. Int. J. Environ. Res. Public Health 2020, 17, 2864. [Google Scholar] [CrossRef] [Green Version]
  23. de Vocht, F.; Cherry, N.; Wakefield, J. A Bayesian mixture modeling approach for assessing the effects of correlated exposures in case-control studies. J. Expo. Sci. Environ. Epidemiol. 2012, 22, 352–360. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  24. Wheeler, D.C.; Carli, M. BayesGWQS: Bayesian Grouped Weighted Quantile Sum Regression. 2020. R Package Version 0.0.2. Available online: https://cran.r-project.org/web/packages/BayesGWQS/index.html (accessed on 11 March 2021).
  25. Whitehead, T.P.; Metayer, C.; Ward, M.H.; Colt, J.S.; Gunier, R.B.; Deziel, N.C.; Rappaport, S.M.; Buffler, P.A. Persistent organic pollutants in dust from older homes: Learning from lead. Am. J. Public Health 2014, 104, 1320–1326. [Google Scholar] [CrossRef] [PubMed]
  26. Colt, J.S.; Gunier, R.B.; Metayer, C.; Nishioka, M.G.; Bell, E.M.; Reynolds, P.; Buffler, P.A.; Ward, M.H. Household vacuum cleaners vs. the high-volume surface sampler for collection of carpet dust samples in epidemiologic studies of children. Environ. Health 2008, 7, 1–9. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  27. Madrigal, J.M.; Jones, R.R.; Gunier, R.B.; Whitehead, T.P.; Reynolds, P.; Metayer, C.; Ward, M.H. Residential exposure to carbamate, organophosphate, and pyrethroid insecticides in house dust and risk of childhood acute lymphoblastic leukemia. Environ. Res. 2021. submitted. [Google Scholar]
  28. Metayer, C.; Colt, J.S.; Buffler, P.A.; Reed, H.D.; Selvin, S.; Crouse, V.; Ward, M.H. Exposure to herbicides in house dust and risk of childhood acute lymphoblastic leukemia. J. Expo. Sci. Environ. Epidemiol. 2013, 23, 363–370. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Forest plot of odds ratios and 95% credible intervals for chemical groups for childhood leukemia from the Bayesian group index model with a line at the null value of 1.0.
Figure 1. Forest plot of odds ratios and 95% credible intervals for chemical groups for childhood leukemia from the Bayesian group index model with a line at the null value of 1.0.
Ijerph 18 03486 g001
Figure 2. Weights for chemicals in each of the chemical groups from the Bayesian group index model for childhood leukemia in the CCLS.
Figure 2. Weights for chemicals in each of the chemical groups from the Bayesian group index model for childhood leukemia in the CCLS.
Ijerph 18 03486 g002
Figure 3. Forest plot of odds ratios and 95% credible intervals for chemical groups for childhood leukemia in children who lived at the same residence since birth with a line at the null value of 1.0.
Figure 3. Forest plot of odds ratios and 95% credible intervals for chemical groups for childhood leukemia in children who lived at the same residence since birth with a line at the null value of 1.0.
Ijerph 18 03486 g003
Figure 4. Estimated chemical weights for chemical groups from the Bayesian group index model for childhood leukemia in the CCLS in children with same residence since birth.
Figure 4. Estimated chemical weights for chemical groups from the Bayesian group index model for childhood leukemia in the CCLS in children with same residence since birth.
Ijerph 18 03486 g004
Table 1. Definition of the terms used in the simulation study exposure scenarios.
Table 1. Definition of the terms used in the simulation study exposure scenarios.
TermsLevelsDefinitions
Exposure Scenario SetA9 chemicals; 2 groups (5, 4); 2 important in each group; N = 1000
B14 chemicals; 3 groups (5, 4, 5); 1 important in each group; N = 1000
C14 chemicals; 3 groups (5, 4, 5); (3, 2, 3) important in each group; N = 1000
D14 chemicals; 3 groups (5, 4, 5); (3, 2, 3) important per group; N = 500
Strength of AssociationLevel 1OR = 1.00 for all groups (Null effect scenario)
Level 2OR = (0.67, 1.50) for A; OR = (0.67, 1.50, 1.50) for B and C;
OR = (0.75, 1.25, 1.25) for D
Level 3OR = (0.67, 1.50) for A; OR = (0.50, 2.00, 2.00) for B and C;
OR = (0.67, 1.50, 1.50) for D
Level 4OR = (0.40, 2.50) for A; OR = (0.40, 2.50, 2.50) for B and C;
OR = (0.57, 1.75, 1.75) for D
Level 5OR = (0.67, 1.50) for A; OR = (0.33, 3.00, 3.00) for B and C;
OR = (0.50, 2.00, 2.00) for D
Chemical Correlation StructureWeak0.5 within group, 0.1 across group
Moderate0.7 within group, 0.3 across group
Strong0.9 within group, 0.5 across group
Table 2. Estimated odds ratio (OR) and power values for the Bayesian group index model and group weighted quantile sum (GWQS) regression for Scenario D.
Table 2. Estimated odds ratio (OR) and power values for the Bayesian group index model and group weighted quantile sum (GWQS) regression for Scenario D.
ParameterBayesian Group IndexGWQS
Weak CorrelationEstimated ORPowerEstimated ORPower
exp(β1) = 1.001.0040.061.03060.1
exp(β2) = 1.001.00270.071.01980.06
exp(β3) = 1.000.99990.041.03130.06
exp(β1) = 0.800.83640.270.8680.13
exp(β2) = 1.251.24380.41.25350.26
exp(β3) = 1.251.25060.381.23040.21
exp(β1) = 0.670.69710.770.71230.53
exp(β2) = 1.501.52880.881.51170.74
exp(β3) = 1.501.47560.811.48670.57
exp(β1) = 0.570.58420.980.61810.8
exp(β2) = 1.751.809711.75610.91
exp(β3) = 1.751.67570.981.62480.78
exp(β1) = 0.500.509610.53830.93
exp(β2) = 2.002.099512.06090.98
exp(β3) = 2.001.944811.92320.95
Moderate Correlation
exp(β1) = 1.001.0010.031.0140.04
exp(β2) = 1.001.00590.061.02060.05
exp(β3) = 1.001.00240.11.00790.07
exp(β1) = 0.800.82360.440.82090.29
exp(β2) = 1.251.25560.531.27130.35
exp(β3) = 1.251.23390.421.23130.25
exp(β1) = 0.670.68730.910.72580.62
exp(β2) = 1.501.50140.941.45030.67
exp(β3) = 1.501.48340.951.48360.69
exp(β1) = 0.570.574810.60940.88
exp(β2) = 1.751.778311.70190.93
exp(β3) = 1.751.767911.74340.94
exp(β1) = 0.500.505610.53690.97
exp(β2) = 2.002.067312.05411
exp(β3) = 2.002.007212.01961
Strong Correlation
exp(β1) = 1.001.01370.031.01980.06
exp(β2) = 1.000.98790.081.00230.07
exp(β3) = 1.001.00230.030.99880.06
exp(β1) = 0.800.81750.480.82520.26
exp(β2) = 1.251.2490.521.26290.35
exp(β3) = 1.251.24940.561.26210.3
exp(β1) = 0.670.67050.940.68470.7
exp(β2) = 1.501.49930.961.47830.71
exp(β3) = 1.501.52170.991.54070.82
exp(β1) = 0.570.57970.990.60780.88
exp(β2) = 1.751.797811.79511
exp(β3) = 1.751.714511.68130.93
exp(β1) = 0.500.498710.52480.95
exp(β2) = 2.002.027911.99090.98
exp(β3) = 2.002.021712.00271
Table 3. MSE and bias for effect estimates from the Bayesian group index model and group weighted quantile sum (GWQS) regression for Scenario D.
Table 3. MSE and bias for effect estimates from the Bayesian group index model and group weighted quantile sum (GWQS) regression for Scenario D.
ParameterBayesian Group IndexGWQS
Weak CorrelationMSEBiasMSEBias
exp(β1) = 1.000.0148−0.00340.04110.0088
exp(β2) = 1.000.0146−0.00450.03120.0041
exp(β3) = 1.000.0148−0.00760.03020.0155
exp(β1) = 0.800.01460.03790.03540.0663
exp(β2) = 1.250.0148−0.01230.031−0.0127
exp(β3) = 1.250.0166−0.00800.0349−0.0327
exp(β1) = 0.670.02310.03390.03480.0503
exp(β2) = 1.500.01450.01190.0256−0.0049
exp(β3) = 1.500.019−0.02580.0328−0.0257
exp(β1) = 0.570.02090.01190.04210.06
exp(β2) = 1.750.01420.02670.029−0.0113
exp(β3) = 1.750.0194−0.05180.0459−0.0828
exp(β1) = 0.500.02030.0090.0410.0546
exp(β2) = 2.000.01680.0410.0280.0162
exp(β3) = 2.000.0213−0.038240.0429−0.0598
Moderate CorrelationMSEBiasMSEBias
exp(β1) = 1.000.0086−0.00330.02380.0015
exp(β2) = 1.000.011800.02190.0097
exp(β3) = 1.000.0111−0.00310.0254−0.0049
exp(β1) = 0.800.01220.02340.02170.0151
exp(β2) = 1.250.012−0.00160.01950.0073
exp(β3) = 1.250.0107−0.01830.0209−0.0252
exp(β1) = 0.670.01340.02410.03320.0706
exp(β2) = 1.500.0144−0.00630.0244−0.0449
exp(β3) = 1.500.0103−0.01610.0207−0.0211
exp(β1) = 0.570.0161−0.00200.03010.0501
exp(β2) = 1.750.01320.00950.0228−0.0384
exp(β3) = 1.750.01510.00260.0244−0.0158
exp(β1) = 0.500.01920.00160.04020.0525
exp(β2) = 2.000.01660.0250.02880.0122
exp(β3) = 2.000.0174−0.00520.0372−0.0091
Strong CorrelationMSEBiasMSEBias
exp(β1) = 1.000.00890.00910.02280.0084
exp(β2) = 1.000.0133−0.01870.0268−0.0112
exp(β3) = 1.000.009−0.00220.0258−0.0142
exp(β1) = 0.800.01180.0160.02530.0188
exp(β2) = 1.250.0114−0.00650.0252−0.0023
exp(β3) = 1.250.0102−0.00550.0228−0.0019
exp(β1) = 0.670.0131−0.00070.02410.0148
exp(β2) = 1.500.0105−0.00580.023−0.0258
exp(β3) = 1.500.00920.00980.01980.0168
exp(β1) = 0.570.01230.00830.03010.0477
exp(β2) = 1.750.01260.02080.02450.013
exp(β3)= 1.750.013−0.02660.0267−0.0522
exp(β1) = 0.500.0151−0.01020.03270.0323
exp(β2) = 2.000.01320.00730.032−0.0204
exp(β3) = 2.000.01410.00360.02580.0118
Table 4. Sensitivity and specificity for the Bayesian group index model and group weighted quantile sum (GWQS) regression for simulation scenario D.
Table 4. Sensitivity and specificity for the Bayesian group index model and group weighted quantile sum (GWQS) regression for simulation scenario D.
Bayesian Group IndexGWQS
Effect SizeCorrelationSensitivitySpecificitySensitivitySpecificity
OR = 1.00Weak0.3810.620.4190.618
Moderate0.4230.6270.40.637
Strong0.4250.5420.380.605
OR = 1.50Weak0.5230.7930.4930.697
Moderate0.5040.7270.430.695
Strong0.4960.6770.4290.693
OR = 2.00Weak0.6250.9030.6180.827
Moderate0.5810.8350.5210.76
Strong0.5250.7530.470.718
OR = 2.50Weak0.6850.9450.6980.873
Moderate0.6060.890.5740.775
Strong0.5430.8130.5230.753
OR = 3.00Weak0.7290.9630.7390.907
Moderate0.6730.9330.6380.842
Strong0.5830.8530.5340.757
Table 5. Characteristics of childhood leukemia cases (n = 268) and controls (n = 296) with measurements of chemicals in house dust in the CCLS.
Table 5. Characteristics of childhood leukemia cases (n = 268) and controls (n = 296) with measurements of chemicals in house dust in the CCLS.
VariableControlsCases
Child’s age, Mean (SD)3.84 (1.90)3.77 (1.81)
Female, N (%)110 (41.0)121 (40.9)
Child’s Ethnicity, N (%)130 (43.9)119 (44.4)
White Non-Hispanic
Hispanic101 (34.1)87 (32.4)
Other Non-Hispanic65 (22.0)62 (23.1)
Household Income, N (%)6 (2.0)37 (13.8)
Less than USD 15,000
USD 15,000–29,99937 (12.5)27 (10.1)
USD 30,000–44,99936 (12.2)44 (16.4)
USD 45,000–59,99929 (9.8)33 (12.3)
USD 60,000–74,99929 (9.8)20 (7.5)
USD 75,000 or more159 (53.7)107 (39.9)
Mother’s education, N (%)14 (4.7)16 (6.0)
Less than high school
High school60 (20.3)68 (25.4)
Some college87 (29.4)79 (29.5)
Bachelor’s or higher135 (45.6)105 (39.2)
Mother’s age, mean (SD)30.42 (6.30)30.89 (5.80)
Lived at residence since birth, N (%)159 (53.7)120 (44.8)
Table 6. Bayesian group index model odds ratios and 95% credible intervals for chemical groups and demographic variables for childhood leukemia in the CCLS. Bold indicates significant effects according to 95% credible intervals.
Table 6. Bayesian group index model odds ratios and 95% credible intervals for chemical groups and demographic variables for childhood leukemia in the CCLS. Bold indicates significant effects according to 95% credible intervals.
VariableOdds Ratio2.5% CI97.5% CI
PCBs1.150.911.45
Insecticides0.640.400.99
Herbicides1.190.871.67
Metals0.890.681.15
PAHs1.160.941.44
Tobacco0.850.691.03
Child’s age1.010.921.11
Female1.000.711.41
Child’s Ethnicity
Hispanic vs. White Non-Hispanic1.220.791.96
Other Non-Hispanic vs. White Non-Hispanic1.360.882.18
Household Income
USD 15,000–29,999 vs. Less than USD 15,0000.930.421.94
USD 30,000–44,999 vs. Less than USD 15,0000.770.351.56
USD 45,000–59,999 vs. Less than USD 15,0000.710.301.51
USD 60,000–74,999 vs. Less than USD 15,0000.420.171.02
USD 75,000 or more vs. Less than USD 15,0000.360.160.77
Mother’s education
High school vs. Less than high school1.230.612.73
Some college vs. Less than high school1.200.582.73
Bachelor’s or higher vs. Less than high school1.210.572.87
Mother’s age1.020.981.05
Lived at residence since birth0.690.471.01
Table 7. Odds ratios and 95% credible intervals for chemical groups and demographic variables from the Bayesian group index model for subjects with same residence since birth. Bold indicates significant effects according to 95% credible intervals.
Table 7. Odds ratios and 95% credible intervals for chemical groups and demographic variables from the Bayesian group index model for subjects with same residence since birth. Bold indicates significant effects according to 95% credible intervals.
VariableOdds Ratio2.5% CI97.5% CI
PCBs1.190.861.67
Insecticides0.500.230.99
Herbicides2.221.453.61
Metals0.750.451.61
PAHs1.150.831.61
Tobacco0.910.671.22
Child’s age0.870.741.02
Female0.990.571.71
Child’s Ethnicity
Hispanic vs. White Non-Hispanic1.270.632.69
Other Non-Hispanic vs. White Non-Hispanic1.620.843.33
Household Income
USD 15,000–29,999 vs. Less than USD 15,0001.590.485.81
USD 30,000–44,999 vs. Less than USD 15,0000.870.262.70
USD 45,000–59,999 vs. Less than USD 15,0000.990.283.26
USD 60,000–74,999 vs. Less than USD 15,0000.870.233.13
USD 75,000 or more vs. Less than USD 15,0000.370.111.14
Mother’s education
High school vs. Less than high school2.130.717.96
Some college vs. Less than high school2.250.738.79
Bachelor’s or higher vs. Less than high school1.660.516.76
Mother’s age1.040.991.10
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Wheeler, D.C.; Rustom, S.; Carli, M.; Whitehead, T.P.; Ward, M.H.; Metayer, C. Bayesian Group Index Regression for Modeling Chemical Mixtures and Cancer Risk. Int. J. Environ. Res. Public Health 2021, 18, 3486. https://0-doi-org.brum.beds.ac.uk/10.3390/ijerph18073486

AMA Style

Wheeler DC, Rustom S, Carli M, Whitehead TP, Ward MH, Metayer C. Bayesian Group Index Regression for Modeling Chemical Mixtures and Cancer Risk. International Journal of Environmental Research and Public Health. 2021; 18(7):3486. https://0-doi-org.brum.beds.ac.uk/10.3390/ijerph18073486

Chicago/Turabian Style

Wheeler, David C., Salem Rustom, Matthew Carli, Todd P. Whitehead, Mary H. Ward, and Catherine Metayer. 2021. "Bayesian Group Index Regression for Modeling Chemical Mixtures and Cancer Risk" International Journal of Environmental Research and Public Health 18, no. 7: 3486. https://0-doi-org.brum.beds.ac.uk/10.3390/ijerph18073486

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop