Next Article in Journal
The Relationship between Health Literacy, Quality of Life, and Subjective Health: Results of a Cross-Sectional Study in a Rural Region in Germany
Next Article in Special Issue
Spatial Distribution of Land Surface Temperatures in Kuwait: Urban Heat and Cool Islands
Previous Article in Journal
Mechanism of Membrane Fouling Control by HMBR: Effect of Microbial Community on EPS
Previous Article in Special Issue
A Dynamic Spatio-Temporal Analysis of Urban Expansion and Pollutant Emissions in Fujian Province

Multi-Scale Multivariate Models for Small Area Health Survey Data: A Chilean Example

Department of Public Health Sciences, Medical University of South Carolina, Charleston, SC 29466, USA
Luxembourg Institute of Health, 1A-B, rue Thomas Edison, Strassen, L-1445 Luxembourg City, Luxembourg
Public Health Department, School of Medicine, Pontificia Universidad Católica de Chile, Diagonal Paraguay 362, Santiago 8330077, Chile
Author to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2020, 17(5), 1682;
Received: 17 January 2020 / Revised: 25 February 2020 / Accepted: 3 March 2020 / Published: 5 March 2020
(This article belongs to the Special Issue GIS and Spatial Modelling for Environmental Epidemiology)


Background: We propose a general approach to the analysis of multivariate health outcome data where geo-coding at different spatial scales is available. We propose multiscale joint models which address the links between individual outcomes and also allow for correlation between areas. The models are highly novel in that they exploit survey data to provide multiscale estimates of the prevalences in small areas for a range of disease outcomes. Results The models incorporate both disease specific, and common disease spatially structured components. The multiple scales envisaged is where individual survey data is used to model regional prevalences or risks at an aggregate scale. This approach involves the use of survey weights as predictors within our Bayesian multivariate models. Missingness has to be addressed within these models and we use predictive inference which exploits the correlation between diseases to provide estimates of missing prevalances. The Case study we examine is from the National Health Survey of Chile where geocoding to Province level is available. In that survey, diabetes, Hypertension, obesity and elevated low-density cholesterol (LDL) are available but differential missingness requires that aggregation of estimates and also the assumption of smoothed sampling weights at the aggregate level. Conclusions: The methodology proposed is highly novel and flexibly handles multiple disease outcomes at individual and aggregated levels (i.e., multiscale joint models). The missingness mechanism adopted provides realistic estimates for inclusion in the aggregate model at Provincia level. The spatial structure of four diseases within Provincias has marked spatial differentiation, with diabetes and hypertension strongly clustered in central Provincias and obesity and LDL more clustered in the southern areas.
Keywords: Bayesian modeling; multivariate; multi-scale; spatial correlation; sample weights Bayesian modeling; multivariate; multi-scale; spatial correlation; sample weights

1. Introduction

Often, health outcome data arises where measures are made on individuals who reside within geographical regions of a country. A survey may have been carried out to obtain a ‘snapshot’ of the health of the study region. In addition, the survey could be useful in providing insight into prevalences of disease within aggregated regions of the study window. Recently, Vanderdijck et al., [1] proposed an approach to the modeling of small area health survey data by utilizing the survey weight as a predictor within a conventional generalized linear mixed model, thereby making allowance for the survey design aspect of the study. Watjou et al. [2] extended this approach to situations where there is non-response. Alternative approaches have been proposed whereby the outcome itself is adjusted via sample weights [3,4,5]. However, assuming a conventional generalized linear mixed model (GLMM) with an unadjusted outcome variable has a variety of advantages not least of which is that conventional software can be used for modeling and hence flexible extension to models is straightforward. Multi-scale models have also been developed where models allow borrowing of information across scales [6,7,8,9]. In the most recent examples, models at higher levels of spatial aggregation can inherit effects from finer resolution levels, individual-level models can inherit contextual effects from aggregate regions. In what follows, we assume individual level models for survey participants and aggregate level models for Provincia estimation. We assume that spatial effects can be captured using Markov Random Field (MRF) models as we have discrete spatial units [10].
The Chilean National Health Survey (CNHS) is a large population cross-sectional study representative of the adult Chilean population, which is performed every 6 years. The main objectives of the CNHS are to assess and monitor the prevalence of adult health problems in the general Chilean population and to describe its variation according to sex, age, socioeconomic level, rural-urban area and geographic region. This is to facilitate the planning and evaluation of preventive and curative strategies. The survey provides information about a range of diseases (physical and mental health) and risk factors. Data are collected by self-reported questionnaires, a physical examination visit and biological samples. In the CNHS, type 2 diabetes, hypertension, obesity and high cholesterol, amongst the major outcomes of the metabolic syndrome, are available. After 1980, with economic development, the prevalence of obesity and other chronic diseases continued to increase in Chile. In the period from 2003 to 2017, both type 2 diabetes and obesity prevalence showed marked increases in the CNHS surveys.
One big challenge of the data collection of the CNHS was the particular Chilean geography with an uneven distribution of population across the country in a central-overcrowded and in remote, isolated and depopulated areas that were very difficult to access [11].

1.1. Objectives

Our focus is the modeling of multiple individual survey outcomes and the extension of this to a multi-scale approach to aggregate estimation of disease prevalence or risk. By multi scale, we mean analysis of more than one geographic resolution (scale) level. By aggregate, we mean that data are accumulated into larger geographic regions. This multi-scale objective extends the idea of small area estimation by jointly modeling different geo-scales at the same time adjusting for survey response and biases.

1.2. Data Background and Availability

The aim of this study was to present a methodology that can be used in the analysis of multiple disease outcomes and that can provide combined estimates of prevalence at various spatial scales. The data we used are from a cross-sectional study of the 2009–2010 CNHS. The regions used in the analysis were 52 out of 54 provinces of Chile. The population in each province varies from a few thousand to over 4 million in the largest province Santiago. The sampling frame was constituted from the 2002 Population and Housing Census which was the most recent Census to the study period. The study design was based on a random sample of households of complex type (stratified and multistage by clusters) with national, regional and rural/urban representation. The target population was adults older than or equal to 15 years. The survey had a response rate in the eligible population of 85%. Our data consist of a total of 4780 individuals with differing patterns of missingness. The data from the survey consist of individual clinical, demographic, and behavioral variables for survey participants. Diabetes outcome was defined as fasting glucose 126   m g / d l or self-reported medical diagnosis (not during pregnancy). Obesity was defined as body mass index (BMI) HTA 30 kg/m2, hypertension was defined as systolic blood pressure 140 mmHg or diastolic blood pressure 90 mmHg or self- report of arterial hypertension (HTA) treatment. Elevated low density lipoprotein (LDL) was defined according to the cardiovascular adult treatment panel (ATP) III Update (> 100 mg/dl (if there is already cardiovascular disease), > 130 mg/dl (if moderate cardiovascular reactivity(CVR)) or > 160 mg/dl (if low CVR)). The 2009–010 CNHS included a total of 5293 individuals, although only 4780 participated in the two examinations and the laboratory test. All outcome definitions are based on the official report [12] and data provided by the CNHS. The demographic variables included from the survey were age, gender and their interaction. In addition, the individuals were geo-referenced to various spatial administrative units (such as regions, provinces and communes). In the following analysis, the province level was used throughout. The data are publicly available at
The basic map of Chile and its provinces was taken from It has to be mentioned that the map only represents 52 provinces instead of the current 54. One reason is that Easter Island, which is counted as one province, is not on the map and was not included in the survey. The other missing province is Marga Marga, which was created in 2009 and became operative in 2010. Marga Marga consists of two communes of the Valparaiso province and two communes of the Quillota province. As the survey was carried out during the creation of the new province, it was decided to use the map that was still valid when the survey started.
Ethics approval and consent to participate: The data used in this study are public domain as part of CNHS and the analysis does not require ethical approval as individual cases are not labelled or referenced. Provincial level inference is made only.

2. Methods

Denote a set of surveyed individual responses as y i k ,   i = 1 , .... , m ;   k = 1 , .... , K where the i th person has kth outcome. The vector of outcomes for each person is denoted by y i . The survey consists of m participants and the disease outcomes are defined as y i k   i = 1 , , m ;   m = 4780 . We assume that this binary outcome for the k th disease is Bernoulli distributed as y i k B e r n ( p i k ) and we modeled the logit of the probability of positive outcome. The logit consists of fixed and random effects to account for both observed confounding and unobserved outcome confounding. Hence, logit ( p i k ) = x i t β k + z i t γ k will be assumed where x i t β k is a linear predictor including observed confounders, and z i t γ k is a linear combination of random effects. As the individuals were sampled from within a population, we must include within our specification the sampling weight used for each person. Following the proposal in [1,2], we add the sampling weight ( s w i ) as a fixed effect within the linear predictor.
The random effects included within the model were chosen to represent the different forms of unobserved effects. First, we assume an individual frailty effect could be present and add an uncorrelated disease specific individual level random effect v i k . Next, we include spatial effects to represent the area within which the individual resides (province only). Two effects are estimated for the province level: an uncorrelated effect v j ( i j ) p , where the subscript represents the j th province within which the i th person resides, and a spatially correlated effect u j ( i j ) p .
The overall model used for this analysis is a Bayesian formulation and so all parameters in the model have prior distributions, as follows: v i k N ( 0 , τ v k 1 ) ;   v j p N ( 0 , τ v p 1 ) ;   u j p I C A R ( τ u p 1 ) . The ICAR distribution is a special Markov random field prior distribution which includes spatial correlation (see, e.g., [13]; [10], ch. 5) and essentially captures the clustering tendency of the outcome via neighborhood adjacency. Note that this is a common assumption for the analysis of discrete small area spatial structure, rather than geostatistical models which require distance-based correlation to be specified [10]. The uncorrelated effects have zero mean Gaussian distribution with small precisions which should provide non-informativeness. There are 52 provinces in Chile, so that j = 1 , , n where n = 52 and Table A1 provides a basic summary of the regions.

2.1. Joint Scale Models

The overall model used for this analysis is a binary logistic model at the individual level for each of the k outcomes. A Bayesian formulation is assumed and so all parameters in the model have prior distributions, as follows:
v i k N ( 0 , τ v k 1 ) ;   v j p N ( 0 , τ v p 1 ) ;   u j p I C A R ( τ u p 1 ) . The ICAR distribution is a special prior distribution with
u j k p | u l j , k p N ( u ¯ p ¯ δ j k , τ u p k 1 / n δ j )   where   u ¯ p ¯ δ j k = l δ j u l j , k p / n δ j   and   δ j   is   the   neighborhood   set   of   the   j   th   location   with   n δ j   the   number   of   neighbors .
This can model spatial structure (see, e.g., Besag [13]; Lawson [10], chapter 5) and essentially assumes that neighboring areas are positively correlated. It is assumed that each outcome has a different variance of the spatial field and hence, the precisions have a k subscript. Note that while this ICAR formulation is dependent on neighborhood definition, it is an adaptive prior specification as the variance depends on the number of neighbors of any given region. This allows for edge regions to have larger variance and smaller precision. To estimate the probability of the diseases for each province, a binomial model on aggregated province data was used as in Aregay, Lawson, Faes, Kirby, Carroll and Watjou [8], Aregay, Lawson, Faes and Kirby [9]. The binomial model assumed here is an approximation and we assume that the random effect structure employed makes allowance for this misspecification. The outcome was defined as cases of each disease per province out of the number of individuals sampled per province. The mean sampling weight per province ( s w j p ), the mean age per province ( A g e j p ), the percentage of male individuals per province ( S e x j p ) and their interaction were defined as adjusting variables. The same uncorrelated and correlated random spatial effects from the individual model were used in the aggregated model. This was done to use the spatial information gained from the individual analysis for the aggregated model. Hence, for the k th outcome, with age x sex adjustment,
y i k   B e r n ( p i k ) i n d i v i d u a l   l e v e l
l o g i t ( p i k ) = β 0 k + A g e i   β 1 k + S e x i   β 2 k + A g e i S e x i   β 3 k + f k ( s w i ) + v i k + v j k p + u j k p
y j k p   B i n ( p j k p ,   n j p ) p r o v i n c e   l e v e l
l o g i t ( p j k p ) =   α 0 k + A g e j p α 1 k + S e x j p α + A g e j p S e x j p   α 3 k + f k ( s w j p ) + v j k p + u j k p
where n j p is the sample size in the j t h province and N j p is the population of the province. p j k p is the probability of the outcome and y j k p = i j y i k where the sum is over all the individuals within the j th province. A crude unadjusted estimate of prevalence could be computed as y j k p / n j p .

2.2. Missingness and Imputation

All missing data were imputed within our Bayesian models: outcomes were imputed using predictive distribution and predictors were imputed, where appropriate, from an assumed prior distribution. Any missing predictors were given suitable prior distributions and imputed within the Markov Chain Monte Carlo (McMC) algorithms.
Some provinces had no sampling, in which case they must have their prevalence estimated. We did this as follows. Assume restricted prior distributions for
n j p   P o i s ( 110 ) I ( 1 , )
p j p   B e t a ( 1 , 1 )
and use the predictive distribution to yield
y j p m < b i n ( p j p ,   n j p ) .
For missing sampling weights, the following estimation for province level sampling weights was performed. Denote the average sampling weight for a province as
s w j p =   l   j s w l / n j p
for provinces sampled and
s w j p =   s w ¯
where s w ¯ is the global mean sampling weight (for those areas sampled). Hence,
s w j p =   s w ¯ + I j ( s w j p s w ¯ )
I j =   { 1 ,   n o n m i s s i n g 0 , m i s s i n g
So that the sampling weight for the missing areas is the global average.

2.3. Joint Outcome Modelling

While a multi-scale model is assumed for a given disease outcome, we also have a range of outcomes that are a focus of this study. Outcomes related to the metabolic syndrome were considered important to examine. These include diabetes, obesity, hypertension and elevated LDL. For each of these, a joint multi-scale model was assumed, but to allow linkage between the outcomes at the individual level, a joint model approach was implemented assuming that there is a shared random effect between the k=4 outcome variables. This means that models for each outcome on the individual data level and aggregated data level were run during the same Markov Chain Monte Carlo iterations. The individual data level models included a common random effect v s i for each individual.
Individual Level Models (I1)
y i k B e r n ( p i k ) logit ( p i k ) = β 0 k + β 1 k A g e i + β 2 k S e x i + β 3 k A g e i S e x i + β 4 k s w i + R k i j   model   I 1   and   R k i j = v i k + v s i k + v j k p + u j k p w i t h v i , v j p , v s i , β N ( 0 , τ 1 ) u j k p I C A R ( τ u p k 1 ) .
Note that the random effects are chosen to represent our belief that individuals vary independently but multiple measurements on an individual will have some commonality: Hence, v i k + v s i is a composite effect. We also assume that individuals inherit a contextual effect of province and so v j k p + u j k p are jointly modelled with the aggregate level.
Aggregate (Provincia) Level Models (A1)
At the provincial level, we approximate the model by assuming that the sampled count is
y j k p b i n ( p j k p , n j p ) k and logit ( p j k p ) = α 0 k + α 1 k A g e j p + α 2 k S e x j p + α 3 k A g e j p S e x j p + f k ( s w j p ) + R j k p model   A 1 w h e r e   R j k p = v j k p + u j k p k a n d   f k ( s w j p ) = γ . s w j p
where there are assumed to be different confounder effects for each outcome. This binomial approximation affects the variance of the aggregate count, but the inclusion of random effects at this level allows for the adjustment of variance. Note that there are eight models fitted jointly with some separate and shared random effects.

2.4. Model Fitting and Goodness of fit

Model I1 and AI were fitted jointly for all disease outcomes. Posterior sampling via McMC was chosen as the main tool for estimation. Given that missingness occurs within the outcomes and predictors, we assumed a Bayesian paradigm and decided to use BUGS software as it allows the imputation of missing outcomes using data augmentation, and allows prior specification for missing predictors. All joint models were fitted using WinBUGS14 [14]. Maps of Chile were created using the tmap package in R [15]. Two chains were run with a thinning of 50. Samples of a size of 5000 were taken following burn-in (Lunn, et al. [16], Lunn, et al. [17]). Convergence was visually checked by means of Gelman–Rubin–Brooks plot (Brooks and Gelman [18]), the potential scale reduction factor R ^ (Gelman and Rubin [19]), sample trace and density plots, sample autocorrelations and Markov Chain Monte Carlo error (Lawson, et al. [20]). Convergence can be assumed if R ^ is close to 1. Model fit was visually checked by means of trace plots of the deviance (Spiegelhalter, et al. [21]), mapping the correlated heterogeneity, u p , and uncorrelated heterogeneity, v p . The map of the correlated heterogeneity shows a clustered pattern and the map of uncorrelated heterogeneity shows a random pattern, if an adequate model fit is found.
As is common in other cross-sectional studies the missingness in this study did not display any particular structure and was assumed to be essentially at random (MAR). This form of missingness is handled optimally using predictive inference within the Bayesian modeling framework.

2.5. Posterior Risk Exceedance

As an additional diagnostic to help with the delineation of areas of exceptionally high risk, we employed exceedance probability criteria (Lawson [22]). An exceedance probability can be computed from posterior sampled output from a Bayesian model (see, e.g., Lawson and Rotejanaprasert [23]). The exceedance probability is defined as the probability that the estimated posterior value, in this case prevalence, for each sample, was greater than a chosen threshold.
P ( p k j > c ) =   1 G   g = 1 G   I ( p k j > c )
I = { 1 , p k j > c   0 ,   o t h e r w i s e  
where G is the total number of samples, p is the estimated probability of sample g and c is the chosen threshold. Whenever p exceeds the threshold c , it is recorded as 1, 0 otherwise. Averaged over the sample, this yields an estimate of the upper tail marginal probability of the parameter. This can be used to detect unusually high values and for hot spot clustering by looking for groups of unusually high areas in mapped output (Richardson, et al. [24]).
For diabetes, a threshold of 9.4%, for obesity of 25.1%, for hypertension of 26.9% and for elevated LDL a threshold of 22.7% was chosen based on the estimated national prevalence in the CNHS report ( Exceedance results are reported in Table A1, Table A2, Table A3 and Table A4.

2.6. Descriptive Statistics

The mean age of the study population was 46.3 with no significant difference in age between males (45.7) and females (46.7). More females (59.9%), than males (40.1%) participated in the study (p-value < 0.001). Females showed a significantly higher BMI than males (28.1 vs. 27.4; p-value < 0.001). Women had a higher, but not significant, diabetes rate (10.7% vs. 10.4%; p-value = 0.8331), a significantly higher obesity rate (32.9% vs. 23.4%; p-value < 0.001), a lower hypertension rate (34.3% vs. 37.2%; p-value = 0.03895) and a lower significant elevated LDL rate (24.5% vs. 34.6%; p-value = < 0.001) than men.
Table 1 gives some information concerning which region each province belongs to, square km, Inhabitants per square km, gross domestic product (GDP) per capita and the number of sampled individuals per province. Table A1, Table A2 and Table A3 show estimates of posterior probabilities, their standard deviations (SD), median, 2.5% and 97.5% percentiles of the 95% credible interval, and the probability of exceeding the chosen threshold for each province and each outcome. In Figure 1, Figure 2, Figure 3 and Figure 4, posterior estimates of a range of quantities for the fitted spatial models for diabetes, hypertension, obesity and elevated LDL are found.

3. Results of the Joint Modeling

For Diabetes (Table A1), Cauquenes (29) had the highest posterior probability of diabetes, with 16.27%. Provinces with a 95% or higher probability of exceeding the threshold of 9.4% of diabetes were Limari (12), San Felipe de Aconcagua (19), Melipilla (22), Colchagua (28), Cauquenes (29), Curicó (30), Concepción (34) and Cautín (38). Those diabetes hotspots are all located in the central part of Chile. Table A2 provides the results for the analysis of obesity. The highest posterior probability of obesity was estimated in Antártica Chilena (49) with 40.60%, but it did not appear to be significantly higher than the chosen threshold of 25.1% and the estimate actually mainly depends on spatial prior distributions as no samples were taken in Antártica Chilena (49) itself. The highest posterior probability was for Magallanes (51) with 38.52%. Obesity hotspots are mostly found in the central to southern part of Chile. In Table A2, posterior probabilities of hypertension per province are listed. The highest probability of hypertension was found in Cauquenes (29) with 57.14%. Hypertension hotspots are mostly found in the central to southern part of Chile. Table A4 demonstrates that the highest probability of elevated LDL level was estimated in Ultima Esperanza (50) with 41.79%, though estimation relies on assumed prior distributions as no samples were observed in this province itself. The highest exceedance probability was for Copiaco (9), 37.15%, Cautin (38) 40.06%, and Chiloe (41) 40.28% which all have a 100% exceedance of the national rate.
Overall, it would appear that diabetes and hypertension display a similar distribution in that there are concentrations of both in central metropolitan areas and also in Southern Chile. Obesity, on the other hand, is less marked in central area but shows elevation on the south also. Cholesterol (elevated LDL) demonstrates a similar pattern to obesity in terms of the north–south gradient, but displays less prevalence in central areas and is overall less clustered.
In the joint analyses reported here, it is clear that diabetes and hypertension display a similar spatial distribution of risk in that central and southern areas are the most affected and exceedance probabilities > 0.95 are commonly found. In contrast, elevated LDL and obesity are more marked in the southern parts of the country and show fewer examples of exceedance than for diabetes or hypertension. This suggests a more uniform pattern of risk for elevated LDL over the country than for the other outcomes.
Figure 1, Figure 2, Figure 3 and Figure 4 display the multi-scale model heterogeneity effects for each outcome.
It is notable that for all the outcomes the uncorrelated effects are relatively random, whereas the correlated effects display distinct clustering. For diabetes, the clustering is marked in the northern regions, whereas the clustering is marked in the south for obesity. Hypertension displays a clustering in the central regions whereas elevated LDL shows clustering in the most southerly areas of the Magallanes region, with lower central area effects.

Linear Model Parameter Estimates

Table 2 displays the posterior mean estimates for the predictors included in each of the individual and aggregate models for the four outcomes. Some general features of the analysis should be highlighted. First, in the aggregate models, only the intercept and spatial random effects were well estimated. The aggregate survey weights were not well estimated for any outcome, and neither were the age and gender predictors. For the individual level outcomes, different effects emerged. In all the outcomes, the sampling weight was not found to be well estimated. The intercept was well estimated, as was age for all outcomes. Age was found to be positively related to diabetes, obesity, cholesterol and hypertension. Differences arise in the effect of gender and the age x gender interaction. While gender and the age x gender interaction were not well estimated for diabetes, there is a well estimated negative effect of gender (male) on obesity, and a positive effect for cholesterol and hypertension. In addition, while most age x gender interactions were not well estimated, it was found well estimated for hypertension.
Although hypertension tends to display similar spatial patterning to diabetes, there are marked differences in the individual predictors associated with each outcome. Hypertension has a gender and age x gender association that is not seen in diabetes.

4. Discussion

There are some notable correlations and disparities between the distributions of diabetes, hypertension, obesity and elevated LDL. Both diabetes and hypertension have marked prevalences in the central provinces, including the metropolitan areas around Santiago. Note that while marked differences arise between outcomes, at the individual level, we have included a person-specific and outcome-specific individual effect. This leaves some allowance for correlation between outcomes in the individual. At the province level, we did not include shared province effects, but individuals had multi-scale sharing of province effects. The regression parameter posterior estimates demonstrate that while similar spatial patterning can arise, there are also differences at the individual level that can be marked. The central regions elevation of diabetes and hypertension could be explained by the urbanicity of the areas around Santiago and the associated lifestyle trends. The concentration of obesity and elevated LDL in the southern regions may reflect differentials with northern comparison regions and in particular dietary practices.
In this example, the sampling weight appears to have little impact on any outcome whether at individual or aggregate level. That said, it is important to include the sampling weight as it represents factors affecting the inclusion of participants in the survey.

5. Conclusions

The joint analysis of these four metabolic outcomes demonstrates the benefit of considering the correlation between outcomes at the individual level. It also demonstrates the benefit of a multi-scale analysis in that individual outcomes can be modelled contextually within provinces and they can inherit the grouping effect of the province. In addition, the inclusion of survey weights at different levels is an important feature that allows the analysis to proceed, taking into consideration sampling effects. The joint analysis allowed the estimation of prevalences at the aggregate level while also providing contextual effects for the individuals. We did not assume here that the province level outcomes should share effects but in future work, we could explore sharing further especially for diabetes and hypertension (e.g., shared spatial effects), which appear to have similar patterning.

Author Contributions

A.L. and G.A.A. provide overall direction and management of the work reported. A.S. performed all the analyses. L.V. and G.A.A. provided contextual and interpretative support for the Chilean data. All authors provided input into the final paper. All authors have read and agreed to the published version of the manuscript.


This study was funded by the Ministry of Higher Education and Research, Luxembourg, through an internal project from the Luxembourg Institute of Health (funding GAA). The first author (AL) received contract support from Luxembourg Institute of Health for this collaborative project.

Conflicts of Interest

There are no competing interests with respect to the submission of this paper.

Appendix A

Table A1. Diabetes.
Table A1. Diabetes.
RegionNrProvinceMeanSd2.50%97.50%>= 9.4%
Arica y Parinacota1Parinacota13.12%6.36%4.67%29.04%74.09%
6El Loa9.03%1.77%5.98%12.97%38.03%
Valparaiso14San Antonio9.37%2.09%5.59%13.89%47.25%
18Los Andes10.97%2.13%7.42%15.88%78.08%
19San Felipe de Aconcagua15.45%3.35%10.61%23.80%99.63%
O’Higgins26Cardenal Caro11.54%5.79%4.14%25.96%62.77%
46General Carrera5.84%1.70%3.02%9.65%3.24%
48Capitan Prat10.24%5.30%3.49%23.05%48.79%
Magallanes49Antartica Chilena11.53%10.18%0.79%38.62%46.47%
50Ultima Esperanza10.37%2.18%6.46%14.92%66.17%
52Tierra del Fuego10.02%7.59%0.93%30.10%43.23%
Table A2. Obesity.
Table A2. Obesity.
RegionNrProvinceMeanSd2.50%97.5%>= 25.1%
Arica y Parinacota1Parinacota26.78%8.04%12.72%44.93%55.31%
6El Loa26.58%3.60%19.96%33.88%64.71%
Valparaiso14San Antonio25.54%4.50%17.32%35.01%51.71%
18Los Andes25.62%4.40%17.56%35.02%52.64%
19San Felipe de Aconcagua31.03%5.26%21.90%42.40%87.65%
O’Higgins26Cardenal Caro26.95%7.71%12.97%44.19%57.30%
46General Carrera37.74%4.47%29.34%46.83%99.87%
48Capitan Prat35.46%9.02%18.53%54.36%88.76%
Magallanes49Antartica Chilena40.60%16.12%12.75%74.23%81.67%
50Ultima Esperanza36.02%4.80%26.72%45.82%99.06%
52Tierra del Fuego39.59%12.42%17.18%64.96%87.56%
Table A3. Hypertension.
Table A3. Hypertension.
RegionNrProvinceMeanSd2.50%97.5%>= 6.9%
Arica y Parinacota1Parinacota30.98%10.7%14.0%56.98%60.55%
6El Loa21.82%2.70%16.8%27.56%4.16%
Valparaiso14San Antonio32.69%3.89%25.1%40.47%93.46%
18Los Andes34.27%3.60%26.8%41.02%97.45%
19San Felipe de Aconcagua49.71%4.01%42.3%58.24%100.0%
O’Higgins26Cardenal Caro40.45%11.4%21.1%66.95%90.67%
46General Carrera25.98%3.66%19.2%33.50%38.48%
48Capitan Prat36.50%11.0%17.9%61.74%81.35%
Magallanes49Antartica Chilena21.46%13.1%3.92%53.47%28.10%
50Ultima Esperanza44.71%4.14%36.7%53.02%100.0%
52Tierra del Fuego17.54%9.04%4.24%39.02%15.05%
Table A4. Elevated LDL.
Table A4. Elevated LDL.
RegionNrProvinceMeanSd2.50%97.50%>= 22.7%
Arica y Parinacota1Parinacota31.28%11.00%13.0%57.03%79.12%
6El Loa28.04%5.06%18.9%38.90%85.45%
Valparaiso14San Antonio27.66%6.25%17.0%41.47%78.12%
18Los Andes24.60%5.60%14.4%36.82%61.88%
19San Felipe de Aconcagua28.63%5.91%17.4%40.53%84.15%
O’Higgins26Cardenal Caro27.64%10.16%11.1%51.62%67.34%
46General Carrera27.71%4.70%19.1%37.26%85.61%
48Capitan Prat32.20%11.28%13.3%58.78%81.19%
Magallanes49Antartica Chilena37.17%18.31%7.79%76.28%75.29%
50Ultima Esperanza41.79%6.27%30.2%54.65%99.96%
52Tierra del Fuego34.33%13.49%11.5%62.79%78.95%


  1. Vandendijck, Y.; Faes, C.; Kirby, R.S.; Lawson, A.; Hens, N. Model-based inference for small area estimation with sampling weights. Spat. Stat. 2016, 18, 455–473. [Google Scholar] [CrossRef] [PubMed]
  2. Watjou, K.; Faes, C.; Lawson, A.; Kirby, R.S.; Aregay, M.; Carroll, R.; Vandendijck, Y. Spatial small area smoothing models for handling survey data with nonresponse. Stat. Med. 2017, 36, 3708–3745. [Google Scholar] [CrossRef] [PubMed]
  3. Raghunathan, T.E.; Xie, D.; Schenker, N.; Parsons, V.L.; Davis, W.W.; Dodd, K.W.; Feuer, E.J. Combining information from two surveys to estimate county-level prevalence rates of cancer risk factors and screening. J. Am. Stat. Assoc. 2007, 102, 474–486. [Google Scholar] [CrossRef]
  4. Mercer, L.; Wakefield, J.; Chen, C.; Lumley, T. A comparison of spatial smoothing methods for small area estimation with sampling weights. Spat. Stat. 2014, 8, 69–85. [Google Scholar] [CrossRef] [PubMed]
  5. Chen, C.; Wakefield, J.; Lumley, T. The use of sampling weights in Bayesian hierarchical models for small area estimation. Spat. Patio Temporal Epidemiol. 2014, 11, 33–43. [Google Scholar] [CrossRef] [PubMed]
  6. Kolaczyk, E.D. Multiscale statisticstal models for hierarchical spatal aggregation. Stat. Med. 2001, 33, 95–118. [Google Scholar]
  7. Louie, M.M. A multiscale method for disease mapping in spatial epidemiology. Stat. Med. 2006, 25, 1287–1306. [Google Scholar]
  8. Aregay, M.; Lawson, A.B.; Faes, C.; Kirby, R.S.; Carroll, R.; Watjou, K. Comparing multilevel and multiscale convolution models for small area aggregated health data. Spat. Spatio Temporal Epidemiol. 2017, 22, 39–49. [Google Scholar] [CrossRef] [PubMed]
  9. Aregay, M.; Lawson, A.B.; Faes, C.; Kirby, R.S. Bayesian multi-scale modeling for aggregated disease mapping data. Stat. Methods Med. Res. 2017, 26, 2726–2742. [Google Scholar] [CrossRef] [PubMed]
  10. Lawson, A.B. Bayesian Disease Mapping: Hierarchical Modeling in Spatial Epidemiology, 3rd ed.; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
  11. Clark-Núñez, X. Compendio Estadístico; Instituto Nacional de Estadísticas Chile: Santiago, Chile, 2017.
  12. Salud, M.D. National Health Survey Chile 2009–2010, Results; Departamento de Epidemiología. Ministerio de Salud: Santiago, Chile, 2011.
  13. Besag, J. Bayesian image restoration with two applications in spatial statistics. Ann. Inst. Stat. Math. 1991, 43, 1–59. [Google Scholar] [CrossRef]
  14. Lunn, D.J.; Thomas, A.; Best, N.; Spiegelhalter, D. WinBUGS—A Bayesian modelling framework: Concepts, structure, and extensibility. Stat. Comput. 2000, 10, 325–337. [Google Scholar] [CrossRef]
  15. Tennekes, M. tmap: Thematic Maps. J. Stat. Softw. 2018, 84, 6. [Google Scholar] [CrossRef]
  16. Lunn, D.; Spiegelhalter, D.; Thomas, A.; Best, N. The BUGS project: Evolution, critique and future directions. Stat. Med. 2009, 28, 3049–3067. [Google Scholar] [CrossRef] [PubMed]
  17. Lunn, D.; Jackson, C.; Best, N.; Thomas, A.; Spiegelhalter, D. The BUGS Book: A Practical Introduction to Bayesian Analysis; CRC Press: Boca Raton, FL, USA, 2012. [Google Scholar]
  18. Brooks, S.P.; Gelman, A. General Methods for Monitoring Convergence of Iterative Simulations. J. Comput. Graph. Stat. 1998, 7, 434–455. [Google Scholar] [CrossRef]
  19. Gelman, A.; Rubin, D.B. Inference from Iterative Simulation Using Multiple Sequences. Stat. Sci. 1992, 7, 457–472. [Google Scholar] [CrossRef]
  20. Lawson, A.B.; Browne, W.J.; Rodeiro, C.L.V. Disease Mapping with WinBUGS and MLwiN; John Wiley & Sons: Hoboken, NJ, USA, 2003. [Google Scholar]
  21. Spiegelhalter, D.; Best, N.; Carlin, B.P. Bayesian Deviance, the Effective Number of Parameters, and the Comparison of Arbitrarily Complex Models; Tech Report 98-009 Division of Biostatistics; University of Minnesota: Minneapolis, MN, USA, 1998. [Google Scholar]
  22. Lawson, A.B. Hotspot detection and clustering: Ways and means. Environ. Ecol. Stat. 2010, 17, 231–245. [Google Scholar] [CrossRef]
  23. Lawson, A.B.; Rotejanaprasert, C. Childhood brain cancer in Florida: A Bayesian clustering approach. Stat. Public Policy 2014, 1, 99–107. [Google Scholar] [CrossRef]
  24. Richardson, S.; Thomson, A.; Best, N.; Elliott, P. Interpreting Posterior Relative Risk Estimates in Disease-Mapping Studies. Environ. Health Perspect. 2004, 112, 1016–1025. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Multi-scale model heterogeneity effects for diabetes (u: correlated effect; v: uncorrelated effect).
Figure 1. Multi-scale model heterogeneity effects for diabetes (u: correlated effect; v: uncorrelated effect).
Ijerph 17 01682 g001
Figure 2. Multi-scale model heterogeneity effects for obesity (u: correlated effect; v: uncorrelated effect).
Figure 2. Multi-scale model heterogeneity effects for obesity (u: correlated effect; v: uncorrelated effect).
Ijerph 17 01682 g002
Figure 3. Multi-scale model heterogeneity effects for hypertension (u: correlated effect; v: uncorrelated effect).
Figure 3. Multi-scale model heterogeneity effects for hypertension (u: correlated effect; v: uncorrelated effect).
Ijerph 17 01682 g003
Figure 4. Multi-scale model heterogeneity effects for elevated LDL (u: correlated effect; v: uncorrelated effect).
Figure 4. Multi-scale model heterogeneity effects for elevated LDL (u: correlated effect; v: uncorrelated effect).
Ijerph 17 01682 g004
Table 1. Characteristics of provinces and regions in Chile (a Data obtained from Clark-Núñez X. Compendio estadístico. Instituto Nacional de Estadísticas Chile, 2017).
Table 1. Characteristics of provinces and regions in Chile (a Data obtained from Clark-Núñez X. Compendio estadístico. Instituto Nacional de Estadísticas Chile, 2017).
RegionsArea (km2) aInhabitants/km2 aGDP/Capita USD aProvinces (Number)Sampled
Arica y Parinacota16,87314.69848Parinacota (1)0
Arica (2)311
Tarapacá42,2268.427,604Iquique (3)289
Tamarugal (4)24
Antofagasta126,0495.163,402Tocopilla (5)34
El Loa (6)88
Antofagasta (7)183
Atacama75,1764.327,882Chañaral (8)0
Copiapó (9)226
Huasco (10)81
Coquimbo40,58019.614,800Elqui (11)185
Limarí (12)72
Choapa (13)49
Valparaíso16,396113.417,009San Antonio (14)34
Petorca (15)17
Valparaíso (16)187
Quillota (17)48
Los Andes (18)25
San Felipe de Aconcagua (19)34
Metropolitana15,403485.824,224Chacabuco (20)13
Santiago (21)728
Melipilla (22)26
Talagante (23)18
Maipo (24)49
Cordillera (25)77
O’Higgins16,3875717,985Cardenal Caro (26)0
Cachapoal (27)211
Colchagua (28)102
Maule30,29634.910,620Cauquenes (29)19
Curicó (30)85
Linares (31)108
Talca (32)139
Biobío37,06957.812,582Arauco (33)51
Concepción (34)134
Ñuble (35)57
Biobío (36)49
Araucanía31,84231.58,376Malleco (37)67
Cautín (38)261
Los Ríos18,43022.311,711Ranco (39)71
Valdivia (40)228
Los Lagos48,58417.613,335Chiloé (41)74
Llanquihue (42)151
Palena (43)0
Osorno (44)92
Far South
Aysén108,494119,851Coyhaique (45)185
General Carrera (46)98
Aysén (47)0
Capitan Prat (48)0
Magallanes1,382,2910.118,447Antártica Chilena (49)0
Última Esperanza (50)56
Magallanes (51)243
Tierra del Fuego (52)14
Table 2. Posterior mean estimates for regression parameters in the joint model for diabetes, obesity, cholesterol (elevated LDL), and hypertension for both individual and aggregate level models.
Table 2. Posterior mean estimates for regression parameters in the joint model for diabetes, obesity, cholesterol (elevated LDL), and hypertension for both individual and aggregate level models.
Disease OutcomeModelParametersMeanSD2.50%97.50%
DiabetesIndividual level modelIntercept−4.0880.508−5.242−3.222
Survey weight0.0000.0000.0000.000
Sex male−0.0270.149−0.3480.266
Age * Sex male0.0060.009−0.0100.023
Aggregated model (per province)Intercept−2.3650.336−3.063−1.722
Mean Survey weight0.0000.0000.0000.000
Mean Age0.0970.098−0.0770.303
Proportion Male0.3750.791−1.1612.051
Mean age * proportion male−0.1120.231−0.5990.304
ObesityIndividual level modelIntercept−1.1820.174−1.557−0.894
Survey weight0.0000.0000.0000.000
Sex male−0.7930.148−1.111−0.536
Age * Sex male−0.0050.006−0.0180.006
Aggregated model (per province)Intercept−0.8260.281−1.364−0.224
Mean Survey weight0.0000.0000.0000.000
Mean Age−0.0080.068−0.1480.129
Proportion Male−0.1420.665−1.5671.142
Mean age * proportion male0.0110.159-0.3100.345
CholesterolIndividual level modelIntercept−2.4100.330−3.145−1.839
Survey weight0.0000.0000.0000.000
Sex male1.1180.2270.7091.608
Age * Sex male−0.0050.010−0.0250.013
Aggregated model (per province)Intercept−0.8950.340−1.586−0.204
Mean Survey weight0.0000.0000.0000.000
Mean Age0.0630.090−0.1090.261
Proportion Male0.1440.812−1.5171.813
Mean age * proportion male−0.0430.213−0.5110.363
HypertensionIndividual level modelIntercept−1.5420.137−1.848−1.313
Survey weight0.0000.0000.0000.000
Sex male0.5010.1180.2740.734
Age * Sex male−0.0220.007−0.037−0.008
Aggregated model (per province)Intercept−0.8900.253−1.396−0.424
Mean Survey weight0.0000.0000.0000.000
Mean Age0.0520.063−0.0710.183
Proportion Male0.7420.612−0.3451.982
Mean age * proportion male0.0440.149−0.2680.339
* Interaction.
Back to TopTop