Next Article in Journal
Debiased/Double Machine Learning for Instrumental Variable Quantile Regressions
Next Article in Special Issue
Uncertainty Due to Infectious Diseases and Stock–Bond Correlation
Previous Article in Journal
Integration and Disintegration of EMU Government Bond Markets
Previous Article in Special Issue
Hospital Emergency Room Savings via Health Line S24 in Portugal
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimating Endogenous Treatment Effects Using Latent Factor Models with and without Instrumental Variables

1
Department of Humanities and Social Sciences, Indian Institute of Technology Bombay, Mumbai 400076, India
2
The Comparative Health Outcomes, Policy, and Economics (CHOICE) Institute, School of Pharmacy, University of Washington, Seattle, WA 98195, USA
*
Author to whom correspondence should be addressed.
Submission received: 6 November 2020 / Revised: 7 March 2021 / Accepted: 9 March 2021 / Published: 17 March 2021
(This article belongs to the Special Issue Health Econometrics)

Abstract

:
We provide evidence on the least biased ways to identify causal effects in situations where there are multiple outcomes that all depend on the same endogenous regressor and a reasonable but potentially contaminated instrumental variable that is available. Simulations provide suggestive evidence on the complementarity of instrumental variable (IV) and latent factor methods and how this complementarity depends on the number of outcome variables and the degree of contamination in the IV. We apply the causal inference methods to assess the impact of mental illness on work absenteeism and disability, using the National Comorbidity Survey Replication.
JEL Classification:
C3; I12; J210

1. Introduction

Treatment-effect estimators that address endogeneity of treatments can produce causal estimates of treatment effects. Typically, endogeneity of treatment arises due to differences in the levels of observed and unobserved risk factors for outcomes between the treated and the untreated group. The literature on endogenous treatment-effect estimators is substantial (Gilleskie and Hoffman 2014; Gilleskie et al. 2017; Gilleskie and Strumpf 2005; Prada and Urzúa 2017; Urzúa 2008; Rodríguez et al. 2016; Heckman and Robb 1985; Angrist et al. 1996; Imbens and Wooldridge 2009; Angrist and Kreuger 2001; Heckman et al. 2006; Lewbel 2012; Terza et al. 2008; Vella and Verbeek 1999). One of the most commonly used approaches for this purpose is the instrumental variables method, where external instrumental variables (IVs) are used to distill out exogenous variation in the treatment, which is then associated with outcomes to estimate the causal effect. The challenge with IV methods is to identify good IVs, which should strongly predict the treatment receipt and also not affect outcomes through any channel except through its association with treatment receipt. This later assumption is untestable in the data1 and relies on theoretical arguments by analysts. It is often the case that the IV chosen for empirical analysis would have some contamination or association with outcomes independent of treatment receipt. Although this leads to bias in estimating treatment effects, a case can be made that, under small levels of contamination, the IV estimator continues to be more robust (lower mean squared errors) than naive estimators that do not address unobserved confounding (Basu and Chan 2014).
Another method—though, less popular—which can address endogenous treatments are the latent factor models (Carneiro et al. 2003; Goldberger 1972; Hauser and Goldberger 1971; Heckman et al. 2006; Joreskog and Goldberger 1975). These models use information from multiple outcomes to identify a latent factor that can be shared across the outcomes and the treatment selection equations. The latent factor acts as a proxy for the unobserved confounder; therefore, controlling for it in the regressions solves for the endogeneity of treatment selection in a fashion similar to controlling for observed confounders. These treatment-effect models have the potential to identify treatment effects even in the absence of an exclusion restriction criteria, such as that of an instrumental variable. However, identification often follows functional form restrictions, and its efficiency in comparison to an IV estimator is not well-known. In fact, whether precision of causal estimators is improved when the IV approach is combined with a latent factor model is also a less studied phenomenon. Even more interesting and under-studied are the consequences of using latent factor models when the instrumental variable is contaminated, i.e., they are not perfectly orthogonal to the error in the outcome equation.
In this paper, we develop the theory of the interaction between latent factor models and contaminated instrumental variable methods and assess the complementarity between the two estimators, if any, in producing consistent causal effects of endogenous treatments. We apply these methods to estimate the causal effect of mental illness on work absenteeism and disability, where this complementarity can best be highlighted.

2. Materials and Methods

2.1. Econometrics of Causal Inference Methods

In this section, we provide a unifying theoretical framework for our alternative model specifications and provide analytical expressions for the asymptotic bias in each case. We are interested in examining the causal impact of a binary indicator, D , on outcome, Y j , for j = 1 ,   2 ,   ,   K . Henceforth, we drop subscript j , to avoid clutter. For ease of exposition, in what follows, we assume that there are no exogenous variables, although it is quite straightforward to extend to the case which includes exogenous variables.
The structural model can be written as follows:
Y = α 0 + α D D +   ε
where Y denotes the outcome of interest, D is the treatment variable, and ε is an idiosyncratic error term. Under the assumption E ( D ε ) = 0 , we can obtain an unbiased effect of D on Y , α D ^ in Equation (1). However, if E ( D ε ) 0 , a naive OLS regression of Y on D will produce a biased treatment effect, with the asymptotic bias given by Bound et al. (1995).
B i a s O L S = σ D ε σ D 2
where σ D ε is c o v ( D ,   ε ) and σ D 2 is v a r ( D ) .

2.1.1. Instrumental Variable Methods

In order to mitigate the bias obtained in case of a naive OLS estimator, one often uses an instrumental variable ( Z ) —a variable that is highly correlated with the endogenous treatment variable ( D ) but does not directly affect the outcome variable ( Y ) . The structural model in this case can be written as follows:
D = β 0 + β Z Z +   ω
Y = α 0 + α D D +   ε
The instrumental variables (IV) estimator, in practice, is frequently implemented as two-stage least squares (2SLS), where D is regressed on Z in the first stage (Equation (3)), to obtain a predicted value of D ( D ^ I V ) , and in the second stage, the outcome variable, Y , is regressed on D ^ I V , to obtain the IV estimator. The asymptotic bias for the IV estimator is given by Bound et al. (1995), as follows:
B i a s I V = σ D ^ I V ε σ D ^ I V 2
where σ D ^ I V ε = c o v ( D ^ I V ,   ε ) and σ D ^ I V 2 = v a r ( D ^ I V ) = ρ D Z 2 / σ Z 2 .2 Moreover, ρ D Z 2 is the square of c o r r ( D , Z ) and σ D 2 is as defined earlier. If Z is a valid instrument, then E ( Z ε ) = 0 , implying that σ D ^ I V ε = 0 and the bias reduces to 0. On the other hand, if E ( Z ε ) 0 , i.e., Z is a contaminated instrument (see Basu and Chan 2014, for more details), then σ D ^ I V ε 0 , and the magnitude of the bias depends on the level of contamination.

2.1.2. Latent Factor Methods

Next, we consider a model with a single latent factor ( θ ) that is shared between the outcome equations and the treatment equation, which provides an alternative to the IV method to obtain causal treatment effects, as long as the scale of the latent factor and the factor loadings can be estimated through multiple outcome measurements. The shared latent factor model (SLF) can be written as follows:
D = β 0 +   ω ;    ω = β θ θ + v
Y = α 0 + α D D +   ε ;    ε = α θ θ + u
where Y now denotes a (Tx1) vector of outcomes, θ is the latent factor (scalar), v and u are stochastic error terms.3
Since θ is a latent variable and does not have a natural scale of measurement, in order to identify the model, one requires that one of the coefficients (factor loadings) of θ in Equations (6) or (7) be normalized to a constant value. Without loss of generality, we set β θ = 1 in this framework. The bias for the SLF model is given by the following:
B i a s S L F = σ D ^ S L F ε | θ σ D ^ S L F 2 = σ D ^ S L F u σ D ^ S L F 2
where D ^ S L F denotes the predicted value of D and can be obtained by regressing D on the predicted value (factor score) of θ . The SLF estimator is unbiased if ω | θ   ε | θ , where signifies statistical independence.

2.1.3. Identification of the SLF Model

The SLF model with an endogenous treatment equation and 4 outcomes equations can be written as follows:
D =   θ + v y 1 = α 1 D D + α 1 θ θ +   u 1 y 2 = α 2 D D + α 2 θ θ + u 2 y 3 = α 3 D D + α 3 θ θ + u 3 y 4 = α 4 D D + α 4 θ θ + u 4
Parameters to be estimated = α 1 D , α 2 D , α 3 D ,   α 4 D ;   α 1 θ ,   α 2 θ ,   α 3 θ ,   α 4 θ ;   σ θ 2 (n = 9).
Denote v a r ( D ) =   σ D 2 ; v a r ( θ ) = σ θ 2 .
Now,
c o v ( y 1 ,   y 2 ) = c o v ( α 1 D D + α 1 θ θ +   u 1 ,   α 2 D D + α 2 θ θ + u 2 ) =   α 1 D α 2 D σ D 2 + ( α 1 D α 2 θ + α 1 θ α 2 D ) c o v ( D , θ ) + α 1 θ α 2 θ σ θ 2
      =   α 1 D α 2 D σ D 2 + ( α 1 D α 2 θ + α 1 θ α 2 D + α 1 θ α 2 θ ) σ θ 2
c o v ( y 1 ,   y 3 ) = α 1 D α 3 D σ D 2 + ( α 1 D α 3 θ + α 1 θ α 3 D + α 1 θ α 3 θ ) σ θ 2
c o v ( y 1 ,   y 4 ) = α 1 D α 4 D σ D 2 + ( α 1 D α 4 θ + α 1 θ α 4 D + α 1 θ α 4 θ ) σ θ 2
c o v ( y 2 ,   y 3 ) = α 2 D α 3 D σ D 2 + ( α 2 D α 3 θ + α 2 θ α 3 D + α 2 θ α 3 θ ) σ θ 2
c o v ( y 2 ,   y 4 ) = α 2 D α 4 D σ D 2 + ( α 2 D α 4 θ + α 2 θ α 4 D + α 2 θ α 4 θ ) σ θ 2
c o v ( y 3 ,   y 4 ) = α 3 D α 4 D σ D 2 + ( α 3 D α 4 θ + α 3 θ α 4 D + α 3 θ α 4 θ ) σ θ 2
c o v ( D ,   y 1 )   = c o v ( θ + v ,   α 1 D D + α 1 θ θ +   u 1 ) = α 1 D c o v ( θ , D ) + α 1 θ σ θ 2
= ( α 1 D + α 1 θ ) σ θ 2
c o v ( D ,   y 2 ) = ( α 2 D + α 2 θ ) σ θ 2
c o v ( D ,   y 3 ) = ( α 3 D + α 3 θ ) σ θ 2
c o v ( D ,   y 4 ) = ( α 4 D + α 4 θ ) σ θ 2
Number of estimating equations = 10. The 10 Equations (9)–(18) can be solved to obtain closed form solutions of the 9 unknown parameters (including the 4 factor loadings α 1 θ ,   ,   α 4 θ ) as functions of the observed covariances c o v ( . , . ) .
A necessary condition for identification of the SLF model is that the number of outcomes must be ≥4. Consider the case with number of outcomes = 3. Here, the number of parameters to be estimated = 7 ( α 1 D , α 2 D , α 3 D ;   α 1 θ ,   α 2 θ ,   α 3 θ ;   σ θ 2 ) and the number of observed covariances = 6 ( c o v ( y 1 ,   y 2 ) ,   c o v ( y 1 ,   y 3 ) ,   c o v ( y 2 ,   y 3 ) ,   c o v ( D ,   y 1 ) ,   c o v ( D ,   y 2 ) ,   c o v ( D ,   y 3 ) ) . Similarly, for any number of outcomes <3, it can be shown that the number of unknown parameters > number of estimating equations and, therefore, the structural parameters of the SLF model cannot be identified.

2.1.4. Combined IV and Latent Factor Methods

Finally, we consider a model (SLF + IV) which incorporates the features of both the IV model and the SLF model and can be used for causal inference when a suitable instrumental variable is available and the scale of the latent factor and the factor loadings can be estimated via multiple outcomes. The SLF + IV model can be represented as follows:
D = β 0 +   β Z Z + ω ;    ω = β θ θ + v
Y = α 0 + α D D +   ε ;    ε = α θ θ + u
As noted earlier, we put β θ = 1 to set the scale of measurement of the latent factor. The asymptotic bias of the SLF + IV model is given by the following:
B i a s S L F + I V = σ D ^ S L F + I V ε | θ σ D ^ S L F + I V 2 = σ D ^ S L F + I V u σ D ^ S L F + I V 2
where D ^ S L F + I V denotes the predicted value of D . The SLF + IV estimator is unbiased if ω | θ   ε | θ .
Lemma 1.
| B i a s S L F + I V |     | B i a s I V |   ρ Z u ( 1 , 0 ) ( 0 , 1 ) .
Proof of Lemma 1.
ρ Z ε 2 = σ D ^ I V ε 2 σ ε 2 σ D ^ I V 2           σ D ^ I V ε 2 = ρ Z ε 2 σ ε 2 σ D ^ I V 2
and ,   ρ Z u 2 = σ D ^ S L F + I V u 2 σ u 2 σ D ^ S L F + I V 2           σ D ^ S L F + I V u 2 = ρ Z u 2 σ u 2 σ D ^ S L F + I V 2
Therefore, following (4) and (10),
( B i a s I V B i a s S L F + I V ) 2 = ( σ D ^ I V ε / σ D ^ I V 2 σ D ^ S L F + I V u / σ D ^ S L F + I V 2 ) 2 = ( ρ Z ε ρ Z u ) 2 ( σ ε σ u ) 2 ( σ D ^ S L F + I V σ D ^ I V ) 2
Now, by construction, ( σ D ^ S L F + I V σ D ^ I V ) 1 , since the latent factor explains additional variation in D . Since the latent factor also serves as a proxy for the omitted variables, it reduces the contamination between Z and the error term. That is,   ( ρ Z ε ρ Z u ) 2 1 . Lastly, σ ε   σ u by construction, using Equation (20), as v a r ( ε ) = v a r ( α θ θ + u ) = α θ 2 v a r ( θ ) + v a r ( u ) v a r ( u ) . Therefore, ( σ ε σ u ) 1 .
Consequently,
| B i a s S L F + I V |   | B i a s I V | .

3. Simulations

3.1. Case I: Uncontaminated IV

In the baseline simulations, we consider the case of continuous outcomes ( Y j ) with the number of outcomes ( j ) varying from 4 to 6, a binary treatment indicator ( D ) , an observed continuous control variable ( X ) , an unobserved confounder ( θ ) , and a continuous instrumental variable ( Z ) , where Z is orthogonal to the unobserved confounder θ , i.e., uncontaminated (Basu and Chan 2014). The data-generating process (DGP) for the treatment variable is as follows:
D * = β 0 + β X X + β Z   Z + β θ θ + v
D = 1   i f   D * > 0 .
Here, β 0 = 1 , β X = 2 , β Z = 3 , and β θ = 2 , as well as X , Z , θ , and v , are distributed independently and identically (IID) normal with mean 0 and variance 1.
Moreover, the DGP for the outcomes is defined below:
Y j =   α 0 j + α X j X + α D j D + α θ j θ + u j ,   j =   4 ,   5 ,   6
where
Y j = ( Y 1   Y 2     Y j )   ;
α 0 4 = ( 2   1   1   1   ) , α X 4 = ( 2   1   1   2 ) ,   α D 4 = ( 2   3   1   1 ) ,   α θ 4 = ( 2   2   2   2 ) ;
α 0 5 = ( 2   1   1   1   2 ) ,   α X 5 = ( 2   1   1   2   1 ) ,   α D 5 = ( 2   3   1   1   2 ) ,   α θ 5 = ( 2   2   2   2   2 ) ;
α 0 6 = ( 2   1   1   1   2   3 ) ,   α X 6 = ( 2   1   1   2   1   3 ) ,     α D 6 = ( 2   3   1   1   2   1 ) ,   α θ 6 = ( 2   2   2   2   2   2 ) ;
and u j = ( u 1   u 2     u j ) with u distributed IID normal with mean 0 and variance 1.

3.2. Case II: Contaminated IV

In the next set of simulations, we consider a similar framework as that of the baseline case but now draw Z and θ from a bivariate normal distribution ( Z θ ) ~ N [ ( 0 0 ) ,   ( 1 ρ Z θ ρ Z θ 1 ) ] , with ρ Z θ (degree of contamination of Z with θ ) taking values of 0 (baseline—no contamination), 0.010, 0.015, 0.025, 0.035, 0.050, 0.075, and 0.10.

3.3. Estimators

We compare the estimation of the endogenous treatment effects through the bias, efficiency, and coverage probability of alternative causal estimators by performing extensive set of Monte Carlo simulations. The simulations were conducted over 1000 replications of datasets, each of size 2000. The alternative estimators include the following: (i) naive OLS, (ii) instrumental variables (2SLS), (iii) shared latent factor without an IV (SLF), and (iv) shared latent factor with an IV (SLF + IV). In the contaminated IV case, we do not perform the simulations by using the SLF estimator, since variations in the degree of contamination have no bearing on the treatment-effect parameters in this setup.4

3.4. Results

In Table 1, we present results from the baseline simulations (uncontaminated IV) for the alternative estimators. As expected, the naive OLS estimators are severely biased, whereas the IV estimators are very close to their true values (coverage probability ~ 0.95) for each of the outcomes. The bias in case of the SLF estimator is much lower compared to the naive OLS estimator (coverage probability ~ 0.85); although, it is somewhat higher compared to that of the 2SLS estimator. For the same model, the standard errors are much larger than that of those of the 2SLS estimator for all the outcomes. The model with the SLF + IV estimator performs quite well—the treatment effects are very close to their true values (coverage probability ~ 0.95), and the standard errors are lower compared to those obtained by using the 2SLS estimator for all the different outcomes. Our results indicate that a SLF estimator produces much less bias compared to a naive OLS estimator but may not be as efficient as the 2SLS estimator. Such an estimator may be preferred for causal inference in the absence of an IV, especially if one has large sample sizes. In addition, we find that a model with a SLF + IV estimator has comparable bias and coverage probability but may be more efficient than a 2SLS estimator. In the presence of multiple outcome measurements, investigators may choose such an estimator.
Next, we consider the case of contaminated IV and display the simulation results in Table 2, Table 3 and Table 4 for number of outcomes varying from 4 to 6. At all levels of contamination, we observe that the 2SLS and the SLF + IV estimators have a substantially lower percentage bias compared to the naive OLS estimator. Comparing the two causal estimators, we find that, even at low degrees of contamination, the SLF + IV estimator outperforms the 2SLS estimator in terms of percentage bias and efficiency—with the percentage bias of the 2SLS estimator being almost twice as large as that of the SLF + IV estimator—although both the estimators have similar coverage probability (>0.90), under the parameterizations considered in the Monte Carlo simulations. At higher levels of contamination, the SLF + IV estimator dominates the 2SLS estimator in terms of bias, efficiency, and coverage probability; however, it needs to be borne in mind that the coverage probability of both the models deteriorate substantially when ρ Z θ = 0.10. These findings suggest that, in the presence of multiple outcomes and a strong but contaminated IV, a SLF + IV estimator may be chosen.

4. Empirical Example: Effect of Mental Illness on Work Absenteeism and Disability

4.1. Introduction

Psychiatric disorders are widely prevalent, with an estimated 47.6 million adults (age 18 and above) with any mental disorder (excluding developmental disorders and substance use disorders) in the past year—19.1% in 2018 (Substance Abuse and Mental Health Services Administration 2019). The debilitating impact of mental illness on work outcomes and disability is well established (Jarl et al. 2020; Diby et al. 2020; Banerjee et al. 2014 and 2017; Chatterji et al. 2007; Chatterji et al. 2011; Doshi et al. 2008; Ettner et al. 1997; OECD 2012). Psychiatric symptoms such as insomnia/hypersomnia, indecisiveness, and fatigue (Banerjee et al. 2014) can impair an individual’s ability to obtain and sustain employment and also lower on-the-job productivity (Chatterji et al. 2011). In addition, employers may be unable/unwilling to make suitable arrangements for their employees with mental health needs, thereby worsening their mental health condition and, in turn, adversely affecting their work and/or raising their level of disability along one or more domains.
The primary challenge in the literature to assess the impact of mental illness on labor market outcomes and disability has been to obtain relatively “clean” causal estimates of the treatment effect, i.e., whether and to what extent mental illness is a causal factor for poor work outcomes and disability. Mental illness may be endogenous in a model of work outcomes/disability if (1) the mental illness variable is measured with error, (2) there exists a reverse causal pathway, i.e., poor employment outcomes/disability lead to mental health problems, and (3) there are unobserved (to the analyst/researcher) confounders that are correlated with treatment assignment and the outcomes. The first source of endogeneity can be addressed by using a latent mental health construct, whereby the latent variable is generated from varied psychiatric symptoms (multiple indicators), using a multiple indicator model (Joreskog and Goldberger 1975). The second source of confounding can be mitigated by using lagged value(s) of the mental health variable. The third source of confounding is what we address in this paper.
In an ideal situation, by using a well-designed randomized controlled trial (RCT) that is adequately powered for its outcomes, one would be able to randomly assign individuals to psychiatric disorders (or not) and, subsequently, assess the impact of mental illness on the outcomes. However, clearly, such a thing would not be ethical, and, therefore, one must rely on quasi-experimental methods to make any claim about the causal effect of psychiatric illness. Instrumental variables can provide an exogenous source of variation in the treatment variable—IVs are correlated with the treatment variable but do not affect the outcome variable directly—and can be used to identify the treatment effect, as noted earlier. Good IVs are difficult to find and defend, based on conceptual and theoretical considerations. Chatterji et al. (2011) summarizes identifying instruments that have been used in the mental health–labor market outcomes context: parental history of psychiatric problems (Chatterji et al. 2011; Ettner et al. 1997; Marcotte and Wilcox-Gök 2003), early onset of psychiatric disorders (Banerjee et al. 2017; Chatterji et al. 2011; Ettner et al. 1997), and religiosity (MacDonald and Shields 2004), to name a few. Although each of these instruments is correlated with mental illness, it is hard to make a case that they do not have a direct impact on work. For example, early onset of disorders may be correlated with individual personality traits/characteristics, such as indolence, that may be associated with one’s ability to obtain or maintain employment. Again, religiosity may be an imperfect IV, since participation in religious services can aid in social networking, which, in turn, can foster employment opportunities. To the extent that these aforementioned instruments have a small degree of contamination, the IV method would be preferred (in terms of lower mean squared error) to the naive OLS estimator (Basu and Chan 2014). However, if the instruments are moderately/severely contaminated, one would have to rely on the OLS estimates—which, in themselves, are often substantially biased—to make a causal claim.
The methods proposed in this paper—using a shared latent factor framework—can be used to obtain “improved” causal estimates of the effect of mental illness on work absenteeism and disability among working individuals under both circumstances: (1) when no IV is available and (2) a contaminated IV is present.

4.2. Data

We use data from the National Comorbidity Survey Replication (NCS-R) (Kessler and Merikangas 2004), which is a part of the Collaborative Psychiatric Epidemiology Surveys (CPES)—collected by the University of Michigan Survey Research Center. Data collected for the NCS-R are based on a multi-stage area probability sample. The NCS-R is a nationally representative household survey of the non-institutionalized, English-speaking population of people who are 18 years and older. The survey, conducted between February 2001 and April 2003, comprised two parts—Part I included a core diagnostic assessment, with a sample size of 9282, and Part II was administered to all the respondents from Part I of the survey who met lifetime criteria for any disorder, as well as a probability sample of new respondents (N = 5692).
The dependent variables are measures of labor-market outcome and disability for employed individuals: (a) number of full days of work missed in the last 30 days, (b) mobility impairment score, (c) cognitive impairment score, (d) social interaction interference score, and (e) role impairment score. The labor market outcome (a) is generated from the stem question regarding the number of full days of work missed in the last 30 days. The measures of disability on the domains of mobility, cognition, social interaction interference, and role impairment are based on the World Health Organization Disability Assessment Schedule (WHO DAS II) (Chwastiak and Von Korff 2003) and are on a continuous scale of 0 to 100, with 0 indicating no disability, and 100 indicating full disability.
The treatment variable is a binary indicator for meeting diagnostic criteria for major depressive episode (MDE) or generalized anxiety disorder (GAD) in the past 12 months. The diagnosis of the aforementioned psychiatric disorders is based on the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) (American Psychiatric Association 1994).
The instrumental variable for mental illness is parent/parental figure’s experience of a period of sadness for at least two weeks or a period of constant anxiety/nervousness for at least one month during most of the respondent’s childhood (see Chatterji et al. 2011; Ettner et al. 1997, for empirical examples in a similar context).
The control variables comprise age (25–34, 35–44, 45–54, and 55–64; reference group 18–24), race/ethnicity (Asian, Latino, African American, with non-Latino Whites as the reference category), marital status (married/cohabiting, widowed/divorced/separated, with single as the reference), education (12 years, 13–15 years, 16 or more years, with less than 12 years as the reference category), chronic physical conditions (arthritis/rheumatism, stroke, heart attack, diabetes, ulcer, and cancer at any point during their lifetime), and region of residence (Midwest, south, and west, with northeast as the baseline).
Our preliminary sample comprised respondents from Part II of the NCS-R (N = 5692). We restricted the analysis to individuals who were currently employed (N = 3766) and used list-wise deletion for missing covariates: arthritis/rheumatism (N = 3; 0.08%), heart attack (N = 1; 0.03%), diabetes (N = 1; 0.03%), and ulcer (N = 5; 0.13%), yielding a final analytic sample of 3756 individuals.

4.3. Methods

The summary statistics are presented for the overall sample, as well as stratified by mental health status. We present the mean and standard deviation for continuous variables and percentage in case of categorical variables. To test for statistical difference between those with and those without mental illness, we used a t-test for continuous variables and a chi-squared test for categorical variables. To adjust for the complex survey design, we used appropriate survey weights.
In our baseline model with an OLS estimator, we used a linear specification to model the labor market outcome and measures of mobility, cognitive, social interaction, and role impairment as a function of the endogenous mental illness treatment indicator and other covariates hypothesized to affect the outcomes. In the next specification, we use 2SLS estimator with history of parental psychiatric problems as an instrument for the endogenous mental illness treatment variable. To the extent that the instrument is contaminated (moderately or severely)—say, for example, if there are genetic traits that predispose an individual to higher levels of disability and they are also correlated with parental mental health—one would expect the treatment effect to be biased. Next, we have the SLF estimator, which does not rely on identification of the structural parameters on any instrumental variable. The bias emanating from the SLF estimator is expected to be lower than that from the naive OLS estimator but may be larger or smaller than that from the 2SLS and SLF + IV estimators, depending on the degree of contamination of the instrumental variable. Finally, we estimate the SLF + IV estimator, which incorporates elements of both the SLF and 2SLS estimators. The structure and details about the SLF and the SLF + IV estimators were presented earlier, in Section 2 and Section 3. As mentioned earlier, the bias from the SLF + IV estimator is expected to be lower than that from the naive OLS and 2SLS estimators but may be greater or smaller, as compared to that of those from the SLF estimator—depending on the level of contamination of the instrument. We estimated both the SLF and SLF + IV estimators, using STATA 15 and the “gsem” package.5 The outcomes were modeled by using a linear specification, whereas the endogenous treatment dummy was modeled by using a probit specification. A maximum likelihood estimation approach was used for the shared latent factor models, both with and without the instrumental variable.

4.4. Results

4.4.1. Weighted Summary Statistics

In order to understand the underlying features of our sample, we present the weighted descriptive statistics in Table 5. The prevalence of mental illness in our sample was 9.93%. Individuals with mental illness had a significantly higher number of work absenteeism, as compared to those without a diagnosed mental health condition (mean = 2.02 vs. 1.09; p value < 0.001). Similarly, the disability scores were significantly higher among individuals with mental illness, compared to those without a mental illness diagnosis for each domain of disability—mobility impairment (mean = 4.22 vs. 1.61; p value < 0.001); cognitive impairment (mean = 2.87 vs. 0.35; p value < 0.001); social interaction interference (mean = 1.78 vs. 0.16; p value < 0.001); and role impairment (mean = 9.55 vs. 2.91; p value < 0.001). The prevalence of parental psychiatric illness for a sustained period of time during the respondent’s childhood was significantly higher among those with mental illness, as compared to those without mental illness (53.82% vs. 27.57%; p value < 0.001). The majority of individuals in our sample was between 35 and 54 years, with 26.27% in the 35–44 age group and 23.62% in the 45–54 group. The prevalence of arthritis/rheumatism was 20.3% in our sample and did not differ significantly between individuals with and those without mental illness. A significantly higher proportion of individuals in the mentally ill group had ulcer, compared to those in the group without mental illness (13.52% vs. 7.43%; p value < 0.001).

4.4.2. Effect of Mental Illness on Work Absenteeism in the Past Month

In Table 6, the results of the impact of mental illness on work absenteeism are presented. The SLF and SLF + IV, as well as the OLS estimates, suggest that mental illness has a detrimental effect on work absenteeism; in particular, mental illness significantly increases work absenteeism in the past month by four-fifths of a day. We also noted a negative education gradient in work absenteeism—individuals with 13–15 years of education or 16 or more years of education had a significantly lower number of work absences, as compared to those with fewer than 12 years of education. Moreover, individuals who reported having a heart attack had a significantly higher number of work absences in the past month, as compared to those who did not report having had a heart attack.

4.4.3. Effect of Mental Illness on Mobility Impairment Score in the Past Month

The results in Table 7 represent the findings of the impact of mental illness on mobility impairment score. Individuals with mental illness had a significantly higher mobility impairment score, as compared to those without a mental health diagnosis. The OLS, SLF, and SLF + IV estimates indicate an increase in mobility impairment score by 2.2 (OLS) to 2.3 (SLF, SLF + IV) percentage points, as a result of mental illness. Since the average mobility impairment score among individuals not having mental illness was 1.6 (Table 5), we can infer that a change from no mental illness to psychiatric illness resulted in a significant increase in mobility score by 138% (OLS)6 to 144% (SLF, SLF + IV). We also observed that individuals who reported the chronic conditions of arthritis, heart attack, diabetes, and cancer had significantly higher mobility-impairment score. The 2SLS point estimate (and standard error) of the effect of mental illness on mobility impairment was substantially larger, as compared to those obtained from the other estimators.

4.4.4. Effect of Mental Illness on Cognitive Impairment Score in the Past Month

In Table 8, the results of the impact of mental illness on cognitive impairment are shown. The OLS, SLF, and SLF + IV estimates were comparable and indicated a significantly higher cognitive impairment score among those with mental illness, as compared to those without a psychiatric diagnosis. Given an average cognitive impairment score of 0.35 among individuals without any psychiatric disorder (Table 5), the increase in cognitive impairment score by 2.3 (OLS) to 2.4 (SLF, SLF + IV) percentage points attributable to mental illness (Table 8) represents more than a six-fold increase in cognitive disability score. Women and individuals with arthritis, ulcer had a significantly higher cognitive disability (see OLS, SLF, SLF + IV results in Table 8). Old age (≥65 age) was a protective factor for cognitive disability (see OLS, SLF, and SLF + IV results in Table 8). The treatment effect of mental illness on cognitive disability score, using the 2SLS estimator, was more than three times higher, as compared to those obtained using the OLS, SLF, and SLF + IV estimators.

4.4.5. Effect of Mental Illness on Social Interaction Interference Score in the Past Month

Table 9 presents the results of the impact of mental illness on social interaction interference score. In line with findings of significant debilitating effect of mental illness on mobility and cognitive impairment score, we found a significant increase in social interaction interference score due to mental illness. The OLS, SLF, and SLF + IV estimates point to a 1.5 percentage point increase in social interaction interference score resulting from mental illness. This represents a large effect, given that the social interaction interference score among individuals without a mental health condition was 0.16 (Table 5). Asians and those in the 55–64 age group had significantly lower social interaction interference scores (see OLS, SLF, and SLF + IV results in Table 9). Ulcer was a risk factor for social interaction impairment (see OLS, SLF, and SLF + IV results in Table 9). The mental illness treatment effect, using the 2SLS estimator, was twice as large than that obtained by using the OLS, SLF, and SLF + IV estimators; the standard error of the treatment effect was also much larger when using the 2SLS estimator.

4.4.6. Effect of Mental Illness on Role Impairment Score

Finally, in Table 10, the results of the impact of mental illness on role impairment score are displayed. Psychiatric illness significantly increased role impairment score by 5.9 (OLS) to 6.9 (SLF, SLF + IV) percentage points. Since the mean role impairment score was 2.9 among individuals having no mental health issues (Table 5), our treatment-effect estimates suggest that a change in mental health status from no mental illness to a diagnosable mental health condition resulted in an increase in role impairment score by 203% (OLS) to 238% (SLF, SLF + IV). The 2SLS point estimate and standard errors of the treatment effect were substantially larger, as compared to those obtained by using the OLS, SLF, and SLF + IV estimators. We observed that, in regard to women, those with arthritis, diabetes, and ulcer had a significantly higher role impairment score (see OLS, SLF, and SLF + IV results in Table 10). Individuals in the 55–64 age group had a lower role impairment score (see OLS, SLF, and SLF + IV results in Table 10).

4.5. Discussion

Overall, our findings indicate that mental illness has a detrimental impact on work and leads to higher levels of impairment across multiple domains of disability. Notably, mental illness increases work absenteeism in the past month by four-fifths of a day. Banerjee et al. (2017) used a combined sample of the NCS-R and the National Latino and Asian American Study (NLAAS) and identified the treatment effect of mental illness on work absenteeism, using heteroskedasticity-based moment conditions. The estimated increase in absenteeism due to mental illness was about two days in Banerjee et al. (2017). The mental illness measure used in that study was a continuous latent variable that was generated from varied psychiatric symptoms of four mental disorders—major depressive episode, social phobia, panic attack, and generalized anxiety disorders. It is possible that the exclusion of panic attack and social phobia from the mental illness binary indicator in the present study may have resulted in a more modest effect of psychiatric illness on absenteeism. We also found that mental illness leads to greater disability in terms of mobility impairment (2.2 to 2.3 percentage points), cognitive impairment (2.3 to 2.4 percentage points), social interaction interference (1.5 to 1.6 percentage points), and role impairment (5.9 to 6.9 percentage points).
The OLS estimates of the treatment effect of mental illness on work and disability outcomes are very similar to those of the shared latent factor models—with and without the instrumental variable. This suggests that endogeneity of mental illness in the context of work absenteeism and disability in the domains of mobility impairment, cognitive impairment, social interaction interference, and role impairment in our analytic sample of working individuals may not be a source of major concern, given our objective of obtaining causal estimates of the impact of mental illness on work and disability outcomes. On the other hand, the 2SLS estimates (and standard errors) of the treatment effects of mental illness on the disability outcomes are much larger, as compared to those obtained by using the shared latent factor estimators (and OLS estimator), and raises concern about the validity of the instrumental variable in our context and, therefore, the potential for biased and inefficient estimates. In the absence of any additional identifying instrument, however, we cannot directly test for the validity of the instrumental variable and can only infer about the magnitude and direction of the bias emanating from a potentially contaminated IV, by comparing the findings to those of the shared latent factor model and OLS models.

5. Conclusions

In this paper, we directly compared two causal inference methods, which, to our knowledge, have not been directly compared before, in the presence of multiple outcomes. The first is the traditional instrumental variables method, which is widely used in empirical research, when one of the covariates of interest is endogenous. By using appropriate instrument(s), one can obtain consistent and unbiased estimates of the treatment effects; however, the IV estimator is less efficient than the OLS estimator. The second is the latent factor model, which, in the absence of an instrumental variable, can be used to obtain a causal treatment effect as long as the scale of the factor and the factor loadings can be estimated through multiple outcome measurements. We first provided a unifying theoretical framework to compare the treatment-effect bias from alternative estimators—naive OLS, instrumental variables (2SLS), SLF, and SLF + IV. Subsequently, we compared the estimation of the endogenous treatment effects through the bias, efficiency, and coverage probability of alternative causal estimators by performing an extensive set of simulations. Finally, we highlighted the applicability of the alternative estimators in a practical application of the impact of mental illness on work absenteeism and disability among working individuals, using the NCS-R.
In the presence of an uncontaminated IV, our simulation results suggest that the SLF estimator produces lower bias, as compared to a naive OLS estimator, but it may not be as efficient as the 2SLS estimator. Such an estimator may be preferred for causal inference in the absence of an IV, especially if one has large sample sizes. The SLF + IV estimator has comparable bias and coverage probability but may be more efficient than a 2SLS estimator and, therefore, may be preferred when investigators have a series of outcomes on which treatment effects are estimated. In the case of a contaminated IV, we find that, at low levels of contamination, the SLF + IV estimator outperforms naive OLS, SLF, and 2SLS estimators—in terms of percentage bias, efficiency, and coverage probability; however, at higher levels of contamination, the SLF estimator outperforms all others, given the parameterizations considered in this study.
In our empirical example, we find that mental illness significantly increases work absenteeism and impairment scores in the domains of mobility, cognition, social interaction, and role.
Future work should consider exploring the relationship between strength of an instrumental variable and contamination levels in the SLF + IV framework, to assist applied researchers and practitioners determine the optimal causal inference strategy in their particular real-world setting. Extension of the shared latent factor models, to capture unobservable variables along multiple domains and assess the relative performance of these causal estimators to single-shared latent factor models and other more well-known causal inference methods, in the presence of multiple outcome measurements, should be considered.

Author Contributions

Conceptualization, S.B. and A.B.; data curation, S.B.; formal analysis, S.B. and A.B.; investigation, S.B. and A.B.; methodology, S.B. and A.B.; project administration, A.B.; supervision, A.B.; writing—original draft, S.B. and A.B.; writing—review and editing, S.B. and A.B. Both authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Publicly available dataset was analyzed in this study. This data can be found here: https://www.icpsr.umich.edu/web/ICPSR/studies/20240.

Acknowledgments

The authors thank Edward Norton and Steven Pizer as well as participants at the 6th Biennial Conference of the American Society of Health Economists (ASHEcon) in Philadelphia, PA, June 2016; International Health Economics Association (iHEA) 2017 Boston Congress, Boston, MA, July 2017; Applied Quantitative Methods Seminar at the Health Law, Policy & Management, Boston University School of Public Health, Boston, MA, November 2019; and Department of Humanities and Social Sciences, Indian Institute of Technology Bombay, Mumbai, India, March 2020.

Conflicts of Interest

The authors declare no conflict of interest.

Notes

  • There are some necessary tests that exist. For example, studying the balance of observed confounders across levels of the IV. However, these tests are not sufficient to prove validity of the IV.
  • v a r ( D ^ I V ) = σ D ^ I V 2 = β ^ 1 2 v a r ( Z )
    σ D ^ I V 2 = [ c o v ( D , Z ) v a r ( Z ) ] 2 . v a r ( Z )
    σ D ^ I V 2 = ρ D Z 2   / σ Z 2   .
  • See (Carneiro et al. 2003, pp. 372–73) for necessary condition for identification of latent factor models.
  • In the contaminated IV case, although the treatment-effect parameters are also unaffected by using a naïve OLS estimator with alternative values of contamination of the instrumental variable, we present the results, nonetheless, since these provide a benchmark for comparison of the results with the 2SLS and SLF + IV estimators.
  • ‘gsem’ stands for generalized structural equation model.
  • Percentage change in mobility score = ( 2.2 1.6 ) 100 = 138 % .

References

  1. American Psychiatric Association. 1994. Diagnostic and Statistical Manual of Mental Disorders, 4th ed. Washington, DC: American Psychiatric Association. [Google Scholar]
  2. Angrist, Joshua D., and Alan B. Krueger. 2001. Instrumental Variables and the Search for Identification: From Supply and Demand to Natural Experiments. NBER Technical Working Paper 8456: 1–29. [Google Scholar]
  3. Angrist, Joshua D., Guido W. Imbens, and Donald B. Rubin. 1996. Identification of Causal Effects Using Instrumental Variables. Journal of the American Statistical Association 91: 444–55. [Google Scholar] [CrossRef]
  4. Banerjee, Souvik, Pinka Chatterji, and Kajal Lahiri. 2014. Identifying the Mechanisms for Workplace Burden of Psychiatric Illness. Medical Care 52: 112–20. [Google Scholar] [CrossRef] [PubMed]
  5. Banerjee, Souvik, Pinka Chatterji, and Kajal Lahiri. 2017. Effects of Psychiatric Disorders on Labor Market Outcomes: A Latent Variable Approach Using Multiple Clinical Indicators. Health Economics 26: 184–205. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Basu, Anirban, and Kwun Chuen Gary Chan. 2014. Can we make smart choices between OLS and contaminated IV methods? Health Economics 23: 462–72. [Google Scholar] [CrossRef] [Green Version]
  7. Bound, John, David A. Jaeger, and Regina M. Baker. 1995. Problems with Instrumental Variables Estimation When the Correlation Between the Instruments and the Endogenous Explanatory Variable is Weak. Journal of the American Statistical Association 90: 443–50. [Google Scholar] [CrossRef]
  8. Carneiro, Pedro, Karsten T. Hansen, and James J. Heckman. 2003. Estimating distributions of treatment effects with an application to the returns to schooling and measurement of the effects of uncertainty on college choice. International Economic Review 44: 361–422. [Google Scholar] [CrossRef]
  9. Chatterji, Pinka, Margarita Alegria, and David Takeuchi. 2011. Psychiatric disorders and labor market outcomes: Evidence from the National Comorbidity Survey-Replication. Journal of Health Economics 30: 858–68. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Chatterji, Pinka, Margarita Alegria, Mingshan Lu, and David Takeuchi. 2007. Psychiatric disorders and labor market outcomes: Evidence from the National Latino and Asian American Study. Health Economics 16: 1069–90. [Google Scholar] [CrossRef] [PubMed]
  11. Chwastiak, Lydia A., and Michael Von Korff. 2003. Disability in depression and back pain Evaluation of the World Health Organization Disability Assessment Schedule (WHO DAS II) in a primary care setting. Journal of Clinical Epidemiology 56: 507–14. [Google Scholar] [CrossRef]
  12. Diby, Akissi S., Pascale Lengagne, and Camille Regaert. 2020. Employment Vulnerability of People with Severe Mental Illness. Health Policy 125: 269–75. [Google Scholar] [CrossRef] [PubMed]
  13. Doshi, Jalpa A., Liyi Cen, and Daniel Polsky. 2008. Depression and Retirement in Late Middle-Aged U.S. Workers. Health Services Research 43: 693–713. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Ettner, Susan L., Richard G. Frank, and Ronald C. Kessler. 1997. The Impact of Psychiatric Disorders on Labor Market Outcomes. Industrial and Labor Relations Review 51: 64–81. [Google Scholar] [CrossRef]
  15. Gilleskie, Donna, and Denise Hoffman. 2014. Health Capital and Human Capital as Explanations for Health-Related Wage Disparities. Journal of Human Capital 8: 235–79. [Google Scholar] [CrossRef] [Green Version]
  16. Gilleskie, Donna B., and Koleman S. Strumpf. 2005. The Behavioral Dynamics of Youth Smoking. Journal of Human Resources 40: 822–66. [Google Scholar] [CrossRef]
  17. Gilleskie, Donna B., Euna Han, and Edward C. Norton. 2017. Disentangling the Contemporaneous and Dynamic Effects of Human and Health Capital on Wages over the Life Cycle. Review of Economic Dynamics 25: 350–83. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  18. Goldberger, Arthur S. 1972. Structural Equation Methods in the Social Sciences. Econometrica 40: 979–1001. [Google Scholar] [CrossRef]
  19. Hauser, Robert M., and Arthur S. Goldberger. 1971. The Treatment of Unobservable Variables in Path Analysis. Sociological Methodology 3: 17. [Google Scholar] [CrossRef]
  20. Heckman, James J., and Richard Robb Jr. 1985. Alternative methods for evaluating the impact of interventions: An overview. Journal of Econometrics 30: 239–67. [Google Scholar] [CrossRef]
  21. Heckman, James J., Jora Stixrud, and Sergio Urzua. 2006. The Effects of Cognitive and Noncognitive Abilities on Labor Market Outcomes and Social Behavior. Journal of Labor Economics 24: 411–82. [Google Scholar] [CrossRef] [Green Version]
  22. Imbens, Guido W., and Jeffrey M. Wooldridge. 2009. Recent Developments in the Econometrics of Program Evaluation. Journal of Economic Literature 47: 5–86. [Google Scholar] [CrossRef] [Green Version]
  23. Jarl, Johan, Anna Linder, Hillevi Busch, Anja Nyberg, and Ulf-G. Gerdtham. 2020. Heterogeneity in the associations between common mental disorders and labour outcomes—A population study from southern Sweden. BMC Public Health 20: 1285. [Google Scholar] [CrossRef] [PubMed]
  24. Jöreskog, Karl G., and Arthur S. Goldberger. 1975. Estimation of a Model with Multiple Indicators and Multiple Causes of a Single Latent Variable. Journal of the American Statistical Association 70: 631–39. [Google Scholar]
  25. Kessler, Ronald C., and Kathleen R. Merikangas. 2004. The National Comorbidity Survey Replication (NCS-R): Background and aims. nternational Journal of Methods in Psychiatric Research 13: 60–68. [Google Scholar] [CrossRef] [PubMed]
  26. Lewbel, Arthur. 2012. Using Heteroscedasticity to Identify and Estimate Mismeasured and Endogenous Regressor Models. Journal of Business & Economic Statistics 30: 67–80. [Google Scholar] [CrossRef]
  27. MacDonald, Ziggy, and Michael A. Shields. 2004. Does problem drinking affect employment? Evidence from England. Health Economics 13: 139–55. [Google Scholar] [CrossRef] [PubMed]
  28. Marcotte, Dave E., and Virginia Wilcox-Gok. 2003. Estimating earnings losses due to mental illness: A quantile regression approach. The Journal of Mental Health Policy Economics 6: 123–34. [Google Scholar]
  29. OECD. 2012. Sick on the Job?: Myths and Realities about Mental Health and Work, Mental Health and Work. Paris: OECD Publishing. [Google Scholar]
  30. Prada, María F., and Sergio Urzúa. 2017. One Size Does Not Fit All: Multiple Dimensions of Ability, College Attendance and Earnings. Journal of Labor Economics 35: 953–91. [Google Scholar] [CrossRef]
  31. Rodríguez, Jorge, Sergio Urzúa, and Loreto Reyes. 2016. Heterogeneous Economic Returns to Post-Secondary Degrees: Evidence from Chile. Journal of Human Resources 51: 416–60. [Google Scholar] [CrossRef]
  32. Substance Abuse and Mental Health Services Administration. 2019. Key Substance Use and Mental Health Indicators in the United States: Results from the 2018 National Survey on Drug Use and Health (HHS Publication No. PEP19-5068, NSDUH Series H-54); Rockville: Center for Behavioral Health Statistics and Quality, Substance Abuse and Mental Health Services Administration. Available online: https://www.samhsa.gov/data/ (accessed on 27 September 2020).
  33. Terza, Joseph V., Anirban Basu, and Paul J. Rathouz. 2008. Two-stage residual inclusion estimation: Addressing endogeneity in health econometric modeling. Journal of Health Economics 27: 531–43. [Google Scholar] [CrossRef] [Green Version]
  34. Urzúa, Sergio. 2008. Racial Labor Market Gaps: The Role of Abilities and Schooling Choices. Journal of Human Resources 43: 919–71. [Google Scholar] [CrossRef]
  35. Vella, Francis, and Marno Verbeek. 1999. Estimating and Interpreting Models with Endogenous Treatment Effects. Journal of Business & Economic Statistics 17: 473–78. [Google Scholar]
Table 1. Treatment effects with uncontaminated instrumental variable (IV).
Table 1. Treatment effects with uncontaminated instrumental variable (IV).
# Outcomes →y = 4y = 5y = 6
Treatment Effect
{% Bias}
(SE)
[Coverage Probability]
Naive OLS2SLS †SLF ⍕SLF + IV ‡Naïve OLS2SLS †SLF ⍕SLF + IV ‡Naive OLS2SLS †SLF ⍕SLF + IV ‡
α D 1 (True value: 2)3.7591.9932.1701.9963.7622.0032.1722.0003.7661.9932.2251.998
{87.97%}−{0.35%}{8.51%}−{0.19%}{88.11%}{0.15%}{8.61%}{0.02%} {88.29%}−{0.33%}{11.24%}−{0.09%}
(0.10)(0.18)(0.48)(0.12)(0.10)(0.18)(0.47)(0.12)(0.10)(0.18)(0.48)(0.12)
[0.00][0.95][0.86][0.94][0.00][0.96][0.87][0.94][0.00][0.95][0.86][0.94]
α D 2 (True value: 3)4.7602.9913.1752.9984.7593.0013.1742.9984.7642.9963.2222.996
{58.67%}−{0.31%}{5.83%}−{0.08%}{58.65%}{0.04%}{5.78%}−{0.07%}{58.80%}−{0.15%} {7.41%}−{0.12%}
(0.10)(0.18)(0.48)(0.12)(0.10)(0.18)(0.47)(0.12)(0.10)(0.18)(0.48)(0.12)
[0.00][0.96][0.86][0.95][0.00][0.95][0.87][0.94][0.00][0.95][0.85][0.94]
α D 3 (True value: 1)2.7580.9911.1680.9952.7611.0001.1730.9992.7670.9951.2250.999
{175.75%}−{0.88%} {16.84%}−{0.46%}{176.11%}−{0.04%}{17.28%}−{0.11%}{176.68%}−{0.47%}{22.50%}−{0.11%}
(0.10)(0.18)(0.48)(0.12)(0.10)(0.18)(0.47)(0.12)(0.10)(0.18)(0.48)(0.12)
[0.00][0.96][0.86][0.96][0.00][0.95][0.86][0.95][0.00][0.95][0.85][0.95]
α D 4 (True value: 1)2.7590.9911.1750.9962.7620.9981.1700.9992.7650.9961.2260.997
{175.90%}−{0.93%} {17.48%}−{0.37%}{176.17%}−{0.21%}{17.02%}−{0.09%}{176.54%}−{0.45%}{22.56%}−{0.28%}
(0.10)(0.18)(0.48)(0.12)(0.10)(0.18)(0.47)(0.12)(0.10)(0.18)(0.48)(0.12)
[0.00][0.95][0.85][0.95][0.00][0.95][0.87][0.94][0.00][0.94][0.86][0.96]
α D 5 (True value: 2) 3.7622.0042.1732.0013.7661.9992.2261.999
{88.09%}{0.19%}{8.66%}{0.05%}{88.30%}−{0.04%}{11.29%}−{0.04%}
(0.10)(0.18)(0.47)(0.12)(0.10)(0.18)(0.48)(0.12)
[0.00][0.95][0.87][0.95][0.00][0.95][0.86][0.95]
α D 6 (True value: 1) 2.7660.9981.2240.998
{176.62%}−{0.24%}{22.39%}−{0.20%}
(0.10)(0.18)(0.48)(0.12)
[0.00][0.95][0.86][0.94]
F-stat on excluded instrument 1749 1578 1525
N200020002000200020002000200020002000200020002000
Notes: † denotes two-stage least-squares model. ⍕ denotes shared latent factor model without IV. ‡ denotes shared latent factor model with IV. OLS, ordinary least squares; 2SLS, two-stage least squares estimator; SLF, shared latent factor estimator; SLF + IV, shared latent factor + instrumental variables estimator.
Table 2. Treatment effects with contaminated IV: outcomes = 4.
Table 2. Treatment effects with contaminated IV: outcomes = 4.
Degree   of   Contamination     ρ Z θ = 0.000 ρ Z θ = 0.010 ρ Z θ = 0.015 ρ Z θ = 0.025 ρ Z θ = 0.035 ρ Z θ = 0.050 ρ Z θ = 0.075 ρ Z θ = 0.100
Treatment Effect
{% Bias}
(SE)
[Coverage Probability]
Naive OLSSLF2SLS †SLF + IV ‡2SLS †SLF + IV ‡2SLS †SLF + IV ‡2SLS †SLF + IV ‡2SLS †SLF + IV ‡2SLS †SLF + IV ‡2SLS †SLF + IV ‡
(1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)
α D 1 (True value: 2)3.7592.1702.0652.0352.1012.0502.1732.0822.2442.1142.3512.1632.5252.2462.6972.333
{87.97%}{8.51%}{3.25%}{1.74%}{5.06%}{2.52%}{8.65%}{4.09%} {12.22%} {5.69%} {17.53%} {8.13%} {26.26%} {12.28%} {34.85%} {16.63%}
(0.10)(0.48)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.17)(0.13)(0.17)(0.13)
[0.00][0.86][0.93][0.94][0.91][0.92][0.82][0.89][0.70][0.85][0.49][0.73][0.17][0.50][0.02][0.28]
α D 2 (True value: 3)4.7603.1753.0663.0343.1023.0493.1743.0813.2453.1133.3513.1613.5263.2453.6983.331
{58.67%} {5.83%} {2.20%} {1.13%} {3.40%} {1.65%} {5.80%} {2.70%} {8.18%} {3.76%} {11.71%} {5.38%} {17.53%} {8.16%} {23.26%} {11.05%}
(0.10)(0.48)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.17)(0.13)(0.17)(0.13)
[0.00][0.86][0.92][0.94][0.91][0.94][0.81][0.90][0.70][0.85][0.49][0.73][0.16][0.49][0.03][0.28]
α D 3 (True value: 1)2.7581.1681.0701.0381.1061.0541.1781.0851.2491.1171.3551.1651.5301.2491.7021.335
{175.75%} {16.84%} {7.00%} {3.82%} {10.61%} {5.37%} {17.79%} {8.49%} {24.92%} {11.68%} {35.53%} {16.54%} {52.98%} {24.89%} {70.15%} {33.54%}
(0.10)(0.48)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.17)(0.13)(0.17)(0.13)
[0.00][0.86][0.93][0.93][0.91][0.91][0.83][0.88][0.69][0.83][0.47][0.73][0.15][0.50][0.02][0.26]
α D 4 (True value: 1)2.7591.1751.0661.0351.1021.0511.1741.0821.2451.1141.3511.1631.5261.2471.6981.333
{175.90%} {17.48%}{6.62%}{3.55%} {10.23%} {5.09%} {17.41%} {8.24%} {24.54%} {11.42%} {35.15%} {16.32%} {52.60%} {24.67%} {69.77%} {33.31%}
(0.10)(0.48)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.17)(0.13)(0.17)(0.13)
[0.00][0.85][0.94][0.94][0.90][0.92][0.82][0.88][0.70][0.83][0.49][0.72][0.16][0.51][0.02][0.28]
F-stat on excluded instrument 1779 1793 1839 1863 1904 2005 1967
N2000200020002000200020002000200020002000200020002000200020002000
Notes: † denotes two-stage least-squares model. ‡ denotes shared latent factor model with IV.
Table 3. Treatment effects with contaminated IV: outcomes = 5.
Table 3. Treatment effects with contaminated IV: outcomes = 5.
Degree   of   Contamination     ρ Z θ = 0.000 ρ Z θ = 0.010 ρ Z θ = 0.015 ρ Z θ = 0.025 ρ Z θ = 0.035 ρ Z θ = 0.050 ρ Z θ = 0.075 ρ Z θ = 0.100
Treatment Effect
{% Bias}
(SE)
[Coverage Probability]
Naive OLSSLF2SLS †SLF + IV ‡2SLS †SLF + IV ‡2SLS †SLF + IV ‡2SLS †SLF + IV ‡2SLS †SLF + IV ‡2SLS †SLF + IV ‡2SLS †SLF + IV ‡
(1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)
α D 1 (True value: 2)3.7622.1722.0682.0272.1042.0422.1762.0732.2472.1042.3542.1522.5282.2342.7002.319
{88.11%} {8.61%} {3.38%} {1.34%} {5.19%} {2.10%} {8.79%} {3.64%} {12.37%} {5.20%} {17.68%} {7.59%} {26.42%} {11.68%} {35.02%} {15.93%}
(0.10)(0.47)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.17)(0.13)(0.17)(0.13)
[0.00][0.87][0.93][0.95][0.92][0.94][0.84][0.91][0.70][0.86][0.48][0.77][0.15][0.54][0.03][0.29]
α D 2 (True value: 3)4.7593.1743.0733.0303.1093.0453.1813.0763.2533.1083.3593.1553.5343.2373.7053.322
{58.65%} {5.78%} {2.44%} {1.01%} {3.65%} {1.51%} {6.05%} {2.54%} {8.43%} {3.58%} {11.97%} {5.17%} {17.79%} {7.90%} {23.51%} {10.74%}
(0.10)(0.47)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.17)(0.12)(0.17)(0.13)
[0.00][0.87][0.93][0.94][0.90][0.93][0.82][0.91][0.70][0.87][0.46][0.76][0.14][0.53][0.03][0.29]
α D 3 (True value: 1)2.7611.1731.0701.0301.1071.0451.1781.0761.2501.1071.3561.1551.5311.2381.7031.322
{176.11%} {17.28%} {7.04%} {3.00%} {10.65%} {4.53%} {17.85%} {7.62%} {24.99%} {10.74%} {35.62%} {15.51%} {53.09%} {23.75%} {70.28%} {32.24%}
(0.10)(0.47)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.17)(0.13)(0.17)(0.13)
[0.00][0.86][0.93][0.95][0.90][0.93][0.82][0.90][0.70][0.86][0.48][0.76][0.14][0.53][0.03][0.29]
α D 4 (True value: 1)2.7621.1701.0691.0291.1051.0441.1771.0751.2481.1061.3541.1541.5291.2361.7011.321
{176.17%} {17.02%} {6.86%} {2.91%} {10.48%} {4.40%} {17.67%} {7.52%} {24.82%} {10.63%} {35.44%} {15.41%} {52.92%} {23.59%} {70.11%} {32.07%}
(0.10)(0.47)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.17)(0.13)(0.17)(0.13)
[0.00][0.87][0.93][0.95][0.90][0.92][0.82][0.90][0.71][0.86][0.47][0.76][0.15][0.53][0.03][0.30]
α D 5 (True value: 2)3.7622.1732.0702.0302.1062.0462.1782.0772.2502.1082.3562.1562.5312.2382.7032.323
{88.09%} {8.66%} {3.51%} {1.52%} {5.32%} {2.28%} {8.91%} {3.84%} {12.49%} {5.39%} {17.80%} {7.78%} {26.54%} {11.90%} {35.13%} {16.16%}
(0.10)(0.47)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.17)(0.12)(0.17)(0.13)
[0.00][0.87][0.93][0.95][0.90][0.93][0.82][0.89][0.70][0.84][0.48][0.76][0.16][0.52][0.02][0.27]
F-stat on excluded instrument 1779 1793 1839 1863 1904 2005 1967
N2000200020002000200020002000200020002000200020002000200020002000
Notes: † denotes two-stage least-squares model. ‡ denotes shared latent factor model with IV.
Table 4. Treatment effects with contaminated IV: outcomes = 6.
Table 4. Treatment effects with contaminated IV: outcomes = 6.
Degree   of   Contamination     ρ Z θ = 0.000 ρ Z θ = 0.010 ρ Z θ = 0.015 ρ Z θ = 0.025 ρ Z θ = 0.035 ρ Z θ = 0.050 ρ Z θ = 0.075 ρ Z θ = 0.100
Treatment Effect
{% Bias}
(SE)
[Coverage Probability]
Naive OLSSLF2SLS †SLF + IV ‡2SLS †SLF + IV ‡2SLS †SLF + IV ‡2SLS †SLF + IV ‡2SLS †SLF + IV ‡2SLS †SLF + IV ‡2SLS †SLF + IV ‡
(1)(2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)
α D 1 (True value: 2)3.7662.2252.0592.0312.0962.0462.1682.0762.2392.1072.3452.1552.5202.2362.6922.320
{88.29%} {11.24%} {2.97%} {1.55%} {4.78%} {2.29%} {8.38%} {3.82%} {11.95%} {5.35%} {17.27%} {7.73%} {26.02%} {11.78%} {34.62%} {15.99%}
(0.10)(0.48)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.17)(0.12)(0.17)(0.13)
[0.00][0.86][0.94][0.95][0.92][0.94][0.85][0.91][0.72][0.87][0.51][0.77][0.15][0.53][0.02][0.28]
α D 2 (True value: 3)4.7643.2223.0603.0303.0963.0463.1683.0763.2393.1073.3463.1553.5213.2363.6933.321
{58.80%} {7.41%} {1.99%} {1.01%} {3.20%} {1.52%} {5.60%} {2.53%} {7.98%} {3.58%} {11.53%} {5.16%} {17.36%} {7.87%} {23.10%} {10.70%}
(0.10)(0.48)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.17)(0.12)(0.17)(0.13)
[0.00][0.85][0.94][0.94][0.92][0.94][0.85][0.91][0.73][0.85][0.51][0.76][0.15][0.53][0.03][0.27]
α D 3 (True value: 1)2.7671.2251.0621.0321.0981.0471.1701.0781.2421.1091.3481.1561.5231.2371.6941.321
{176.68%} {22.50%} {6.21%} {3.21%} {9.82%} {4.74%} {17.02%} {7.81%} {24.16%} {10.91%} {34.79%} {15.62%} {52.26%} {23.69%} {69.45%} {32.07%}
(0.10)(0.48)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.17)(0.12)(0.17)(0.13)
[0.00][0.85][0.94][0.94][0.92][0.93][0.85][0.89][0.72][0.85][0.49][0.76][0.14][0.54][0.02][0.28]
α D 4 (True value: 1)2.7651.2261.0641.0321.1011.0471.1731.0781.2441.1091.3501.1571.5251.2381.6971.322
{176.54%} {22.56%} {6.45%} {3.24%} {10.06%} {4.75%} {17.26%} {7.81%} {24.41%} {10.93%} {35.04%} {15.66%} {52.53%} {23.77%} {69.72%} {32.22%}
(0.10)(0.48)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.17)(0.12)(0.17)(0.13)
[0.00][0.86][0.94][0.95][0.91][0.94][0.84][0.90][0.71][0.85][0.48][0.75][0.15][0.52][0.02][0.27]
α D 5 (True value: 2)3.7662.2262.0622.0322.0982.0472.1702.0782.2412.1082.3482.1562.5222.2372.6942.322
{88.30%} {11.29%} {3.09%} {1.59%} {4.89%} {2.35%} {8.49%} {3.88%} {12.06%} {5.42%} {17.38%} {7.79%} {26.12%} {11.87%} {34.71%} {16.09%}
(0.10)(0.48)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.17)(0.12)(0.17)(0.13)
[0.00][0.86][0.93][0.95][0.91][0.93][0.84][0.90][0.72][0.85][0.48][0.76][0.16][0.52][0.03][0.28]
α D 6 (True value: 1)2.7661.2241.0651.0321.1011.0471.1731.0781.2451.1091.3511.1561.5261.2371.6981.321
{176.62%} {22.39%} {6.50%} {3.20%} {10.11%} {4.72%} {17.31%} {7.76%} {24.46%} {10.86%} {35.10%} {15.60%} {52.59%} {23.74%} {69.78%} {32.14%}
(0.10)(0.48)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.18)(0.12)(0.17)(0.12)(0.17)(0.13)
[0.00][0.86][0.94][0.95][0.91][0.94][0.84][0.90][0.72][0.85][0.49][0.76][0.16][0.52][0.01][0.29]
F-stat on excluded instrument 1779 1793 1839 1863 1904 2005 1967
N2000200020002000200020002000200020002000200020002000200020002000
Notes: † denotes two-stage least-squares model. ‡ denotes shared latent factor model with IV.
Table 5. Weighted summary statistics for empirical example.
Table 5. Weighted summary statistics for empirical example.
OverallMental Illness §No Mental Illnessp value ††
(N = 3756)(N = 604)(N = 3152)
ObservationsMean/%SDMean/%SDMean/%SD
Mental illness §(%)37569.93 100.00 0.00
Outcomes †
Number of full days of work missed in past 30 days37101.183.822.026.331.093.54<0.001
Mobility impairment score37561.878.554.2215.911.617.68<0.001
Cognitive impairment score37560.603.722.8710.550.352.58<0.001
Social interaction interference score37560.322.881.788.880.161.82<0.001
Role impairment score37023.5610.529.5521.412.919.05<0.001
Parental psychiatric problems (%)346830.23 53.82 27.57 <0.001
Age (%)3756 <0.001
18–24 15.07 17.79 14.77
25–34 19.99 24.88 19.45
35–44 26.27 26.80 26.21
45–54 23.62 22.76 23.71
55–64 10.88 6.96 11.31
65+ 4.18 0.81 4.55
Race/ethnicity (%)3756 0.132
Non-Latino Whites 72.93 76.07 72.59
Asian 1.96 1.24 2.04
Latino 11.40 10.65 11.48
African American 11.02 8.38 11.31
Others 2.70 3.66 2.59
Marital status (%)3756 <0.001
Single 24.80 31.36 24.07
Married 57.92 45.32 59.31
Divorced 17.29 23.32 16.62
Educational attainment (%)3756 0.676
<12 years 10.90 9.97 11.00
12 years 31.13 33.05 30.92
13–15 years 30.29 31.24 30.19
≥16 years 27.67 25.74 27.89
Chronic physical conditions (%)
Arthritis/rheumatism375620.30 24.18 19.87 0.055
Stroke37561.12 1.63 1.06 0.271
Heart attack37563.11 3.99 3.01 0.410
Diabetes37564.66 5.63 4.55 0.370
Ulcer37568.04 13.52 7.43 <0.001
Cancer37564.18 5.21 4.07 0.286
Region (%)3756 0.6741
Northeast 17.76 19.34 17.58
Midwest 24.00 23.76 24.02
South 35.83 33.05 36.14
West 22.41 23.85 22.25
Notes: Mean reported for continuous variables; percentage reported for categorical variables. SD, standard deviation; reported for continuous variables. † Continuous variable; ††, difference between individuals diagnosed with mental illness and those without any diagnosed mental health condition; t-test for continuous variables and chi-square test for categorical variables. §, includes major depressive episode (MDE) and generalized anxiety disorder (GAD). The mobility impairment, cognitive impairment, social interaction interference, and role impairment scores are based on the World Health Organization Disability Assessment Schedule (WHO DAS II) and range from 0 to 100, with 0 indicating no disability, and 100 full indicating disability.
Table 6. Effect of mental illness on work absenteeism in the past month.
Table 6. Effect of mental illness on work absenteeism in the past month.
CoefficientNumber of Full Days of Work Missed in the Past 30 Days
(SE)OLS2SLSSLFSLF + IV
Mental illness0.781 **0.7690.824 **0.821 **
(0.235)(1.462)(0.234)(0.235)
Female0.1090.1670.1140.114
(0.164)(0.172)(0.163)(0.163)
Age: 25–340.009−0.0280.0080.008
(0.345)(0.283)(0.345)(0.345)
Age: 35–44−0.268−0.343−0.271−0.271
(0.343)(0.267)(0.343)(0.343)
Age: 45–54−0.198−0.160−0.191−0.191
(0.382)(0.344)(0.381)(0.381)
Age: 55–64−0.253−0.304−0.254−0.254
(0.449)(0.424)(0.449)(0.449)
Age: 65+−0.760−0.847−0.753−0.753
(0.483)(0.488)(0.491)(0.491)
Asian0.0950.1410.0980.098
(0.704)(0.657)(0.704)(0.704)
Latino−0.122−0.069−0.131−0.131
(0.289)(0.269)(0.291)(0.291)
African American0.2850.2750.2790.279
(0.303)(0.269)(0.303)(0.302)
Other race/ethnicity1.2690.1491.2621.262
(1.059)(0.449)(1.059)(1.059)
Married−0.227−0.096−0.229−0.229
(0.282)(0.262)(0.282)(0.282)
Divorced−0.0340.216−0.024−0.024
(0.437)(0.402)(0.437)(0.436)
Education 12 years−0.366−0.499−0.359−0.359
(0.312)(0.416)(0.310)(0.310)
Education 13–15 years−0.733 *−1.066 **−0.729 *−0.729 *
(0.333)(0.400)(0.333)(0.333)
Education 16+ years−0.801 *−0.975 *−0.797 *−0.797 *
(0.312)(0.402)(0.311)(0.311)
Arthritis0.2090.2630.2190.219
(0.246)(0.231)(0.248)(0.248)
Stroke0.3140.3250.3120.312
(0.781)(0.735)(0.779)(0.779)
Heart attack1.270 **1.335 *1.266 **1.266 **
(0.407)(0.636)(0.408)(0.407)
Diabetes0.2120.1510.2070.207
(0.387)(0.346)(0.386)(0.387)
Ulcer0.2210.2650.2040.204
(0.359)(0.361)(0.358)(0.358)
Cancer0.6660.6500.6360.636
(0.393)(0.383)(0.388)(0.388)
Midwest0.1870.0580.1850.185
(0.269)(0.211)(0.270)(0.270)
South0.0290.0400.0260.026
(0.220)(0.213)(0.220)(0.220)
West0.4130.4040.4060.406
(0.224)(0.304)(0.223)(0.223)
First stage F-statistic on instrument82.24
p value <0.001
N3710342437103710
Notes: Mental illness is a binary variable = 1 if individual meets DSM-IV diagnostic criteria for major depressive episode (MDE) or generalized anxiety disorder (GAD), and 0 otherwise. Reference categories include the following: male, age 18–24, non-Latino White, single, <12 years of education, and northeast. SE, standard error; OLS, ordinary least squares; 2SLS, two-stage least squares estimator; SLF, shared latent factor estimator; SLF + IV, shared latent factor + instrumental variables estimator. * p < 0.05; ** p < 0.01.
Table 7. Effect of mental illness on mobility impairment score in the past month.
Table 7. Effect of mental illness on mobility impairment score in the past month.
CoefficientMobility Impairment Score
(SE)OLS2SLSSLFSLF + IV
Mental illness2.180 ***7.687 *2.329 ***2.317 ***
(0.429)(3.593)(0.440)(0.464)
Female0.169−0.2430.1600.161
(0.273)(0.340)(0.273)(0.273)
Age: 25–34−0.162−0.170−0.165−0.164
(0.297)(0.349)(0.297)(0.297)
Age: 35–440.846 *0.8570.847 *0.847 *
(0.378)(0.504)(0.378)(0.378)
Age: 45–540.7620.9840.7660.765
(0.453)(0.562)(0.453)(0.453)
Age: 55–64−0.053−0.025−0.042−0.043
(0.737)(0.959)(0.738)(0.738)
Age: 65+−1.466−0.975−1.446−1.448
(0.971)(1.266)(0.971)(0.971)
Asian−0.914 *−0.749−0.909 *−0.910 *
(0.428)(0.445)(0.429)(0.429)
Latino−0.411−0.427−0.408−0.409
(0.470)(0.510)(0.469)(0.468)
African American−0.195−0.029−0.188−0.188
(0.592)(0.673)(0.592)(0.591)
Other race/ethnicity0.4520.4570.4520.452
(0.998)(1.055)(0.997)(0.997)
Married−0.0710.138−0.065−0.066
(0.375)(0.405)(0.374)(0.374)
Divorced0.2490.3420.2470.247
(0.637)(0.709)(0.637)(0.637)
Education 12 years0.6850.9190.6860.686
(0.524)(0.483)(0.524)(0.524)
Education 13–15 years0.5090.5540.5100.510
(0.501)(0.447)(0.500)(0.500)
Education 16+ years0.2860.4050.2880.288
(0.557)(0.484)(0.557)(0.557)
Arthritis3.077 ***3.015 ***3.072 ***3.072 ***
(0.617)(0.708)(0.617)(0.617)
Stroke−0.185−0.390−0.193−0.192
(1.376)(1.470)(1.374)(1.375)
Heart attack3.829 *3.9503.821 *3.822 *
(1.813)(2.221)(1.812)(1.813)
Diabetes2.962 *2.4582.956 *2.956 *
(1.321)(1.409)(1.321)(1.321)
Ulcer1.1110.8191.1011.102
(0.647)(0.722)(0.647)(0.648)
Cancer1.766 *1.6591.760 *1.760 *
(0.863)(0.913)(0.861)(0.861)
Midwest−0.432−0.369−0.430−0.430
(0.517)(0.448)(0.518)(0.518)
South−0.358−0.192−0.355−0.355
(0.581)(0.491)(0.582)(0.582)
West−0.477−0.318−0.475−0.475
(0.591)(0.517)(0.593)(0.593)
First stage F-statistic on instrument81.15
p value <0.001
N3756346837563756
Notes: Mental illness is a binary variable = 1 if individual meets DSM-IV diagnostic criteria for major depressive episode (MDE) or generalized anxiety disorder (GAD), and 0 otherwise. Reference categories include the following: male, age 18–24, non-Latino White, single, <12 years of education, and northeast. SE, standard error; OLS, ordinary least squares; 2SLS, two-stage least squares estimator; SLF, shared latent factor estimator; SLF + IV, shared latent factor + instrumental variables estimator. * p < 0.05; *** p < 0.001.
Table 8. Effect of mental illness on cognitive impairment score in the past month.
Table 8. Effect of mental illness on cognitive impairment score in the past month.
CoefficientCognitive Impairment Score
(SE)OLS2SLSSLFSLF + IV
Mental illness2.334 ***8.592 ***2.403 ***2.398 ***
(0.392)(1.710)(0.396)(0.383)
Female0.271 *−0.0700.267 *0.267 *
(0.107)(0.144)(0.106)(0.106)
Age: 25–34−0.102−0.285−0.103−0.103
(0.250)(0.303)(0.250)(0.250)
Age: 35–44−0.101−0.101−0.101−0.101
(0.262)(0.293)(0.262)(0.262)
Age: 45–54−0.240−0.119−0.239−0.239
(0.259)(0.287)(0.259)(0.259)
Age: 55–64−0.509−0.132−0.504−0.504
(0.285)(0.326)(0.285)(0.285)
Age: 65+−0.819 **−0.002−0.810 **−0.811 **
(0.289)(0.329)(0.289)(0.288)
Asian−0.184−0.001−0.182−0.182
(0.102)(0.261)(0.103)(0.103)
Latino−0.0020.004−0.001−0.001
(0.182)(0.262)(0.182)(0.182)
African American0.0910.3140.0940.094
(0.178)(0.243)(0.179)(0.178)
Other race/ethnicity0.7100.7590.7100.710
(0.576)(0.559)(0.574)(0.574)
Married−0.0550.147−0.052−0.053
(0.154)(0.196)(0.154)(0.154)
Divorced0.011−0.1060.0100.010
(0.203)(0.263)(0.203)(0.203)
Education 12 years−0.109−0.053−0.109−0.109
(0.241)(0.272)(0.241)(0.241)
Education 13–15 years−0.131−0.041-0.131−0.131
(0.207)(0.274)(0.207)(0.207)
Education 16+ years−0.177−0.069-0.176−0.176
(0.235)(0.286)(0.235)(0.235)
Arthritis0.461 *0.2930.459 *0.459 *
(0.181)(0.221)(0.181)(0.181)
Stroke0.9960.7890.9920.993
(1.064)(1.168)(1.065)(1.064)
Heart attack0.214−0.1360.2100.210
(0.413)(0.494)(0.414)(0.413)
Diabetes0.2960.0950.2930.293
(0.305)(0.337)(0.305)(0.305)
Ulcer0.718*0.2290.713 *0.713 *
(0.289)(0.348)(0.289)(0.289)
Cancer0.3880.1240.3850.385
(0.262)(0.339)(0.261)(0.261)
Midwest0.1470.3560.1490.149
(0.132)(0.198)(0.132)(0.132)
South−0.1110.085−0.110−0.110
(0.131)(0.196)(0.131)(0.131)
West−0.1090.017−0.108−0.108
(0.109)(0.186)(0.110)(0.110)
First stage F-statistic on instrument81.15
p value <0.001
N3756346837563756
Notes: Mental illness is a binary variable = 1 if individual meets DSM-IV diagnostic criteria for major depressive episode (MDE) or generalized anxiety disorder (GAD), and 0 otherwise. Reference categories include the following: male, age 18–24, non-Latino White, single, <12 years of education, and northeast. SE, standard error; OLS, ordinary least squares; 2SLS, two-stage least squares estimator; SLF, shared latent factor estimator; SLF + IV, shared latent factor + instrumental variables estimator. * p < 0.05; ** p < 0.01; *** p < 0.001.
Table 9. Effect of mental illness on social interaction interference score in the past month.
Table 9. Effect of mental illness on social interaction interference score in the past month.
CoefficientSocial Interaction Interference Score
(SE)OLS2SLSSLFSLF + IV
Mental illness1.518 ***3.386 **1.549 ***1.546 ***
(0.266)(1.218)(0.268)(0.265)
Female−0.008−0.120-0.010−0.010
(0.078)(0.141)(0.078)(0.078)
Age: 25–34−0.222−0.277−0.223−0.223
(0.128)(0.184)(0.128)(0.128)
Age: 35–44−0.017−0.034−0.017−0.017
(0.129)(0.184)(0.129)(0.129)
Age: 45–54−0.149−0.134−0.148−0.148
(0.099)(0.203)(0.099)(0.099)
Age: 55–64−0.371 *−0.276−0.369 *−0.369 *
(0.142)(0.268)(0.143)(0.143)
Age: 65+−0.303−0.055−0.299−0.300
(0.210)(0.350)(0.210)(0.210)
Asian−0.158 *−0.121−0.157 *−0.157 *
(0.077)(0.141)(0.077)(0.077)
Latino0.0720.0900.0730.073
(0.185)(0.206)(0.185)(0.185)
African American−0.139−0.099−0.137−0.137
(0.106)(0.157)(0.106)(0.106)
Other race/ethnicity0.3060.3320.3060.306
(0.356)(0.354)(0.357)(0.357)
Married−0.0590.015−0.058−0.058
(0.089)(0.123)(0.089)(0.090)
Divorced0.1060.0930.1060.106
(0.175)(0.212)(0.175)(0.175)
Education 12 years−0.024−0.013−0.024−0.024
(0.199)(0.230)(0.199)(0.199)
Education 13–15 years0.0780.0940.0790.079
(0.207)(0.251)(0.207)(0.207)
Education 16+ years−0.040−0.019−0.040−0.040
(0.189)(0.230)(0.189)(0.189)
Arthritis0.095−0.0130.0940.094
(0.122)(0.117)(0.121)(0.121)
Stroke1.5121.4931.5101.510
(1.896)(1.993)(1.897)(1.897)
Heart attack−0.254−0.358−0.256−0.256
(0.298)(0.261)(0.298)(0.298)
Diabetes0.5960.4990.5950.595
(0.507)(0.509)(0.507)(0.507)
Ulcer0.735 *0.6440.733 *0.733 *
(0.333)(0.378)(0.334)(0.334)
Cancer0.4010.3320.4000.400
(0.321)(0.319)(0.321)(0.321)
Midwest0.1300.1730.1310.131
(0.066)(0.104)(0.066)(0.066)
South0.1020.1510.1030.103
(0.089)(0.098)(0.089)(0.089)
West0.1560.1870.1560.156
(0.099)(0.149)(0.099)(0.099)
First stage F-statistic on instrument81.15
p value <0.001
N3756346837563756
Notes: Mental illness is a binary variable = 1 if individual meets DSM-IV diagnostic criteria for major depressive episode (MDE) or generalized anxiety disorder (GAD) and 0 otherwise. Reference categories include the following: male, age 18–24, non-Latino White, single, <12 years of education, and northeast. SE, standard error; OLS, ordinary least squares; 2SLS, two-stage least squares estimator; SLF, shared latent factor estimator; SLF + IV, shared latent factor + instrumental variables estimator. * p < 0.05; ** p < 0.01; *** p < 0.001.
Table 10. Effect of mental illness on role impairment score.
Table 10. Effect of mental illness on role impairment score.
CoefficientRole Impairment Score
(SE)OLS2SLSSLFSLF + IV
Mental illness5.908 ***20.491 ***6.939 ***6.912 ***
(0.891)(4.252)(0.883)(0.835)
Female1.612 ***0.5031.639 ***1.640 ***
(0.326)(0.400)(0.343)(0.342)
Age: 25–34−0.1030.079−0.121−0.121
(0.630)(0.705)(0.673)(0.673)
Age: 35–44−0.2640.375−0.039−0.039
(0.751)(0.737)(0.800)(0.800)
Age: 45–54−0.5340.383−0.485−0.486
(0.623)(0.717)(0.693)(0.692)
Age: 55–64−2.171 **−1.023−2.014 *−2.016 *
(0.748)(0.835)(0.829)(0.829)
Age: 65+−1.789−0.862−1.954−1.957
(1.808)(1.207)(1.839)(1.832)
Asian−0.747−0.175−0.915−0.916
(0.788)(1.120)(0.795)(0.794)
Latino0.581−0.0150.4670.466
(1.126)(0.835)(1.125)(1.124)
African American−0.884−0.121−0.837−0.838
(0.542)(0.651)(0.578)(0.575)
Other race/ethnicity0.5390.8461.0361.036
(1.217)(1.276)(1.186)(1.186)
Married0.3620.3610.3260.325
(0.516)(0.539)(0.534)(0.534)
Divorced1.0490.3291.0351.035
(0.655)(0.700)(0.699)(0.699)
Education 12 years−0.612−0.177−0.522−0.522
(0.773)(0.682)(0.754)(0.754)
Education 13–15 years−0.0360.070−0.110−0.110
(0.765)(0.689)(0.696)(0.695)
Education 16+ years−0.316−0.014−0.457−0.457
(0.692)(0.744)(0.659)(0.659)
Arthritis2.596 ***2.000 **2.660 ***2.661 ***
(0.622)(0.619)(0.641)(0.642)
Stroke0.746−0.0581.4301.432
(1.382)(1.370)(1.670)(1.670)
Heart attack1.4381.2851.3571.358
(1.310)(1.331)(1.340)(1.336)
Diabetes2.939 *1.5163.109 *3.110 *
(1.249)(1.276)(1.337)(1.338)
Ulcer1.725 *1.0861.902 **1.904 **
(0.669)(0.800)(0.670)(0.669)
Cancer1.8171.7462.0492.051
(1.045)(1.020)(1.114)(1.113)
Midwest0.5150.9440.4480.448
(0.540)(0.551)(0.613)(0.611)
South−0.608−0.255−0.669−0.669
(0.410)(0.519)(0.495)(0.494)
West0.3420.4450.4690.468
(0.524)(0.581)(0.601)(0.601)
First stage F-statistic on instrument75.89
p value <0.001
N3702341637023702
Notes: Mental illness is a binary variable = 1 if individual meets DSM-IV diagnostic criteria for major depressive episode (MDE) or generalized anxiety disorder (GAD), and 0 otherwise. Reference categories include the following: male, age 18–24, non-Latino White, single, <12 years of education, and northeast. SE, standard error; OLS, ordinary least squares; 2SLS, two-stage least squares estimator; SLF, shared latent factor estimator; SLF + IV, shared latent factor + instrumental variables estimator. * p < 0.05; ** p < 0.01; *** p < 0.001.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Banerjee, S.; Basu, A. Estimating Endogenous Treatment Effects Using Latent Factor Models with and without Instrumental Variables. Econometrics 2021, 9, 14. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics9010014

AMA Style

Banerjee S, Basu A. Estimating Endogenous Treatment Effects Using Latent Factor Models with and without Instrumental Variables. Econometrics. 2021; 9(1):14. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics9010014

Chicago/Turabian Style

Banerjee, Souvik, and Anirban Basu. 2021. "Estimating Endogenous Treatment Effects Using Latent Factor Models with and without Instrumental Variables" Econometrics 9, no. 1: 14. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics9010014

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop