Next Article in Journal
Causal Transmission in Reduced-Form Models
Previous Article in Journal
Missing Values in Panel Data Unit Root Tests
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Binary Choice Model with Sample Selection and Covariate-Related Misclassification

by
Jorge González Chapela
Academia General Militar, Centro Universitario de la Defensa de Zaragoza, 50090 Zaragoza, Spain
Submission received: 18 April 2021 / Revised: 30 July 2021 / Accepted: 20 March 2022 / Published: 23 March 2022

Abstract

:
Misclassification of a binary response variable and nonrandom sample selection are data issues frequently encountered by empirical researchers. For cases in which both issues feature simultaneously in a data set, we formulate a sample selection model for a misclassified binary outcome in which the conditional probabilities of misclassification are allowed to depend on covariates. Assuming the availability of validation data, the pseudo-maximum likelihood technique can be used to estimate the model. The performance of the estimator accounting for misclassification and sample selection is compared to that of estimators offering partial corrections. An empirical example illustrates the proposed framework.

1. Introduction

Misclassification (or miscategorization) of a binary response variable and nonrandom sample selection are data issues frequently encountered by empirical researchers. It is by now well-known that errors in binary variables cannot be classical and thus lead to bias (see, for example, Hausman et al. 1998; Meyer and Mittag 2017). Additionally, regressions estimated on selected samples do not in general estimate the population parameters of interest (Heckman 1974, 1979). While there are methods for estimating models of misclassified binary responses (surveyed in Meyer and Mittag 2017), as a rule they assume the availability of a random sample. Likewise, the majority of the literature dealing with sample selection bias (surveyed in Vella 1998) assumes that the response variable of interest is measured accurately.
For cases in which misclassification of a binary response variable and nonrandom sample selection feature simultaneously in a data set, Arezzo and Guagnano (2019) formulated a model in which the probabilities of misclassification depend on the true values of the binary response but are otherwise independent of the covariates. In many cases, however, the assumption of conditionally random misclassification does not hold. Examples include classification errors varying across demographic groups or economic conditions in survey reports of unemployment status, participation in welfare programs, voter turnout, and debt repayment status (Levine 1993; Poterba and Summers 1995; Bollinger and David 1997, 2001; Bound et al. 2001; Davern et al. 2009; Katz and Katz 2010; Aller and González 2013). Applying an estimator that assumes misclassification to be conditionally random when this assumption is false yields biased estimates, sometimes even more so than estimators that do not correct for misclassification (Meyer and Mittag 2017).
This paper generalizes Arezzo and Guagnano’s (2019) formulation by allowing the misclassification probabilities to depend on covariates. When misclassification is not conditionally random, one can still obtain consistent estimates if validation data or a model of misclassification are available. Assuming the availability of validation data (or other out-of-sample information), a two-step approach can be used to estimate the parameters of interest (Bollinger and David 1997). In the first step, the misclassification probabilities are predicted for specific subgroups from cross-tabulations or models of misclassification run on validation data. Second, the predicted probabilities of misclassification are incorporated into a model of a mismeasured binary response variable estimated on the observed data by the method of (pseudo-)maximum likelihood (ML) (e.g., Gourieroux et al. 1984). To this “predicted probabilities estimator” (PPE) (Meyer and Mittag 2017), we add the property that estimation in the second step can be conducted on a nonrandomly selected sample. If validation data are not available, then a model of misclassification dependent on covariates (or other observables) is needed (e.g., Aller and González 2013). This model can be incorporated into the formulation developed by Arezzo and Guagnano (2019).
The sample selection mechanism considered in this paper is of the probit type. This setup applies to problems where units can be randomly drawn from the population, but due to some variables taking on particular values (incidental truncation) or unit/item nonresponse, we have missing data on the binary outcome of interest. For choice-based samples, Ramalho (2002) developed an estimator for a miscategorized discrete response variable under the assumption of conditionally random misclassification. Extending Ramalho’s (2002) estimator to the case where misclassification is related to the covariates is left for future research.
Our model specification is based on the sample selection model introduced by Heckman (1974) under the assumption of bivariate normality between the outcome of interest and the selection propensity. Van de Ven and van Praag (1981) and Dubin and Rivers (1989) extended Heckman’s framework to deal with binary outcomes. For a misclassified binary outcome observed in a panel survey, Bollinger and David (2001) developed a model dealing with survey participation missed at random and misclassification dependent on covariates. Similarly, Katz and Katz’s (2010) Bayesian procedure can account for data missed at random and covariate-related misclassification. On the other hand, Arezzo and Guagnano’s (2019) model accounts for nonrandom selection but assumes conditionally random misclassification.
The rest of the paper is organized as follows. Section 2 formulates a bivariate probit model dealing with nonrandom sample selection and conditionally nonrandom misclassification and develops the proposed PPE. The finite-sample performance of the proposed and other related estimators is evaluated through simulation in Section 3. Section 4 contains an empirical application to a model of lifetime migration. Section 5 concludes the paper. Appendix A presents the technical material.

2. Model Specification

We start by specifying the probit model with sample selection of Van de Ven and van Praag (1981). Then, we adapt it to manage covariate-related misclassification in the binary outcome of interest.
The probit model with sample selection can be written as
y i T = 1 [ X 1 i β 1 + ε 1 i > 0 ]
s i = 1 [ X 2 i β 2 + ε 2 i > 0 ]
where 1 [ ] is the indicator function, X 1 i and X 2 i are vectors of observed regressors, β 1 and β 2 are unknown vectors of parameters, and ε 1 i and ε 2 i are error terms from a bivariate normal distribution
( ε 1 i ε 21 ) ~ N 2 { ( 0 0 ) , ( 1 ρ ρ 1 ) }
and independent of X i = ( X 1 i , X 2 i ) . The binary outcome y i T is observed only when s i = 1 . A sample selection bias arises when ρ 0 and probit is used to estimate β 1 in (1). Identification of β = ( β 1 , β 2 ) is assured whenever X 2 i includes at least one variable that is not also in X 1 i .
When y i T is observed, the density of ( y i T , s i ) is given by
P ( y i T = y T , s i = 1 ) = Φ 2 ( w 1 i , X 2 i β 2 , ρ i ) = P ( y i T = y T | s i = 1 ) Φ ( X 2 i β 2 )
where Φ ( ) and Φ 2 ( ) are the cdf of the standard and bivariate normal distributions, respectively. In expression (4), the first equality is written using notation adapted from Greene (2003): w 1 i = q 1 i X 1 i β 1 , q 1 i = 2 y i T 1 , and ρ i * = q 1 i ρ . The second equality (utilized below) is the multiplication rule of probabilities. When y i T is not observed, the density is simply
P ( s i = 0 ) = Φ ( X 2 i β 2 )
so the log-likelihood function for a sample of size N is
l ( β , ρ ) = i = 1 N { s i ln Φ 2 ( w 1 i , X 2 i β 2 , ρ i ) + ( 1 s i ) ln Φ ( X 2 i β 2 ) } .
The estimator of ( β , ρ ) obtained by maximizing (6) is called Heckprobit (StataCorp 2019).
Now, let us suppose y i T may be mismeasured, that is, a true 1 may be misclassified as a 0 and a true 0 may be misclassified as a 1. We let y i denote the mismeasured outcome indicator, which is observed only when s i = 1 .
Although we are interested in the probabilities of misclassification in the selected sample, if the selection mechanism is not operating in the validation data, only the probabilities of misclassification in the population can be calculated. To handle this case, we assume
P ( y i = 1 | s i = 1 , y i T = 0 ) = P ( y i = 1 | y i T = 0 ) = α 0 i
P ( y i = 0 | s i = 1 , y i T = 1 ) = P ( y i = 0 | y i T = 1 ) = α 1 i .
In these expressions, the first equality means that misclassification is conditionally independent from sample selection, or in other words that the misclassification mechanism operating in the population operates in the selected sample as well. The second equality defines what are usually called the conditional probabilities of misclassification, where conditional here refers to the true response. Overall, expressions (7) and (8) allow the conditional probabilities of misclassification in the selected sample to be calculated from validation data even if these data are not affected by selection.
The total probability theorem is used to derive the conditional (on selection) probability of an observed mismeasured outcome:
P ( y i = y | s i = 1 ) = y i α 0 i + ( 1 y i ) α 1 i + ( 1 α 0 i α 1 i ) P ( y i T = y T | s i = 1 ) .
The log-likelihood function based on observations ( y i , s i ) is
l ( α , β , ρ ) = i = 1 N { s i ln [ P ( y i = y | s i = 1 ) Φ ( X 2 i β 2 ) ] + ( 1 s i ) ln Φ ( X 2 i β 2 ) } = i = 1 N { s i ln [ ( y i α 0 i + ( 1 y i ) α 1 i ) Φ ( X 2 i β 2 ) + ( 1 α 0 i α 1 i ) Φ 2 ( w 1 i , X 2 i β 2 , ρ i ) ] + ( 1 s i ) ln Φ ( X 2 i β 2 ) }
where α is the vector with ( α 0 i , α 1 i ) , i = 1,…,N, and the second equality follows from (9) and the second equality in (4).
The estimation framework provided by expression (10) encompasses several models that correct for misclassification and/or sample selection. When the binary outcome of interest is correctly classified, that is, α 0 i = α 1 i = 0 , then (10) collapses to (6). When α 0 i and α 1 i are constants, that is, α 0 i = α 0 and α 1 i = α 1 , then (10) becomes the log-likelihood function of Arezzo and Guagnano (2019). When P ( s i = 1 ) = 1 , then (10) conforms to the log-likelihood function of a probit model accounting for misclassification (e.g., Meyer and Mittag 2017):
l ( α , β 1 ) = i = 1 N { ln [ ( y i α 0 i + ( 1 y i ) α 1 i ) + ( 1 α 0 i α 1 i ) Φ ( w 1 i ) ] } .
In expression (10), the unknown parameters ( α , β , ρ ) are unidentified as there are 2 N + dim ( β ) + 1 parameters. Assuming the availability of validation data (or other out-of-sample information), Bollinger and David’s (1997) two-step procedure can be used to estimate ( β , ρ ) . First, the conditional probabilities of misclassification are predicted from cross-tabulations or models of classification errors run on validation data. The prediction step does not require access to the validation data themselves and may be conducted by other researchers with direct access to the data. Second, after replacing α 0 i and α 1 i in (10) with their estimates, the resulting expression is maximized with respect to β and ρ . If α 0 i and α 1 i are consistently estimated, this PPE of ( β , ρ ) is consistent and asymptotically efficient. Standard errors need to be adjusted for the estimation of the conditional probabilities of misclassification unless these probabilities are precisely estimated or assumed to be known. Bollinger and David (1997) provided the formula for the asymptotic variance matrix. The bootstrap can also be used.
Function (10) is not globally concave in ( β , ρ ) , and since estimators corresponding to local maxima may have no useful properties, a number of steps are taken to increase the chance that the maximum obtained is global.1 Maximizations are conducted using the Newton–Raphson algorithm combined with the steepest ascent, providing the maximization routine analytical first and second derivatives of (10).2 Probit estimates of the selection Equation (2), β ^ 2 Probit , provide the initial values for β 2 and are also utilized to calculate the inverse Mills ratio for the n observations with data available on y i :
λ ^ i = ϕ ( X 2 i β ^ 2 Probit ) Φ ( X 2 i β ^ 2 Probit ) , i = 1 , , n
where ϕ is the standard normal pdf. Initial values for β 1 and ρ were obtained from Heckman’s (1979) two-step method applied to the linear probability model (LPM)
P ( y i T = 1 | s i = 1 ) = X 1 i β 1 L P M + β λ λ ^ i
augmented with misclassification:
P ( y i = 1 | s i = 1 ) = α 0 i + ( 1 α 0 i α 1 i ) ( X 1 i β 1 L P M + β λ λ ^ i ) .
After replacing α 0 i and α 1 i with their estimates, Equation (14) is estimated by ordinary least squares without an intercept and constraining the coefficient of α 0 i to unity. The initial value of β 1 is β ^ 1 L P M × 2.5 ,3 while that of ρ is
ρ ^ L P M = β ^ λ σ ^
where
σ ^ 2 = e e + β λ 2 i = 1 n ω i n
ω i = λ ^ i ( λ ^ i + X 2 i β ^ 2 P r o b i t )
and e e is the sum of squared residuals of regression (14) (Cameron and Trivedi 2005, p. 550).

3. Monte Carlo Study

This section reports the results of a Monte Carlo study designed to investigate the properties of the estimator proposed above and of other related estimators in situations characterized by different models of misclassification, probabilities of selection, and correlation between ε 1 i and ε 2 i . We used the data generating process considered in Arezzo and Guagnano (2019):
y i T = 1 [ β 10 + β 11 X 11 i + β 12 X 12 i + β 13 X 13 i + ε 1 i > 0 ]
s i = 1 [ β 20 + 0.8 X 21 i 0.5 X 22 i + ε 2 i > 0 ]
where X 11 ~.., X 12 is a dummy variable equal to one with probability of 1/3, X 13 ~ U ( 0 , 1 ) , X 21 and X 22 are standard normal variates, and the parameters ( β 10 , β 11 , β 12 , β 13 ) are set equal to ( 1 , 0.2 , 1.5 , 0.6 ) , respectively. The parameter β 20 is set alternatively at { 0.5 , 2.18 } , producing a moderate or a low amount of incidental truncation in the data (about 35% and 5%, respectively). The value of ρ is set alternatively at { 0.2 , 0.8 } .
Three models of misclassification were considered. Misclassification Model 1 (MM1) is conditionally random misclassification with α 0 i = 0.05 and α 1 i = 0.20 . Misclassification Models 2 and 3 (MM2 and MM3) allow α 0 i and α 1 i to depend on covariates. In MM2, the values taken by α 0 i and α 1 i are listed in Table 1, where the first (second) entry in a cell with numerical entries gives α 0 i ( α 1 i ). MM2 represents a case in which the conditional probabilities of misclassification are calculated from tabulations of validation data or other out of sample information, as in, for example, Levine (1993) and Poterba and Summers (1995). The values of α 0 i and α 1 i given in Table 1 ensure that the average probability of false positives (false negatives) is 0.05 (0.20).
MM3 has α 0 i and α 1 i predicted from probit models of classification errors run on validation data:
α 0 i = Φ ( 1.5 0.1 X 11 i 0.1 X 12 i )
α 1 i = Φ ( 0.5 0.2 X 11 i 0.3 X 12 i )
as in, for example, Bollinger and David (1997). The values of the parameters in expressions (20) and (21) ensure that the average probability of false positives (false negatives) is close to 0.05 (0.20). Note that in any of these three models of misclassification, α 0 i + α 1 i < 1    i , implying that the outcome of interest is not so mismeasured that its analysis should probably be abandoned (Hausman et al. 1998).
Seven estimators were compared in the Monte Carlo study.4 Probit of y i on X 1 i run on the selected sample corrects neither for misclassification nor for sample selection. The estimators called HAS-Probit in Meyer and Mittag (2017) and the one we denote PP-Probit correct for misclassification only. Following Hausman et al. (1998), HAS-Probit assumes that α 0 i = α 0 and α 1 i = α 1 , so α 0 , α 1 , and β 1 can be estimated by maximizing (11) on the selected sample. PP-Probit maximizes (11) with respect to β 1 on the selected sample after replacing α 0 i and α 1 i with α ^ 0 i and α ^ 1 i estimated in the first step. The fourth estimator, Heckprobit, corrects for sample selection only.
The other three estimators can correct for misclassification and sample selection. The estimators developed in Arezzo and Guagnano (2019) and this paper are denoted HAS-Heckprobit1 and PP-Heckprobit, respectively. HAS-Heckprobit2 is a modification of the former, allowing the misclassification probabilities to depend on X 11 and X 12 as specified in Table 1. If r = { 1 , 2 } and c = { 1 , 2 } are sets of indices for the rows and columns with numerical entries of Table 1, the conditional probability of an observed mismeasured outcome for an individual with values of X 11 and X 12 corresponding to the cell r,c is
P ( y i r c = y | s i = 1 ) = y i r c α 0 r c + ( 1 y i r c ) α 1 r c + ( 1 α 0 r c α 1 r c ) P ( y i r c T = y T | s i = 1 ) .
Thus, the log-likelihood based on observations ( y i r c , s i ) can be written as
l ( α , β , ρ ) = i = 1 N { s i ln [ ( y i r c α 0 r c + ( 1 y i r c ) α 1 r c ) Φ ( X 2 i β 2 ) + ( 1 α 0 r c α 1 r c ) Φ 2 ( w 1 i r c , X 2 i β 2 , ρ i ) ] + ( 1 s i ) ln Φ ( X 2 i β 2 ) } .
HAS-Heckprobit2 maximizes (23) over ( ( α 0 r c , α 1 r c ) , β , ρ ) (a total of 8 + 7 + 1 parameters) under the condition α 0 r c + α 1 r c < 1 for each cell r,c.
The analysis focused on the mean bias and the standard deviation across simulations. Bias is the relative difference between the estimate and the parameter value, averaged across simulations. For each estimator, the results of 12 different experiments are presented, combining three models of misclassification, two different β 20 , and two different ρ . Each experiment consisted of 500 simulations with N = 5000 .
The results of the Monte Carlo experiments for β ^ and ρ ^ are presented in Table 2, Table 3 and Table 4. These tables also list the number of convergences achieved in each experiment.5 As expected (e.g., Hausman et al. 1998; Meyer and Mittag 2017), Probit presented significantly downward biased means and standard deviations. In terms of both bias and variance, Heckprobit offered little improvement. Under MM1, correcting for misclassification greatly reduced the bias, especially when ρ was small. PP-Probit tended to outperform HAS-Probit in terms of both bias and variance, but we stacked the deck in favor of PP estimators by using the true misclassification probabilities and by treating them as known parameters. Under MM2 and MM3, PP-Probit performed quite well, especially when ρ was small. In contrast, HAS-Probit was biased, sometimes significantly so. The bias of HAS-Probit tended to be upward under MM2 and was not clear-cut under MM3.
Both HAS-Heckprobit1 and PP-Heckprobit performed well when misclassification was random. Under this scenario, HAS-Heckprobit1 outperformed HAS-Heckprobit2, especially when the degree of incidental truncation in the data was moderate. The use of additional information in the form of α ^ 0 i and α ^ 1 i positively affected the accuracy of PP-Heckprobit vis à vis HAS-Heckprobit estimators. Under MM2 and MM3, HAS-Heckprobit1 generally obtained worse results in terms of bias for β ^ 1 and sometimes produced substantial convergence failures. However, it still yielded good results in terms of bias for β ^ 2 . HAS-Heckprobit2 produced many convergence failures. This estimator outperformed HAS-Heckprobit1 under MM2 (which is not surprising) and MM3, in the latter case especially when the degree of incidental truncation in the data was moderate. In nearly all settings, PP-Heckprobit performed markedly better than the other estimators and showed few convergence failures. When estimation was conducted with PP-Heckprobit, the difference between the simulations mean and the true value of the parameter was usually below 1%.
The results for the alpha parameters are presented in Table 5. Results are limited to cases where the estimator and the misclassification model match. HAS-Probit and HAS-Heckprobit1 performed well in terms of both bias and variance. HAS-Heckprobit2 performed well only in a few cases. Estimates for α ^ 0 12 and α ^ 0 22 were heavily biased and imprecise. A possible explanation for this behavior is the existence of local maxima, as only one set of starting values was tried in each simulation. (Using an alternative set of starting values for α 0 12 and α 0 22 did not offer an improvement.) The degree of bias and imprecision seemed to diminish as the number of convergences achieved by HAS-Heckprobit2 increased. However, increasing the number of simulations to 1000 hardly reduced the bias of α ^ 0 12 and α ^ 0 22 (results not shown).
Table 6 shows additional results for PP-Heckprobit when the value of ρ was set alternatively at { 0.9 , 0.8 , 0.9 } . PP-Heckprobit’s good properties were preserved. However, when β 20 = 2.18 (low incidental truncation), the number of convergence failures increased somewhat at higher values of ρ .

4. Application to a Lifetime Migration Model

Data from the Survey of Financial Competences (referred to here by its Spanish abbreviation ECF) were used by González (2022) to investigate whether cross-region migrants in Spain are less impatient than individuals who choose to remain in their birth region. Previous studies on the empirical link between time preference and migration involve small samples or do not control for cognitive skills (Gibson and McKenzie 2011; Arcand and Mbaye 2013; Nowotny 2014; Goldbach and Schlüter 2018). The ECF (Banco de España and National Securities Market Commission 2018), conducted in Spain in 2016, enables the empirical link between time preference and migration to be investigated on the basis of a large sample and purged of the influence of cognitive skills.
The residential history available in the ECF is limited to the region of birth and the region of residence at the time of the survey, thus raising misclassification of lifetime cross-region migrant status. As argued by Molloy et al. (2011), some true migrants will have returned to their birth region after having spent time elsewhere, whereas individuals who moved when they were still a member of their parents’ household are indistinguishable in the data from individuals who moved during their adult lives. The 2011 edition of the Spanish Population Census, conducted by the National Statistics Institute (INE, www.ine.es, accessed on 8 July 2020), provides a validation sample for migrant status. In addition to information on residence at birth and at the Census date, the Census indicates the year of arrival in the region of residence, which reveals interim moves between birth and the census date, and, in conjunction with information on minimum working age, provides a basis for inferring the autonomy of migration decisions.
The conditional probabilities of misclassifying migrant status, estimated using Census data, are incorporated into (11), which is intended to be maximized on a sample of 7129 Spanish natives drawn from the ECF. However, 433 of them have their birth region undisclosed by the ECF to preserve confidentiality, so maximization was conducted on the subsample of 6696 individuals with observable migrant status ( s i = 1 ). Individuals in the subsample are significantly different from individuals with undisclosed birth region in some observables. If individuals in the subsample were also selected in terms of unobservables affecting their true migrant status, the results would be contaminated by sample selection bias. The estimator developed in this paper makes it possible to conduct a sensitivity analysis of the results reported in González (2022) to sample selection bias.
We let y i take on value 1 if individual i resides in a region other than that where he/she was born and value 0 if i resides in the same region where he/she was born. The population proportion of out-of-birth region residents is 14.5% in the ECF and 19.3% in the Census.6 The proportion of true migrants, calculated in the Census, is 17.4%. The lower proportion of true migrants is the result of 27% of true migrants returning to their birth region (false nonmigrants) and 8% of true nonmigrants migrating nonautonomously as a child (false migrants). The probabilities of misclassification are not constant but vary with observed personal attributes. These probabilities are estimated on very large samples and hence are considered as known parameters.
The ECF included a Money Earlier or Later (MEL) task to measure time preferences (e.g., Cohen et al. 2020). Respondents were presented sequentially with two hypothetical binary choices between immediate and delayed monetary rewards. According to their choices, they were sorted into four groups, which are described in terms of required rates of return (RRRs):7 below 4.9% (22.7% of the sample), between 4.9% and 9.8% (10.7%), between 9.8% and 44.9% (29.6%), and above 44.9% (37.0%). Higher levels of RRR reflect greater impatience.
Results were developed for six specifications of X 1 i , corresponding to three functions of RRR and two sets of controls. Time preference was measured alternatively with indicators for RRR group, an indicator for an RRR > 9.8%, and a quadratic function of RRR.8 The first set of controls comprised sex, single-year age group, and birth region. These characteristics are conceivably exogenous to an individual’s mobility decisions and appear related in the data (in the case of age and sex) to time preference. The second set of controls added attained education, the number of books at home at the age of 10, cognitive skills, willingness to take risks in financial matters, and the marginal propensity to consume (MPC) from windfall income (in percentage terms). Education and the number of books read might simultaneously increase a person’s ability to appreciate the future (Becker and Mulligan 1997) and to live in other places, confounding the relation of interest. Likewise, impatience appears systematically related to risk aversion and cognitive ability (e.g., Dohmen et al. 2010), which are strong predictors of geographic mobility (e.g., Jaeger et al. 2010; Bütikofer and Peri 2021). Krupka and Stephens (2013) found that measured rates of time preference are responsive to individuals’ immediate economic conditions. In this respect, the MPC may control for individuals’ economic resources at the interview date. X 2 i comprises these same regressors except birth region (which is unknown for individuals with unobservable migrant status) and the individual’s age (which was entered as a linear trend), plus the population of the region of residence on 1 January 2017 (taken from INE Cifras Oficiales de Población).
Before presenting the results, an issue requires some discussion. For the six specifications given above, PP-Heckprobit produced ρ ^ near +1.0 with no or an implausible standard error. As demonstrated in Butler (1996), this problem is caused by a sample with no observations for which s i = 1 and X 1 i β ^ 1 X 2 i β ^ 2 . A value of ρ on the boundary of the parameter space invalidates ML standard errors. Following Butler (1996), the standard errors for β ^ can be calculated by re-estimating the model under the assumption that ρ = + 1.0 . The components of the likelihood function for ρ = + 1.0 are presented in Table 7. These components are simple adaptations of those derived by Butler (1996) to a mismeasured response variable.
The estimation output yielded by PP-Heckprobit is presented in Table 8 for the six specifications pointed out above. For comparison purposes, Table 9 shows Probit estimates of the reduced-form Equation (2) and PP-Probit estimates of the outcome Equation (1). If ρ was zero, the sum of the log-likelihood values from these two equations would equal the log likelihood of the probit model with sample selection and misclassification.
As the selection equation was affected neither by sample selection nor by misclassification of the dependent variable, it is not surprising that β ^ 2 is similar in the two tables. The population of the region of residence was a strong predictor of having the birth region disclosed by the ECF, with the probability of disclosing the birth region increasing with population size. Attained education had a negative effect on this probability, with having a higher education showing the strongest effect. Having more than 200 books at home at the age of 10 was negatively associated with the probability of disclosing the birth region. Individuals’ numeracy skills and (in the full specification) age exerted a negative effect as well.
Correcting for possible sample selection bias using PP-Heckprobit had little effect on the estimated coefficients and associated standard errors yielded by PP-Probit, thus leaving the main results of González (2022) almost unchanged. The RRR for financial flows and the probability of ever migrating appeared to be inversely related even after accounting for individuals’ cognitive skills. Of course, the sample selection correction might be important in other contexts with different probabilities of selection and different correlation between error terms.

5. Conclusions

In many cases, the assumption that misclassification of a binary outcome is conditionally random does not hold. Applying an estimator that assumes conditionally random misclassification can make estimates worse when this assumption is false. This paper has extended Arezzo and Guagnano’s (2019) formulation for a misclassified binary response variable affected by sample selection to incorporate misclassification related to the covariates. Assuming the availability of validation data, the proposed pseudo-maximum likelihood estimator is consistent and asymptotically efficient. A Monte Carlo study documents the good performance of the proposed estimator under different models of misclassification, probabilities of selection, and correlation between error terms. We have also illustrated our method with an empirical example, examining the impact of time preference on the propensity to migrate.

Supplementary Materials

The following supporting information can be downloaded at: https://0-www-mdpi-com.brum.beds.ac.uk/article/10.3390/econometrics10020013/s1. Program files (Stata 16) that generated the data used in the Monte Carlo study and that analyzed the dataset used in the illustrative study.

Funding

This research was funded by the Government of Aragón, grant number S32−20R.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The program files that generated the data used in the Monte Carlo study and that analyzed the dataset used in the illustrative study are included in the electronic Supplementary Materials. The dataset analyzed in the illustrative study was constructed from publicly available data published by the Banco de España, Spain’s National Securities Market Commission, and Spain’s National Statistics Institute. Instructions for how other researchers can obtain these data can be found in the Supplementary Materials.

Acknowledgments

Two anonymous reviewers provided helpful comments and suggestions.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Maximum Likelihood Evaluator

In this appendix, we derive the analytic first and second derivatives of the log-likelihood function (10) and write out our method-lf2 likelihood evaluator (see Gould et al. 2010) implemented in Stata (version 16).
The log-likelihood function (10) can be written more compactly as
l ( β , ρ ) = i = 1 N s i ln [ cc i Φ ( w 2 i ) + om i Φ 2 ( w 1 i , w 2 i , ρ i ) ] + ( 1 s i ) ln Φ ( w 2 i )
where cc i = y i α 0 i + ( 1 y i ) α 1 i , w 1 i = q 1 i z 1 i , w 2 i = q 2 i z 2 i , z 1 i = X 1 i β 1 , z 2 i = X 2 i β 2 , q 1 i = 2 y i T 1 , q 2 i = 2 s i 1 , ρ i = q 1 i q 2 i ρ , and om i = 1 α 0 i α 1 i . To accommodate the restriction that in general 1 < ρ < 1 , maximization is not conducted directly with respect to ρ , but with respect to its inverse hyperbolic tangent ( atanh ρ ):
a atanh ρ = 1 2 ln ( 1 + ρ 1 ρ ) .
We first obtain the first derivatives of (A1) for s i = 1 , using Greene (2003, p. 711) as a reference:
l z 1 i = i = 1 N q 1 i om i k 1 i d i
l z 2 i = i = 1 N q 2 i cc i ϕ ( w 2 i ) + om i k 2 i d i
l a = i = 1 N [ q 1 i q 2 i om i ϕ 2 i d i ] ( 1 ( tanh a ) 2 )
where
k 1 i = ϕ ( w 1 i ) Φ [ w 2 i ρ i w 1 i 1 ρ i 2 ]
the subscripts 1 and 2 in k 1 i are reversed to obtain k 2 i , d i = cc i Φ ( w 2 i ) + om i Φ 2 ( w 1 i , w 2 i , ρ i ) , ϕ 2 i = ϕ 2 ( w 1 i , w 2 i , ρ i ) , ϕ 2 being the bivariate normal density, and
tanh a = e a e a e a + e a .
The first derivatives for s i = 0 are all zero except
l z 2 i = i = 1 N q 2 i ϕ ( w 2 i ) Φ ( w 2 i ) .
To obtain the second derivatives for s i = 1 , we use the result
δ i ϕ ( w 1 i ) ϕ ( v 1 i ) = δ i ϕ ( w 2 i ) ϕ ( v 2 i ) = ϕ 2 i
where δ i = 1 / 1 ρ i 2 , v 1 i = δ i ( w 2 i ρ i w 1 i ) , and v 2 i = δ i ( w 1 i ρ i w 2 i ) ; see Greene (2003). Thus:
2 l z 1 i 2 = i = 1 N om i [ w 1 i k 1 i d i ρ i ϕ 2 i d i om i k 1 i 2 d i 2 ]
2 l z 1 i z 2 i = i = 1 N om i q 1 i q 2 i [ ϕ 2 i d i om i k 1 i k 2 i d i 2 cc i ϕ ( w 2 i ) k 1 i d i 2 ]
2 l z 1 i a = i = 1 N om i q 2 i ϕ 2 i d i [ ρ i δ i v 1 i w 1 i om i k 1 i d i ] ( 1 ( tanh a ) 2 )
2 l z 2 i 2 = i = 1 N [ om i w 2 i k 2 i d i om i ρ i ϕ 2 i d i cc i w 2 i ϕ ( w 2 i ) d i ( om i k 2 i + cc i ϕ ( w 2 i ) ) 2 d i 2 ]
2 l z 2 i a = i = 1 N om i q 1 i ϕ 2 i d i [ ρ i δ i v 2 i w 2 i om i k 2 i + cc i ϕ ( w 2 i ) d i ] ( 1 ( tanh a ) 2 )
2 l a 2 = i = 1 N { om i ϕ 2 i d i [ δ i 2 ρ i ( 1 δ i 2 ( w 1 i 2 + w 2 i 2 2 ρ i w 1 i w 2 i ) ) + δ i 2 w 1 i w 2 i om i ϕ 2 i d i ] ( 1 ( tanh a ) 2 ) 2 2 tanh a om i q 1 i q 2 i ϕ 2 i d i ( 1 ( tanh a ) 2 ) }
The second derivatives for s i = 0 are all zero except
2 l z 2 i 2 = i = 1 N ϕ ( w 2 i ) Φ ( w 2 i ) [ w 2 i + ϕ ( w 2 i ) Φ ( w 2 i ) ] .
Referring to α 0 i , α 1 i , and om i as alpha0_i, alpha1_i, and om_i, respectively, our method-lf2 likelihood evaluator can be written as:
program PP_Heckprobit_lf2
 version 16.1
 args todo b lnfj g1 g2 g3 H
 tempvar z1 z2 q1 q2 w1 w2 rhos delta v1 v2 k1 k2 bden qf cc d
 tempname a
 mleval `z1’=`b’, eq(1)
 mleval `z2’=`b’, eq(2)
 mleval `a’=`b’, eq(3) scalar
 quietly {
 gen double `q1′=(2*$ML_y1)-1
 gen double `q2′=(2*$ML_y2)-1
 gen double `w1′=`q1′*`z1’
 gen double `w2’=`q2’*`z2’
 gen double `rhos’=`q1’*`q2’*tanh(`a’)
 gen double `delta’=1/sqrt(1-(`rhos’^2))
 gen double `v1’=`delta’*(`w2’-(`rhos’*`w1’))
 gen double `v2’=`delta’*(`w1’-(`rhos’*`w2’))
 gen double `k1’=normalden(`w1’)*normal(`v1’)
 gen double `k2’=normalden(`w2’)*normal(`v2’)
 gen double `bden’=normalden(`w2’)*normalden(`v2’)*`delta’
 gen double `qf’=(`delta’^2)*`rhos’*(1-((`delta’^2)*((`w1’^2) +(`w2’^2)-(2*`rhos’*`w1’*`w2’))))
 gen double `cc’=(alpha0_i*$ML_y1) + (alpha1_i*(1-$ML_y1))
 gen double `d’=(om_i*binormal(`w1’,`w2’,`rhos’)) + (`cc’*normal(`w2’))
 replace `lnfj’=ln(`d’) if $ML_y2==1
 replace `lnfj’=ln(normal(`w2’)) if $ML_y2==0
 if (`todo’==0) exit
 replace `g1’=om_i*`q1’*`k1’/`d’ if $ML_y2==1
 replace `g1’=0 if $ML_y2==0
 replace `g2’=`q2’*((om_i*`k2’)+(`cc’*normalden(`w2’)))/`d’ if $ML_y2==1
 replace `g2’=`q2’*normalden(`w2’)/normal(`w2’) if $ML_y2==0
 replace `g3’=(om_i*`q1’*`q2’*`bden’/`d’)*(1-(tanh(`a’)^2)) if $ML_y2==1
 replace `g3’=0 if $ML_y2==0
 if (`todo’==1) exit
 tempvar h11 h12 h13 h22 h23 h33
 gen double `h11’=-om_i*((`w1’*`k1’/`d’)+(`rhos’*`bden’/`d’) + (om_i*((`k1’/`d’)^2))) if /// $ML_y2==1
 replace `h11’=0 if $ML_y2==0
 gen double `h12’=om_i*`q1’*`q2’*((`bden’/`d’)-(om_i*`k1’*`k2’/(`d’^2)) ///-(`cc’*normalden(`w2’)*`k1’/(`d’^2))) if $ML_y2==1
 replace `h12’=0 if $ML_y2==0
 gen double `h13’=(om_i*`q2’*`bden’*((`rhos’*`delta’*`v1’)-`w1’ ///-(om_i*`k1’/`d’))/`d’)*(1-(tanh(`a’)^2)) if $ML_y2==1
 replace `h13’=0 if $ML_y2==0
 gen double `h22’=-(om_i*`w2’*`k2’/`d’)-(om_i*`rhos’*`bden’/`d’) ///-(`cc’*`w2’*normalden(`w2’)/`d’)-((((om_i*`k2’)+(`cc’*normalden(`w2’)))/`d’)^2) if /// $ML_y2==1
 replace `h22’=-normalden(`w2’)*(`w2’+(normalden(`w2’)/normal(`w2’)))/normal(`w2’) /// if $ML_y2==0
 gen double `h23’=(om_i*`q1’*`bden’*((`rhos’*`delta’*`v2’)-`w2’-(((om_i*`k2’) ///+(`cc’*normalden (`w2’)))/`d’))/`d’)*(1-(tanh(`a’)^2)) if $ML_y2==1
 replace `h23’=0 if $ML_y2==0
 gen double `h33’=(((om_i*`bden’*(`qf’+((`delta’^2)*`w1’*`w2’) ///-(om_i*`bden’/`d’))/`d’)*(1-(tanh(`a’)^2)))-(2*tanh(`a’)*(om_i*`q1’*`q2’*`bden’/`d’))) ///*(1-(tanh(`a’)^2)) if $ML_y2==1
 replace `h33’=0 if $ML_y2==0
 tempname d11 d12 d13 d22 d23 d33
 mlmatsum `lnfj’ `d11’=`h11’, eq(1)
 mlmatsum `lnfj’ `d12’=`h12’, eq(1,2)
 mlmatsum `lnfj’ `d13’=`h13’, eq(1,3)
 mlmatsum `lnfj’ `d22’=`h22’, eq(2)
 mlmatsum `lnfj’ `d23’=`h23’, eq(2,3)
 mlmatsum `lnfj’ `d33’=`h33’, eq(3)
 matrix `H’=(`d11’,`d12’,`d13’\`d12’’,`d22’,`d23’\`d13’’,`d23’’,`d33’)
 }
end

Notes

1
See Train (2009) for a good treatment of numerical maximization.
2
These derivatives are shown in Appendix A along with the maximum likelihood evaluator.
3
Except the coefficient on the intercept included in X1i, which is the LPM estimate of the coefficient on (1 − α0iα1i) in (14) minus 0.5 multiplied by 2.5 (Amemiya 1981).
4
Probit and Heckprobit were implemented using the homonymous Stata commands. The other five estimators were implemented in Stata using programs written by the author and available in the supplementary material.
5
Convergence is accepted if the Hessian is negative definite and the scaled gradient is lower than 1−8.
6
If all individuals with undisclosed birth region were out-of-birth region residents, the proportion of migrants (in this sense) in the ECF would rise to 17.6%. Given that the ECF is an individual survey, its lower migration rates might be the consequence of a greater probability of survey noncontact among movers, reducing the proportion of migrants in the sample. However, results in Imbens (1992) suggest that small amounts of endogenous sampling are unlikely to substantially alter estimated parameters. In addition, to guard against possible misspecification, our inference is based on robust estimators of variance.
7
When a respondent is indifferent between €m1 today and €m2 in a year’s time, the RRR necessary to induce her/him to forgo €m1 immediately is 2((m2/m1)1/2 − 1). This definition assumes semiannual compounding of the annual interest rate as a natural compromise between the types of compounding that Spaniards are most familiar with.
8
RRR is treated as a continuous variable by predicting the conditional mean for each RRR group from a lognormal curve fitted to the distribution of RRR data.

References

  1. Aller, Carlos, and Jorge González Chapela. 2013. Misclassification of the dependent variable in a debt–repayment behavior context. Journal of Empirical Finance 23: 162–72. [Google Scholar] [CrossRef]
  2. Amemiya, Takeshi. 1981. Qualitative response models: A survey. Journal of Economic Literature 19: 1483–536. [Google Scholar]
  3. Arcand, Jean-Louis, and Linguere Mbaye. 2013. Braving the Waves: The Role of Time and Risk Preferences in Illegal Migration from Senegal. IZA Discussion Paper No. 7517. Bonn: Institute for the Study of Labor. [Google Scholar]
  4. Arezzo, Maria Felice, and Giuseppina Guagnano. 2019. Misclassification in binary choice models with sample selection. Econometrics 7: 32. [Google Scholar] [CrossRef] [Green Version]
  5. Banco de España and National Securities Market Commission. 2018. Survey of Financial Competences (ECF) 2016. Available online: https://app.bde.es/pmk/en/ecf/2016 (accessed on 17 January 2020).
  6. Becker, Gary S., and Casey B. Mulligan. 1997. The endogenous determination of time preference. Quarterly Journal of Economics 112: 729–58. [Google Scholar] [CrossRef]
  7. Bollinger, Christopher R., and Martin H. David. 1997. Modeling discrete choice with response error: Food Stamp participation. Journal of the American Statistical Association 92: 827–35. [Google Scholar] [CrossRef]
  8. Bollinger, Christopher R., and Martin H. David. 2001. Estimation with response error and nonresponse: Food-Stamp participation in the SIPP. Journal of Business & Economic Statistics 19: 129–41. [Google Scholar]
  9. Bound, John, Charles Brown, and Nancy Mathiowetz. 2001. Measurement error in survey data. In Handbook of Econometrics. Edited by James J. Heckman and Edward Leamer. Amsterdam: Elsevier, vol. 5, pp. 3705–843. [Google Scholar]
  10. Bütikofer, Aline, and Giovanni Peri. 2021. How cognitive ability and personality traits affect geographic mobility. Journal of Labor Economics 39: 559–95. [Google Scholar] [CrossRef]
  11. Butler, John S. 1996. Estimating the correlation in censored probit models. Review of Economics and Statistics 78: 356–58. [Google Scholar] [CrossRef]
  12. Cameron, A. Colin, and Pravin K. Trivedi. 2005. Microeconometrics. Methods and Applications. Cambridge: CUP. [Google Scholar]
  13. Cohen, Jonathan, Keith Marzilli Ericson, David Laibson, and John Myles White. 2020. Measuring time preferences. Journal of Economic Literature 58: 299–347. [Google Scholar] [CrossRef]
  14. Davern, Michael, Jacob A. Klerman, Jeanettec Ziegenfuss, Victoriad Lynch, and Georgee Greenberg. 2009. A partially corrected estimate of medicaid enrollment and uninsurance: Results from an imputational model developed off linked survey and administrative data. Journal of Economic and Social Measurement 34: 219–40. [Google Scholar] [CrossRef]
  15. Dohmen, Thomas, Armin Falk, David Huffman, and Uwe Sunde. 2010. Are risk aversion and impatience related to cognitive ability? American Economic Review 100: 1238–60. [Google Scholar] [CrossRef] [Green Version]
  16. Dubin, Jeffrey A., and Douglas Rivers. 1989. Selection bias in linear regression, logit and probit models. Sociological Methods & Research 18: 360–90. [Google Scholar]
  17. Gibson, John, and David McKenzie. 2011. The microeconomic determinants of emigration and return migration of the best and brightest: Evidence from the Pacific. Journal of Development Economics 95: 18–29. [Google Scholar] [CrossRef] [Green Version]
  18. Goldbach, Carina, and Achim Schlüter. 2018. Risk aversion, time preferences, and out-migration. Experimental evidence from Ghana and Indonesia. Journal of Economic Behavior and Organization 150: 132–48. [Google Scholar] [CrossRef]
  19. González Chapela, Jorge. 2022. Is There a Patience Premium on Migration? Empirical Economics. Available online: https://0-doi-org.brum.beds.ac.uk/10.1007/s00181-021-02196-z (accessed on 10 January 2022).
  20. Gould, William, Jeffrey Pitblado, and William Sribney. 2010. Maximum Likelihood Estimation with Stata, 4th ed. College Station: Stata Press. [Google Scholar]
  21. Gourieroux, Christian, Alain Monfort, and Alain Trognon. 1984. Pseudo maximum likelihood methods: Theory. Econometrica 52: 681–700. [Google Scholar] [CrossRef]
  22. Greene, William H. 2003. Econometric Analysis, 4th ed. Upper Saddle River: Prentice Hall. [Google Scholar]
  23. Hausman, Jerry A., Jason Abrevaya, and Fiona M. Scott-Morton. 1998. Misclassification of the dependent variable in a discrete-response setting. Journal of Econometrics 87: 239–69. [Google Scholar] [CrossRef] [Green Version]
  24. Heckman, James. 1974. Shadow prices, market wages, and labor supply. Econometrica 42: 679–94. [Google Scholar] [CrossRef]
  25. Heckman, James. 1979. Sample selection bias as a specification error. Econometrica 47: 153–62. [Google Scholar] [CrossRef]
  26. Imbens, Guido W. 1992. An efficient method of moments estimator for discrete choice models with choice-based sampling. Econometrica 60: 1187–214. [Google Scholar] [CrossRef] [Green Version]
  27. Jaeger, David, Thomas Dohmen, Armin Falk, David Huffman, Uwe Sunde, and Holger Bonin. 2010. Direct evidence on risk attitudes and migration. Review of Economics and Statistics 92: 684–89. [Google Scholar] [CrossRef] [Green Version]
  28. Katz, Jonathan N., and Gabriel Katz. 2010. Correcting for survey misreports using auxiliary information with an application to estimating turnout. American Journal of Political Science 54: 815–35. [Google Scholar] [CrossRef]
  29. Krupka, Erin L., and Melvin Stephens Jr. 2013. The stability of measured time preferences. Journal of Economic Behavior and Organization 85: 11–19. [Google Scholar] [CrossRef]
  30. Levine, Phillip B. 1993. CPS contemporaneous and retrospective unemployment compared. Monthly Labor Review 116: 33–39. [Google Scholar]
  31. Meyer, Bruce D., and Nikolas Mittag. 2017. Misclassification in binary choice models. Journal of Econometrics 200: 295–311. [Google Scholar] [CrossRef]
  32. Molloy, Raven, Christopher L. Smith, and Abigail Wozniak. 2011. Internal migration in the United States. Journal of Economic Perspectives 25: 173–96. [Google Scholar] [CrossRef] [Green Version]
  33. Nowotny, Klaus. 2014. Cross-border commuting and migration intentions: The roles of risk aversion and time preference. Contemporary Economics 8: 137–56. [Google Scholar] [CrossRef] [Green Version]
  34. Poterba, James M., and Lawrence H. Summers. 1995. Unemployment benefits and labor market transitions: A multinomial logit model with errors in classification. Review of Economics and Statistics 77: 207–16. [Google Scholar] [CrossRef]
  35. Ramalho, Esmeralda A. 2002. Regression models for choice-based samples with misclassification in the response variable. Journal of Econometrics 106: 171–201. [Google Scholar] [CrossRef] [Green Version]
  36. StataCorp. 2019. Stata: Release 16. Statistical Software. College Station: StataCorp LLC. [Google Scholar]
  37. Train, Kenneth E. 2009. Discrete Choice Methods with Simulation, 2nd ed. New York: CUP. [Google Scholar]
  38. Van de Ven, Wynand, and Bernard van Praag. 1981. The demand for deductibles in private health insurance. Journal of Econometrics 17: 229–52. [Google Scholar] [CrossRef]
  39. Vella, Francis. 1998. Estimating models with sample selection bias: A survey. Journal of Human Resources 33: 127–69. [Google Scholar] [CrossRef] [Green Version]
Table 1. Misclassification probabilities in Misclassification Model 2.
Table 1. Misclassification probabilities in Misclassification Model 2.
X11 ValuesX12 Values
X 12 = 0 X 12 = 1
X 11 < 1 0.06, 0.200.08, 0.16
X 11 1 0.03, 0.180.04, 0.28
Table 2. Monte Carlo simulation results for Misclassification Model 1.
Table 2. Monte Carlo simulation results for Misclassification Model 1.
β 20 = 0 . 5 ,   ρ = 0.2
ProbitHAS-ProbitPP-ProbitHeckprobitHAS-Heckprobit1HAS-Heckprobit2PP-Heckprobit
BiasSDBiasSDBiasSDBiasSDBiasSDBiasSDBiasSD
β ^ 10 −0.1460.058−0.0340.216−0.0810.080−0.0920.0660.0360.2290.0110.260−0.0040.092
β ^ 11 −0.4160.0180.0710.0550.0050.024−0.4180.0170.0420.0530.0870.070−0.0020.024
β ^ 12 −0.2730.0520.0530.3020.0050.081−0.2750.0510.0350.291−0.1460.494−0.0020.079
β ^ 13 −0.3140.0840.0930.2100.0100.123−0.3160.0830.0670.2120.1280.2570.0030.122
β ^ 20 0.0010.0490.0020.048−0.0010.0480.0010.049
β ^ 21 0.0040.0260.0030.0260.0030.0240.0040.026
β ^ 22 0.0020.0230.0010.0230.0030.0250.0020.023
ρ ^ −0.3280.076−0.0230.1240.1520.152−0.0080.109
Convergence493491493493409186493
β 20 = 0.5 ,   ρ = 0.8
ProbitHAS-ProbitPP-ProbitHeckprobitHAS-Heckprobit1HAS-Heckprobit2PP-Heckprobit
BiasSDBiasSDBiasSDBiasSDBiasSDBiasSDBiasSD
β ^ 10 −0.2860.059−0.1900.212−0.2580.080−0.0960.0600.0040.126−0.0240.1750.0000.075
β ^ 11 −0.4020.0180.1870.0590.1090.026−0.4320.0170.0040.0310.0170.0440.0050.024
β ^ 12 −0.2300.0520.1680.3300.1130.090−0.2620.0510.0010.154−0.0600.3010.0030.082
β ^ 13 −0.2730.0850.2160.2380.1060.126−0.3050.0820.0110.1380.0280.1400.0050.114
β ^ 20 −0.0060.0440.0020.0440.0090.0440.0010.044
β ^ 21 0.0030.0250.0020.0250.0020.0240.0020.025
β ^ 22 0.0020.0220.0000.0230.0020.0240.0000.023
ρ ^ −0.3450.061−0.0070.092−0.0000.103−0.0050.072
Convergence485483485485476169485
β 20 = 2.18 ,   ρ = 0.2
ProbitHAS-ProbitPP-ProbitHeckprobitHAS-Heckprobit1HAS-Heckprobit2PP-Heckprobit
BiasSDBiasSDBiasSDBiasSDBiasSDBiasSDBiasSD
β ^ 10 −0.0980.0480.0120.184−0.0170.067−0.0890.0490.0210.185−0.0590.203−0.0040.069
β ^ 11 −0.4110.0140.0430.0450.0050.019−0.4120.0140.0380.0450.0250.0560.0010.019
β ^ 12 −0.2700.0420.0310.2490.0040.064−0.2720.0420.0280.246−0.1350.375−0.0000.065
β ^ 13 −0.3100.0700.0650.1710.0170.102−0.3110.0690.0650.1690.0250.2050.0130.102
β ^ 20 0.0010.0910.0010.0910.0010.0880.0020.091
β ^ 21 0.0030.0450.0040.0450.0020.0440.0030.045
β ^ 22 0.0050.0380.0050.038−0.0020.0390.0040.038
ρ ^ −0.3230.1560.1120.2580.1080.2460.0380.239
Convergence495496496496466218496
β 20 = 2.18 ,   ρ = 0.8
ProbitHAS-ProbitPP-ProbitHeckprobitHAS-Heckprobit1HAS-Heckprobit2PP-Heckprobit
BiasSDBiasSDBiasSDBiasSDBiasSDBiasSDBiasSD
β ^ 10 −0.1160.0480.0080.185−0.0390.068−0.0870.0480.0090.163−0.0410.162−0.0030.066
β ^ 11 −0.4020.0140.0790.0460.0330.020−0.4130.0140.0200.039−0.0050.0380.0050.020
β ^ 12 −0.2550.0430.0640.2550.0320.066−0.2650.0430.0100.207−0.0830.3520.0030.066
β ^ 13 −0.2970.0710.1030.1760.0440.104−0.3060.0700.0300.153−0.0140.1450.0120.101
β ^ 20 0.0020.0900.0030.0900.0050.0910.0030.089
β ^ 21 0.0070.0440.0060.0440.0070.0450.0060.045
β ^ 22 −0.0020.040−0.0040.040−0.0040.039−0.0030.040
ρ ^ −0.3560.151−0.0600.198−0.1160.210−0.0390.186
Convergence493494494494424162462
Table 3. Monte Carlo simulation results for Misclassification Model 2.
Table 3. Monte Carlo simulation results for Misclassification Model 2.
β 20 = 0.5 ,   ρ = 0.2
ProbitHAS-ProbitPP-ProbitHeckprobitHAS-Heckprobit1HAS-Heckprobit2PP-Heckprobit
BiasSDBiasSDBiasSDBiasSDBiasSDBiasSDBiasSD
β ^ 10 −0.1770.057−0.0670.251−0.0790.081−0.1230.0660.0160.262−0.0140.244−0.0020.094
β ^ 11 −0.5130.0160.2660.0860.0050.023−0.5140.0160.2100.0800.0980.067−0.0010.023
β ^ 12 −0.3000.0520.4120.7920.0070.083−0.3020.0520.3450.640−0.2270.558−0.0010.081
β ^ 13 −0.3210.0800.5950.4200.0050.118−0.3230.0800.5180.3870.1870.299−0.0010.117
β ^ 20 0.0010.0490.0040.048−0.0070.0490.0020.049
β ^ 21 0.0040.0260.0040.0260.0030.0260.0040.026
β ^ 22 0.0020.0230.0020.0230.0040.0230.0020.023
ρ ^ −0.3290.0750.2820.1690.2060.186−0.0130.106
Convergence500499500500443197500
β 20 = 0.5 ,   ρ = 0.8
ProbitHAS-ProbitPP-ProbitHeckprobitHAS-Heckprobit1HAS-Heckprobit2PP-Heckprobit
BiasSDBiasSDBiasSDBiasSDBiasSDBiasSDBiasSD
β ^ 10 −0.3230.058−0.1790.257−0.2580.080−0.1330.060−0.0560.127−0.0050.145−0.0000.076
β ^ 11 −0.5030.0170.4930.0880.1090.025−0.5280.016−0.0080.0320.0210.0400.0050.022
β ^ 12 −0.2670.0510.7090.9790.1140.091−0.2970.0500.0940.185−0.1000.3130.0030.083
β ^ 13 −0.2790.0830.8370.4070.1050.123−0.3090.0800.1280.1510.0370.1400.0060.111
β ^ 20 −0.0060.0440.0020.0440.0050.0450.0020.044
β ^ 21 0.0040.0250.0020.0250.0020.0250.0020.025
β ^ 22 0.0020.023−0.0000.0230.0030.0230.0000.023
ρ ^ −0.3510.0600.0860.0790.0230.092−0.0060.071
Convergence500498500500480254500
β 20 = 2.18 ,   ρ = 0.2
ProbitHAS-ProbitPP-ProbitHeckprobitHAS-Heckprobit1HAS-Heckprobit2PP-Heckprobit
BiasSDBiasSDBiasSDBiasSDBiasSDBiasSDBiasSD
β ^ 10 −0.1290.047−0.0200.213−0.0150.067−0.1190.048−0.0170.211−0.0660.173−0.0030.069
β ^ 11 −0.5080.0130.2120.0700.0050.019−0.5090.0130.1890.0680.0150.0460.0010.019
β ^ 12 −0.2960.0410.3350.4840.0050.065−0.2980.0410.3110.470−0.1850.3950.0000.065
β ^ 13 −0.3200.0680.5210.3380.0110.101−0.3210.0680.4900.3350.1090.2090.0060.101
β ^ 20 0.0010.0910.0010.0910.0030.0930.0010.091
β ^ 21 0.0030.0450.0030.0450.0060.0440.0030.045
β ^ 22 0.0050.0380.0050.0380.0090.0380.0050.038
ρ ^ −0.3300.1550.3090.3490.1060.2870.0060.234
Convergence500500500500493203499
β 20 = 2.18 ,   ρ = 0.8
ProbitHAS-ProbitPP-ProbitHeckprobitHAS-Heckprobit1HAS-Heckprobit2PP-Heckprobit
BiasSDBiasSDBiasSDBiasSDBiasSDBiasSDBiasSD
β ^ 10 −0.1480.047−0.0180.215−0.0370.068−0.1180.047−0.0630.184−0.0190.176−0.0010.066
β ^ 11 −0.5010.0130.2670.0720.0330.019−0.5090.0130.0850.0550.0320.0440.0070.019
β ^ 12 −0.2820.0410.3880.4890.0320.066−0.2920.0410.2030.375−0.1040.3260.0040.067
β ^ 13 −0.3070.0680.5890.3390.0370.101−0.3150.0680.3040.2650.0750.1710.0100.098
β ^ 20 0.0020.0910.0030.0930.0070.0890.0030.090
β ^ 21 0.0060.0450.0100.0450.0120.0420.0050.044
β ^ 22 −0.0020.0400.0020.039−0.0030.039−0.0020.039
ρ ^ −0.3650.1520.0270.176−0.0230.206−0.0470.186
Convergence500499500500399197466
Table 4. Monte Carlo simulation results for Misclassification Model 3.
Table 4. Monte Carlo simulation results for Misclassification Model 3.
β 20 = 0.5 ,   ρ = 0.2
ProbitHAS-ProbitPP-ProbitHeckprobitHAS-Heckprobit1HAS-Heckprobit2PP-Heckprobit
BiasSDBiasSDBiasSDBiasSDBiasSDBiasSDBiasSD
β ^ 10 −0.0410.0610.1550.218−0.0750.0830.0160.0670.2440.2210.1460.2010.0030.090
β ^ 11 −0.0370.0140.0900.0320.0160.018−0.0410.0140.1000.0310.0940.0290.0080.018
β ^ 12 −0.1840.053−0.0810.1720.0090.075−0.1870.053−0.0750.166−0.1370.1920.0020.076
β ^ 13 −0.2730.087−0.1510.1250.0130.123−0.2770.087−0.1430.125−0.0220.1490.0050.122
β ^ 20 −0.0020.046−0.0030.045−0.0040.041−0.0020.046
β ^ 21 0.0030.0260.0030.0260.0010.0270.0030.026
β ^ 22 0.0010.0230.0010.023−0.0020.0250.0010.023
ρ ^ −0.2760.076−0.1380.092−0.0970.0940.0050.103
Convergence50039950050040792500
β 20 = 0.5 ,   ρ = 0.8
ProbitHAS-ProbitPP-ProbitHeckprobitHAS-Heckprobit1HAS-Heckprobit2PP-Heckprobit
BiasSDBiasSDBiasSDBiasSDBiasSDBiasSDBiasSD
β ^ 10 −0.1550.059−0.1350.104−0.2540.0820.0490.0600.1720.1250.1270.1380.0040.076
β ^ 11 0.0740.0160.0850.0180.1170.0200.0210.0150.1070.0210.0640.0220.0070.018
β ^ 12 −0.1210.052−0.1100.0730.1140.081−0.1670.052−0.1080.096−0.1010.1720.0020.077
β ^ 13 −0.2300.089−0.2160.0910.1060.130−0.2710.085−0.1910.105−0.1310.1050.0030.116
β ^ 20 0.0040.0480.0040.0490.0030.0500.0020.049
β ^ 21 0.0020.0240.0020.0240.0010.0210.0020.024
β ^ 22 −0.0000.0220.0010.0220.0040.0220.0000.022
ρ ^ −0.2650.062−0.2010.081−0.1410.085−0.0000.068
Convergence50035050050047775499
β 20 = 2.18 ,   ρ = 0.2
ProbitHAS-ProbitPP-ProbitHeckprobitHAS-Heckprobit1HAS-Heckprobit2PP-Heckprobit
BiasSDBiasSDBiasSDBiasSDBiasSDBiasSDBiasSD
β ^ 10 0.0010.0510.0540.137−0.0110.0700.0120.0520.1120.1750.0980.1490.0030.071
β ^ 11 −0.0540.012−0.0190.0190.0080.015−0.0570.0120.0090.0260.0430.0270.0030.015
β ^ 12 −0.1890.041−0.1630.1000.0040.059−0.1910.041−0.1370.140−0.1520.135−0.0010.059
β ^ 13 −0.2820.072−0.2520.0880.0030.101−0.2840.072−0.2150.100−0.1080.129−0.0020.101
β ^ 20 0.0030.0930.0030.0920.0060.0960.0040.093
β ^ 21 0.0010.0440.0010.044−0.0100.0390.0010.044
β ^ 22 0.0010.0390.0020.0390.0020.0350.0010.040
ρ ^ −0.2000.173−0.2860.181−0.0670.1980.0770.230
Convergence50035650050038786499
β 20 = 2.18 ,   ρ = 0.8
ProbitHAS-ProbitPP-ProbitHeckprobitHAS-Heckprobit1HAS-Heckprobit2PP-Heckprobit
BiasSDBiasSDBiasSDBiasSDBiasSDBiasSDBiasSD
β ^ 10 −0.0110.0500.0840.182−0.0310.0690.0220.0490.0950.1480.1390.1480.0030.066
β ^ 11 −0.0250.0120.0320.0250.0390.015−0.0460.0120.0100.0220.0820.0270.0070.015
β ^ 12 −0.1680.042−0.1190.1340.0330.060−0.1850.042−0.1430.117−0.1340.1850.0020.061
β ^ 13 −0.2660.071−0.2130.0940.0330.101−0.2810.070−0.2240.096−0.0850.1170.0000.098
β ^ 20 0.0030.0900.0060.0950.0080.0870.0030.090
β ^ 21 −0.0030.0440.0030.0460.0030.0410.0020.044
β ^ 22 −0.0030.0400.0090.0400.0090.0400.0040.040
ρ ^ −0.1860.168−0.1700.200−0.1390.201−0.0170.165
Convergence50037750049929458460
Table 5. Monte Carlo simulation results for alpha parameters.
Table 5. Monte Carlo simulation results for alpha parameters.
HAS-Probit under Misclassification Model 1
β 20 = 0.5 β 20 = 0.5 β 20 = 2.18 β 20 = 2.18
ρ = 0.2 ρ = 0.8 ρ = 0.2 ρ = 0.8
BiasSDBiasSDBiasSDBiasSD
α ^ 0 0.0400.0430.1700.051−0.0110.0370.0540.037
α ^ 1 −0.0250.070−0.0410.054−0.0320.064−0.0420.060
Convergence491483496494
HAS-Heckprobit1 under Misclassification Model 1
β 20 = 0.5 β 20 = 0.5 β 20 = 2.18 β 20 = 2.18
ρ = 0.2 ρ = 0.8 ρ = 0.2 ρ = 0.8
BiasSDBiasSDBiasSDBiasSD
α ^ 0 0.0470.044−0.0580.033−0.0110.037−0.0750.034
α ^ 1 −0.0330.070−0.0360.041−0.0140.063−0.0450.057
Convergence409476466424
HAS-Heckprobit2 under Misclassification Model 2
β 20 = 0.5 β 20 = 0.5 β 20 = 2.18 β 20 = 2.18
ρ = 0.2 ρ = 0.8 ρ = 0.2 ρ = 0.8
BiasSDBiasSDBiasSDBiasSD
α ^ 0 11 0.0250.048−0.0070.032−0.1110.038−0.0420.036
α ^ 1 11 0.1790.220−0.0160.1520.1280.2060.0630.192
α ^ 0 12 1.7710.1760.9530.1561.2370.1620.8850.135
α ^ 1 12 −0.2850.124−0.1540.068−0.2060.114−0.1720.103
α ^ 0 21 0.0610.0480.0040.035−0.2800.039−0.0870.037
α ^ 1 21 0.1120.136−0.0100.1000.0460.1110.0100.106
α ^ 0 22 3.7380.1722.2490.1482.4510.1472.1370.130
α ^ 1 22 −0.0880.081−0.0410.045−0.0860.063−0.0520.059
Convergence197254203197
Table 6. Monte Carlo simulation results for PP-Heckprobit.
Table 6. Monte Carlo simulation results for PP-Heckprobit.
Misclassification Model 1
β 20 = 0.5 β 20 = 0.5 β 20 = 0.5 β 20 = 2.18 β 20 = 2.18 β 20 = 2.18
ρ = 0.9 ρ = 0.8 ρ = 0.9 ρ = 0.9 ρ = 0.8 ρ = 0.9
BiasSDBiasSDBiasSDBiasSDBiasSDBiasSD
β ^ 10 0.0050.1000.0040.105−0.0010.0720.0020.0710.0010.072−0.0010.066
β ^ 11 0.0080.0220.0060.0230.0100.0230.0000.0190.0000.0190.0080.019
β ^ 12 0.0050.0860.0030.0860.0040.0790.0030.0660.0010.0670.0050.065
β ^ 13 0.0120.1250.0060.1260.0130.1100.0070.1040.0050.1040.0150.102
β ^ 20 0.0010.047−0.0030.0450.0030.0440.0010.0890.0020.0940.0050.089
β ^ 21 0.0020.0240.0020.0230.0020.0260.0070.0450.0080.0440.0090.044
β ^ 22 0.0030.0220.0030.0220.0020.0220.0110.0400.0120.0400.0010.040
ρ ^ −0.0010.044−0.0000.060−0.0090.057−0.0120.094−0.0070.117−0.0550.152
Convergence498500490432482441
Misclassification Model 2
β 20 = 0.5 β 20 = 0.5 β 20 = 0.5 β 20 = 2.18 β 20 = 2.18 β 20 = 2.18
ρ = 0.9 ρ = 0.8 ρ = 0.9 ρ = 0.9 ρ = 0.8 ρ = 0.9
BiasSDBiasSDBiasSDBiasSDBiasSDBiasSD
β ^ 10 0.0040.1020.0040.106−0.0010.0730.0030.0720.0010.071−0.0010.066
β ^ 11 0.0060.0220.0040.0230.0080.0220.0010.0190.0020.0190.0070.019
β ^ 12 0.0050.0860.0040.0870.0050.0830.0040.0650.0020.0660.0060.064
β ^ 13 0.0120.1230.0030.1240.0100.1080.0050.1000.0050.1020.0140.099
β ^ 20 0.0010.047−0.0030.0450.0030.0450.0000.0910.0010.0940.0050.091
β ^ 21 0.0020.0240.0020.0230.0020.0260.0070.0450.0080.0440.0110.043
β ^ 22 0.0030.0220.0030.0220.0020.0220.0100.0400.0100.0400.0020.039
ρ ^ −0.0010.043−0.0010.057−0.0080.056−0.0160.088−0.0100.110−0.0640.157
Convergence499500488425479430
Misclassification Model 3
β 20 = 0.5 β 20 = 0.5 β 20 = 0.5 β 20 = 2.18 β 20 = 2.18 β 20 = 2.18
ρ = 0.9 ρ = 0.8 ρ = 0.9 ρ = 0.9 ρ = 0.8 ρ = 0.9
BiasSDBiasSDBiasSDBiasSDBiasSDBiasSD
β ^ 10 0.0090.1010.0100.1040.0040.0720.0070.0760.0080.0750.0040.067
β ^ 11 0.0060.0170.0080.0180.0060.0180.0050.0150.0050.0150.0060.015
β ^ 12 0.0050.0800.0040.0800.0020.0760.0040.0590.0030.0610.0030.060
β ^ 13 0.0050.1270.0020.1270.0010.1110.0030.102−0.0030.1010.0020.097
β ^ 20 0.0010.045−0.0020.0460.0040.0470.0010.0980.0000.0950.0020.092
β ^ 21 0.0030.0260.0040.0260.0030.0240.0070.0450.0050.0440.0050.043
β ^ 22 0.0030.0220.0020.0220.0020.0210.0100.0410.0060.0420.0030.040
ρ ^ −0.0010.041−0.0030.056−0.0000.052−0.0160.091−0.0170.120−0.0220.127
Convergence500500489439477429
Table 7. Likelihood function for ρ = + 1.0 in the probit model with sample selection and misclassification.
Table 7. Likelihood function for ρ = + 1.0 in the probit model with sample selection and misclassification.
ConditionLikelihood Function
s i = 0 : Φ ( X 2 i β 2 )
s i = 1   and :
y i = 0 ,    X 1 i β 1 X 2 i β 2 : α 1 i Φ ( X 2 i β 2 )
y i = 0 ,    X 1 i β 1 < X 2 i β 2 : α 1 i Φ ( X 2 i β 2 ) + ( 1 α 0 i α 1 i ) ( Φ ( X 2 i β 2 ) Φ ( X 1 i β 1 ) )
y i = 1 ,    X 1 i β 1 X 2 i β 2 : ( 1 α 1 i ) Φ ( X 2 i β 2 )
y i = 1 ,    X 1 i β 1 < X 2 i β 2 : α 0 i Φ ( X 2 i β 2 ) + ( 1 α 0 i α 1 i ) Φ ( X 1 i β 1 )
Notes: Entries for s i = 1 show P ( y i = y | s i = 1 ) P ( s i = 1 ) .
Table 8. PP-Heckprobit estimates.
Table 8. PP-Heckprobit estimates.
(1)(2)(3)(4)(5)(6)
Explanatory VariablesSelectionOutcomeSelectionOutcomeSelectionOutcomeSelectionOutcomeSelectionOutcomeSelectionOutcome
1(4.9% < RRR ≤ 9.8%)−0.0400.009−0.0620.001
(0.089)(0.184)(0.091)(0.191)
1(9.8% < RRR ≤ 44.9%)0.057−0.250 *0.018−0.250
(0.070)(0.150)(0.071)(0.158)
1(44.9% < RRR)0.059−0.166−0.039−0.141
(0.067)(0.152)(0.071)(0.165)
1(9.8% < RRR) 0.071−0.207 *0.008−0.194
(0.053)(0.117)(0.055)(0.120)
RRR 0.340−1.334 *0.155−1.328 *
(0.337)(0.720)(0.345)(0.744)
RRR2 −0.0680.273 *−0.0340.273 *
(0.070)(0.148)(0.071)(0.153)
Male0.015−0.0370.0130.0230.016−0.0400.0150.0200.016−0.0380.0140.023
(0.050)(0.108)(0.053)(0.111)(0.050)(0.112)(0.053)(0.112)(0.050)(0.108)(0.053)(0.111)
Lower secondary education −0.364 ***−0.108 −0.364 ***−0.114 −0.363 ***−0.108
(0.099)(0.186) (0.099)(0.186) (0.099)(0.186)
Upper secondary −0.239 **−0.006 −0.235 **−0.011 −0.239 **−0.005
(0.108)(0.199) (0.108)(0.198) (0.108)(0.196)
Higher education −0.434 ***0.279 −0.424 ***0.264 −0.431 ***0.278
(0.112)(0.219) (0.111)(0.217) (0.112)(0.216)
11–25 books at home −0.016−0.149 −0.014−0.147 −0.015−0.146
(0.081)(0.183) (0.081)(0.183) (0.081)(0.180)
26–100 0.0080.045 0.0090.046 0.0090.050
(0.083)(0.179) (0.083)(0.183) (0.083)(0.174)
101–200 −0.165−0.130 −0.164−0.125 −0.166−0.128
(0.102)(0.262) (0.101)(0.267) (0.102)(0.259)
>200 −0.235 **−0.200 −0.236 **−0.202 −0.235 **−0.201
(0.096)(0.270) (0.096)(0.268) (0.096)(0.265)
Numeracy skills −0.129 **0.076 −0.127 **0.081 −0.129 **0.076
(0.055)(0.146) (0.055)(0.147) (0.055)(0.145)
Reading comprehension −0.0410.047 −0.0400.045 −0.0400.047
(0.036)(0.068) (0.036)(0.068) (0.036)(0.068)
Cognitive reflection 0.046−0.059 0.048−0.068 0.047−0.060
(0.062)(0.133) (0.062)(0.131) (0.062)(0.132)
Risk score −0.032−0.092* −0.031−0.093 * −0.033−0.093 *
(0.020)(0.049) (0.020)(0.049) (0.020)(0.048)
MPC (÷10) 0.001−0.044 * 0.001−0.043 * 0.001−0.044 *
(0.008)(0.024) (0.008)(0.024) (0.008)(0.024)
Age−0.001 −0.006 *** −0.001 −0.006 *** −0.001 −0.006 ***
(0.001) (0.002) (0.001) (0.002) (0.001) (0.002)
Region population (106)0.213 *** 0.215 *** 0.213 *** 0.215*** 0.213 *** 0.215 ***
(0.019) (0.019) (0.019) (0.019) (0.019) (0.019)
Intercept1.014 ***−0.897 ***1.877 ***−0.6711.002 ***−0.886 ***1.850 ***−0.6380.993 ***−0.847 ***1.857 ***−0.625
(0.093)(0.320)(0.202)(0.425)(0.090)(0.309)(0.199)(0.407)(0.093)(0.320)(0.201)(0.426)
Log-likelihood−3653.97−3609.54−3654.31−3610.53−3654.28−3609.91
Notes: The number of observations is 7129. Regressors in the outcome equation include indicators for single-year age group and birth region. 1(·) is the indicator function. Robust standard errors are in parentheses. *: Significant at 10%. **: Significant at 5%. ***: Significant at 1%.
Table 9. Probit (selection equation) and PP-Probit (outcome equation) estimates.
Table 9. Probit (selection equation) and PP-Probit (outcome equation) estimates.
(1)(2)(3)(4)(5)(6)
Explanatory VariablesSelectionOutcomeSelectionOutcomeSelectionOutcomeSelectionOutcomeSelectionOutcomeSelectionOutcome
1(4.9% < RRR ≤ 9.8%)−0.0380.015−0.0610.013
(0.089)(0.181)(0.090)(0.192)
1(9.8% < RRR ≤ 44.9%)0.053−0.255 *0.011−0.248
(0.069)(0.150)(0.071)(0.156)
1(44.9% < RRR)0.055−0.173−0.045−0.131
(0.067)(0.155)(0.071)(0.166)
1(9.8% < RRR) 0.066−0.216 *0.002−0.192
(0.053)(0.126)(0.055)(0.120)
RRR 0.318−1.371 *0.118−1.325 *
(0.336)(0.738)(0.344)(0.744)
RRR2 −0.0640.281 *−0.0260.273 *
(0.070)(0.152)(0.071)(0.153)
Male0.014−0.0400.0080.0230.015−0.0450.0110.0180.015−0.0440.0090.023
(0.050)(0.109)(0.053)(0.113)(0.050)(0.117)(0.053)(0.113)(0.050)(0.111)(0.053)(0.114)
Lower secondary education −0.364 ***−0.074 −0.364 ***−0.081 −0.363***−0.075
(0.098)(0.189) (0.098)(0.189) (0.098)(0.190)
Upper secondary −0.237 **0.010 −0.234 **0.004 −0.236 **0.009
(0.108)(0.192) (0.107)(0.190) (0.108)(0.192)
Higher education −0.432 ***0.328 −0.423 ***0.310 −0.430 ***0.325
(0.111)(0.216) (0.111)(0.210) (0.111)(0.214)
11–25 books at home −0.016−0.146 −0.013−0.144 −0.014−0.142
(0.081)(0.174) (0.081)(0.173) (0.081)(0.173)
26–100 0.0150.052 0.0160.054 0.0150.059
(0.082)(0.158) (0.083)(0.158) (0.082)(0.156)
101–200 −0.155−0.119 −0.155−0.113 −0.156−0.116
(0.101)(0.252) (0.101)(0.255) (0.101)(0.253)
>200 −0.231 **−0.178 −0.232 **−0.180 −0.231 **−0.178
(0.096)(0.268) (0.096)(0.262) (0.096)(0.265)
Numeracy skills −0.127 **0.096 −0.125 **0.101 −0.127 **0.097
(0.055)(0.148) (0.055)(0.148) (0.055)(0.148)
Reading comprehension −0.0410.049 −0.0400.047 −0.0400.049
(0.036)(0.070) (0.036)(0.070) (0.036)(0.070)
Cognitive reflection 0.047−0.070 0.049−0.080 0.048−0.071
(0.062)(0.136) (0.062)(0.135) (0.062)(0.136)
Risk score −0.031−0.090 * −0.030−0.091 * −0.031−0.091 *
(0.020)(0.050) (0.020)(0.050) (0.020)(0.050)
MPC (÷10) 0.001−0.042 * 0.001−0.042 * 0.001−0.043 *
(0.008)(0.025) (0.008)(0.025) (0.008)(0.025)
Age−0.001 −0.006 *** −0.001 −0.006 *** −0.001 −0.006 ***
(0.001) (0.002) (0.001) (0.002) (0.001) (0.002)
Region population (106)0.208 *** 0.210 *** 0.208 *** 0.210 *** 0.208 *** 0.210 ***
(0.019) (0.019) (0.019) (0.019) (0.019) (0.019)
Intercept1.025 ***−0.927 ***1.875 ***−0.767 *1.014 ***−0.912 ***1.848 ***−0.724 *1.006 ***−0.872 ***1.856 ***−0.717
(0.094)(0.332)(0.202)(0.448)(0.090)(0.330)(0.198)(0.428)(0.093)(0.338)(0.201)(0.451)
Log-likelihood−1470.68−2189.85−1438.48−2178.17−1470.77−2190.05−1439.06−2178.58−1470.86−2189.96−1438.75−2178.27
Observations712966967129669671296696712966967129669671296696
Notes: Regressors in the outcome equation include indicators for single-year age group and birth region. 1(·) is the indicator function. Robust standard errors are in parentheses. *: Significant at 10%. **: Significant at 5%. ***: Significant at 1%.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

González Chapela, J. A Binary Choice Model with Sample Selection and Covariate-Related Misclassification. Econometrics 2022, 10, 13. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics10020013

AMA Style

González Chapela J. A Binary Choice Model with Sample Selection and Covariate-Related Misclassification. Econometrics. 2022; 10(2):13. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics10020013

Chicago/Turabian Style

González Chapela, Jorge. 2022. "A Binary Choice Model with Sample Selection and Covariate-Related Misclassification" Econometrics 10, no. 2: 13. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics10020013

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop