Next Article in Journal
Green Bonds for the Transition to a Low-Carbon Economy
Previous Article in Journal
Robust Estimation and Forecasting of Climate Change Using Score-Driven Ice-Age Models
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Communication

Identification in Parametric Models: The Minimum Hellinger Distance Criterion

School of Economics, University of Bristol, Bristol BS8 1TU, UK
Submission received: 5 October 2021 / Revised: 3 February 2022 / Accepted: 15 February 2022 / Published: 21 February 2022

Abstract

:
This note studies the criterion for identifiability in parametric models based on the minimization of the Hellinger distance and exhibits its relationship to the identifiability criterion based on the Fisher matrix. It shows that the Hellinger distance criterion serves to establish identifiability of parameters of interest, or lack of it, in situations where the criterion based on the Fisher matrix does not apply, like in models where the support of the observed variables depends on the parameter of interest or in models with irregular points of the Fisher matrix. Several examples illustrating this result are provided.

1. Introduction

There are values of unknown parameters of interest in data analysis that cannot be determined even in the most favorable situation where the maximum amount of data is available, i.e., when the distribution of the population is known. This difficulty has been tackled by either introducing criteria securing that the parameter of interest is (local) identifiable or by delineating the set of observationally equivalent values of the parameter of interest; for a review of these approaches, see, e.g., (Paulino and Pereira 1998) or (Lewbel 2019). This note contributes to these efforts by studying the criterion for identifiability based on the minimization of the Hellinger distance, which was introduced by Beran (1977), and exhibiting its relationship to the criterion for local identifiability based on the non-singularity the Fisher matrix, which was introduced by Rothenberg (1971). The similarities and differences between these two criteria for identifiability have so far not been studied.
The main result in this note is to show that the Hellinger distance criterion can be used to verify the (local) identifiability of a parameter of interest, or lack of it, either in models or points in the parameter space where the Fisher matrix criterion does not apply. This note illustrates this result with several examples, including a parametric procurement auction model, the uniform, normal squared, and Laplace location models. These models are either irregular because the support of the observed variables depends on the parameter of interest or the parameter space has irregular points of the Fisher matrix. Additional examples of irregular models and models with irregular points of the Fisher matrix are referenced below after defining the concepts of a regular point of the Fisher matrix and a regular model according to conventional usage, see, e.g., Rothenberg (1971).
Let Y be a vector-valued random variable in Y R L with probability function P o . Let the available data be a sample { Y i } i = 1 N of independent and identically distributed replications of Y. Consider a family F of probability density functions f : Y [ 0 , ) defined with respect to a common dominating measure μ , which will allow us to dispense with the need to distinguish between continuous and discrete random variables.1 Let F Θ denote a subset of densities in F indexed by θ Θ , where the parameter space Θ is a subset of R K , with K a positive integer. Let f θ denote an element of F Θ .
Definition 1
(Identifiability). A parameter point θ o in Θ is said to be identifiable if there is no other θ in Θ such that f θ ( y ) = f θ o ( y ) for μ-a.s y.
Definition 2
(Local Identifiability). A parameter point θ o in Θ is said to be locally identifiable if there exists an open neighborhood Θ Θ of θ o containing no other θ such that Yes, it has been f θ ( y ) = f θ o ( y ) for μ-a.s y.
Definition 3
(Regular Points). The Fisher matrix I ( θ ) is the variance-covariance of the score ( θ ) : = ln f θ ,
I ( θ ) : = E [ ( θ ) ( θ ) ] E [ ( θ ) ] E [ ( θ ) ] .
The point θ o Θ is said to be a regular point of the Fisher matrix if there exists an open neighborhood of θ o in which I ( θ ) has constant rank.
The (local) identifiability of regular points of the Fisher matrix in parametric models has been extensively studied, see, e.g., Rothenberg (1971). In contrast, the identifiability of irregular points has been less studied and the literature is rather unclear about what may happen about (local) identifiability of irregular points of the Fisher matrix. The study of irregular points is worthy of consideration because, first, there are several models of interest with this type of point in the parameter space (see the list below), and second, because irregular points may either correspond to:
  • points in the parameter space that are not locally identifiable and for which a consistent estimator cannot not exist, e.g., the measurement error model studied by Reiersol (1950), or a consistent estimator can only exist after a normalization; or
  • points in the parameter space that are locally identifiable and for which a N -consistent estimator cannot exist (and some algorithms, e.g., Newton–Raphson method based on the Fisher matrix, will face difficulties in converging) or a N -consistent estimator can only exist after a reparametrization of the model, see, e.g., the bivariate probit model in Han and McCloskey (2019).
Hinkley (1973) noted that an irregular point of the Fisher matrix arises in the normal unsigned location model when the location parameter is zero. Sargan (1983) constructed simultaneous equation models with irregular points of the Fisher matrix. Lee and Chesher (1986) showed that the normal regression model with non-ignorable non-response has irregular points of the Fisher matrix in the vicinity of ignorable non-response. Li et al. (2009) noted that finite-mixture density models have irregular points of the Fisher matrix in the vicinity of homogeneity. Hallin and Ley (2012) showed that skew-symmetric density models have irregular points of the Fisher matrix in the vicinity of symmetry. We use below the normal squared location model (see Example 3) to illustrate in a transparent way the notion of an irregular point of the Fisher matrix.
The next Section shows that the criterion for local identifiability based on minimizing the Hellinger distance, unlike the criterion based on the non-singularity of the Fisher matrix, does apply to both regular and irregular points of the Fisher matrix and to regular and irregular models, to be defined below in Section 3. Section 3 shows that, for regular points of the Fisher matrix in the class of regular models studied by Rothenberg (1971), the criterion based on the Fisher matrix is a particular case of the criterion based on minimizing the Hellinger distance (but not for irregular models or irregular points of the Fisher matrix). Section 4 relates the minimum Hellinger distance criterion with the criterion based on the reversed Kullback–Liebler criterion, introduced by Bowden (1973), by showing that both are particular cases of the criterion for identifiability based on the minimization of a φ -divergence.

2. The Minimum Hellinger Distance Criterion

Identifying θ o is the problem of distinguishing f θ o from the other members of F Θ . It is then convenient to begin by introducing a notion of how densities differ from each other. The squared Hellinger distance for the pair of densities f θ , f θ 0 in F Θ is the square of the L 2 ( μ ) -norm of the difference between the squared-root of the densities:
ρ ( θ ) : = 1 2 f θ 1 / 2 f θ o 1 / 2 L 2 ( μ ) 2 = 1 2 f θ 1 / 2 f θ o 1 / 2 2 d μ = 1 f θ 1 / 2 f θ o 1 / 2 d μ .
The squared Hellinger distance has the following well-known properties (see, e.g., Pardo 2005, p. 51), which are going to be used later.
Lemma 1.
ρ can take values from 0 to 1, which are independent of the choice of the dominating measure μ, and ρ ( θ ) = 0 if and only if f θ ( y ) = f θ o ( y ) for μ-a.s y.
(All the proofs are in Appendix A) Alternative notions of divergence between densities, other than the squared Hellinger distance, are studied in the Section 4. Since ρ ( θ ) is equal to zero if and only if f θ and f θ o are equal, one has the following characterization of identifiability.
Lemma 2.
The parameter θ o Θ is identifiable in the model F Θ if and only if, for all θ Θ such that ρ ( θ ) = 0 , θ = θ o .
Moreover, since θ ρ ( θ ) is non-negative and reaches a minimum at θ = θ o , one obtains the following criterion for identifiability based on minimizing the squared Hellinger distance.
Proposition 1.
The parameter θ o Θ is identifiable in the model F Θ if and only if
θ o = arg min θ Θ ρ ( θ ) .
This criterion applies to models where:
  • the support of Y depends on the parameter of interest (see Examples 1 and 2 below);
  • θ o is not a regular point of the Fisher matrix (see Example 3 below);
  • some elements of the Fisher matrix I ( θ o ) are not defined (see Example 5 below);
  • θ I ( θ ) is not continuous (see Example 6 below);
  • Θ is infinite-dimensional, as in semiparametric models (which are out of the scope of this note).2
The following examples illustrate the use of Proposition 1 and the definitions introduced so far. They are also going to illustrate, in the next section, the regularity conditions employed by Rothenberg (1971) to obtain a criterion for local identifiability based on the Fisher matrix. In these examples, μ denotes the Lebesgue measure. The Supplementary Materials presents step-by-step calculations of the squared Hellinger distance in Examples 1–5.
Example 1
(Uniform Location Model). Set Y = ( 0 , ) and Θ = ( 0 , ) . Consider the uniform location model
f θ ( y ) = θ 1 1 ( 0 y θ ) .
The Hellinger distance is
ρ ( θ ) = 1 θ + θ o | θ θ o | 2 θ θ o .
Since the unique solution to 1 2 θ θ o θ + θ o | θ θ o | = 1 is θ = θ o , see Figure 1a, one has arg min θ Θ ρ ( θ ) = θ o .
The Fisher matrix is I ( θ o ) = 0 , which is a singular matrix.
Example 2
(First-Price Auction Model). Consider the first-price procurement auction model with m bidders introduced in (Paarsch 1992, Section 4.2.2). For bidders with independent private valuations, following an exponential distribution with parameter θ (Paarsch 1992, Display 4.18) shows that the density of the wining bid Y i in the i-th auction is
f θ ( y ) = m θ exp m m 1 m θ y 1 y θ m 1 .
Set Y = R + and Θ = ( 0 , ) . The Hellinger distance in this case is
ρ ( θ ) = 1 2 θ θ o ( θ + θ o ) exp 1 ( θ + θ o ) 2 θ θ o max ( θ , θ o ) m m 1 .
For θ < θ o (resp. θ > θ o ), one has ρ ( θ ) < 0 ( ρ ( θ ) > 0 ), see Figure 1b. Hence, by continuity of θ ρ ( θ ) , arg min θ Θ ρ ( θ ) = θ o . The Fisher matrix is I ( θ o ) = 0 , which is a singular matrix.
Example 3
(Normal Squared Location Model). Set Y = R and Θ = R . Consider the normal squared location model
f θ ( y ) = ( 2 π ) 1 exp ( y θ 2 ) 2 / 2 .
This model would arise, for example, if Y is the difference between a matched pair of random variables whose control and treatment labels are not observed. The Hellinger distance is
ρ ( θ ) = 1 exp ( ( θ 2 θ o 2 ) 2 / 8 ) .
The parameter point θ o = 0 is identifiable because θ 1 exp ( θ 4 / 8 ) is a strictly convex function, see Figure 2a. The parameter points θ o 0 are not identifiable because ρ ( θ o ) = ρ ( θ o ) = 0 , see Figure 2b. The Fisher matrix is I ( θ o ) = 4 θ o 2 , which implies that I ( 0 ) = 0 is a singular matrix and θ o = 0 is an irregular point of the Fisher matrix.
Example 4
(Demand-and-Supply Model). Let Y = ( P , Q ) denote the observed price and quantity of a good transacted in a market at a given period of time. Linear approximations to the demand and supply functions are
D = α + β P + V (Demand) S = γ + δ P + U (Supply) Q = D = S , (Equilibrium)
where α , β , γ , δ are unknown parameters and ( U , V ) is an unobserved random vector. Assume that U and V are independent and jointly normal distributed with mean zero and unknown variance σ 11 and σ 22 , respectively. Set θ = ( α , β , γ , δ , σ 11 , σ 22 ) . The density of the observed variables is then
f θ ( y ) = exp 1 2 ( y μ ) Ω 1 ( y μ ) ( 2 π ) 2 det ( Ω ) ,
where det ( · ) is the determinant of the matrix in the parenthesis and
μ = α γ δ β δ α β γ δ β   a n d   Ω = σ 11 + σ 22 ( α γ ) 2 δ σ 11 + β σ 22 ( α γ ) 2 δ σ 11 + β σ 22 ( α γ ) 2 δ 2 σ 11 + β 2 σ 22 ( α γ ) 2 .
The squared Hellinger distance is
ρ ( θ ) = 1 det ( Ω ) 1 / 4 det ( Ω o ) 1 / 4 det Ω + Ω o 2 1 / 2 exp 1 8 ( μ μ o ) Ω + Ω o 2 1 ( μ μ o ) .
To show that θ o is not identifiable, by Proposition 1, it suffices to verify that arg min θ ρ ( θ ) is not a singleton. We elaborate on this point in the Supplementary Material.
Example 5
(Laplace Location Model). Set Y = R , Θ = R . Consider the Laplace location model
f θ ( y ) = 1 2 exp ( | y θ | ) .
The squared Hellinger distance is
ρ ( θ ) = 1 exp | θ θ o | 2 δ 2 1 ( θ θ o > 0 ) exp θ θ o 2 1 ( θ θ o < 0 ) exp θ θ o 2 .
For any θ θ o < 0 ( θ θ o < 0 ), one has ρ ( θ ) > 0 ( ρ ( θ ) > 0 ). By continuity, θ ρ ( θ ) has then a unique minimizer at θ = θ o , and, by Proposition 1, θ o is identifiable. The Fisher matrix is I ( θ ) = 1 , which is a non-singular matrix.
Example 6
(Exponential Mixture). Set Y = [ 0 , ] , θ = ( θ 1 , θ 2 , θ 3 ) and Θ = [ 0 , 1 ] × [ 0 , ) × [ 0 , ) . Consider the finite mixture of exponential model
f θ ( y ) = ( 1 θ 1 ) exp ( ln θ 2 θ 2 y ) + θ 1 exp ( ln θ 3 θ 3 y ) .
Consider θ o = ( 1 / 2 , 1 , 2 ) and θ = ( 1 / 2 , 2 , 1 ) . Since f θ o 1 / 2 = f θ 1 / 2 , one has ρ ( θ ) = 0 and θ arg min θ Θ ρ ( θ ) . Since θ o θ , it follows from Proposition 1 that θ o is not identifiable.
The previous examples also illustrate the difference between identifiable and local identifiable points in the parameter space.
Example 7
(Normal Squared Location Model, Continued). In this example, any θ o Θ is locally identifiable—even the irregular point θ o = 0 to the Fisher matrix—and only θ o = 0 is identifiable, see Figure 2.
We also have the following criterion for local identifiability based on minimizing the squared Hellinger distance.
Proposition 2.
The parameter θ o Θ is locally identifiable in the model F Θ if and only if there exists an open set Θ θ o such that
θ o = arg min θ Θ ρ ( θ ) .
This criterion, unlike the criterion based on the Fisher matrix by Rothenberg (1971) and re-stated below as Lemma 3 for the sake of completeness, applies to the case when:
  • the support of Y depends on the parameter of interest;
  • θ o is not a regular point of the Fisher matrix;
  • some elements of the Fisher matrix I ( θ o ) are not defined;
  • θ I ( θ ) is not continuous;
  • Θ is infinite-dimensional.
Proposition 2 reduces local identifiability to a unique solution of a well-defined minimization problem. One general criterion, and, as argued, e.g., (Rockafellar and Wets 1998), virtually the only available one, to check in advance for the uniqueness of a minimizer of an optimization problem is the strict convexity of the objective function. The application of this general criterion to the characterization of local identifiability in Proposition 2 yields the following result:
Proposition 3.
If θ ρ ( θ ) is a locally strictly convex function around θ o (i.e., if there is an open convex set Θ θ o such that ρ : Θ [ 0 , 1 ] is a strictly convex function), then θ o is locally identifiable.
Proposition 3 leads to the observation that local identifiability can be seen to be related to the local convexity of the Hellinger distance. As with our earlier propositions, it holds when the support of Y depends on the parameter of interest, θ o is not a regular point of the Fisher matrix, some elements of the Fisher matrix I ( θ o ) are not defined or θ I ( θ ) is not continuous.

3. The Fisher Matrix Criterion

Rothenberg (1971) gives a criterion for local identifiability in terms of the non-singularity of the Fisher matrix. Additional insight about the relevance—and limitations—of the Fisher matrix criterion for local identifiability may then be gained by relating it to the criterion based on minimizing the Hellinger distance. To study this relationship, we now focus on the regular models studied by Rothenberg (1971).
Assumption 1
(R (Regular Models)). F Θ is such that:
(A1) Θ is an open set in R K .
(A2) f θ 0 and f θ d μ = 1 for all θ Θ .
(A3) s u p p ( f θ ) : = { y Y : f θ ( y ) > 0 } is the same for all θ Θ .
(A4) For all θ in a convex set containing Θ and for all y s u p p ( f θ ) , the functions θ f θ and θ ( θ ) : = ln f θ are continuously differentiable.
(A5) The elements of the matrix E [ ( θ ) ( θ ) ] are finite and are continuous functions of θ everywhere in Θ.
We now replicate the characterization of local identifiability by Rothenberg (1971) Theorem 1 based on the non-singularity of the Fisher matrix.
Lemma 3.
Let the regularity conditions in Assumption R hold. Let θ o be a regular point of the Fisher matrix I ( θ o ) . Then, θ o is locally identifiable if and only if I ( θ o ) is non-singular.
This characterization of local identifiability only applies to the regular models defined by Assumption R and to the regular points of the Fisher matrix, which may be a subset of the parameter space (see Example 3). These conditions do not have themselves any direct statistical or economic interpretation: their role is just to permit a characterization of local identifiability.3 We have already referenced in the introduction a list of models with irregular points of the Fisher matrix, for which the characterization in Lemma 3 does not apply. We now use Examples 1–5 to illustrate the notions of regular and irregular models and their implications for the analysis of identifiability. The richness of the possibilities that follow is a recall of the care needed in using the Fisher matrix criterion for showing local identifiability (or lack of it). It also highlights the convenience of the identifiability criterion based on minimizing the Hellinger distance as a unifying approach to study the identifiability of regular or irregular points of the Fisher matrix in either regular or irregular models. Specifically:
  • The uniform location model in Example 1 and the first-price auction model in Example 2 have, respectively, s u p p ( f θ ) = [ 0 , 1 / θ ] and s u p p ( f θ ) = [ θ / [ m 1 ] , ) , which means that these models violate the regularity condition ( A 3 ) . We have seen that θ o is identifiable in Examples 1 and 2, which implies that ( A 3 ) is not necessary for identifiability. These models also have a singular Fisher matrix, which implies that, in irregular models violating ( A 3 ) , the non-singularity of the Fisher matrix is not a necessary condition for (local) identifiability.
  • One can verify that the normal squared location model in Example 3 and the normal supply-and-demand model in Example 4 both satisfy the regularity conditions in Assumption R. We have seen that in Example 3 the parameter of interest is locally identifiable while in Example 4 it is not, which means that the regularity conditions in Assumption R are not sufficient or necessary for (local) identifiability, they are just convenient. In Example 3, moreover, θ o = 0 is not a regular point of the Fisher matrix and is locally identifiable, which implies that, for irregular points of the Fisher matrix, the non-singularity of the Fisher matrix is not a necessary condition for (local) identifiability.
  • In Example 5, the function θ ln ( 1 / 2 ) | y θ | is not differentiable when y = θ , which means that the Laplace location model is an irregular model because it violates ( A 4 ) .
  • To illustrate a failure of ( A 1 ) and ( A 5 ) , consider the finite mixture of exponential model in Example 6 with θ 1 = 0 , θ 2 = 1 and θ 3 = 0.5 . In this case, E [ ( 0 ) ( 0 ) ] = ( 1 2 ) 2 / [ 2 ( 2 2 ) ] , which is not finite.
We also have the following result linking the Hellinger distance to the Fisher matrix, which we are going to use to show that, in regular models with irregular points to the Fisher matrix, the non-singularity of the Fisher matrix is only a sufficient condition for local identifiability.
Lemma 4.
Let the regularity conditions in Assumption R hold and assume that θ ( θ ) : = f θ 1 / 2 is continuously differentiable μ-a.e. Then, the Hellinger distance and the Fisher matrix are related by
2 ρ ( θ o ) = c I ( θ o ) , w h e r e   c = 1 / 4 .
Though this result is known, see, e.g., (Borovkov 1998), its implications for local identifiability have so far not been drawn.
Since the Fisher matrix is a variance-covariance matrix, one has that I ( θ ) is, under ( A 5 ) , a real symmetric semi-definite positive matrix for every θ Θ , and then the following result follows from Lemma 4 and the characterization of a convex function in terms of its Hessian, see, e.g., (Rockafellar and Wets 1998, Theorem 2.14).
Proposition 4.
Let the regularity conditions in Assumption R hold and assume that θ f θ 1 / 2 is continuously differentiable μ-a.e. Then, θ ρ ( θ ) is a locally convex function around θ o . Furthermore, if I ( θ o ) is non-singular, then θ ρ ( θ ) is a locally strict convex function around θ o and θ o is locally identifiable.
Two remarks are in order. First, notice that, unlike Lemma 3, the result in Proposition 4 also applies when θ o is not a regular point of the Fisher matrix and the non-singularity of the Fisher matrix becomes only sufficient for local identifiability. Second, if I ( θ o ) is singular, the function ρ : Θ [ 0 , 1 ] is still locally convex (because I ( θ o ) is positive semi-definite) and arg min θ Θ ρ ( θ ) is a convex, but not necessarily bounded, set, which is a result that can be used to delineate the set of observational equivalent values of θ o . This note does not pursue this interesting direction.
Table 1 summarizes the information in this note about the necessity and sufficiency of the non-singularity of the Fisher matrix for local identifiability.
We conclude this section by mentioning that, in response to the misbehavior of the Fisher matrix when informing about the difficulty to estimate parameters of interest in parametric models, alternative notions of information, other than the Fisher matrix, have been proposed in the literature (see, e.g., Donoho and Liu 1987). Without further elaboration, these alternative notions of information are not directly applicable to construct new criteria of identifiability. In particular, the geometric information based on the modulus of continuity of θ o arg min θ ρ ( θ ) with respect to the Hellinger distance, introduced by Donoho and Liu (1987) to geometrize convergence rates, cannot be used to construct a criterion for local identifiability because this modulus of continuity, in its current format, is not defined for parameters that are not locally identifiable.4

4. The Kullback–Liebler Divergence and Other Divergences

Some of the examples where we have had success in using the Hellinger distance to analyze identifiability share the same structure: the Hellinger distance is a locally convex function, see Figure 2, and so the results from convex optimization become available. If the Hellinger distance proves to be difficult to analyze, one can set out a criterion for identifiability based on another divergence function, such as the reversed Kullback–Liebler divergence (see, e.g., Bowden 1973)
κ ( θ ) = H ( θ ) , where H ( θ ) : = ln f θ f θ o f θ o d μ .
One can unify the identification criteria based on the Hellinger distance and the reversed Kullback–Liebler divergence by using the family of φ -divergences defined as
δ φ ( θ ) = φ f θ f θ o d μ ,
where f θ / f θ o is the likelihood ratio and φ : R [ 0 , + ] is a proper closed convex function with φ ( 1 ) = 0 and such that x φ ( x ) is strictly convex on a neighborhood of x = 1 . The squared Hellinger distance corresponds to the member of this family with φ ( x ) = 1 2 ( 1 x ) 2 , whereas the reversed Kullback–Liebler divergence corresponds to φ ( x ) = ln x + x 1 . The following result is an immediate consequence of the property that δ φ is non-negative and it is equal to zero if and only if f θ = f θ o (see, e.g., Pardo 2005, Proposition 1.1)).
Proposition 5.
The parameter θ o Θ is locally identifiable in the model F Θ if and only if there exists an open set Θ θ o such that
θ o = arg min θ Θ δ φ ( θ ) .
This result, which is a generalization of Proposition 2, shows that the choice of a φ -divergence for analyzing the identifiability of a parameter of interest only hinges on the difficulty to characterize the set arg min θ δ φ ( θ ) for a given φ -divergence. The choice of the Hellinger distance over the reversed Kullback–Liebler divergence is, however, not inconsequential when choosing φ -divergence to construct an estimator for the parameter of interest. The use of the Hellinger distance may lead to an estimator that is more robust than the maximum likelihood estimator and equally efficient, see, e.g., Beran (1977) and Jimenez and Shao (2002).5
We conclude this section with the following result showing that, for the regular models analyzed by Rothenberg (1971), the Hellinger distance and the reversed Kullback–Liebler divergence are both locally convex around a minimizer.
Lemma 5.
Let the regularity conditions in Assumption R hold and assume that θ f θ 1 / 2 is continuously differentiable μ-a.e. Let us assume, furthermore, that, in a neighborhood of θ o , f θ and ln f θ are twice differentiable in θ, with derivatives continuous in y s u p p ( f θ ) . Then, the Hellinger distance and the Kullback–Liebler divergence are related by
2 ρ ( θ o ) = c 2 κ ( θ o ) f o r c = 1 / 4 .

Supplementary Materials

The following are available at https://0-www-mdpi-com.brum.beds.ac.uk/article/10.3390/econometrics10010010/s1, Auxiliary calculations in Examples 1–5.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

I would like to thank Sami Stouli and Vincent Han for offering constructive suggestions on previous versions of this paper. All errors are mine.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Proof

Proof of Lemma 1.
Write
ρ ( θ ) = 1 2 f θ 1 / 2 f θ o 1 / 2 L 2 ( μ ) = 1 2 ( f θ 1 / 2 f θ o 1 / 2 ) 2 d = 1 f θ 1 / 2 f θ o 1 / 2 d μ .
Hence, ρ ( θ ) = 0 if and only if f θ = f θ o and ρ ( θ ) = 1 if and only if f θ f θ o = 0 . To show that ρ ( θ ) does not depend on the choice of the dominating measure μ , let g θ and g θ o denote the densities of P θ and P θ o relative to another dominating measure ν . Let h and k denote the densities of μ , ν relative to μ + ν . The density of P θ relative to μ + ν is f θ h and also g θ k . Thus, f θ h = g θ k and also f θ o h = g θ o k . Hence, ( f θ f θ o ) 1 / 2 h = ( g θ g θ o ) 1 / 2 k and
( g θ g θ o ) 1 / 2 d ν = ( g θ g θ o ) 1 / 2 k d ( ν + μ ) = ( g θ g θ o ) 1 / 2 h d ( ν + μ ) = ( f θ f θ o ) 1 / 2 d μ
which completes the proof. □
Proof of Lemma 2.
In the text. □
Proof of Lemma 3.
We replicate the proof by Rothenberg (1971) Theorem 1. By the mean value theorem, there is θ between θ and θ o such that
θ θ o = θ ( θ ) ( θ θ o ) .
Assume that θ o is not locally identifiable. Then, there is a sequence { θ j } j converging to θ o such that θ j = θ o . This implies θ ( θ ) q j = 0 , where q j = ( θ j θ o ) / θ j θ o . The sequence { q j } j belongs to the unit sphere and therefore is convergent to a limit q o . As θ j approaches θ o , q j approaches q o and in the limit q o θ ( θ o ) . However, this implies that
q I θ o q = q E [ θ ( θ o ) θ ( θ o ) ] q = 0 ,
and, hence, I θ o must be singular.
To show the converse, suppose that I θ has constant rank r < K in a neighborhood of θ o . Consider then the eigenvector v θ associated to one of the zero eigenvalues of I θ . Since 0 = v θ I θ v θ , we have for all θ near θ o
v θ θ = 0 .
Since I θ is continuous and has constant rank, the function θ v θ is continuous in a neighborhood of θ o . Consider now the curve γ : [ 0 , t ] R K defined by the function θ ( t ) , which solves the differential equation θ ( t ) t = v θ   with   θ ( 0 ) = θ o   for   0 t t . The log density function is differentiable in t with
θ ( t ) t = v θ ( t ) θ ( θ ( t ) ) .
However, by the preceding display this is zero for all 0 t t . Thus θ θ is constant on the curve γ and θ o is not locally identifiable. □
Proof of Lemma 4.
Assume first that θ is a scalar, i.e., K = 1 . Re-write
ρ ( θ ) : = 1 2 f θ 1 / 2 f θ o 1 / 2 L 2 ( μ ) 2 = 1 2 f θ 1 / 2 f θ o 1 / 2 2 d μ = 1 f θ 1 / 2 f θ o 1 / 2 d μ .
Differentiating θ ρ ( θ ) , one has that
ρ ( θ ) = 1 2 f θ o 1 / 2 f θ 1 / 2 f θ ( θ ) d μ = 1 2 ( f θ 1 / 2 f θ o 1 / 2 ) f θ ( θ ) f θ 1 / 2 d μ ,
where Assumptions (A3) and (A4) allow us to pass the derivative under the integral sign. Since θ ρ ( θ ) reaches a minimum at θ o , one has ρ ( θ o ) = 0 and so
ρ ( θ ) ρ ( θ o ) ( θ θ o ) = 1 2 ( f θ 1 / 2 f θ o 1 / 2 ) f θ ( θ ) ( θ θ o ) f θ 1 / 2 d μ ,
which, by the Lebesgue dominated convergence theorem, satisfies
2 ρ ( θ o ) : = lim θ θ o ρ ( θ ) ρ ( θ o ) θ θ o = 1 4 I ( θ o ) ,
because the integrand converges point-wise
( f θ 1 / 2 f θ o 1 / 2 ) f θ ( θ ) ( θ θ o ) f θ 1 / 2 f θ ( θ o ) f θ ( θ o ) 2 f θ o = f θ o ln f θ ( θ o ) ln f θ ( θ o ) f θ o 2 f θ o = 1 2 ln f θ ( θ o ) ln f θ ( θ o ) f θ o ,
and it is dominated by a sum of, under (A5), integrable functions
| ( f θ 1 / 2 f θ o 1 / 2 ) f θ ( θ ) ( θ θ o ) f θ 1 / 2 | f θ 1 / 2 f θ o 1 / 2 2 ( θ θ o ) 2 + f θ ( θ ) f θ ( θ ) f θ .
To extend this proof to the case when θ is a vector, one applies the argument above element-wise to the components of 2 ρ ( θ o ) . □
Proof of Lemma 5.
If 2 H ( θ ) = I ( θ o ) , the claim then follows from Proposition 1 after replacing c I ( θ o ) = 2 ρ ( θ o ) . To verify that 2 H ( θ o ) = I ( θ o ) , we follow (Bowden 1973, Section 2). Recall that θ = ln f θ = f θ f θ and, since we have assumed that f θ d μ = 1 for any θ Θ , one has that
f θ ( θ ) d μ = 0 K × K and 2 f θ ( θ ) d μ = 0 K × K for   any θ Θ .
Differentiating θ H ( θ ) , one obtains
H ( θ ) = f θ o f θ f θ ( θ ) f θ o f θ o d μ ,
and differentiating again
2 H ( θ ) = θ f θ ( θ ) f θ f θ o d μ = 2 f θ ( θ ) f θ ln f θ ( θ ) ln f θ ( θ ) f θ o d μ .
Evaluating at θ = θ o , and using 2 f θ ( θ o ) d μ = 0 , one obtains 2 H ( θ o ) = I ( θ o ) . □
Proof of Proposition 1.
The sufficiency has already been established by Beran (1977), Theorem 1(iii) and it is an immediate consequence of the definition of identifiability. The necessity is in the text and it follows immediately from Lemmas 1 and 2. □
Proof of Proposition 2.
It is immediate from Proposition 1 and the definition of local identifiability (Definition 2). □
Proof of Proposition 3.
This result follows from the uniqueness of a solution for strictly convex problems (see, e.g., Rockafellar and Wets 1998, Theorem 2.6) after noticing, from Lemma 1, that θ ρ ( θ ) is bounded, and hence a proper function. □
Proof of Proposition 4.
The proof for the claim that θ ρ ( θ ) is a locally convex function around θ o is in the text. It only remains to show that, if the Fisher matrix is non-singular, then θ o is locally identifiable. When the Fisher matrix is non-singular, by Lemma 4 and the characterization of convex functions in (Rockafellar and Wets 1998, Theorem 2.14), the Hellinger distance is a strictly locally convex function. The claim then follows Proposition 3. □
Proof of Proposition 5.
In the text. □

Appendix B. Variational Representation and Estimation

It is well-known, see, e.g., Beran (1977), that the estimator based on minimizing the Hellinger distance between the density postulated by the model for the observed variables and a kernel nonparametric estimator of the density of these variables can be more robust (to ρ -perturbations of density of the observed variables) than the maximum likelihood estimator and still asymptotically efficient in regular models. This minimum Hellinger distance-to-kernel estimator requires smoothing, which becomes an inconvenient requirement in models with observable variables with mixed support, such as the normal regression model with non-response in the dependent variable, or with support depending on the unknown parameter, such as the parametric auction model in Example 2, or in models with high-dimensional observable variables, due to the curse of dimensionality. This Appendix derives the variational representation of the Hellinger distance. This variational representation serves to construct the minimum dual Hellinger distance estimator, which unlike the minimum Hellinger distance-to-kernel estimator, does not require the use of a smooth estimator of the density of the observable variables.
Recall first that the squared Hellinger distance is
ρ ( θ ) = 1 2 f θ 1 / 2 f θ o 1 / 2 2 d μ .
We are going to verify that
ρ ( θ ) = 1 2 sup θ ˜ Θ f θ f θ 1 / 2 f θ ˜ 1 / 2 d μ f θ f θ ˜ d P θ o 1
The expression in the last display, unlike (A1), admits, under a bracketing number condition on the family of likelihood ratios { y f θ ( y ) / f θ ˜ ( y ) , θ , θ ˜ Θ } , a consistent sample analog estimator not depending on smoothing parameters. The minimum dual Hellinger distance estimator of θ o is the set of minimizers of the sample analog of (A2):
θ ^ = arg min θ Θ generator sup θ ˜ Θ discriminator f θ f θ 1 / 2 f θ ˜ 1 / 2 d μ N 1 i = 1 N f θ ( Y i ) f θ ˜ ( Y i ) .
One could use a simulator to approximate f θ or f θ ˜ if these densities have an untractable form.6
To verify (A2), define the functions
φ ( x ) : = 1 2 ( 1 x 1 / 2 ) 2 and φ ¯ ( x ) : = sup x ¯ R x ¯ x φ ( x )
and write the squared Hellinger distance in (A1) as
ρ ( θ ) = φ f θ f θ o d μ .
The function φ ¯ is the convex conjugate of φ ( x ) . We first show that
ρ ( θ ) = sup θ ˜ Θ φ ˙ f θ f θ ˜ d μ φ ¯ φ ˙ f θ f θ ˜ f θ o d μ
where G θ = { y φ ˙ f θ ( y ) / f θ ˜ ( y ) , θ ˜ Θ } and φ ˙ ( x ) = 1 2 ( 1 x 1 / 2 ) is the derivative of x φ ( x ) = 1 2 ( 1 x 1 / 2 ) 2 . For all x ( 0 , + ) , one has that φ ¨ ( x ) = 1 / 2 x 3 / 2 > 0 and then x φ ( x ) = 1 2 ( 1 x 1 / 2 ) 2 is strictly convex on ( 0 , + ) . By strict convexity, for any x , x ˜ ( 0 , + ) , it holds that
φ ( x ) φ ( x ) + φ ˙ ( x ˜ ) ( x x ˜ )
with equality if and only if x = x ˜ . Fix two values θ , θ ˜ in the parameter space and set
x = d P θ d P θ o   and   x ˜ = d P θ d P θ ˜
Inserting these values in the last inequality and integrating with respect to P θ o yields
ρ ( θ ) φ ˙ d P θ d P θ ˜ d P θ φ ˜ d P θ d P θ ˜ d P θ o ,
where φ ˜ ( x ) : = φ ˙ ( x ) x ϕ ( x ) , which in turn implies
ρ ( θ ) φ ˙ d P θ d P θ ˜ d P θ φ ˜ d P θ d P θ ˜ d P θ o .
When θ ˜ = θ o , this inequality turns to equality, which yields (A4) after noticing that φ ˙ ( x ) x φ ( x ) = φ ¯ ( φ ˙ ( x ) ) .
To conclude the verification, since x φ ( x ) = 1 2 ( 1 x 1 / 2 ) 2 is differentiable for all x ( 0 , + ) , one has
φ ¯ ( φ ˙ ( x ) ) = φ ˙ ( x ) x φ ( x ) = 1 2 ( 1 x 1 / 2 ) x 1 2 ( 1 x 1 / 2 ) 2 = 1 2 [ ( x x 1 / 2 ) ( 1 x 1 / 2 ) 2 ] = 1 2 ( x 1 / 2 1 )
By replacing (A5) back in (A4), one obtains (A2).
[custom]

Notes

1
We use → in ‘ f : Y [ 0 , ) ’ to declare the domain ( Y ) and codomain ( [ 0 , ) ) of the function f and we use the arrow notation ‘↦’ to define the rule of a function inline. We use ‘ : = ’ to indicate that an expression is ‘defined to be equal to’. This notation is in line with conventional usage.
2
See, e.g., Escanciano (2021) for a systematic approach to identification in semiparametric models.
3
As a referee has pointed out, necessary and sufficient conditions for (local) identification require different assumptions. Some of the conditions in R are not necessary if we only seek sufficient conditions: differentiability of the score function and non-singularity of the Fisher matrix would suffice.
4
A related modulus of continuity has been introduced by Escanciano (2021, Online Supplementary Materials, Lemma 1.3) to provide sufficient conditions for (local) identification in semiparametric models. The analysis of these models is out of the scope of this paper.
5
Appendix B elaborates more on this point by using the variational representation of the Hellinger distance to construct a minimum distance estimator which does not require a non-parametric estimator of the density of the data.
6
One could also replace the space Θ in the discriminator model by a family of compositional functions—as in neural network models—to gain, if needed, flexibility when fitting f θ ˜ by introducing, again, tuning parameters.

References

  1. Beran, R. 1977. Minimum Hellinger Distance Estimates for Parametric Models. The Annals of Statistics 5: 445–63. [Google Scholar] [CrossRef]
  2. Borovkov, A. 1998. Mathematical Statistics. Amsterdam: Gordon and Breach. [Google Scholar]
  3. Bowden, R. 1973. The Theory of Parametric Identification. Econometrica 41: 1069. [Google Scholar] [CrossRef]
  4. Donoho, D., and R. Liu. 1987. Geometrizing Rates of Convergence, I. Technical Report 137. Berkeley: University of California. [Google Scholar]
  5. Escanciano, J. C. 2021. Semiparametric Identification and Fisher Information. Econometric Theory, 1–38. [Google Scholar]
  6. Hallin, M., and C. Ley. 2012. Skew-Symmetric Distributions and Fisher Information—A Tale of Two Densities. Bernoulli 18: 747–63. [Google Scholar] [CrossRef]
  7. Han, S., and A. McCloskey. 2019. Estimation and Inference with a (Nearly) Singular Jacobian. Quantitative Economics 10: 1019–68. [Google Scholar] [CrossRef]
  8. Hinkley, D. 1973. Two-Sample Tests with Unordered Pairs. Journal of the Royal Statistical Association (Series B: Methodological) 36: 2466–80. [Google Scholar] [CrossRef]
  9. Jimenez, R., and Y. Shao. 2002. On Robustness and Efficiency of Minimum Divergence Estimators. Test 10: 241–48. [Google Scholar] [CrossRef]
  10. Lee, L., and A. Chesher. 1986. Specification Testing when Score Test Statistics are Identically Zero. Journal of Econometrics 31: 121–49. [Google Scholar] [CrossRef]
  11. Lewbel, A. 2019. The Identification Zoo: Meanings of Identification in Econometrics. Journal of Economic Literature 57: 835–903. [Google Scholar] [CrossRef]
  12. Li, P., J. Chen, and P. Marriot. 2009. Non-Finite Fisher Information and Homogeneity: And EM Approach. Biometrika 96: 411–26. [Google Scholar] [CrossRef]
  13. Paarsch, H. 1992. Deciding between the Common and Private Value Paradigms in Empirical Models of Auctions. Journal of Econometrics 51: 191–215. [Google Scholar] [CrossRef]
  14. Pardo, L. 2005. Statistical Inference Based on Divergence Measures. New York: Chapman & Hall/CRC Press. [Google Scholar]
  15. Paulino, C., and C. Pereira. 1994. On Identifiability of Parametric Statistical Models. Journal Italian Statistical Society 3: 125–51. [Google Scholar] [CrossRef]
  16. Reiersol, O. 1950. The Identifiability of a Linear Reation between Variables which are Subject to Error. Econometrica 18: 375–89. [Google Scholar] [CrossRef]
  17. Rockafellar, T., and R. Wets. 1998. Variational Analysis. Berlin: Springer. [Google Scholar]
  18. Rothenberg, R. 1971. Identification in Parametric Models. Econometrica 39: 577–91. [Google Scholar] [CrossRef]
  19. Sargan, J. 1983. Identification and Lack of Identification. Econometrica 51: 1605–33. [Google Scholar] [CrossRef]
Figure 1. The Hellinger distance in Examples 1 and 2. (a) Example 1 ( θ o = 4 ); (b) Example 2 ( θ o = 4 , m = 5 ).
Figure 1. The Hellinger distance in Examples 1 and 2. (a) Example 1 ( θ o = 4 ); (b) Example 2 ( θ o = 4 , m = 5 ).
Econometrics 10 00010 g001
Figure 2. The Hellinger distance in Example 3. (a) Example 3 ( θ o = 0 ); (b) Example 3 ( θ o = 1 ).
Figure 2. The Hellinger distance in Example 3. (a) Example 3 ( θ o = 0 ); (b) Example 3 ( θ o = 1 ).
Econometrics 10 00010 g002
Table 1. For local identifiability, the non-singularity of the Fisher Matrix is ....
Table 1. For local identifiability, the non-singularity of the Fisher Matrix is ....
Regular PointsIrregular Points
Regular Modelsnecessary and sufficient
(Lemma 3)
only sufficient
(Proposition 4 and Example 3)
Irregular Modelsnot necessary
(Examples 1, 2, and 5)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Pacini, D. Identification in Parametric Models: The Minimum Hellinger Distance Criterion. Econometrics 2022, 10, 10. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics10010010

AMA Style

Pacini D. Identification in Parametric Models: The Minimum Hellinger Distance Criterion. Econometrics. 2022; 10(1):10. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics10010010

Chicago/Turabian Style

Pacini, David. 2022. "Identification in Parametric Models: The Minimum Hellinger Distance Criterion" Econometrics 10, no. 1: 10. https://0-doi-org.brum.beds.ac.uk/10.3390/econometrics10010010

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop