Next Article in Journal
Fractal Dimension Analysis of Earth Magnetic Field during 26 August 2018 Geomagnetic Storm
Next Article in Special Issue
Influence of Transfer Entropy in the Short-Term Prediction of Financial Time Series Using an ∊-Machine
Previous Article in Journal
A New Look at the Spin Glass Problem from a Deep Learning Perspective
Previous Article in Special Issue
Stability and Evolution of Synonyms and Homonyms in Signaling Game
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Exponential Families with External Parameters

Dipartimento di Matematica Tullio Levi-Civita, Università degli Studi di Padova, 35131 Padova, Italy
Submission received: 25 April 2022 / Revised: 11 May 2022 / Accepted: 12 May 2022 / Published: 14 May 2022
(This article belongs to the Collection Advances in Applied Statistical Mechanics)

Abstract

:
In this paper we introduce a class of statistical models consisting of exponential families depending on additional parameters, called external parameters. The main source for these statistical models resides in the Maximum Entropy framework where we have thermal parameters, corresponding to the natural parameters of an exponential family, and mechanical parameters, here called external parameters. In the first part we we study the geometry of these models introducing a fibration of parameter space over external parameters. In the second part we investigate a class of evolution problems driven by a Fokker-Planck equation whose stationary distribution is an exponential family with external parameters. We discuss applications of these statistical models to thermodynamic length and isentropic evolution of thermodynamic systems and to a problem in the dynamic of quantitative traits in genetics.

1. Introduction

This work is a first attempt to study the geometrical properties and potential applications of a class of statistical models consisting of exponential families depending on additional parameters, called external parameters. The main source for these statistical models comes from the application of E.T. Jaynes Maximum Entropy framework [1] to thermodynamical systems, where we can identify in a natural way thermal parameters (corresponding to natural parameters in an exponential family) and mechanical parameters, here called external parameters. While the construction of equilibrium Statistical Mechanics from the Maximum Entropy principle is a well established domain of science, little attention is paid in the literature to the intrinsic geometrical structure of these statistical models. Given the widespread application of Maximum Entropy principle to disparate fields of science, it is reasonable to assume that a closer scrutiny of these models can pave the way to further applications outside statistical thermodynamics.
Here is the plan of the paper: in Section 2 we recall the definitions of regular statistical model and of exponential family. The main point is that we are dealing with a finite dimensional Riemannian manifold with respect to the Fisher metric. In Section 3 we introduce the exponential families with external parameters, we state the conditions that render them a regular statistical model and we compute the Fisher metric. The additional geometrical structure that we get with these exponential families is a fibration over the space of external parameters U in the sense that for every fixed u U the fiber is a standard exponential family. The notion of Eheresmann connection on a fibered bundle and of parallel transport is recalled in Section 4. In Section 5 we outline some applications of these parameterized exponential families: we give a formula for the thermodynamic length of a process described by a path in both natural and external parameters and we give conditions for the isentropic evolution of the system. Section 6 is motivated by a model problem in quantitative genetics (briefly recalled in Appendix A) where the dynamics of the system is given by a Fokker-Planck equation with gradient drift and the equilibrium or stationary distribution is an exponential family with external parameters. We recast the dynamic approximation procedure exposed in [2,3,4] in the framework of exponential family with external parameters and we give a generalization of the ODE that drives the approximating dynamics. We think that the consideration from the present point of view of the problem exposed in [2,3,4] may shed light on some still poorly understood aspects of the model.

Exponential Families in Statistical Thermodynamics

To help locate the contribution of the present paper in the scientific literature we briefly review and compare some of the geometrical approaches to statistical mechanics that are most relevant for our argument. A line of research initiated by the influential papers of Wheinhold [5] and Ruppeiner [6] investigates the Riemannian metric structure on parameter space related to the Boltzmann-Gibbs canonical distribution. This Riemannian metric is the one defined by the Hessian matrix of the free energy ψ = log Z (which coincides with the Fisher metric) with respect to the canonical parameters or by its inverse which is the Hessian of the entropy S, related to ψ by the Legendre transform. The Levi-Civita connection with respect to this metric allows to define the Riemannian curvature tensor and its sectional and scalar curvature. For a two-dimensional parameter space the divergence of the scalar curvature is a signal of the existence of a phase transition in the underlying physical system. This theory has been applied to Ising and Potts lattice system, to the ideal and Van der Waals gas and to black hole thermodynamics (see e.g., [7,8,9,10]). However in dimension grater of two the scalar curvature has a less stringent role and care must be taken in the interpretation of the results.
In this work we also start from the Boltzmann-Gibbs distribution but we stress the different role of natural or thermal parameters θ , which occur linearly in (15), and external parameters u which may enter nonlinearly in the Boltzmann-Gibbs distribution. In particular we are interested in using the external parameters as control parameters on the evolution of the system. The related geometrical framework exposed in Section 4 adopts the connection and curvature associated to the Ehresmann connection on the fibration locally described by ( θ , u ) u , which is fit for describing the isentropic evolution of the system or the dependence of the work control protocol on the global geometric structure i.e., the holonomy of the path of the external control space.
A second line of research relating information geometry and statistical thermodynamics concerns the notion of thermodynamic length (see [11,12,13]), which is important in the design of optimal driving protocols for the non-equilibrium evolution of (small) thermodynamic systems, see [14,15], both for classical and quantum descriptions. In this work (see Section 5.1) we investigate the notion of thermodynamic length using our geometric framework and we give a formula for for thermodynamic length that highlights the contribution of natural and external (controlled) parameters.
For the sake of completeness we cite the statistical models introduced by J. Naudits (see [16,17]) called generalized exponential families and q-exponential families by Amari-Ohara, [18]. In these models the exponential function is generalized by introducing the so-called q-deformed exponential. In practice one considers simultaneously two elements of an exponential family, the second one is called escort distribution. These deformed exponential families are useful for describing Tsallis thermostatistics [19] which gives a more accurate description for thermodynamic systems where the extensivity of the classical definition of entropy notion is defied. However this highly debated topic is not relevant for the present work.
This paper is a first attempt to study the exponential families with external parameters using geometrical tools. Even if we were inspired by the Maximum Entropy formalism our result are completely general. In particular we investigated the case where the family (with respect to the the natural and external parameters) is a regular statistical models. This is only a first step in the analysis of these parameterized models; a further step would be in the direction of singular (in opposition to regular) statistical models (see [20]) a domain where there is nowadays an increasing attention in the information geometry community. A drawback of this work is that most of the results are presented in a coordinate-dependent way and have a local character. We hope to resolve these issues in a subsequent work. Some of the results presented here were introduced in a less refined form in [21].

2. Statistical Models and Exponential Families

Before introducing their generalization in Section 3 below, we recall the definitions of regular statistical model and of exponential family (see [22,23,24]). Let ( X , B , d x ) be a probability space where X may be a discrete or continuous set. We stipulate that in case of a discrete set the integrals over X with respect to the measure d x are substituted by sum symbols. Let
P ( X ) = { p : X [ 0 , + ) , p ( x ) 0 , X p d x = 1 } L 1 ( X )
be the infinite dimensional space of probability densities over X. Let Z R d be the open set of the parameters, f : Z P ( X ) be a given smooth map and consider the subset of P ( X )
S = { p = f ( z ) : z Z } P ( X ) .
To avoid technicalities, we stipulate that the support of p, i.e., the set where p > 0 is the same for all p S and that it coincides with X. We now state the conditions under which S is a regular d-dimensional statistical model (see [22,24,25]).
Definition 1
(Regular statistical model). S is a regular statistical model if the following conditions are satisfied:
1.
(injectivity) the map f : Z S , z f ( z ) = p ( z ) is one to one,
2.
(regularity) the d functions defined on X
p i ( x ; z ) = p z i ( x ; z ) , i = 1 , , d
are linearly independent as functions on X for every z Z .
A statistical model which is not regular is called singular (see [20] for a comprehensive discussion on singular models). If condition 1 . hold the model is called identifiable, otherwise it is called unidentifiable. If condition 2 . fail the main consequence is that the Fisher metric (22) is only positive semidefinite because condition (23) fail. Many statistical model e.g., Boltzmann machines, Bayes networks, hidden Markov models are singular. Note that for a regular statistical model the inverse φ of the map f, φ ( p ) = z defines a global coordinate system for S .
To check regularity condition 2 . it is convenient to introduce the so called log-likelihood l = ln p of p and the score base
l i ( x ; z ) = l z i = ln p z i = 1 p p i ( x ; z ) .
Since l i and p i are proportional, the regularity condition 2 . holds if and only if the elements of the score base are linearly independent on X.

Exponential Family

Foundamental examples of statistical models are the exponential families. Let us introduce a observable functions h : X R a , h = ( h 1 , , h a ) and suppose that the a + 1 functions
h 1 ( x ) , , h a ( x ) , 1 ,
are linearly independent as functions over X, where 1 denotes the constant function over X. Moreover, let k = k ( x ) be a function defined on X and let us introduce the free energy ψ : Θ R a R , ψ = ψ ( θ ) as (here θ · h denotes the scalar product in R a )
ψ ( θ ) = ln X e θ · h ( x ) + k ( x ) d x
where the parameter space Θ is the subset of R a where e ψ ( θ ) < + . The a real numbers θ are called natural parameters. It is known that the set Θ is open and convex in θ and that ψ is a convex function in the θ variable (see [23,26]).
The following subset of the infinite dimensional space P ( X )
E = { p ( x ; θ ) = e θ · h ( x ) ψ ( θ ) + k ( x ) , θ Θ } P ( X ) .
is called exponential family. We show that E is an a-dimensional regular statistical model. For p E we have
l = ln p = θ · h ( x ) ψ ( θ ) + k ( x )
therefore the injectivity condition 1 . above holds if and only if for all θ , θ Θ
( θ θ ) · h ( x ) + 1 ( ψ ( θ ) ψ ( θ ) ) = 0 x X θ = θ
holds and this is true by the independence condition (2) above. To check regularity condition 2 above, we compute the elements of the score base. They are (here we use the shorthand notation i f = f / z i and f = f p d x , moreover summation over repeated indices is understood)
l α = h α α ψ = h α h α , α = 1 , a .
The last equality α ψ = h α holds if we assume that the integrability condition h α is satisfied for every α . It is not restrictive to assume that h α = 0 therefore the regularity condition 2 . holds if and only if the d functions h α are linearly independent over X, which again follows from (2).
One can show (see [22,27]) that every smooth diffeomorphism θ m ( θ ) give an equivalent parameterization of the elements of the exponential family. In this sense E has the structure of a smooth manifold, called statistical manifold. Another coordinate system for E (we will denote it with p = p ( x ; η ) ) is provided by the so called expectation parameters η E R a defined by (here ( θ ψ ) i = ψ / θ i )
η = θ ψ ( θ ) = X h ( x ) p ( x ; θ ) d x .
Since ψ is a convex function, the gradient map θ θ ψ ( θ ) is globally invertible with inverse θ = θ ^ ( η ) which is also a gradient map θ ^ ( η ) = η φ ( η ) , where
φ ( η ) = θ ^ ( η ) · η ψ ( θ ^ ( η ) )
is the Legendre transform of ψ (see [22]).

3. Exponential Families Depending on External Control Parameters

These statistical models are introduced by supposing that the observables h that defines an exponential family depend on so-called external parameters u U R b , which are to be distinguished from the natural parameters θ . These generalized exponential families arise naturally when one applies the Maximum Entropy formalism to equilibrium Statistical Mechanics, that we briefly recall here (see E.T. Jaynes books [1,28]).
It is well known that when the information consist of the average values of some random variables h α describing observables of interest for the system, the maximum entropy probability densities are exponential families. Indeed, if we introduce the Shannon entropy functional for a probability density p P ( X )
H ( p ) = X p ln p d x ,
then the probability density that maximize H on the set of probability densities that satisfy the constraints h = X h p d x = c R a has the form of an exponential family of the form in (4) with k = 0 . If the observables of interest for the system h = h ( x , u ) depend on extra parameters, the exponential family inherits naturally a dependence on the external parameters, see (15) below. Typical examples of external parameters are the magnetic or electric field applied to the system or the length of a polymer chain (see [12,29]). Also, for a quantum system confined in an infinite square well potential, the discrete energy levels h i depends on the width L of the well. Another typical example of a thermodynamic system subject to an external parameter is an ideal gas in a container of variable volume V; however in this case the parameter V affects the state space X = X ( V ) and not the observables h therefore this important system it is not described by a generalized exponential family (see [21] for a discussion of this point).
An important difference between the natural parameters θ and the external ones u is that the former are the Lagrange multipliers associated to the constraints when one solves the constrained extremization problem for H using Lagrange multipliers method, while the latter are parameters in the problem formulation that can be controlled by an agent external to the system under consideration. This difference is displayed when we consider the variation of h for p = p ( x ; θ , u ) . If we suppose, as we will always do, that we can exchange the order of integration and differentiation with respect to a parameter, we have
d h = X h θ p d x d θ + X p u h d x d u = d Q + u h
where d Q has the meaning of generalized heat exchanged and u h of generalized work exchanged (see [28]). Moreover, while the value of the external parameters u is controlled and can be varied by an agent external to the system, the value of the natural parameters θ can be varied only by putting the system in contact with an heath bath at a prescribed value of the inverse temperature θ (see again [28]).
The Kullback-Leibler divergence, also called relative entropy (see [27]) is defined for p , q P ( X ) and q > 0 as
D ( p | q ) = X p ( x ) log p ( x ) q ( x ) d x .
It is well known that the probability density p ^ that minimize D on the set of probability densities that satisfy the constraints h = X h p d x = c R a has the form of an exponential family as in (4) with q = e K > 0
p ^ ( x ; θ ) = e θ · h ( x ) ψ ( θ ) + k ( x ) .
The probability distribution p ^ is the distribution that gives the minimum information gain when one wants to update the current statistical description of the system given by q using the new available information h = c . We will refer in the sequel to this as the minimum Relative Entropy principle. The parameters θ of p ^ ( x ; θ ) in (4) are uniquely determined as θ = θ ^ ( c ) by the constraint conditions
h = θ ψ ( θ ) = c
since the gradient map θ θ ψ is invertible. Note that for θ = 0 we have p ( x ; 0 ) = q ( x ) therefore the case θ ^ ( c ) = 0 corresponds uniquely to the constraint value c = X h q d x meaning that the constraints do not represent a new piece of information on the system. We will use this fact in the following.
Having exposed the motivations for considering these probability distributions, in the sequel we will investigate the geometrical properties of exponential families with external parameters or controlled exponential families for short.

3.1. Exponential Families with External Parameters

Let U R b be the external parameter space and consider the a observables
h α : X × U R .
Let k ( x ) be a function on X and define the free energy ψ : Z R d , ψ = ψ ( θ , u ) , d = a + b
ψ ( θ , u ) = ln X e θ · h ( x , u ) + k ( x ) d x
where the parameter space Z is the subset of R d where e ψ < + . We suppose that
(i)
Z is open and we introduce the map
π : Z U , π ( θ , u ) = u .
We consider the following subset of the infinite dimensional space P ( X )
F = { p ( x ; θ , u ) = e θ · h ( x , u ) ψ ( θ , u ) + k ( x ) , ( θ , u ) Z } P ( X )
and we suppose that
(ii)
for every fixed u π ( Z ) the set E ( u ) F
E ( u ) = { p ( x ; θ ) = e θ · h ( x , u ) ψ ( θ , u ) + k ( x ) , ( θ , u ) π 1 ( u ) } P ( X )
is an exponential family. As a consequence π 1 ( u ) is a convex subset in θ and h α ( x , u ) , 1 are a + 1 functions linearly independent over X.
A natural question is to ask if the set F can be seen as a foliated manifold whose leaves are the statistical manifolds E ( u ) . Note however that if θ = 0 is allowed (that is X e k d x < + ) we have for θ = 0 in (13) ψ ( 0 ) = ψ ( 0 , u ) and p ( x ; 0 , u ) = e k ( x ) ψ ( 0 ) for every u π ( Z ) therefore
e k ( x ) ψ ( 0 ) E ( u ) E ( u ) u , u π ( Z ) .
So the statistical manifold leaves are not disjoint.
A second natural question to ask is if F can be given the structure of a regular statistical model. To this we need to check conditions 1. and 2. in the Definition 1 above. Concerning injectivity condition 1 . for the map z f ( z ) we have that f ( 0 , u ) = e k ψ ( 0 ) for all u U so injectivity condition 1 . may fail for controlled exponential families at θ = 0 . However, if we recall the statistical mechanics interpretations of controlled exponential families made in Section 3 and in particular in (12), we can consider the point of singularity θ = 0 outside the domain of application of the statistical model (see however [20] for a discussion of this point). If we assume θ 0 , due to the possibly nonlinear dependence of h ( x , u ) on u, condition (6) to assess injectivity for a controlled exponential family becomes
p z = θ · h ( x , u ) ψ ( θ , u ) = θ · h ( x , u ) ψ ( θ , u ) = p z x X θ = θ , u = u .
Condition (17) seems hard to satisfy even if we assume hypothesis (ii) as the following example shows. Suppose that the observables h depends linearly on u
h α ( x , u ) = A α k ( x ) u k
and (see (ii)) suppose that the d + 1 functions in (18) h α ( x , u ) , 1 are linearly independent over X for every fixed u. Note that the elements of F in (15) depend on θ , u through the scalar quantity θ · A u . To prove injectivity of the map z p z , z = ( θ , u ) we need to prove that if z z
θ · A ( x ) u θ · A ( x ) u
as functions on X. But this is not true if for example θ = λ θ and u = u / λ for λ 0 . So the model (18) is singular. This should not be a surprise because elements of the family F are not characterized by the observables A α k ( x ) but by the linear space spanned by the A α k ( x ) . Indeed, if we set θ = B θ and u = C u where B and C are nonsingular square matrices, then
θ · A u = θ T A u = ( B θ ) T A C u = θ T B T A C u = θ T A u = θ · A u
hence the family F is equally described by A = B T A C with respect to the parameters ( θ , u ) . Another lesson we can draw from this example is that for an exponential family linearly dependent in the external parameters, the distinction between natural and external ones is lost, as their role can be interchanged.
All that said, we stipulate that
Definition 2.
F in (15) is an exponential family depending on the parameters z = ( θ , u ) if (i) for every fixed u the set E ( u ) is an exponential family and (ii) F is a regular d = a + b statistical model for a suitable choice of the open parameter set Z R a × U .
In the case of an exponential family (15) depending on natural and external parameters in addition to a natural parameters score base vectors
l α = ln p θ α = h α α ψ = h α h α
we have b external parameters score base vectors
l k = ln p u k = θ α k h α k ψ = θ α k h α k h α = θ α L α k .
Note that l α = 0 and l k = 0 because L α k = 0 . Moreover, one can always assume that h α = 0 and k h α = 0 therefore the regularity condition 2 . above holds if and only if the a + b functions
h α ( x , u ) , θ α h α u k ( x , u )
are linearly independent over X.

3.2. Fisher Metric for an Exponential Family with External Parameters

Regular statistical models can be endowed with a Riemannian metric defined on their parameter space Z. This is called Fisher metric [30] and it has the form
g i j ( z ) = l i l j = X l z i l z j p d x .
The Fisher matrix is symmetric and positive definite therefore it defines a Riemannian metric on Z (see [24], p. 24). In fact we have
g i j v i v j = l i l j v i v j = ( l i v i ) 2 = 0 l i v i = 0 v i = 0 i
since the score vectors l i are linearly independent over X. Note also (see [24]) that g is invariant with respect to change of coordinates in the state space X and covariant (as an order 2 tensor) with respect to change of coordinates in the parameter space Z .
The elements of the Fisher matrix (22) relative to an exponential family with external parameters (15) can be detailed as follows: using (19)
g α β = l α l β = h α h α h β h β = cov ( h α , h β ) ;
we also have from (20)
g α k = l α l k = ( h α h α ) θ β ( k h β k h β ) = θ β cov ( h α , k h β )
and
g k m = l k l m = θ α ( k h α k h α ) θ β ( m h β m h β ) = θ α θ β cov ( k h α , m h β ) .
It is useful to set
A α β = g α β , M α k = g α k , B k m = g k m
and introduce a block representation of the symmetric ( a + b ) -dimensional Fisher matrix g as
g ( z ) = A M M T B .
We now give the expression of the Fisher metric coefficients using the free entropy function ψ in (13), which is also called the moment generating function because its derivative with respect to the θ parameters give the different moments of the random variables h. We thus have the well know relation
β α ψ = cov ( h α , h β ) = g α β .
By direct computation on (13) we have also
k α ψ = θ β cov ( h α , k h β ) + k h α
hence
g α k = k α ψ k h α .
Moreover we have
k m ψ = θ α θ β cov ( k h α , m h β ) + θ α k m h α
hence
g k m = k m ψ θ α k m h α .
We see that, unlike the case of natural parameters θ , second order derivatives of the free entropy ψ with respect to mixed or external parameters do not coincides with the elements of Fisher matrix.
Example 1.
As a toy model, we introduce the following example of a controlled exponential family. Let X = [ 0 , + ) and U = [ 0 , + ) and consider the two observables where x X , u U
h 1 ( x ) = ln x , h 2 ( x , u ) = ln ( x + u ) .
For this example we set k ( x ) = ln x . We check that we have an integrable free energy function
e ψ = X e θ · h + k d x = 0 + e ( θ 1 1 ) ln x + θ 2 ln ( x + u ) d x = 0 + x θ 1 1 ( x + u ) θ 2 d x = u θ 1 + θ 2 Γ ( θ 1 ) Γ ( θ 2 θ 1 ) Γ ( θ 2 )
which is finite if θ 1 > 0 , u > 0 and θ 2 + θ 1 < 0 . Here Γ ( z ) is the Gamma function defined as
Γ ( z ) = 0 + t z 1 e z d t .
Note that since e k ( x ) = 1 / x is non integrable over X, θ 1 = θ 2 = 0 is a non feasible value. By inspection h 1 , h 2 , 1 are linearly independent over X for every fixed u, the map
( θ 1 , θ 2 , u ) θ 1 ln ( x ) + θ 2 ln ( x + u ) ln ( x )
is injective. From the likelihood
l = θ 1 ln ( x ) + θ 2 ln ( x + u ) ln ( x ) ψ ( θ , u ) ,
the elements of the score base are
l 1 = ln ( x ) 1 ψ , l 2 = ln ( x + u ) 2 ψ , l u = θ 2 ( 1 x + u u ψ )
which are linearly independent over X. So the statistical model defined by (32) is a 2 + 1 dimensional controlled exponential family. Note that the probability density
p ( x ; θ 1 , θ 2 , u ) = e θ 1 h 1 + θ 2 h 2 ψ + k = x θ 1 1 ( x + u ) θ 2 Z ( θ 1 , θ 2 , u )
is known as a (possible formulation of a) compound Gamma distribution; moreover, for u = 1 , this is the Beta distribution of second kind [31].
We now compute the Fisher matrix elements for this example. Let us introduce the Polygamma function Φ m for m N
Φ m ( z ) = d m d z m Φ 0 ( z ) , Φ 0 ( z ) = d d z ln Γ ( z ) = Γ ( z ) Γ ( z ) .
We have
ψ ( θ 1 , θ 2 , u ) = ( θ 1 + θ 2 ) ln u + ln Γ ( θ 1 ) Γ ( θ 1 θ 2 ) Γ ( θ 2 )
and from relation (28) above we have
g 11 = 1 1 ψ = Φ 1 ( θ 1 ) + Φ 1 ( θ 1 θ 2 ) g 12 = 1 2 ψ = Φ 1 ( θ 1 θ 2 ) g 22 = 1 1 ψ = Φ 1 ( θ 2 ) + Φ 1 ( θ 1 θ 2 )
so the A block of g depends only on θ. Moreover from (29) and (31) we have
g 1 u = u 1 ψ u h 1 = u 1 ψ = 1 u g 2 u = u 2 ψ u h 2 = 1 u + θ 1 + θ 2 θ 2 u = θ 1 u θ 2 g u u = u u ψ θ α u u h α = ( θ 1 + θ 2 ) θ 1 u 2 ( θ 2 1 ) .

4. A Synopsis of Ehresmann Connections

On a smooth fibration π : M N , where M, N are smooth manifolds, with dim M = m , dim N = n , the set V M = ker T π of the vectors that project onto the null space of T N is an integrable subbundle of T M called the vertical bundle.
An Ehresmann connection (see e.g., [32]) on π : M N is the assignment of a distribution H M transversal to V M , so that H M V M = T M . The elements of H M are the horizontal vectors; since T π restricted to H M is an isomorphism, it has a fiberwise defined inverse, the horizontal lift: hor : T π ( z ) N T z M , hor ( X ) H z M . Let X = X h + X v be the splitting of a vector in T z M into its horizontal and vertical component. The projection on V M with respect to the horizontal subspace defines the vector–valued connection one-form
ω : T M V M , ω ( z ) ( X ) = X v ,
whose kernel is the horizontal distribution. The assignment of an horizontal distribution, of an horizontal lift operator or of a connection one-form are equivalent ways to define a connection on π : M N . The curvature of the connection is the V M –valued two-form defined as
Ω ( X , Y ) = ω ( [ X h , Y h ] )
which shows that the curvature measures the failure of the horizontal distribution to be integrable. Moreover, the curvature relates the Lie brackets of vector fields X , Y on the base manifold N with the Lie bracket of their horizontal lifts through the formula
Ω ( hor X , hor Y ) = [ hor X , hor Y ] M hor [ X , Y ] N .
Again, we find that if the curvature is vanishing the horizontal distribution, spanned by vectors of the type hor X , is involutive hence integrable. Next we give the local expressions of a connection in a fibered chart. Let z = ( x , y ) be a fibered chart on U M , π ( x , y ) = y . Then the vertical space is, α = 1 , , a = dim M dim N ,
V z U = ker T z π = span { x α }
and the connection one-form ω is:
ω = ω α x α , ω α = d x α + A l α ( z ) d y l .
The A l α ( z ) are the connection’s coefficients. The horizontal vectors have the coordinate expression
X H M ω ( X ) = 0 , X l = X l ( y l A l α x α )
while the horizontal lift of a base vector U = U l y l T π ( z ) N has the form
( hor U ) l ( z ) = U l ( y l A l α x α ) .
We now specialize the above relations to the important case where the horizontal distribution H z M is defined to be the g-orthogonal of V z M with respect to a Riemannian metric g on M. Referring to a block representation of the metric g in the coordinates ( x , y ) like the above one (27) for ( θ , u ) we ask that every X h H z M , X h = ( A l α U l , U l ) be orthogonal to all X v V z M , X v = ( W , 0 ) . As a consequence
g ( X v , X h ) = W · ( A A U + M U ) = 0 W A = A 1 M .
The connection one-form (37) becomes from (39)
ω α = d x α + A l α ( z ) d y l , where A l α = ( A 1 M ) l α
and it is called mechanical connection in the control theory for mechanical systems, where g is the kinetic energy of a mechanical system. In the orthogonal splitting case the metric g has the simpler form by (39)
g ( X , Y ) = g ( X v + X h , Y v + Y h ) = g ( X v , Y v ) + g ( X h , Y h )
Since X v = ( ω ( X ) , 0 ) and using again the block representation (27) of g we have
g ( X v , Y v ) = A α β ( z ) ω α ( X ) ω β ( Y )
and
g ( X h , Y h ) = ( A U , U ) T g ( A V , V ) = U T K V
where K = B M T A 1 M = K T hence
g ( z ) d z d z = A ( z ) ω ( · ) ω ( · ) + K ( z ) d y d y .
and
g = A 0 0 K .

Parallel Transport Equation

Let γ : [ 0 , T ] N be a smooth path in the base manifold and let z 0 π 1 ( γ ( 0 ) ) . The parallel transport equation is the following ODE for the horizontal lift vector field
z ˙ = d z d t = hor ( γ ˙ ) , z ( 0 ) = z 0
with local expression
x ˙ α = A α l ( x , γ ) γ ˙ l , y ˙ l = γ ˙ l .
The connection is called complete if the parallel transport equation has a solution defined on the whole [ 0 , T ] . If in (41) we have K = K ( y ) then the metric g is called bundle-like metric. The main geometric consequence is that if we introduce the Riemannian manifold ( N , K ) then the horizontal lift is an isometry and the solution z ( t ) of the parallel transport equation is a curve that projects over γ of the same length.

5. Some Applications of Exponential Families with External Parameters

In this Section we apply the geometric framework of the previous Section 4 to the fibration π : Z U , π ( z , u ) = u introduced in (14). We can also consider the inverse φ of the map z f ( z ) = p z and introduce the fibration
F φ Z π U , π ˜ = π φ .
Since π ˜ 1 ( u ) = E ( u ) , fibers of π ˜ are exponential families for every fixed value of the external parameters. One can show that the orthogonal splitting of T Z induces and orthogonal splitting of T F with respect to the Fisher metric (see [21]).

5.1. Thermodynamic Length

Let t z ( t ) = ( θ ( t ) , u ( t ) ) Z , t [ 0 , T ] be a path in parameter space. Define the time-dependent relative entropy along the path as D ( t ) = D ( p ( z ( t ) ) | p ( z ( 0 ) ) and compute the Taylor expansion of D ( t ) at t = 0 . A direct computation shows that D ( 0 ) = 0 , D ( 0 ) = 0 hence
D ( d t ) = D ( 0 ) + D ( 0 ) d t + 1 2 D ( 0 ) d t 2 + O ( d t 3 ) = 1 2 D ( 0 ) d t 2 + O ( d t 3 ) = 1 2 z ˙ g 2 + O ( d t 3 )
where z ˙ g 2 is the scalar product with respect to the Fisher metric in ( Z , g ) . It holds that
z ˙ g 2 = z ˙ · g ( z ) z ˙ l i l j z ˙ i z ˙ j = l i l j z ˙ i z ˙ j = X p i ln p j ln p z ˙ i z ˙ j d x = X p 1 p i p 1 p j p z ˙ i z ˙ j d x = X 1 p d p d t ( z ( t ) ) 2 d x .
The quantity 2 D ( d t ) = z ˙ g 2 can be related to the entropy change rate d σ / d t of the heat bath and to the total system entropy production rate d F / d t in a non quasi-static evolution of the system by the formula (see [14])
z ˙ g 2 = d σ d t d F d t 0 .
Therefore z ˙ g 2 is a measure of the system entropy production rate d σ s y s / d t in a non-quasi static evolution of the system. When integrated along the finite time evolution protocol z ( t ) , the quantity
2 C = 0 T z ˙ g 2 d t = 0 T [ d σ d t d F d t ] d t
is called action of the path and can be interpreted as the thermodynamic cost (loss in the entropy transfer due to the system entropy production) associated to the protocol therefore it is a measure of the dissipated (non available) work. The quantity (see [11,12,15])
L ( z ) = 0 T z ˙ g ( t ) d t
is called the thermodynamic length of the path z ( · ) . By the Cauchy-Schwartz inequality one obtains the inequality (see [14])
T 2 C ( L ( z ) ) 2
showing that the thermodynamic length (TL) gives a lower bound on the dissipated work in a non quasi-static evolution of the system [11,15]. The above relation is used when studying the controlled evolution of classical and quantum small thermodynamic systems, e.g., molecular motors (see [15]).
Using the representation (41) of the scalar product with respect to the Fisher metric g we have the interesting formula for the TL of a controlled exponential family
L ( z ) = 0 T z ˙ g ( t ) d t = 0 T A α β ( z ) ω α ( z ˙ ) ω β ( z ˙ ) + K l m ( z ) u ˙ l u ˙ m 1 2 d t .
In particular, if the path z is the horizontal lift of a path u in the external parameter space then z ˙ = hor ( u ˙ ) and ω ( z ˙ ) = 0 . If moreover the metric g is bundle-like with respect to the fibration π we have K ( z ) = K ( π ( z ) ) and the thermodynamic length can be expressed as
L ( z ) = L ( u ) = 0 T K ( u ) u ˙ · u ˙ d t
showing that TL depends solely on the external parameters evolution u = u ( t ) .

5.2. Isentropic Evolution Driven by External Parameters

We have recalled in Section 3 that the elements of a controlled exponential family F where q = e K P are the solution of the constrained minimization problem for the relative entropy D ( p | q ) of the form (11)
p ^ ( x ; c , u ) = e θ ^ · h ( x , u ) ψ ( θ ^ , u ) + k ( x )
where θ ^ = θ ^ ( c , u ) is uniquely determined by inverting the gradient map θ ψ ( θ , u ) = c . We have that
D ( c , u ) = D ( p ^ | q ) = X p ^ ln p ^ q d x = θ ^ · h ψ = θ ^ · θ ψ ψ = S ( c , u )
where S ( c , u ) = ψ θ · θ ψ is the entropy of the statistical system when the information on the system is described by the constraint h = c . In the following we consider D ( c , u ) as a function of ( θ , u ) knowing that θ is in a one-to-one correspondence with c.
Let us compute the differential of D ( θ , u ) corresponding to a infinitesimal variation of the parameters z = ( θ , u ) . We have
d D ( θ , u ) = D ( p ^ | q ) = θ D d θ + u D d u
More in detail, using (28) we obtain
β D = β ( θ α α ψ ψ ) = β ψ + θ α β α ψ β ψ = θ α g α β
and using (29) we obtain
k D = k ( θ α α ψ ψ ) = θ α k α ψ k ψ = θ α ( g α k + k h α θ α k h α = θ α g α k
so collecting the results and using (40) we have
d D ( z ) = θ α g α β d θ β + g α k d u k = θ α A α β d θ β + M α k d u k = θ α A α β ω β
and the following proposition holds
Proposition 1.
(1) The variation of entropy for an infinitesimal change in the parameters z = ( θ , u ) can be expressed using the g-orthogonal Ehresmann connection ω on π : Z U
d S = d D = θ α A α β ω β = θ α α β ψ ω β
(2) the change in entropy for the system along a given path z = z ( t ) , t [ 0 , T ] in parameter space is given by
Δ S ( z ) = 0 T d S ( z ˙ ) d t = 0 T θ α ( t ) A α β ( z ( t ) ) ω β ( z ˙ ) d t
(3) since ω ( z ˙ ) = 0 for an horizontal path, the horizontal lift z ˙ = hor γ ˙ of a path γ in the external parameter space U gives an isentropic ( Δ S = 0 ) evolution of the system.
Note that the horizontal lift do not represent all the possible isentropic evolution of the system. These are characterized by the weaker (with respect to ω ( z ˙ ) = 0 ) condition θ · A ω ( z ˙ ) = 0 . Let us investigate this condition using the general relation (10) that we can now write as
d h α = d ( α ψ ) = α β ψ d θ β + k α ψ d u k = A α β ω β + k h α d u k = d Q α + d W α .
If we want to gain insight into the above relation using a thermodynamic analogy, then h α is the α -type energy, d Q α = A α β ω β is the α -type heat exchanged and d W α the α -type work exchanged. If we interpret the natural parameters θ as the α -type inverse temperature θ α = 1 / T α then (46) display as
d S = θ α A α β ω β = α d Q α T α .
Therefore an horizontal path corresponds to the condition d Q α = 0 for all α and certainly it represents an isentropic evolution of the system, but we can have an isentropic evolution even if d Q α 0 if the heat fluxes divided by their temperatures have a zero sum. As a final remark, note that in the exponential family we have the scalar product θ · h hence the inverse temperature vector θ R a should be seen as an element of the dual space of the h R a vector space and not as a point in a local coordinate chart. See [33] on this point.

6. Information Geometry of Gradient Systems

In this Section we consider a class of evolution problems described by a Fokker-Planck type equation (FPE) on a regular connected domain X R n which is open and bounded. We write FPE as in [34] ( i , j = 1 , , n , repeated indices are summed)
t p = i ( D i p ) + i j ( D i j p ) = · S
where D i ( x ) is the drift field, D i j ( x ) is the symmetric diffusion matrix, · denotes divergence and S is the probability current
S i ( p ) = D i p j ( D i j p ) , i = 1 , , n .
To ensure that a solution p ( x , t ) is normalized to one for all t 0 we need to ask
0 = d d t X p d x = X t p d x = X · S d x = X S · ν d σ
that is S · ν = 0 on X . We restrict to the case that the diffusion matrix is diagonal D i j = d i ( x ) δ i j and positive definite ( d i > 0 ) and therefore we rewrite S as
S i ( p ) = D i p i ( d i p ) = p [ D i d i i ln ( p d i ) ] .
Moreover we suppose that the drift field is of the form
D i ( x ) = d i ( x ) i ϕ ( x )
where ϕ is a function defined on X. A stationary solution p of FPE is obtained if we have S i ( p ) = 0 for all i that is from (48)
d i [ i ϕ ln ( p d i ) ] = 0 .
One can show ([34], Chapter 6) that in this setting the stationary solution to FPE (47) is unique. We can rewrite the FPE using p from (48) and (49) as follows
t p = i S i = i [ p d i i ( ϕ + ln ( p d i ) ) ] = i [ p d i i ( ln ( p d i ) + ln ( p d i ) ) ] = i [ p d i i ln p p ]
or in compact notation as
t p = · p D ln p p
where D is the diagonal diffusion matrix. The trend to the equilibrium can be studied by computing the “distance in entropy” between a solution p of (47) and p . Setting λ ( p ) = ln ( p / p ) we have from (51)
d d t D ( p | p ) = d d t X p ln p p d x = X ( t p ) λ ( p ) d x = X · ( p D λ ) λ d x
and using the relation λ · X = · ( λ X ) X · λ where X = p D λ we get
d d t D ( p | p ) = X [ · ( λ p D λ ) p D λ · λ ] d x = X λ p D λ · ν d σ X p D λ · λ d x = X p D ln p p · ln p p d x < 0
because S · ν = p D λ · ν = 0 on X . So the distance in entropy tends to zero independently of the initial conditions. One can show that for the FPE we have (Csiszar-Kullback-Pinsker inequality, see [35], Chapter 9)
D ( p | p ) 1 2 | | p p | | L 1 .
If we have a constant diffusion matrix D i j = d δ i j , d > 0 , the above inequality (52) can be rewritten as
d d t D ( p | p ) = d X p ln p p 2 d x = d R ( p | p )
where R ( p | p ) is called relative Fisher information (see [35], Chapter 9 or [36]).
A probability density p satisfies a logarithmic Sobolev inequality (LSI) with positive constant σ > 0 if
2 σ D ( p | p ) R ( p | p ) p P ( X ) .
If p satisfies a Logarithmic Sobolev inequality we can prove the exponential speed of convergence to equilibrium in entropy; indeed we have R ( p | p ) 2 σ D ( p | p ) and by substitution in (53) we obtain
d d t D ( p | p ) 2 d σ D ( p | p ) D ( p | p ) D ( p 0 | p ) e 2 d σ t .
A sufficient condition for LSI is the following one (see [35], Chapter 9):
(LSI condition) Let V be a C 2 function on X with X e V d x = 1 and H e s s ( V ) σ I for some σ > 0 . Then e V satisfies LSI with positive constant σ .

6.1. A Dynamic Approximation Problem

This section is motivated by a problem in quantitative genetics which has been dealt with in a series of papers [2,3,4]. See also Appendix A for a brief account. Here we introduce a slightly simplified version of the original model problem which has the advantage of a greater generality. Let us consider FPE (51) with a gradient drift field of the form (49) with x = ( x 1 , , x n ) and
d i ( x ) = d ( x i ) , ϕ ( x ) = θ · h ( x ) , D i j = d ( x i ) δ i j , D i ( x ) = d ( x i ) i ( θ · h ( x ) ) .
In this case one can prove that the stationary solution satisfying (50) of FPE (51) has the form of an exponential family of the form
p ( x ) = p θ = e θ · h ( x ) ψ + k ( x ) where k ( x ) = i ln d ( x i ) .
We are free to set the value of the natural parameters and we set θ = θ 1 where θ 1 is a feasible value.
The explicit solution of FPE (51) is difficult to study and one could be content with the study of the time evolution of the average values of the observables h α that is the functions
t h α p ( t ) = X h α ( x ) p ( x , t ) d x
along the unknown solution of FPE. With this aim, it is natural to consider the following:
(Approximation problem): to find the time evolution of the natural parameters θ = θ ( t ) such that the density
p θ ( x , t ) = e θ ( t ) · h ( x ) ψ ( θ ( t ) ) k ( x )
has the same average values of the unknown solution of FPE i.e.,
h α p ( t ) = h α p θ ( t ) .
This strategy (called Dynamic Maximum Entropy method in [3]) seems reasonable because the exponential density (55) is the maximum entropy distribution which satisfy the constraints of the form h = c therefore it contains exactly the required amount of information needed to satisfy the average values constraint. In the following we will investigate the interplay between the following three densities: (1) the unknown solution p of FPE, (2) the approximating exponential density p θ and (3) the exponential equilibrium density p = p θ 1 .

Triangular Relation

To start with note that for (56) (dropping the explicit time dependence in θ and p)
( b ) D ( p | p θ ) = X p ln p p θ d x = H ( p ) θ · h p + ψ k p
therefore the condition (57) can be rewritten as
θ D ( p | p θ ) = h p + θ ψ ( θ ) = h θ h p = 0 .
Note that the equation θ ψ ( θ ) = h p has a unique solution θ = θ ( t ) for all t 0 .
Next, let us compute the distance in entropy between the solution of FPE and its stationary solution (55) with θ = θ 1
( a ) D ( p | p θ 1 ) = H ( p ) θ 1 · h p + ψ ( θ 1 ) k p
and the distance between p and p θ (here θ = θ ( t ) is the value of the approximating solution satisfying (59))
( c ) D ( p θ | p θ 1 ) = ( θ θ 1 ) · h θ + ψ ( θ 1 ) ψ ( θ ) = ( θ θ 1 ) · θ ψ ( θ ) + ψ ( θ 1 ) ψ ( θ )
which coincides with the Bregman divergence (see [27]) of the convex function ψ
D ψ ( θ 1 , θ ) = ψ ( θ 1 ) ψ ( θ ) ( θ 1 θ ) · θ ψ ( θ ) = D ( p θ | p θ 1 ) .
Collecting the above results (58) (60) (61) and summing the right hand sides we obtain the triangular relation (see [27], Theorem 1.2 or [22], Theorem 3.7)
D ( p | p θ ) + D ( p θ | p θ 1 ) = D ( p | p θ 1 ) + ( θ θ 1 ) · ( h θ h p ) .
It follows that
Proposition 2.
The function θ ( t ) satisfies condition (57) of the Approximation problem if and only if the following relation (called generalized Phytagorean theorem in [27]) holds (see Figure 1)
b + c = D ( p | p θ ) + D ( p θ | p θ 1 ) = D ( p | p θ 1 ) = a
meaning that p θ ( t ) is the geodesic projection of p on the exponential family (flat submanifold) E satisfying to
D ( p | p θ ) = m i n { D ( p | p θ ) : p θ E }
that is p θ ( t ) is the best approximation of p on E with respect to the information gain.
Note that this relation is exact and does not need the hypothesis that θ be close to θ 1 . Relation (63) characterizes θ ( t ) from a geometrical point of view.
We now take the time derivative of (63) with a double aim: on the one hand to find a differential relation (ODE) for θ and on the other hand to find an upper bound for the distance in entropy b = D ( p | p θ ) knowing that a = D ( p | p θ 1 ) tends to 0, possibly with exponential speed. Note that taking the time derivative of the relation (63) b + c = a is equivalent to taking the time derivative of the relation (59), since the two are equivalent conditions on θ ( t ) . We have from (52)
a ˙ = d d t D ( p | p θ 1 ) = X p D ln p p θ 1 · ln p p θ 1 d x < 0 .
Moreover
b ˙ = d d t D ( p | p θ ) = d d t X p ln p p θ d x = X [ p ˙ ln p p θ + p ˙ p p θ p ˙ θ ] d x = X [ p ˙ ln p p θ p p θ p ˙ θ ] d x .
Since
p ˙ θ p θ = d d t ln p θ = θ ˙ · h θ ψ · θ ˙ = θ ˙ · ( h h θ )
we have from (59)
X p p ˙ θ p θ d x = X θ ˙ ( h h θ ) d x = 0
and recalling that p is a solution of FPE (51) we obtain
b ˙ = d d t D ( p | p θ ) = X · p D ln p p θ 1 ln p p θ d x = X p D ln p p θ 1 · ln p p θ d x
since we can get rid of the boundary term as done above.
The side c = D ( p θ | p θ 1 ) does not contain the solution p of FPE therefore its time derivative can be computed as, see (45)
c ˙ = d d t D ( p θ | p θ 1 ) = X p ˙ θ ln p θ p θ 1 d x = X p θ d d t ( θ · h ψ ( θ ) + k ) ln p θ p θ 1 d x = X p θ θ ˙ h h θ [ ( θ θ 1 ) · h + ψ ( θ 1 ) ψ ( θ ) ] d x = h α h β h α h β θ ˙ α ( θ θ 1 ) β = cov θ ( h α , h β ) θ ˙ α ( θ θ 1 ) β = g α β ( θ ) θ ˙ α ( θ θ 1 ) β .
On the other hand, c ˙ can also be computed from (64) and (65) as
c ˙ = a ˙ b ˙ = X p D ln p p θ 1 · ( ln p p θ ln p p θ 1 ) d x = X p D ln p p θ 1 · ln p θ 1 p θ d x = X p D ln p p θ 1 · ln p θ p θ 1 d x = V ( p , θ , θ 1 ) .
By equating (66) and (67) we obtain an ODE for the evolution of θ ( t )
g α β ( θ ) θ ˙ α ( θ θ 1 ) β = V ( p , θ , θ 1 )
which depends on the unknown solution p of FPE. In the paper [3] the following approximation is made: if we substitute p with p θ in V ( p , θ , θ 1 ) in (67) we get
V ( p θ , θ , θ 1 ) = X p θ D ln p θ p θ 1 · ln p θ p θ 1 d x = X p θ D ( h · ( θ θ 1 ) ) · ( h · ( θ θ 1 ) ) d x = X p θ d i i h α i h β d x ( θ θ 1 ) α ( θ θ 1 ) β = B α β θ ( θ θ 1 ) α ( θ θ 1 ) β
where we have introduced the the symmetric matrix
B α β = ( D h · h ) α β = d i i h α i h β .
By comparing (66) and (68) we obtain a closed form ODE for θ since θ θ 1 0
A α β θ ˙ α = g α β ( θ ) θ ˙ α = B α β θ ( θ 1 θ ) α .
which is equation (5.2) in [3] or equation ( 12 ) in [4]. It can be given normal form since g α β is invertible. In these paper the above equation is solved numerically and it is shown that it gives very good (sometimes surprisingly good) estimates of h p using h θ even if θ is far from θ 1 .
Using the information geometry tools we have shown that the above triangular relation (62) holds independently from the assumption that θ be close to θ 1 (called quasi equilibrium approximation in [2]). Moreover, is is evident from inspection of (65) that the substitution of p with p θ renders b ˙ = 0 therefore a ˙ = c ˙ . Note that if D i j = d δ i j and we substitute p with p θ in the above formula (67) we get
c ˙ = d d t D ( p θ | p θ 1 ) = d R ( p θ | p θ 1 ) .
Hence, if p θ 1 satisfies a LSI, we have exponential speed of convergence of p θ to equilibrium distribution p θ 1 , which explains the good behavior of the approximation.

6.2. A Dynamic Approximation Problem with External Parameters

We now suppose that the drift field (54) D i ( x ) = d ( x i ) i ( θ · h ( x ) ) which defines the FPE depends on external parameters because h = h ( x , u ) . We consider the same dynamic approximation problem of Section 6.1 with the extra degrees of freedoms given by the external parameters u. The approximation condition (57) now reads
h p = h p z
where z = ( θ , u ) . We take the time derivative of the above relation to find and ODE for z. We have
d d t h p z = d d t X h p z d x = u h z u ˙ + X h p z d d t ln p z d x = u h z + X h p z d d t ( θ · h ψ + k ) d x = u h z u ˙ + X p z [ h ( h h ) θ ˙ + θ · h ( u h u h ) u ˙ ] d x = u h z u ˙ + ω ( z ˙ )
and
d d t h p = d d t X h p d x = u h p u ˙ + X h t p d x = u h p u ˙ + X h · ( p D ln p p θ 1 ) d x = u h p u ˙ X p D ln p p θ 1 · h d x
since the boundary term is vanishing. If we substitute p with p z in the last line we get
d d t h p = u h p u ˙ X p D ln p θ p θ 1 · h d x = u h z u ˙ B z ( θ θ 1 )
and therefore we have the ODE for z which is a direct generalization of (69)
ω ( z ˙ ) = A θ ˙ + M u ˙ = B z ( θ 1 θ ) .
Note that in this case we have considerably more freedom because we have a system of a ODEs for the d = a + b variables z = ( θ , u ) therefore we can assign the evolution u ( t ) of the external parameters to control the evolution θ ( t ) .

Funding

This research received no external funding.

Acknowledgments

The author thanks the anonymous referees for valuable comments and suggestions on this work.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. An Approximation Problem in Quantitative Genetics

We give a brief account of a classical problem in dynamics of quantitative traits whose approach to equilibrium described by a Fokker-Planck equation has been investigated in [2,3,4]. See also [37] for a gentle introduction to dynamics of populations. We consider a polygenic trait located in n biallelic loci ( A , a ) . If we consider a sufficiently large population, the frequencies of genotype A A at locus i are described by n independent random variables x = ( x 1 , , x n ) , x i [ 0 , 1 ] . The dynamics of these allele frequencies can be described by a diffusion process under the action of stochastic forces which represent the effect of directional selection, dominance, mutation and random drift. This diffusion process is described by a linear Fokker-Planck equation whose equilibrium distribution is known from a long time ( see [38,39])
p ( x ) = e θ · h ( x , u ) ψ ( θ , u ) + k ( x ) , x = ( x 1 , , x n ) X = [ 0 , 1 ] n
Below we show that it can be seen as a generalized exponential distribution with natural parameters θ = ( θ 1 , θ 2 , θ 3 , θ 4 ) and external parameters u = ( v , w ) R + 2 n . Setting d ( x i ) = x i ( 1 x i ) , we introduce the following four observables:
h 1 ( x , v ) = i = 1 n v i d ( x i ) = i = 1 n v i ( 2 x i 1 )
h 1 is the directional selection and v = ( v 1 , , v n ) are external parameters describing the effects on loci;
h 2 ( x , w ) = i = 1 n w i 2 d ( x i ) = i = 1 n w i 2 x i ( 1 x i )
h 2 is the dominance, w = ( w 1 , , w n ) are external parameters describing the effects on loci;
h 3 ( x ) = i = 1 n ln x i and h 4 ( x ) = i = 1 n ln ( 1 x i )
h 3 describes forward and backward mutations and
k ( x ) = i = 1 n ln x i ( 1 x i )
is the (non integrable) neutral distribution of allele frequencies in absence of selection and mutation. Since the random variables x i are independent, the free energy can be factorized using Fubini theorem
e ψ = X e θ · h + k d x = i = 1 n 0 1 e θ 1 v i ( 2 x i 1 ) + θ 2 w i 2 x i ( 1 x i ) x i θ 3 1 ( 1 x i ) θ 4 1 d x i .
We have e ψ < + if θ 3 , θ 4 > 0 so θ = 0 is a non feasible value. Moreover p ( x ) is integrable but tends to + for x tending to X if 1 > θ 3 , θ 4 > 0 while p = 0 on X if θ 3 , θ 4 > 1 . Since the observables h 1 and h 2 have linear dependence on the external parameters v and w, this statistical model fails the injectivity condition 1. therefore it is not identifiable. The Fokker-Planck equation can be given the form (51) where D = d ( x i ) δ i j = x i ( 1 x i ) is diagonal, but D = 0 on X therefore the inequalities like (52) that govern the trend to equilibrium are not strict ones due to the degeneracy of the diffusion matrix D on X . This is a major source of difficulties in the analysis of this model as reported in [3]. A final remark is that the LSI sufficient condition H e s s ( V ) σ I becomes in this case ( V = θ · h ψ + k )
i j V = i j ( θ · h k ) = 4 θ 2 w j δ i j θ 4 ( 1 x j ) 2 θ 3 x j 2 δ i j .
Therefore H e s s ( V ) is not bounded below and the LSI condition fails. In the above cited papers the dynamic approximation procedure exposed in Section 6 is introduced to compute the evolution of the averages of the observables h without solving the FPE which is a computationally hard problem.
In this paper we have shown that this model problem is partly amenable to the controlled exponential family framework with some insight, however remain peculiar difficulties that prevent from a complete analysis of the model. We refer the interested reader to the specific literature, see [3].

References

  1. Jaynes, E.T. Probability Theory: The Logic of Science; Cambridge University Press: New York, NY, USA, 2003. [Google Scholar]
  2. De Vladar, H.P.; Barton, N.H. The statistical mechanics of a polygenic character under stabilizing selection, mutation and drift. J. R. Soc. Interface 2011, 8, 720–739. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  3. Bodova, K.; Haskovec, J.; Markowich, P. Well posedness and maximum entropy approximation for the dynamics of quantitative traits. Phys. D Nonlinear Phenom. 2018, 376, 108–120. [Google Scholar] [CrossRef] [Green Version]
  4. Bodova, K.; Szep, E.; Barton, N.H. Dynamic maximum entropy provides accurate approximation of structured population dynamics. PLoS Comput. Biol. 2021, 17, e1009661. [Google Scholar]
  5. Weinhold, F. Metric geometry of equilibrium thermodynamics. J. Chem. Phys. 1975, 63, 2479–2483. [Google Scholar] [CrossRef]
  6. Ruppeiner, G. Thermodynamics: A Riemannian geometric model. Phys. Rev. A 1979, 20, 1608. [Google Scholar] [CrossRef]
  7. Janke, W.; Johnston, D.A.; Kenna, R. Information geometry and phase transitions. Phys. A Stat. Mech. Its Appl. 2004, 336, 181–186. [Google Scholar] [CrossRef] [Green Version]
  8. Brody, D.; Rivier, N. Geometrical aspects of statistical mechanics. Phys. Rev. E 1995, 51, 1006. [Google Scholar] [CrossRef]
  9. Prokopenko, M.; Lizier, J.T.; Obst, O.; Wang, X.R. Relating Fisher information to order parameters. Phys. Rev. E 2011, 84, 041116. [Google Scholar] [CrossRef] [Green Version]
  10. Brody, D.C.; Ritz, A. Information geometry of finite Ising models. J. Geom. Phys. 2003, 47, 207–220. [Google Scholar] [CrossRef]
  11. Salamon, P.; Berry, R.S. Thermodynamic length and dissipated availability. Phys. Rev. Lett. 1983, 51, 1127–1130. [Google Scholar] [CrossRef]
  12. Crooks, G.E. Fisher information and statistical mechanics. Tech. Rep. 2011, 1–3. [Google Scholar]
  13. Crooks, G.E. Measuring thermodynamic length. Phys. Rev. Lett. 2007, 99, 100602. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Ito, S. Stochastic thermodynamic interpretation of information geometry. Phys. Rev. Lett. 2018, 121, 030605. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. Abiuso, P.; Miller, H.J.; Perarnau-Llobet, M.; Scandi, M. Geometric optimisation of quantum thermodynamic processes. Entropy 2020, 22, 1076. [Google Scholar] [CrossRef] [PubMed]
  16. Naudts, J. The q-exponential family in statistical physics. Cent. Eur. J. Phys. 2009, 7, 405–413. [Google Scholar] [CrossRef] [Green Version]
  17. Naudts, J. Generalised exponential families and associated entropy functions. Entropy 2008, 10, 131–149. [Google Scholar] [CrossRef] [Green Version]
  18. Amari, S.I.; Ohara, A. Geometry of q-exponential family of probability distributions. Entropy 2011, 13, 1170–1185. [Google Scholar] [CrossRef]
  19. Tsallis, C. Possible generalization of Boltzmann-Gibbs statistics. J. Stat. Phys. 1988, 52, 479–487. [Google Scholar] [CrossRef]
  20. Watanabe, S. Algebraic Geometry and Statistical Learning Theory; No. 25; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
  21. Favretti, M. Geometry and Control of Thermodynamic Systems described by Generalized Exponential Families. J. Geom. Phys. 2022, 176, 104497. [Google Scholar] [CrossRef]
  22. Amari, S.; Nagaoka, H. Methods of Information Geometry; AMS: Premstätten, Austria; Oxford University Press: Oxford, UK, 2000. [Google Scholar]
  23. Barndorff-Nielsen, O. Information and Exponential Families: In Statistical Theory; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
  24. Calin, O.; Udriste, C. Geometric Modeling in Probability and Statistics; Springer: Berlin, Germany, 2014. [Google Scholar]
  25. Murray, M.K.; Rice, J.W. Differential Geometry and Statistics; CRC Press: Boca Raton, FL, USA, 1993; Volume 48. [Google Scholar]
  26. Souriau, J.-M. Structure of Dynamical Systems: A Symplectic View of Physics; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; Volume 149. [Google Scholar]
  27. Amari, S. Information Geometry and Its Applications; Springer: Berlin/Heidelberg, Germany, 2016; Volume 194. [Google Scholar]
  28. Jaynes, E.T. Brandeis Lectures. In Papers on Probability, Statistics and Statistical Physics; Rosenkrantz, E.D., Ed.; Springer: Dordrecht, The Netherlands, 1989; pp. 39–76. [Google Scholar]
  29. Rubinstein, M.; Ralph, H.C. Polymer Physics; Oxford University Press: New York, NY, USA, 2003; Volume 23. [Google Scholar]
  30. Fisher, R.A. On the mathematical foundations of theoretical statistics. In Philosophical Transactions of the Royal Society of London; Series A, Containing Papers of a Mathematical or Physical Character; Royal Society of London: London, UK, 1922. [Google Scholar]
  31. Dubey, S.D. Compound gamma, beta and F distributions. Metrika 1970, 16, 27–31. [Google Scholar] [CrossRef]
  32. Marsden, J.E.; Montgomery, R.; Ratiu, T.S. Reduction, Symmetry, and Phases in Mechanics; American Mathematical Soc.: Providence, RI, USA, 1990; Volume 436. [Google Scholar]
  33. Souriau, J.M. Mecanique statistique, groupes de Lie et cosmologie. GeoM. Symplectique et Phys. Math. 1974, 237. [Google Scholar]
  34. Risken, H. The Fokker-Planck Equation. Methods of Solutions and Applications; Springer: Berlin/Heidelberg, Germany, 1984. [Google Scholar]
  35. Villani, C. Topics in Optimal Transportation; American Mathematical Soc.: Providence, RI, USA, 2021; Volume 58. [Google Scholar]
  36. Yamano, T. Phase space gradient of dissipated work and information: A role of relative Fisher information. J. Math. Phys. 2013, 54, 113301. [Google Scholar] [CrossRef] [Green Version]
  37. Rice, S. Evolutionary Theory: Mathematical and Conceptual Foundations; Sinauer Associates, Inc. Publishers: Sunderland, MA, USA, 2004. [Google Scholar]
  38. Wright, S. Evolution in Mendelian populations. Genetics 1931, 16, 97. [Google Scholar] [CrossRef] [PubMed]
  39. Kimura, M. Solution of a process of random genetic drift with a continuous model. Proc. Natl. Acad. Sci. USA 1955, 41, 144. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Triangular relation between a = D ( p | p θ 1 ) , b = D ( p | p θ ) and c = D ( p θ | p θ 1 ) . E is the exponential family submanifold.
Figure 1. Triangular relation between a = D ( p | p θ 1 ) , b = D ( p | p θ ) and c = D ( p θ | p θ 1 ) . E is the exponential family submanifold.
Entropy 24 00698 g001
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Favretti, M. Exponential Families with External Parameters. Entropy 2022, 24, 698. https://0-doi-org.brum.beds.ac.uk/10.3390/e24050698

AMA Style

Favretti M. Exponential Families with External Parameters. Entropy. 2022; 24(5):698. https://0-doi-org.brum.beds.ac.uk/10.3390/e24050698

Chicago/Turabian Style

Favretti, Marco. 2022. "Exponential Families with External Parameters" Entropy 24, no. 5: 698. https://0-doi-org.brum.beds.ac.uk/10.3390/e24050698

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop