Next Article in Journal
Research on Multi-Dimensional Optimal Location Selection of Maintenance Station Based on Big Data of Vehicle Trajectory
Next Article in Special Issue
Action and Entropy in Heat Engines: An Action Revision of the Carnot Cycle
Previous Article in Journal
Interpolation with Specified Error of a Point Series Belonging to a Monotone Curve
Previous Article in Special Issue
Associations between Neurocardiovascular Signal Entropy and Physical Frailty
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Entropic Dynamics on Gibbs Statistical Manifolds

Department of Physics, University at Albany (SUNY), Albany, NY 12222, USA
*
Authors to whom correspondence should be addressed.
Submission received: 2 March 2021 / Revised: 15 April 2021 / Accepted: 19 April 2021 / Published: 21 April 2021
(This article belongs to the Special Issue Entropy: The Scientific Tool of the 21st Century)

Abstract

:
Entropic dynamics is a framework in which the laws of dynamics are derived as an application of entropic methods of inference. Its successes include the derivation of quantum mechanics and quantum field theory from probabilistic principles. Here, we develop the entropic dynamics of a system, the state of which is described by a probability distribution. Thus, the dynamics unfolds on a statistical manifold that is automatically endowed by a metric structure provided by information geometry. The curvature of the manifold has a significant influence. We focus our dynamics on the statistical manifold of Gibbs distributions (also known as canonical distributions or the exponential family). The model includes an “entropic” notion of time that is tailored to the system under study; the system is its own clock. As one might expect that entropic time is intrinsically directional; there is a natural arrow of time that is led by entropic considerations. As illustrative examples, we discuss dynamics on a space of Gaussians and the discrete three-state system.

1. Introduction

The original method of Maximum Entropy (MaxEnt) is usually associated with the names of Shannon [1] and Jaynes [2,3,4,5], although its roots can be traced to Gibbs [6]. The method was designed to assign probabilities on the basis of partial information in the form of expected value constraints and the central quantity, called entropy, which was interpreted as a measure of uncertainty or as an amount of missing information. In a series of developments starting with Shore and Johnson [7], with further contributions from other authors [8,9,10,11,12], the range of applicability of the method was significantly extended. In its new incarnation, the purpose of the method of Maximum Entropy, which will be referred as ME to distinguish it from the older version, is to update the probabilities from arbitrary priors when new information in the form of constraints is considered [13]. Highlights of the new method include: (1) A unified treatment of Bayesian and entropic methods which demonstrates their mutual consistency. (2) A new concept of entropy as a tool for reasoning that requires no interpretation in terms of heat, multiplicities, disorder, uncertainty, or amount of information. Indeed, entropy in ME needs no interpretation; it is a tool designed to perform a certain function—to update probabilities to accommodate new information. (3) A Bayesian concept of information defined in terms of its effects on the beliefs of rational agents—the constraints are the information. (4) The possibility of information that is not in the form of expected value constraints (we shall see an example below).
The old MaxEnt was sufficiently versatile for providing the foundations to equilibrium statistical mechanics [2] and to find application in a wide variety of fields such as economics [14], ecology [15,16], cellular biology [17,18], network science [19,20], and opinion dynamics [21,22]. As is the case with thermodynamics, all these applications are essentially static. MaxEnt has also been deployed to non-equilibrium statistical mechanics (see [23,24] and subsequent literature in maximum caliber, e.g., [25,26,27]) but the dynamics is not intrinsic to the probabilities; it is induced by the underlying Hamiltonian dynamics of the molecules. For problems beyond physics there is a need for more general dynamical frameworks based on information theory.
The ME version of the maximum entropy method offers the possibility of developing a true dynamics of probabilities. It is a dynamics driven by entropy—an Entropic Dynamics (ED)—which is automatically consistent with the principles for updating probabilities. ED naturally leads to an “entropic” notion of time. Entropic time is a device designed to keep track of the accumulation of changes. Its construction involves three ingredients: one must introduce the notion of an instant, verify that these instants are suitably ordered, and finally one must define a convenient notion of duration or interval between successive instants. A welcome feature is that entropic time is tailored to the system under study; the system is its own clock. Another welcome feature is that such an entropic time is intrinsically directional—an arrow of time is generated automatically.
ED has been successful in reconstructing dynamical models in physics such as quantum mechanics [28,29], quantum field theory [30], and the renormalization group [31]. Beyond physics, it has been recently applied to the rhw fields of finance [32,33] and neural networks [34]. Here, we aim for a different class of applications of ED: to describe the dynamics of Gibbs distributions, also known as canonical distribution (exponential family) in statistical physics (statistics), since they are the distributions that are defined by a set of expected values constraint, namely sufficient statistics. Unlike the other cited papers on ED, here we will not focus on what the distributions are meant to represent. Other assumptions that would be specific to the modeled system are beyond the scope of the present article.
The goal is to study the ED that is generated by transitions from one distribution to another. The main assumptions are that changes happen and that they are not discontinuous. We do not explain why changes happen—this is a mechanics without a mechanism. Our goal is to venture an educated estimate of what changes one expects to happen. The second assumption is that systems evolve along continuous trajectories in the space of probability distributions. It also implies that the study of motion involves two tasks. The first is to describe how a single infinitesimal step occurs. The second requires a scheme for keeping track of how a large number of these short steps accumulate to produce a finite motion. It is the latter task that involves the introduction of the concept of time.
The fact that the space of macrostates is a statistical manifold—each point in the space is a probability distribution—has a profound effect on the dynamics. The reason is that statistical manifolds are naturally endowed with a Riemannian metric structure that is given by the Fisher–Rao information metric (FRIM) [35,36]; this structure is known as information geometry [37,38,39]. The particular case of Gibbs distributions leads to additional interesting geometrical properties (see e.g., [40,41]), which have been explored in the extensive work relating statistical mechanics to information geometry [42,43,44,45,46,47,48,49]. Information geometry has also been used as a fundamental concept for complexity measures [50,51,52].
In this paper, we tackle the more formal aspects of an ED on Gibbs manifolds and offer a couple of illustrative examples. The formalism is applied to two important sets of probability distributions: the space of Gaussians and the space of distributions for a three-state system, both of which can be written in the exponential form. Because these distributions are both well-studied and scientifically relevant, they can give us a good insight into how the dynamics work.
It is important to emphasize that Gibbs distributions are not restricted to the description of a system in thermal equilibrium. While it is true that, if one chooses the conserved quantities in Hamiltonian motion as the sufficient statistics, the resultant Gibbs distributions are the ones that are associated to equilibrium statistical mechanics, the Gibbs distribution can be defined for arbitrary choices of sufficient statistics, and the modeling endeavour includes identifying the ones that are relevant to the problem at hand. On the same note, the dynamics developed here are not a form of nonequilibrium statistical mechanics, which is driven by a underlying physical molecular dynamics, while the ED is completely agnostic of any microstate dynamics.
The article is organized, as follows: the next section discusses the space of Gibbs distributions and its geometric properties; Section 3 considers the ideas of ED; Section 4 tackles the difficulties associated with formulating ED on the curved space of probability distributions; Section 5 introduces the notion of entropic time; Section 6 describes the evolution of the system in the form of a differential equation; in Section 7, we offer two illustrative examples of ED on a Gaussian manifold and on a two-simplex.

2. The Statistical Manifold of Gibbs Distributions

2.1. Gibbs Distributions

The canonical or Gibbs probability distributions are the macrostates of a system. They describe a state of uncertainty regarding the microstate x X of the macroscopic system. A canonical distribution ρ ( x ) is assigned by maximizing the entropy
S [ ρ | q ] = d x ρ ( x ) log ρ ( x ) q ( x )
relative to the prior q ( x ) subject to n expected value constraints
d x ρ ( x ) a i ( x ) = A i , with i = 1 n ,
and the normalization of ρ ( x ) . Typically, the prior q ( x ) is chosen to be a uniform distribution over the space X so that it is maximally non-informative, but this is not strictly necessary. The n constraints, on the other hand, reflect the information that happens to be relevant to the problem. The resulting canonical distribution is
ρ ( x | λ ) = q ( x ) Z ( λ ) exp [ λ i a i ( x ) ] ,
where λ = { λ 1 λ n } are the Lagrange multipliers that are associated to the expected value constraints, and we adopt the Einstein summation convention. The normalization constant is
Z ( λ ) = d x q ( x ) exp [ λ i a i ( x ) ] = e F ( λ ) ,
where F = log Z plays a role analogous to the free energy. The Lagrange multipliers λ i ( A ) are implicitly defined by
F λ i = A i .
Evaluating the entropy (1) at its maximum yields
S ( A ) = d x ρ ( x | λ ( A ) ) log ρ ( x | λ ( A ) ) q ( x ) = λ i ( A ) A i F ( λ ( A ) ) .
which we shall call the macrostate entropy or (when there is no risk of confusion) just the entropy. Equation (6) shows that S ( A ) is the Legendre transform of F ( λ ) : a small change d A i in the constraints shows that S ( A ) is indeed a function of the expected values A i ,
d S = λ i d A i so that λ i = S A i .
One might think that defining dynamics on the family of canonical distributions might be too restricted to be of interest; however, this family has widespread applicability. Here, it has been derived using the method of maximum entropy, but historically it has also been known as the exponential family, namely the only family of distributions that possesses sufficient statistics. Interestingly, this was a problem that was proposed by Fisher [53] in the primordium of statistics and later proved independently by Pitman [54], Darmois [55], and Koopman [56]. The sufficient statistics turn out to be the functions a i ( x ) in (1). In Table 1, we give a short list of the priors q ( x ) and the functions a i ( x ) that lead to well-known distributions [41,57].
Naturally, the method of maximum entropy assumes that the various constraints are compatible with each other, so that the set of multipliers λ exists. It is further assumed that the constraints reflect physically relevant information, so that the various functions, such as A i ( λ ) = λ i F and λ i ( A ) = A i S , which appear in the formalism, are both invertible and differentiable, and so that the space of Gibbs distributions is indeed a manifold. However, the manifold may include singularities of various kinds that are of particular interest, as they may describe phenomena, such as phase transitions [42,46].

2.2. Information Geometry

We offer a brief review of well known results concerning the information geometry of Gibbs distributions in order to establish the notation and recall some results that will be needed in later sections [38,40].
To each set of expected values A = { A 1 , A 2 , , A n } , or to the associated set of Lagrange multipliers λ = { λ 1 , λ 2 , , λ n } , there corresponds a canonical distribution. Therefore the set of distributions ρ ( x | λ ) or, equivalently, ρ ( x | A ) is a statistical manifold in which each point can be labelled by the coordinates λ or by A. Whether we choose λ or A as coordinates is purely a matter of convenience. The change of coordinates is implemented using
A i λ k = 2 log Z λ k λ i = A k A i a k a i ,
where we recognize the covariance tensor,
C i j = ( a i A i ) ( a j A j ) = A i λ j .
Its inverse is given by
C j k = λ j A k = 2 S A j A k ,
that means the inverse covariant matrix C i j is the Hessian of negative entropy in (6). This implies
C i j C j k = A i A k = δ k i .
Statistical manifolds are endowed with an essentially unique quantity to measure the extent to which two neighboring distributions ρ ( x | A ) and ρ ( x | A + d A ) can be distinguished from each other. This measure of distinguishability provides a statistical notion of distance, which is given by FRIM, d 2 = g i j d A i d A j where
g i j = d x ρ ( x | A ) log ρ ( x | A ) A i log ρ ( x | A ) A j .
For a broader discussion on the existence, derivation, and consistency of this metric, as well as its properties, see [38,39,40]. Here, it suffices to say that FRIM is the unique metric structure that is invariant under Markov embeddings [58,59] and, therefore, is the only way of assigning a differential geometry structure that is in accordance to the grouping property of probability distributions.
To calculate g i j for canonical distributions, we use
g i j = λ k A i λ l A j d x ρ log ρ λ k log ρ λ l
and
log ρ ( x | A ) λ k = A k a k ( x )
so that, using (8)–(11), we have
g i j = C i k C l j C k l = C i j .
Therefore, the metric tensor g i j is the inverse of the covariance matrix C i j , which, by (10), is the Hessian of the entropy.
As mentioned above, instead of A i , we could use the Lagrange multipliers λ i as coordinates. Subsequently, the information metric is the covariance matrix,
g i j = d x ρ ( x | λ ) log ρ ( x | λ ) λ i log ρ ( x | λ ) λ j = C i j .
Therefore, the distance d between neighboring distributions can be written in either of two equivalent forms,
d 2 = g i j d A i d A j = g i j d λ i d λ j .
Incidentally, the availability of a unique measure of volume d V = ( det g i j ) 1 / 2 d n A implies that there is a uniquely defined notion of the uniform distribution over the space of macrostates. The uniform distribution P u assigns equal probabilities to equal volumes, so that
P u ( A ) d n A g 1 / 2 d n A where g = det g i j .
To conclude this overview section, we note that the metric tensor g i j can be used to lower the contravariant indices of a vector to produce its dual covector. Using (10) and (12), the covector d A i dual to the infinitesimal vector with components d A i is
d A i = g i j d A j = λ i A j d A j = d λ i .
which shows that not only are the coordinates A and λ related through a Legendre transformation, which is a consequence of entropy maximization, but also through a vector-covector duality, i.e., d λ i is the covector dual to d A i , which is a consequence of information geometry.

3. Entropic Dynamics

Having established the necessary background, we can now develop an entropic framework to describe the dynamics on the space of macrostates.

3.1. Change Happens

Our starting assumption is that changes happen continuously, which is supported by observation in nature. Therefore, the dynamics that we wish to formulate assumes that the system evolves along continuous paths. This assumption of continuity represents a significant simplification, because it implies that a finite motion can be analyzed as the accumulation of a large number of infinitesimally short steps. Thus, our first goal will be to find the probability P ( A | A ) that the system takes a short step from the macrostate A to the neighboring macrostate A = A + d A . The transition probability P ( A | A ) will be assigned by maximizing an entropy. This first requires that we identify the particular entropy that is relevant to our problem. Next, we must decide on the prior distribution: what short steps we might expect before we know the specifics of the motion. Finally, we stipulate the constraints that are meant to capture the information that is relevant to the particular problem at hand.
To settle the first item—the choice of entropy—we note that not only are we are uncertain about the macrostate at A, but we are also uncertain about the microstates x X . This means that the actual universe of discourse is the joint space A × X and the appropriate statistical description of the system is in terms of the joint distribution
P ( x , A ) = P ( x | A ) P ( A ) = ρ ( x | A ) P ( A ) ,
Where ρ is of form (3), which means that we impose P ( x | A ) to be canonical and the distribution P ( A ) represents our lack of knowledge about the macrostates. Note that what we did in (20) is nothing more than assuming a probability distribution for the macrostates. This description is sometimes referred to as superstatistics [60].
Our immediate task is to find the transition probability of a change P ( x , A | x , A ) by maximizing the entropy
S [ P | Q ] = d A d x P ( x , A | x , A ) log P ( x , A | x , A ) Q ( x , A | x , A ) ,
relative to the prior Q ( x , A | x , A ) and subject to constraints to be discussed below (to simplify the notation in multidimensional integrals we write d n A = d A and d n x = d x ).
Although S in (6) and S in (21) are both entropies, in the information theory sense, they represent two very distinct statistical objects. The S ( A ) in (6) is the entropy of the macrostate—which is what one may be used to from statistical mechanics —while the S [ P | Q ] in (21) is the entropy to be maximized, so that we find the transition probability that better matches the information at hand, which means that S is a tool to select the dynamics of the macrostates.

3.2. The Prior

We adopt a prior that implements the idea that the system evolves by taking short steps A A + Δ A at the macrostate level, but is otherwise maximally uninformative. We write
Q ( x , A | x , A ) = Q ( x | x , A , A ) Q ( A | x , A ) ,
and analyze the two factors in turn. We shall assume that a priori, before we know the relation between the microstates x and the macrostate A, the prior distribution for x is the same uniform underlying measure q ( x ) that is introduced in (1),
Q ( x | x , A , A ) = q ( x ) .
Next, we tackle the second factor Q ( A | x , A ) . As shown in Appendix A, using the method of maximum entropy, the prior that enforces short steps, but is otherwise maximally uninformative, is spherically symmetric as
Q ( A | x , A ) = Q ( A | A ) g 1 / 2 ( A ) exp [ 1 2 τ g i j Δ A i Δ A j ] .
so the joint prior is
Q ( x , A | x , A ) q ( x ) g 1 / 2 ( A ) exp [ 1 2 τ g i j Δ A i Δ A j ] .
We see that steps of length
Δ ( g i j Δ A i Δ A j ) 1 / 2 τ 1 / 2 ,
have negligible probability. Eventually, we will take the limit τ 0 to enforce short steps. The prefactor g 1 / 2 ( A ) ensures that Q ( A | A ) is a probability density. Later, we will show how this choice of priors, which only comes from the assumption of continuous motion, leads to a diffusion structure.

3.3. The Constraints

The piece of information that we wish to codify through the constraints is the simple geometric idea that the dynamics remains confined to the statistical manifold A . This is implemented by writing
P ( x , A | x , A ) = P ( x | x , A , A ) P ( A | x , A )
and imposing that the distribution for x is a canonical distribution
P ( x | x , A , A ) = ρ ( x | A ) A .
This means that, given A , the distribution of x is independent of the initial microstate x and macrostate A. The second factor in (27), P ( A | x , A ) , is the transition probability we seek, which leads to the constraint
P ( x , A | x , A ) = ρ ( x | A ) P ( A | x , A ) .
We note that this constraint is not, as is usual in applications of the method of maximum entropy, in the form of an expected value. It may appear from (29) that the transition probability P ( A | x , A ) will be largely unaffected by the underlying space of microstates. To the contrary, as we shall see below—(31) and (32)—the macrostate dynamics turns out to be dominated by the entropy of the microstate distribution ρ ( x | A ) .
Depending on the particular system under consideration, one could formulate richer forms of dynamics by imposing additional constraints. To give just one example, one could introduce some drift relative to the direction that is specified by a covector F i by imposing a constraint of the form Δ A i F i = κ (see [29,30]). However, in this paper, we shall limit ourselves to what is perhaps the simplest case, the minimal ED that is described by the single constraint (29).

3.4. Maximizing the Entropy

Substituting (25) and (29) into (21) and rearranging, we find
S [ P | Q ] = d A P ( A | x , A ) log P ( A | x , A ) Q ( A | A ) + S ( A )
where S ( A ) is the macrostate entropy that is given in (6). Maximizing S subject to normalization gives
P ( A | x , A ) Q ( A | A ) e S ( A ) g 1 / 2 ( A ) exp [ 1 2 τ g i j Δ A i Δ A j + S ( A ) ] .
It is noteworthy that P ( A | x , A ) turned out to be independent of x, which is not surprising, since neither the prior nor the constraints indicate any correlation between A and x.
We perform a linear approximation of S because the transition from A to A has to be an arbitrarily small continuous change. This makes the exponential factor in (31) quadratic in Δ A , as
P ( A | A ) = g 1 / 2 ( A ) Z exp S A i Δ A i 1 2 τ g i j Δ A i Δ A j ,
where e S ( A ) was absorbed in the normalization factor Z . This is the transition probability found by maximizing the entropy (21). However, some mathematical difficulties arise from the fact that (32) is defined over a curved manifold. We are going to explore these mathematical issues and their consequences to motion in the following section.

4. The Transition Probability

Because the statistical manifold is a curved space, we must understand how the transition probability (32) behaves under a change of coordinates. Because (25) and (32) describe an arbitrarily small step, we wish to express the transition probability, as well as the quantities derived from it, which are calculated up to the order of τ . Because the exponent in (32) is manifestly invariant, one can complete squares and obtain
P ( A | A ) = g 1 / 2 ( A ) Z exp 1 2 τ g i j Δ A i τ g i k S A k Δ A j τ g i k S A k .
If g ( A ) were uniform, it would imply that the first two moments Δ A i and Δ A i Δ A j are of order τ . Therefore, even in the limit τ 0 , the transition will be affected by curvature effects. This can be verified for an arbitrary metric tensor by a direct calculation of the first moment,
Δ A i = d A Δ A i P ( A | A ) = 1 Z d A g 1 / 2 ( A ) Δ A i exp g k l 2 τ Δ A k τ V k Δ A l τ V l ,
where V i = g i j S A j . And the second moment
Δ A i Δ A j = d A Δ A i Δ A j P ( A | A ) = 1 Z d A g 1 / 2 ( A ) Δ A i Δ A j exp g k l 2 τ Δ A k τ V k Δ A l τ V l .
It is convenient to write (32) in normal coordinates at A in order to facilitate the calculation of the integrals in (34) and (35). This means that, for a smooth manifold, one can always make a change of coordinates A μ ( A i ) —we will label the normal coordinates with Greek letter indexes ( μ , ν )—so that the metric in this coordinate system is so that
g μ ν ( A ) = δ μ ν and g μ ν A μ | A = 0 ,
allowing for us to approximate g ( A ) = 1 for a short step. For a general discussion and rigorous proof of the existence of normal coordinates, see [61]. Although normal coordinates are a valuable tool for geometrical analysis at this point, it is not clear whether they can be given a deeper statistical interpretation—this is unlike other applications of differential geometry, such as general relativity, where the physical interpretation of normal coordinates turns out be of central importance. A displacement in these coordinates Δ A μ can be related to the original coordinates by a Taylor expansion in terms of Δ A i as (see [62,63])
Δ A μ = A μ A i Δ A i + 1 2 2 A μ A j A k Δ A j Δ A k + o ( τ ) .
To proceed, it is interesting to recall the Christoffel symbols Γ j k i ,
Γ j k i = 1 2 g i l g j l A l + g l j A k g j k A l ,
which transform as
Γ j k i = A i A μ A ν A j A σ A k Γ ν σ μ A i A μ 2 A μ A j A k .
Because, in normal coordinates, we have Γ ν σ μ = 0 , this allows us to isolate Δ A i up to the order τ obtaining
Δ A i = A i A μ Δ A μ 1 2 Γ j k i Δ A j Δ A k ,
By squaring (40), we have
Δ A i Δ A j = A i A μ A j A ν Δ A μ Δ A ν + o ( τ ) .
Because the exponent in (34) is invariant and in a coordinate transformation we have d A P ( A ) = d A ˜ P ( A ˜ ) , it separates into two terms.
Δ A i = A i A μ 1 Z d A Δ A μ exp δ ν σ 2 τ Δ A ν τ V ν Δ A σ τ V σ 1 2 Γ j k i A j A μ A k A ν 1 Z d A Δ A μ Δ A ν exp δ υ σ 2 τ Δ A ν τ V ν Δ A σ τ V σ .
The integrals can be evaluated from the known properties of a Gaussian. The integral in the first term gives Δ A μ = τ δ μ ν S A ν and the integral in the second term gives Δ A μ Δ A ν = τ δ μ ν , so that
Δ A i = A i A μ τ δ μ ν S A μ 1 2 Γ j k i A j A μ A k A ν τ δ μ ν .
Therefore, in natural coordinates, the first two moments up to order of τ are
Δ A i = τ g i j S A j τ 2 Γ i , and Δ A i Δ A j = τ g i j ,
where Γ i = Γ j k i g j k . Here, we see the dependence on curvature for Δ A i in the Christoffel symbol term. Note that it is a consequence of the dependance between Δ A i and the quadratic term Δ A i Δ A j in (40), which per (44) does not vanish, even for small steps. Hence, fluctuations in A i cannot be ignored in the ED motion, and this is the reason why the motion probes curvature. It also follows from (44) that, even in the limit τ 0 , the average change Δ A i does not transform covariantly.
Note that we used several words, such as “transitions”, “short step”, “continuous”, and “dynamics” without any established notion of time. In the following section, we will discuss time not as an external parameter, but as an emergent parameter from the maximum entropy transition (32) and its moments (44).

5. Entropic Time

Having described a short step transition, the next challenge is to study how these short steps accumulate.

5.1. Introducing Time

In order to introduce time, we note that A and A are elements of the same manifold; therefore, P ( A ) and P ( A ) are two probability distributions over the same space. Our established solution for describing the accumulation of changes (see [28]) is to introduce a “book-keeping” parameter t that distinguishes the said distributions as labelled by different parameters, i.e., P ( A ) = P t ( A ) and P ( A ) = P t ( A ) .
In this formalism, we will refer to these different labels as a description of the system at particular instants t and t . This allows us to call P ( A | A ) a transition probability.
P t ( A ) = P ( A ) = d A P Δ t ( A | A ) P t ( A )
where Δ t = t t .
As the system changes from A to A and then to A . The probability P ( A ) will be constructed from P ( A ) , not explicitly dependent on P ( A ) . This means that (45) represents a Markovian process: conditioned on the present P t ( A ) , the “future” P t ( A ) is independent of the “past” P t ( A ) , where t > t > t . It is important to notice that, under this formalism, (45) is not used to show that the process is Markovian in an existing time, but rather the concept of time that was developed here makes the dynamics Markovian by design.
It is also important to notice that the parameter t that is presented here is not necessarily the “physical” time (as it appears in Newton’s laws of motion or the Schrödinger equation). Our parameter t, which we call entropic time, is an epistemic well-ordered parameter in which the dynamics are defined.

5.2. The Entropic Arrow of Time

It is important to note that the marginalization process from (20) to (45) could also lead to
P ( A ) = d A P ( A | A ) P ( A ) ,
where the conditional probabilities are related by Bayes’ Theorem,
P ( A | A ) = P ( A ) P ( A ) P ( A | A ) ,
showing that a change “forward” will not happen the same way as a change “backwards” unless the system is in some form of stationary state, P ( A ) = P ( A ) . Another way to present this is that probability theory alone gives no intrinsic distinction of the change “forward” and “backward”. The fact that we assigned the change “forward” by ME implies that, in general, the change “backward” is not an entropy maximum. Therefore, the preferential direction of the flow of time arises from the entropic dynamics naturally.

5.3. Calibrating the Clock

One needs to define the duration Δ t with respect to the motion in order to connect the entropic time to the transition probability. Time in entropic dynamics is defined so as to simplify the description of the motion. This notion of time is tailored to the system under discussion. The time interval will be chosen, so that the parameter τ that first appeared in the prior (25) takes the role of a time interval,
τ = η Δ t ,
where η is a constant, so that t has the units of time. For the remainder of this article, we will adopt η = 1 . In principle, any monotonic function t ( τ ) serves as an parameter for ordering. Our choice is a matter of convenience, as required by simplicity. Here this is implemented so that for a short transition we have the dimensionless time interval
Δ t = g i j Δ A i Δ A j .
This means that the system’s fluctuations measure the entropic time. Rather than having the changes in the system represented in terms of given time intervals (as measured by an external clock), here the system is its own clock.
The moments in (44) can be written, up to order Δ t , as
Δ A i Δ t = g i j S A j 1 2 Γ i , and Δ A i Δ A j Δ t = g i j .
With this, we have established a concept of time and it is convenient to write the trajectory of the expected values in terms of a differential equation.

6. Diffusion and the Fokker–Planck Equation

Our goal of designing the dynamics from entropic methods is accomplished. The entropic dynamics equation of evolution is written in integral form as a Chapman–Kolmogorov Equation (45) with a transition probability given by (32). In this section, we will conveniently rewrite it in the differential form. The computed drift Δ A i and the fluctuation Δ A i Δ A j in (50) describe the dynamical process as a smooth diffusion—meaning, as defined by [63], a stochastic process in which the first two moments are calculated to the order of Δ t , Δ A i = b i Δ t , Δ A i Δ A j = η g i j Δ t , and Δ A i Δ A j Δ A k = 0 . Therefore, for a short transition, it is possible to write the evolution of P t ( A ) , as a Fokker–Planck (diffusion) equation,
t P = A i P v i ,
where
v i = g i j S A j 1 2 g i j A j log P g 1 / 2 .
The derivation of (51) and (52) takes into account the fact that the space in which the diffusion happens is curved and it is given in Appendix B. In equation (52), we see that the current velocity v i consists of two components. The first term is the drift velocity that is guided by the entropy gradient and the second term is an osmotic velocity, which is a term that is driven by differences in probability density. The examples that are presented in the following section will show how these terms interact and the dynamical properties that are derived from each.

Derivatives and Divergence

Because the entropy S is a scalar, the velocity that is defined in (52) is a contravariant vector. However, (51) is not a manifestly invariant equation. To check its consistency, it is convenient to write it in terms of the invariant object p, being defined as
p ( A ) = P ( A ) g 1 / 2 ( A ) ,
meaning that p is the probability of A divided by the volume element, in terms of which (51) becomes
t p = 1 g 1 / 2 A i g 1 / 2 p v i .
We can recognize, on the right-hand side, the covariant divergence of the contravariant vector p v i , which can be written in the manifestly covariant form
t p = D i ( p v i ) ,
where D i is the covariant derivative. The fact that the covariant derivative arises from the dynamical process is the direct indication that even when evolving the invariant object p the curvature of the space is taken into account. We can identify (55) as a continuity equation—generalized to the parallel transport in a curved space, as evidenced by the covariant divergence—where the flux, j i = p v i , can be written from (52) and (53) as
j i = p g i j S A j 1 2 g i j p A j .
The second term, which is related to the osmotic velocity, is a Fick’s law with diffusion tensor D i j = g i j / 2 . Note that this is identified from purely probabilistic arguments, rather than assuming a repulsive interaction from the microstate dynamics.
Having the dynamics fully described, we can now study its consequences, as will be done in the following section.

7. Examples

We established the entropic dynamics by finding the transition probability (32), presenting it as a differential equation in (51), (52), and presenting it as the invariant Equation (55). We want to show some examples of how it would be applied and what are the results achieved. Our present goal is not to search for realistic models, but to search for models that are both mathematically simple and general enough so it can give insight on how to use the formalism.
We will be particularly interested in two properties: the drift velocity,
v D i = g i j S A j ,
which is the first term in (52), and the static states, v i = 0 , which are a particular subset of the dynamical system’s equilibrium t P = 0 . These are obtained from (52) as
v i = 0 S A i 1 2 A i log ( P g 1 / 2 ) = 0
allowing for one to write the static probability
P ( A ) g 1 / 2 ( A ) exp [ 2 S ( A ) ] ,
where the factor of 2 in the exponent comes from the diffusion tensor D i j = g i j / 2 that is explained in Section 6. This result shows that the invariant stationary probability density (53) is
p ( A ) exp [ 2 S ( A ) ] .

7.1. A Gaussian Manifold

The statistical manifold defined by the mean values and correlations of a random variable, the microstate x, is the space of Gaussian distribution, which is an example of a canonical distribution. Here, we consider the dynamics of a two-dimensional spherically symmetric Gaussian with a non-uniform variance, σ ( A ) = σ ( A 1 , A 2 ) , as defined by
x 1 = A 1 , x 2 = A 2 , and ( x i A i ) ( x j A j ) = σ 2 ( A ) δ i j .
These Gaussians are of the form,
ρ ( x A ) = 1 2 π σ 2 ( A ) exp ( 1 2 σ 2 ( A ) i = 1 2 ( x i A i ) 2 )
The entropy of (62) relative to a uniform background measure is given by
S ( A ) = log ( 2 π σ ( A ) 2 )
The space of Gaussians with a uniform variance, σ ( A ) = constant, is flat and the dynamics turn out to be a rather trivial spherically symmetric diffusion. Choosing the variance to be non-uniform yields richer and more interesting dynamics. Because this example is pursued for purely illustrative purposes, we restrict to two dimensions and spherically symmetric Gaussians. The generalization is immediate.
The FRIM for a Gaussian distribution is found using (12) (see also [13]), to be
d l 2 = 4 σ 2 ( d σ ) 2 + δ i j σ 2 d A i d A j ,
so that, using
d σ = σ A i d A i ,
the induced metric d l 2 = g i j d A i d A j leads to
g i j = 1 σ 2 4 σ A i σ A j + δ i j .

Gaussian Submanifold around an Entropy Maximum

We present an example of our dynamical model that illustrates the motion around an entropy maximum. A simple way to manifest it is to recognize that, in (52), S plays a role analogous to a potential. A rotationally symmetric quadratic potential can then be sustituted in (63), leading to
σ ( A ) = exp ( ( A 1 ) 2 + ( A 2 ) 2 4 ) ,
which, substituted in (66), yields the metric
g i j = [ ( A 1 ) 2 + σ 2 A 1 A 2 A 1 A 2 ( A 2 ) 2 + σ 2 ] ,
so that
g 1 / 2 = ( A 1 ) 2 + ( A 2 ) 2 σ 2 + σ 4 .
The scalar curvature for the Gaussian submanifold can be calculated from (68) as
R = ϕ 2 2 ϕ ( ϕ 2 + σ 2 ) 2 σ 2 , where ϕ = ( A 1 ) 2 + ( A 2 ) 2 .
From (57), the drift velocity (Figure 1) is
v D 1 = A 1 σ 2 g and v D 2 = A 2 σ 2 g .
and, from (59), the static probability (Figure 2) is
P ( A ) 4 π 2 g 1 / 2 σ 4 .
The static distribution results from the dynamical equilibrium between two opposite tendencies. One is the drift velocity field that drives the distribution along the entropy gradient towards the entropy maximum at the origin. The other is the osmotic diffusive force that we identified earlier as the ED analogue of Fick’s law. This osmotic force drives the distribution against the direction of the probability gradient and prevents it from becoming infinitely concentrated at the origin. At equilibrium, the cancellation between these two opposing forces results in the Gaussian distribution, Equation (72).

7.2. 2-Simplex Manifold

Here, we discuss an example of discrete microstates. The macrostate coordinates, being expected values, are continuous variables. Our subject matter will be a three-state system, x { 1 , 2 , 3 } , such as, for example, a 3-sided die. The statistical manifold is the 2-dimensional simplex and the natural coordinates are the probabilities themselves,
S 2 = ρ ( x ) ρ ( x ) 0 , x = 1 3 ρ ( x ) = 1 .
The distributions on the two-simplex are Gibbs distributions defined by the sufficient statistics of functions
a i ( x ) = δ x i so that A i = a i = ρ ( i ) .
The entropy relative to the uniform discrete measure is
S = i = 1 3 ρ ( i ) log ( ρ ( i ) ) = i = 1 3 A i log ( A i ) ,
and the information metric is given by
g i j = k = 1 3 ρ k log ( ρ k ) A i log ( ρ k ) A j .
The two-simplex arises naturally from probability theory due to normalization when one identifies the macrostate of interest to be the probabilities themselves. The choice of sufficient statistics (74) implies that the manifold is a two-dimensional surface, since, due to the normalization, one can write A 3 = 1 A 1 A 2 . We will use the the tuple ( A 1 , A 2 ) as our coordinates and A 3 as a function of them. In this scenario, one finds a metric tensor
g i j = [ 1 A 3 + 1 A 1 1 A 3 1 A 3 1 A 3 + 1 A 2 ] ,
which induces the volume element
g 1 / 2 = 1 A 1 A 2 A 3 .
As is well known, the simplex is characterized by a constant curvature R = 1 / 2 ; the two-simplex is the positive octant of a sphere. From (57), the drift velocity (Figure 3) is
v D 1 = A 1 A 2 log ( A 2 A 3 ) + ( A 1 1 ) log ( A 1 A 3 ) v D 2 = A 2 A 1 log ( A 1 A 3 ) + ( A 2 1 ) log ( A 2 A 3 ) ,
Additionally, the static probability is
P ( A ) g 1 / 2 i = 1 3 ( A i ) 2 A i .
From the determinant of the metric, we note that the static probability (80) diverges at the boundary of the two-simplex. This reflects the fact that a two-state system (say, i = 1 , 2 ) is easily distinguishable from a three-state system ( i = 1 , 2 , 3 ). Indeed, a single datum i = 3 will tell us that we are dealing with a three-state system.
On the other hand, we can see (Figure 4) that this divergence is not present in the invariant stationary probability (53).
As in the Gaussian case discussed in the previous section, the static equilibrium results from the cancellation of two opposing forces: the entropic force along the drift velocity field towards the center of the simplex is cancelled by the osmotic diffusive force away from the center.

8. Conclusions

We conclude with a summary of the main results. In this paper, the entropic dynamics framework has been extended to describe dynamics on a statistical manifold. ME played an instrumental role in that it allowed us to impose constraints that are not in the standard form of expected values.
The resulting dynamics, which follow from purely entropic considerations, take the form of a diffusive process on a curved space. The effects of curvature turn out to be significant. We found that the probability flux is the result of two components. One describes a flux along the entropy gradient and the other is a diffusive or osmotic component that turns out to be the curved-space analogue of Fick’s law with a diffusion tensor D i j = g i j / 2 that is given by information geometry.
A highlight of the model is that it includes an “entropic” notion of time that is tailored to the system under study; the system is its own clock. This opens the door to the introduction of a notion of time that transcends physics and it might be useful for social and ecological systems. The emerging notion of entropic time is intrinsically directional. There is a natural arrow of time that manifests itself in a simple description of the approach to equilibrium.
The model developed here is rather minimal in the sense that the dynamics could be extended by taking additional relevant information into account. For example, it is rather straightforward to enrich the dynamics by imposing additional constraints
Δ A i F i ( A ) = κ ,
involving system-specific functions F i ( A ) that carry information regarding correlations. This is the kind of further developments that we envisage in future work.
As illustrative examples, the dynamics were applied to two general spaces of probability distributions. A submanifold of the space of two-dimensional Gaussians and the space of probability distributions for a three-state system (two-simplex). In each of these, we were able to provide insight on the dynamics by presenting the drift velocity (57) and the equilibrium stationary states (59). Additionally, as future work, we intend to apply the dynamics developed here in the distributions found in network sciences [65].

Author Contributions

All authors contributed equally. All authors have read and agreed to the published version of the manuscript.

Funding

P. Pessoa was financed in part by CNPq—Conselho Nacional de Desenvolvimento Científico e Tecnológico– (scholarship GDE 249934/2013-2).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article. Code producing the graphs in Figure 1, Figure 2, Figure 3 and Figure 4 is available upon request.

Acknowledgments

We would like to thank N. Caticha, C. Cafaro, S. Ipek, N. Carrara, and M. Abedi for valuable discussions.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MaxEntMAXimum ENTropy—according to Jaynes [2]
ME(Modern) Maximum Entropy—since Shore & Johnson [7]
FRIMFisher-Rao Information Metric
EDEntropic Dynamics

Appendix A. Obtaining the Prior

In this appendix we derive the prior transition probability from A to A seen in (25). This is achieved by maximizing the entropy
S [ Q ] = d A Q ( A | x , A ) log Q ( A | x , A ) R ( A | x , A ) ,
where R ( A | x , A ) , the prior for (A1), encodes information about an unbiased transition of the systems. The posterior of (A1), Q ( A | x , A ) , becomes the prior in (21).
At this stage A could evolve into any A and the only assumption is that the assigned prior for (A1) leads to equal probabilities for equal volumes; thus, ignoring biases. That is achieved by a prior proportional to the volume element R ( A | x , A ) g 1 / 2 ( A ) , where g ( A ) = det g i j ( A ) . There is no need to address the normalization of R since it will have no effect on the posterior.
The chosen constraint represents an isotropic and continuous motion on the manifold. This will be imposed by
d A Q ( A | x , A ) g i j Δ A i Δ A j = K .
where K is a small quantity, since g i j Δ A i Δ A j is invariant only in the limit for short steps Δ A i 0 . Therefore, eventually, K 0 .
The result of maximizing (A1) under (A2) and normalization is
Q ( A | x , A ) g 1 / 2 ( A ) exp α g i j Δ A i Δ A j ,
where α is the Lagrange multiplier associated to (A2). As the result requires K 0 to make it geometrically invariant, the conjugated Lagrange multiplier should be allowed to be taken to infinity. This allows us to define τ = 1 / α , such that the short step limit will lead to τ 0 .
Note that, since no motion in x and no correlation between x and A is induced by the constraints, the result does not depend on the previous microstate x, Q ( A | x , A ) = Q ( A | A ) .

Appendix B. Derivation of the Fokker-Planck Equation

The goal of this appendix is to show that for a dynamics that is a smooth diffusion in a curved space, can be written as a Fokker-Planck equation and to obtain its velocity (52) from the moments for the motion (50). In order to do so, it is convenient to define a drift velocity
b i = lim Δ t 0 Δ A i Δ t = g i j S A j 1 2 Γ i .
First, let us analyze the change of a smooth integrable function f ( A ) as the system transitions from A to A . A smooth change in the function f ( A ) will be
Δ f ( A ) = f A i Δ A i + 1 2 2 f A i A j Δ A i Δ A j + o ( Δ t ) ,
since a cubic term, Δ A i Δ A j Δ A k would be o ( Δ t ) . In a smooth diffusion we can take the expected value of (A5) with respect to P ( A | A ) as
Δ f ( A ) = d A P ( A | A ) ( f ( A ) f ( A ) ) = b i A i + 1 2 g i j 2 A i A j f ( A ) Δ t .
which can be further averaged in P ( A ) . The left-hand side will be
d A P ( A ) d A P ( A | A ) ( f ( A ) f ( A ) ) = d A d A P ( A , A ) ( f ( A ) f ( A ) )
= d A P ( A ) f ( A ) d A P ( A ) f ( A )
while the right hand is
d A P ( A ) b i A i + 1 2 g i j 2 A i A j f ( A ) Δ t .
such that they equate to
d A P ( A ) f ( A ) d A P ( A ) f ( A ) = d A P ( A ) b i A i + 1 2 g i j 2 A i A j f ( A ) Δ t .
As established in Section 5, P ( A ) and P ( A ) are distributions at the instants t and t respectively.
d A P t ( A ) P t ( A ) Δ t f ( A ) = d A P ( A ) b i A i + 1 2 g i j 2 A i A j f ( A ) Δ t ,
which can be partially integrated in the limit of small steps
d A P ( A ) t f ( A ) = d A A i ( b i P ( A ) ) + 1 2 2 A i A j ( g i j P ( A ) ) f ( A ) .
Due to the generality of f as test function, we identify the integrants,
t P ( A ) = A i b i P ( A ) 1 2 A j ( g i j P ( A ) ) ,
and substitute b i (A4) for general coordinates,
t P ( A ) = A i g i j S A j P ( A ) 1 2 Γ i P ( A ) 1 2 g i j A j P ( A ) 1 2 g i j P ( A ) A j ,
and the contracted Christoffel symbols can be substituted in the identity
Γ i = 1 g 1 / 2 A j ( g 1 / 2 g i j ) = g i j A j g i j log g 1 / 2 A j .
Here we see that the effect of curvature—encoded by the Christoffel symbols—substitute in the differential Equation (A13) obtaining
t P ( A ) = A i g i j S A j 1 2 g i j A j log P ( A ) g 1 / 2 P ( A ) ,
where the second term inside the parenthesis above is the result of taking the curvature into account. The result is a Fokker-Planck equation that is usefully written in the continuity form
t P = A i P v i ,
where
v i = g i j S A j 1 2 g i j A j log P g 1 / 2 ,
completing the derivation.

References

  1. Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef] [Green Version]
  2. Jaynes, E.T. Information theory and statistical mechanics: I. Phys. Rev. 1957, 106, 620. [Google Scholar] [CrossRef]
  3. Jaynes, E.T. Information theory and statistical mechanics. II. Phys. Rev. 1957, 108, 171. [Google Scholar] [CrossRef]
  4. Rosenkrantz, R.D. (Ed.) E. T. Jaynes: Papers on Probability, Statistics and Statistical Physics; Reidel: Dordrecht, The Netherlands, 1983. [Google Scholar] [CrossRef]
  5. Jaynes, E.T. Probability Theory: The Logic of Science; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
  6. Gibbs, J. Elementary Principles in Statistical Mechanics; Yale University Press: New Haven, Connecticut, 1902; Reprinted by Ox Bow Press: Woodbridge, Connecticut, 1981. [Google Scholar]
  7. Shore, J.; Johnson, R. Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy. IEEE Trans. Inf. Theory 1980, 26, 26–37. [Google Scholar] [CrossRef] [Green Version]
  8. Skilling, J. The Axioms of Maximum Entropy. In Maximum-Entropy and Bayesian Methods in Science and Engineering; Erickson, G.J., Smith, C.R., Eds.; Springer: Dordrecht, The Netherlands, 1988; Volumes 31–32, pp. 173–187. [Google Scholar] [CrossRef]
  9. Caticha, A. Relative Entropy and Inductive Inference. AIP Conf. Proc. Am. Inst. Phys. 2004, 707, 75–96. [Google Scholar] [CrossRef]
  10. Caticha, A. Information and Entropy. AIP Conf. Proc. Am. Inst. Phys. 2007, 954, 11–22. [Google Scholar] [CrossRef] [Green Version]
  11. Caticha, A.; Giffin, A. Updating Probabilities. AIP Conf. Proc. Am. Inst. Phys. 2006, 872, 31–42. [Google Scholar] [CrossRef] [Green Version]
  12. Vanslette, K. Entropic Updating of Probabilities and Density Matrices. Entropy 2017, 19, 664. [Google Scholar] [CrossRef] [Green Version]
  13. Caticha, A. Entropic Physics: Probability, Entropy, and the Foundations of Physics. Available online: https://www.albany.edu/physics/faculty/ariel-caticha (accessed on 19 April 2021).
  14. Caticha, A.; Golan, A. An entropic framework for modeling economies. Phys. A Stat. Mech. Appl. 2014, 408, 149–163. [Google Scholar] [CrossRef]
  15. Harte, J. Maximum Entropy and Ecology: A Theory of Abundance, Distribution, and Energetics; OUP Oxford: Oxford, UK, 2011. [Google Scholar]
  16. Banavar, J.R.; Maritan, A.; Volkov, I. Applications of the principle of maximum entropy: From physics to ecology. J. Phys. Condens. Matter 2010, 22, 063101. [Google Scholar] [CrossRef] [Green Version]
  17. De Martino, A.; De Martino, D. An introduction to the maximum entropy approach and its application to inference problems in biology. Heliyon 2018, 4, e00596. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  18. Dixit, P.D.; Lyashenko, E.; Niepel, M.; Vitkup, D. Maximum entropy framework for predictive inference of cell population heterogeneity and responses in signaling networks. Cell Syst. 2020, 10, 204–212. [Google Scholar] [CrossRef]
  19. Cimini, G.; Squartini, T.; Saracco, F.; Garlaschelli, D.; Gabrielli, A.; Caldarelli, G. The statistical physics of real-world networks. Nat. Rev. Phys. 2019, 1, 58–71. [Google Scholar] [CrossRef] [Green Version]
  20. Radicchi, F.; Krioukov, D.; Hartle, H.; Bianconi, G. Classical information theory of networks. J. Phys. Complex. 2020, 1, 025001. [Google Scholar] [CrossRef]
  21. Vicente, R.; Susemihl, A.; Jericó, J.P.; Caticha, N. Moral foundations in an interacting neural networks society: A statistical mechanics analysis. Phys. A Stat. Mech. Its Appl. 2014, 400, 124–138. [Google Scholar] [CrossRef] [Green Version]
  22. Alves, F.; Caticha, N. Sympatric Multiculturalism in Opinion Models; AIP Conference Proceedings; AIP Publishing LLC.: New York, NY, USA, 2016; Volume 1757, p. 060005. [Google Scholar] [CrossRef] [Green Version]
  23. Jaynes, E.T. Where do we stand on maximum entropy? In The Maximum Entropy Principle; Levine, R.D., Tribus, M., Eds.; MIT Press: Cambridge, MA, USA, 1979. [Google Scholar] [CrossRef]
  24. Balian, R. From Microphysics to Macrophysics: Methods and Applications of Statistical Mechanics. Volumes I and II; Springer: Heidelberg, Germany, 1991–1992. [Google Scholar]
  25. Pressé, S.; Ghosh, K.; Lee, J.; Dill, K.A. Principles of maximum entropy and maximum caliber in statistical physics. Rev. Mod. Phys. 2013, 85, 1115–1141. [Google Scholar] [CrossRef] [Green Version]
  26. Davis, S.; González, D. Hamiltonian formalism and path entropy maximization. J. Phys. A Math. Theor. 2015, 48, 425003. [Google Scholar] [CrossRef] [Green Version]
  27. Cafaro, C.; Ali, S.A. Maximum caliber inference and the stochastic Ising model. Phys. Rev. E 2016, 94. [Google Scholar] [CrossRef] [Green Version]
  28. Caticha, A. Entropic dynamics, time and quantum theory. J. Phys. A Math. Theor. 2011, 44, 225303. [Google Scholar] [CrossRef] [Green Version]
  29. Caticha, A. The Entropic Dynamics Approach to Quantum Mechanics. Entropy 2019, 21, 943. [Google Scholar] [CrossRef] [Green Version]
  30. Ipek, S.; Abedi, M.; Caticha, A. Entropic dynamics: Reconstructing quantum field theory in curved space-time. Class. Quantum Gravity 2019, 36, 205013. [Google Scholar] [CrossRef] [Green Version]
  31. Pessoa, P.; Caticha, A. Exact renormalization groups as a form of entropic dynamics. Entropy 2018, 20, 25. [Google Scholar] [CrossRef] [Green Version]
  32. Abedi, M.; Bartolomeo, D. Entropic Dynamics of Exchange Rates and Options. Entropy 2019, 21, 586. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  33. Abedi, M.; Bartolomeo, D. Entropic Dynamics of Stocks and European Options. Entropy 2019, 21, 765. [Google Scholar] [CrossRef] [Green Version]
  34. Caticha, N. Entropic Dynamics in Neural Networks, the Renormalization Group and the Hamilton-Jacobi-Bellman Equation. Entropy 2020, 22, 587. [Google Scholar] [CrossRef]
  35. Fisher, R.A. Theory of Statistical Estimation. Proc. Camb. Philos. Soc. 1925, 122, 700. [Google Scholar] [CrossRef] [Green Version]
  36. Rao, C.R. Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 1945, 37, 81. [Google Scholar] [CrossRef]
  37. Amari, S.; Nagaoka, H. Methods of Information Geometry; American Mathematical Society: Providence, RI, USA, 2000. [Google Scholar]
  38. Amari, S. Information Geometry and Its Applications; Springer International Publishing: Berlin, Germany, 2016. [Google Scholar] [CrossRef]
  39. Ay, N.; Jost, J.; Lê, H.V.; Schwachhöfer, L. Information Geometry; Springer International Publishing: Berlin, Germany, 2017. [Google Scholar] [CrossRef]
  40. Caticha, A. The basics of information geometry. AIP Conf. Proc. Am. Inst. Phys. 2015, 1641, 15–26. [Google Scholar] [CrossRef] [Green Version]
  41. Nielsen, F.; Garcia, V. Statistical exponential families: A digest with flash cards. arXiv 2009, arXiv:cs.LG/0911.4863. [Google Scholar]
  42. Ruppeiner, G. Riemannian geometry in thermodynamic fluctuation theory. Rev. Mod. Phys. 1995, 67, 605. [Google Scholar] [CrossRef]
  43. Janyszek, H.; Mrugala, R. Riemannian geometry and stability of ideal quantum gases. J. Phys. A Math. Gen. 1990, 23, 467. [Google Scholar] [CrossRef]
  44. Brody, D.; Rivier, N. Geometrical aspects of statistical mechanics. Phys. Rev. E 1995, 51, 1006. [Google Scholar] [CrossRef]
  45. Oshima, H.; Obata, T.; Hara, H. Riemann scalar curvature of ideal quantum gases obeying Gentiles statistics. J. Phys. A Math. Gen. 1999, 32, 6373–6383. [Google Scholar] [CrossRef]
  46. Brody, D.; Hook, D.W. Information geometry in vapour–liquid equilibrium. J. Phys. A Math. Theor. 2008, 42, 023001. [Google Scholar] [CrossRef]
  47. Yapage, N.; Nagaoka, H. An information geometrical approach to the mean-field approximation for quantum Ising spin models. J. Phys. A Math. Theor. 2008, 41, 065005. [Google Scholar] [CrossRef]
  48. Tanaka, S. Information geometrical characterization of the Onsager-Machlup process. Chem. Phys. Lett. 2017, 689, 152–155. [Google Scholar] [CrossRef]
  49. Nicholson, S.B.; del Campo, A.; Green, J.R. Nonequilibrium uncertainty principle from information geometry. Phys. Rev. E 2018, 98, 032106. [Google Scholar] [CrossRef] [Green Version]
  50. Ay, N.; Olbrich, E.; Bertschinger, N.; Jost, J. A geometric approach to complexity. Chaos Interdiscip. J. Nonlinear Sci. 2011, 21, 037103. [Google Scholar] [CrossRef] [PubMed]
  51. Felice, D.; Mancini, S.; Pettini, M. Quantifying networks complexity from information geometry viewpoint. J. Math. Phys. 2014, 55, 043505. [Google Scholar] [CrossRef] [Green Version]
  52. Felice, D.; Cafaro, C.; Mancini, S. Information geometric methods for complexity. Chaos Interdiscip. J. Nonlinear Sci. 2018, 28, 032101. [Google Scholar] [CrossRef]
  53. Fisher, R.A. On the mathematical foundations of theoretical statistics. Philos. Trans. R. Soc. Lond. 1922, 222, 309–368. [Google Scholar] [CrossRef] [Green Version]
  54. Pitman, E.J.G. Sufficient statistics and intrinsic accuracy. Mathematical Proceedings of the Cambridge Philosophical Society; Cambridge University Press: Cambridge, UK, 1936; Volume 32, pp. 567–579. [Google Scholar] [CrossRef]
  55. Darmois, G. Sur les lois de probabilitéa estimation exhaustive. CR Acad. Sci. Paris 1935, 260, 85. [Google Scholar]
  56. Koopman, B.O. On distributions admitting a sufficient statistic. Trans. Am. Math. Soc. 1936, 39, 399–409. [Google Scholar] [CrossRef]
  57. Brody, D. A note on exponential families of distributions. J. Phys. A Math. Theor. 2007, 40, F691. [Google Scholar] [CrossRef] [Green Version]
  58. Cencov, N.N. Statistical decision rules and optimal inference. Am. Math. Soc. 1981, 53. [Google Scholar] [CrossRef]
  59. Campbell, L.L. An extended Cencov characterization of the information metric. Proc. Am. Math. Soc. 1986, 98, 135–141. [Google Scholar] [CrossRef] [Green Version]
  60. Beck, C.; Cohen, E.G.D. Superstatistics. Phys. A Stat. Mech. Appl. 2003, 322, 267–275. [Google Scholar] [CrossRef] [Green Version]
  61. Kobayashi, S.; Nomizu, K. Foundations of Differential Geometry (Wiley Classics Library); John Wiley and Sons: New York, NY, USA, 1963; Volume 1. [Google Scholar]
  62. Nawaz, S.; Abedi, M.; Caticha, A. Entropic Dynamics on Curved Spaces; AIP Conference Proceedings; AIP Publishing LLC.: New York, NY, USA, 2016; Volume 1757, p. 030004. [Google Scholar] [CrossRef] [Green Version]
  63. Nelson, E. Quantum Fluctuations; Princeton University Press: Princeton, NJ, USA, 1985. [Google Scholar]
  64. Python-ternary: Ternary Plots in Python. GitHub Repository. Available online: https://github.com/marcharper/python-ternary/ (accessed on 19 April 2021).
  65. Costa, F.X.; Pessoa, P. Entropic dynamics of networks. Northeast J. Complex Syst. 2021, 3, 5. [Google Scholar] [CrossRef]
Figure 1. The drift velocity field (71) drives the flux along the entropy gradient.
Figure 1. The drift velocity field (71) drives the flux along the entropy gradient.
Entropy 23 00494 g001
Figure 2. Equilibrium stationary probability (72).
Figure 2. Equilibrium stationary probability (72).
Entropy 23 00494 g002
Figure 3. Drift velocity field for the two-simplex in (79). The ternary plots ware created using python-ternary library [64].
Figure 3. Drift velocity field for the two-simplex in (79). The ternary plots ware created using python-ternary library [64].
Entropy 23 00494 g003
Figure 4. Static invariant stationary probability for the three-state system.
Figure 4. Static invariant stationary probability for the three-state system.
Entropy 23 00494 g004
Table 1. Identification of sufficient statistics, priors and Lagrange multipliers for some well-known probability distributions.
Table 1. Identification of sufficient statistics, priors and Lagrange multipliers for some well-known probability distributions.
Distribution λ ParameterSuff. Stat.Prior
Exponent Polynomial ρ ( x | β ) = β k Γ ( 1 + 1 / β ) e β x k λ = β a ( x ) = x k uniform
Gaussian ρ ( x | μ , σ ) = 1 2 π σ 2 exp ( x μ ) 2 2 σ 2 λ = μ σ 2 , 1 2 σ 2 a ( x ) = ( x , x 2 ) uniform
Multinomial (k) ρ ( x | θ ) = n ! x 1 ! x k ! θ 1 x 1 θ k x k λ = log ( θ 1 , θ 2 , , θ k ) a = ( x 1 , x k ) q ( x ) = i = 1 k x i !
Poisson ρ ( x | m ) = m x x ! e m λ = log m a ( x ) = x q ( x ) = 1 / x !
Mixed power laws ρ ( x | α , β ) = x α e β x β α 1 Γ ( 1 α ) λ = ( α , β ) a = ( log x , x ) uniform
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Pessoa, P.; Costa, F.X.; Caticha, A. Entropic Dynamics on Gibbs Statistical Manifolds. Entropy 2021, 23, 494. https://0-doi-org.brum.beds.ac.uk/10.3390/e23050494

AMA Style

Pessoa P, Costa FX, Caticha A. Entropic Dynamics on Gibbs Statistical Manifolds. Entropy. 2021; 23(5):494. https://0-doi-org.brum.beds.ac.uk/10.3390/e23050494

Chicago/Turabian Style

Pessoa, Pedro, Felipe Xavier Costa, and Ariel Caticha. 2021. "Entropic Dynamics on Gibbs Statistical Manifolds" Entropy 23, no. 5: 494. https://0-doi-org.brum.beds.ac.uk/10.3390/e23050494

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop