Next Article in Journal
The Truncated Burr X-G Family of Distributions: Properties and Applications to Actuarial and Financial Data
Next Article in Special Issue
A Model of Interacting Navier–Stokes Singularities
Previous Article in Journal
An Analytical Technique, Based on Natural Transform to Solve Fractional-Order Parabolic Equations
Previous Article in Special Issue
Extreme Value Theory in Application to Delivery Delays
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Causal Information Rate

by
Eun-jin Kim
* and
Adrian-Josue Guel-Cortez
Center for Fluid and Complex Systems, Coventry University, Priory St., Coventry CV1 5FB, UK
*
Author to whom correspondence should be addressed.
Submission received: 24 July 2021 / Revised: 13 August 2021 / Accepted: 18 August 2021 / Published: 21 August 2021

Abstract

:
Information processing is common in complex systems, and information geometric theory provides a useful tool to elucidate the characteristics of non-equilibrium processes, such as rare, extreme events, from the perspective of geometry. In particular, their time-evolutions can be viewed by the rate (information rate) at which new information is revealed (a new statistical state is accessed). In this paper, we extend this concept and develop a new information-geometric measure of causality by calculating the effect of one variable on the information rate of the other variable. We apply the proposed causal information rate to the Kramers equation and compare it with the entropy-based causality measure (information flow). Overall, the causal information rate is a sensitive method for identifying causal relations.

1. Introduction

Entropy-related concepts and information theory [1,2,3,4,5,6,7,8,9] are useful for understanding complex dynamics in equilibrium and out of equilibrium. Examples include information (Shannon) entropy (measuring disorder, or lack of information) [1], Fisher information [2], relative entropy [3], mutual information [10], and their microscopic versions (e.g., trajectory entropy) [11,12], etc. In particular, while, in equilibrium, the Shannon entropy has a unique thermodynamic meaning, this is no longer the case in non-equilibrium, with different proposals for generalized entropies (e.g., see the review paper of Reference [13] and references therein). Recent years have witnessed the increased awareness of information as a useful physical concept, for instance, in resolving the famous Maxwell’s demon paradox [14], setting various thermodynamic inequality/uncertainty relations [15,16,17], and establishing theoretical and conceptual links between physics and biology [18]. Information-related ideas are also useful to uncover unexpected relations between apparently unrelated problems, for instance, the connections between Fisher information and Schrödinger equation, inspiring new development in non-equilibrium statistical mechanics [19].
We have recently proposed information-geometric theory as a powerful tool to understand non-equilibrium stochastic processes that often involve high temporal variabilities and large fluctuations [20,21,22,23,24,25,26,27,28,29,30,31,32], as often the case of rare, extreme events. This is based on the surprisal rate, r ( x , t ) = t s ( x , t ) = t ln p ( x , t ) , where p ( x , t ) is a probability density function (PDF) of a random variable x at time t, and s ( x , t ) = ln p ( x , t ) is a local entropy. r ( x , t ) , informing how rapidly p ( x , t ) or r ( x , t ) changes in time, is especially useful for understanding time-varying non-equilibrium processes. As the name indicates, the surprisal rate r measures the degree of surprise when p ( x , t ) changes in time (no surprise in equilibrium with r = 0 ). We can easily show that the average of the surprisal rate d x p ( x , t ) r ( x , t ) = 0 since d x p ( x , t ) = 1 . We note that, in this paper, averages refer to ensemble averages, which vary with time. A non-zero value is obtained from the second moment of r ( x , t ) as
E ( t ) = Γ 2 ( t ) = d x p ( x , t ) ( r ( x , t ) ) 2 = d x p ( x , t ) ( t ln p ( x , t ) ) 2 ,
where Γ ( t ) represents the information rate at which a new information is revealed (a new statistical state is accessed) due to time-evolution. Alternatively, τ ( t ) = Γ ( t ) 1 is the characteristic time scale over which information changes, linked to the smallest time scale of fluctuations [17].
It is important to highlight that E , Γ , and τ have the dimensions of (time) 2 , (time) 1 , and (time), respectively. In addition, we note that E is proportional to the average of an infinitesimal relative entropy (Kullback–Leibler divergence) (e.g., see Reference [20]),
E = lim d t 0 2 ( d t ) 2 d x p ( x , t + d t ) ln p ( x , t + d t ) p ( x , t ) = lim d t 0 2 ( d t ) 2 d x p ( x , t ) ln p ( x , t ) p ( x , t + d t ) .
The total change in information between the initial time 0 and the time t is then obtained by integrating 1 τ ( t ) over time as L ( t ) = 0 t d t 1 τ ( t 1 ) which is the information length, quantifying the total number of statistically different states that a system passes through in time. In the limit of a Gaussian PDF where the variance is constant in time, one statistically distinguishable state is generated when a PDF peak moves by one standard deviation since the latter provides the uncertainty in measuring the peak position of the PDF. In a nutshell, L is an information-geometric measure, enabling us to quantify how the “information” unfolds in time by dimensionless distance. Unlike other information measures, E , τ , and L are invariant under (time-independent) change of variables and are not system-specific. This non-system-specificity is especially useful for comparing the evolution of different variables/systems having different units.
Furthermore, L is a path-dependent dimensionless distance and is uniquely defined as a function of time for fixed parameters and initial condition. These properties are advantageous for quantifying correlation in time-varying data and understanding self-organization, long memory, and hysteresis involved in phase transitions [20,24,27,28,29,30,32]. In particular, we recently investigated a non-autonomous Kramer equation by including a sudden perturbation to the system to mimic the onset of a sudden event [32], demonstrating that our information rate predicts the onset of a sudden event better than one of the entropy-based measures (information flow) (see Section 5.4 for details).
The purpose of this paper is to develop an information-geometric measure of causality (the causal information rate) by generalizing E ( Γ ). Like Reference [32], our intention here is not on modeling the appearance of rare, extreme events (that are nonlinear, non-Gaussian) themselves, but on developing a new information-geometric causality method which is useful for predicting and understanding those events. The remainder of this paper is organized as follows. We propose the causal information rate in Section 2 and apply it to the Kramers equation in Section 3. One of the entropy-based methods (the information flow) is calculated in Section 4 and is compared with our proposed method in Section 5. Conclusions are provided in Section 6. Appendix A, Appendix B and Appendix C show some detailed steps involved in our calculations. We note that, while the usual convention in statistics and probability theory is to use upper case letters for random variables and lower case letters for their realizations, we do not make such distinctions in this paper as their meanings should be clear from the context.

2. Causal Information Rate

Information theoretical measures of causality are often based on entropy, joint entropy, conditional entropy, or mutual information, where the causality is measured by the improvement of the predictability of one of the variables at future time by the knowledge of the other variable, the improved predictability being measured by the decrease in entropy [16,33,34,35,36,37,38,39,40,41]. However, there have been questions raised as to whether predictability improvement (e.g., as measured by the Granger causality, transfer entropy) is directly linked to causality (e.g., Reference [39]) and the suggestion that causality is better understood by performing an intervention experiment to measure the effect of the change (some type of perturbation or intervention) in one variable on another. In particular, spurious causalities between the two (observed) state variables can arise through unobserved state variables that interact with both (observed) state variables, calling for care in dealing with a system with more than two variables. On the other hand, to deal with strongly time-dependent data, the concepts of transfer entropy rate [34], information flow [16,40,41], etc., are proposed.
It is not the aim of this paper to provide detailed discussions about these methods, but to introduce a new information geometric measure of causality (see below) and to compare our new method with one of them (information flow) (see Section 4). Our new information geometric measure of causality focuses on how one variable affects the information rate of another variable. To this end, we generalize Γ in Equation (1) and define the causal information rate for multiple variables.
In order to demonstrate the basic idea required, it is instructive to consider a stochastic system consisting of two variables X 1 and X 2 which have a bivariate joint PDF p ( X 1 , t 1 ; X 2 , t 2 ) at different times t 1 and t 2 , its equal-time joint PDF p ( X 1 , t ; X 2 , t ) p ( X 1 , X 2 , t ) , and the conditional PDFs p ( X 2 , t 2 | X 1 , t 1 ) = p ( X 2 , t 2 ; X 1 , t 1 ) / p ( X 1 , t 1 ) , as well as marginal PDFs p ( X 1 , t ) = d X 2 p ( X 1 , X 2 , t ) and p ( X 2 , t ) = d x 1 p ( X 1 , X 2 , t ) . Using the index notation i , j = 1 , 2 , we then define causal information rate Γ i j for i j from the variable X i to X j as follows:
Γ i j Γ j * Γ j ,
E j Γ j ( t ) 2 = d X j p ( X j , t ) t ln ( p ( X j , t ) 2 ,
E j * Γ j * ( t ) 2 = lim t 1 t + d X i d X j p ( X j , t 1 ; X i , t ) t 1 ln [ p ( X j , t 1 | X i , t ) ] 2 = lim t 1 t + d X i d X j p ( X j , t 1 ; X i , t ) t 1 ln [ p ( X j , t 1 ; X i , t ) ] 2 .
Here, t 1 p ( X i , t ) = 0 for t 1 t was used. Γ i = 1 τ i ( t ) represents the information rate of X i with its characteristic timescale τ i ( t ) . Note that the subscript j in Γ j denotes that the information rate is calculated for the variable j. Since Γ j contains the contribution from the variable j itself and other variable i j , we denote the (auto) contribution from j-th itself (where the other variable i j is frozen in time) by using the superscript *. That is, Γ j * represents the information rate of X j for given (frozen) X i . Subtracting Γ j from Γ j * in Equation (3) then gives us the contribution of dynamic (time-evolving) X i to Γ j , signifying how X i instantaneously influences the information rate of X j .
It is important to note that, as in the case of the information rate Γ or L , the calculation of Γ i j , Γ j and Γ j * in Equations (3)–(5) does not require the knowledge of the main governing equations (stochastic differential equations). This is because Equations (3)–(5) can be calculated from any (numerical or experimental) data as long as time-dependent (marginal, joint) PDFs can be constructed. For instance, we used a time-sliding window method to construct time-dependent PDFs of different variables and then calculated E and L to analyze numerically generated time-series data for fusion turbulence [26], time-series music data [20], and numerically generated time-series data for global circulation model [28]. However, it is not always clear how many hidden variables are in a given data set.
It is also useful to note that, as in the case of Equation (2), Equation (5) can be shown to be related to the infinitesimal relative entropy as
E j * = Γ j * 2 = 2 lim d t 0 1 ( d t ) 2 d X i d X j p ( X j , t + d t ; X i , t ) ln p ( X j , t + d t ; X i , t ) p ( X j , X i , t ) .
The method presented above is for a stochastic process with two variables. For stochastic processes involving three or more variables ( i , j = 1 , 2 , n , n 3 ), one way to proceed is to calculate multivariate PDFs, and then bivariate joint PDFs p ( X j , t 1 ; X j , t 2 ) and its equal-time joint PDF p ( X i , t ; X j , t ) p ( X i , X j , t ) , and marginal PDFs p ( X i , t ) and p ( X j , t ) , and then calculate the information rate from X i to X j , where i j ( i , j = 1 , 2 , n , n 3 ), via Equations (3)–(5). This will give us an effective causal information rate. Another way is to deal with the multivariate PDFs directly (to be reported in future work).

3. Kramers Equation

To demonstrate how the methods Equations (3)–(5) work, in this section, we investigate an analytically solvable, Kramers equation, governed by the following Langevin equations [31,42]:
d x d t = v ,
d v d t = γ v ω 2 x + ξ .
Here, ξ is a short (delta) correlated Gaussian noise with a zero average (mean) ξ = 0 and the strength D with the following property:
ξ ( t ) ξ ( t ) = 2 D δ ( t t ) ,
where the angular brackets denote the average over ξ ( ξ = 0 ).
Assuming an initial Gaussian PDF, time-dependent PDFs remain Gaussian for all time. Thus, the bivariate joint PDF p ( x , t 1 ; v , t 2 ) and the marginal PDFs p ( x , t ) and p ( v , t ) are completely determined by covariance and mean values as
p ( x , t 1 ; v , t 2 ) = 1 ( 2 π ) | Σ ( t 1 , t 2 ) | exp ( 1 2 Σ i j 1 ( t 1 , t 2 ) ( X i X i ) ( X j X j ) ) ,
p ( x , t ) = β x π exp ( β x ( x x ( t ) ) 2 ) ,
p ( v , t ) = β v π exp ( β v ( v v ( t ) ) 2 ) .
Here, ( X 1 , X 2 ) = ( x ( t 1 ) , v ( t 2 ) ) . x and v are the mean values. Σ ( t 1 , t 2 ) is the covariance matrix with the elements Σ 11 = Σ x x ( t 1 ) = ( δ x ( t 1 ) ) 2 = 1 2 β x ( t 1 ) , Σ 12 = Σ 21 = Σ x v ( t 1 , t 2 ) = δ x ( t 1 ) δ v ( t 2 ) , and Σ 22 = Σ v v ( t 2 ) = ( δ v ( t 2 ) ) 2 = 1 2 β v ( t 2 ) , where δ x ( t 1 ) = x x ( t 1 ) and δ v ( t 2 ) = v ( t 2 ) v ( t 2 ) . Σ 1 is the inverse of Σ , while | Σ | = Σ 11 Σ 22 Σ 12 2 is the determinant. Appendix A shows how to calculate mean values and the elements of covariance matrix.
Entropy of the joint PDF p ( x , t 1 ; v , t 2 ) and marginal PDFs p ( x , t ) and p ( v , t ) can easily be shown to be
S ( t 1 , t 2 ) = d x d v p ( x , t 1 ; v , t 2 ) ln p ( x , t 1 ; v , t 2 ) = 1 2 1 + ln ( ( 2 π ) 2 | Σ ( t 1 , t 2 ) | ) ,
S x ( t ) = d x p ( x , t ) ln p ( x , t ) = 1 2 1 + ln ( 2 π Σ x x ( t ) ) ,
S v ( t ) = d v p ( v , t ) ln p ( v , t ) = 1 2 1 + ln ( 2 π Σ v v ( t ) ) .
On the other hand, the information rates for the equal-time joint PDF and the marginal PDFs are given by
E = Γ 2 = d x d v p ( x , v , t ) ( t ln p ( x , v , t ) ) 2 = t X i Σ i j 1 t X j + 1 2 Tr [ ( Σ 1 Σ ˙ ) 2 ] ,
E x = Γ x 2 = d x p ( x , t ) ( t ln p ( x , t ) ) 2 = 1 Σ x x d x d t 2 + 1 2 Σ x x 2 d Σ x x d t 2 ,
E v = Γ v 2 = d v p ( v , t ) ( t ln p ( v , t ) ) 2 = 1 Σ v v d v d t 2 + 1 2 Σ v v 2 d Σ v v d t 2 .
It is useful to note that the first term on the RHS of Equations (17) and (18) is caused by the temporal change in the mean values of x and v, respectively, while the second term is due to that in the variance. Equation (16) for the joint PDF contains the contribution from the temporal changes in the mean values of x and v and in the covariance matrix. The derivation of Equations (16)–(18) is provided in Appendix B (also see Reference [31]).
To clarify the key idea behind the causality information rate, we provide detailed mathematical steps involved in the definition and calculation of Γ x v and Γ v x in Section 3.1 and Section 3.2, respectively.

3.1. Γ v x

We start with the Kramers process Equations (7)–(9), where X 2 = v is frozen for time ( t , t 1 ) ;
d x d t = v ,
d v d t = 0 .
Then, the bivariate Gaussian PDF in Equation (10) for a fixed v takes the following form:
p ( x , t 1 ; v , t ) = 1 ( 2 π ) | Σ ( t 1 , t ) | exp ( 1 2 Σ i j 1 ( t 1 , t ) ( X i X i ) ( X j X j ) ) ,
Σ ( t 1 , t ) = Σ x x ( t 1 ) Σ x v ( t 1 , t ) Σ x v ( t 1 , t ) Σ v v ( t ) = ( δ x ( t 1 ) ) 2 δ x ( t 1 ) δ v ( t ) δ x ( t 1 ) δ v ( t ) ( δ v ( t ) ) 2 ,
where X 1 = x ( t 1 ) and X 2 ( t 2 ) , X 1 = x ( t 1 ) , X 2 = v ( t ) , δ x ( t 1 ) = x ( t 1 ) x ( t 1 ) , and δ v ( t ) = v ( t ) v ( t ) .
For i = 2 and j = 1 in Equations (3) and (5), we have
Γ v x ( t ) = Γ x * Γ x ,
E x * ( t ) = Γ x * ( t ) 2 = lim t 1 t + d x d v p ( x , t 1 ; v , t ) ( t 1 ln p ( x , t 1 ; v , t ) ) 2 ,
where Γ x = E x . In Appendix B, we show that E x * is given by
E x * = lim t 1 t + t 1 X i Σ i j 1 ( t 1 , t ) t 1 X j + 1 2 Tr [ ( Σ 1 ( t 1 , t ) t 1 Σ ( t 1 , t ) ) 2 ] .
Since v is frozen during time ( t , t 1 ) , lim t 1 t t 1 X i = δ i 1 t x ( t ) ; Σ v v remains constant, while Σ x x and Σ x v change, as follows:
lim t 1 t t 1 Σ v v ( t ) = 0 ,
lim t 1 t t 1 Σ x v ( t 1 , t ) = lim t 1 t t 1 ( δ x ( t 1 ) ) δ v ( t ) = Σ v v ( t ) Σ ˙ x v ,
lim t 1 t t 1 Σ x x ( t 1 ) = lim t 1 t t 1 ( δ x ( t 1 ) ) δ x ( t 1 ) = 2 Σ v x ( t ) Σ ˙ x x .
Then, to calculate the two terms on RHS of Equation (25), we note
Σ 1 ( t 1 , t ) = 1 | Σ | Σ v v ( t ) Σ x v ( t 1 , t ) Σ x v ( t 1 , t ) Σ x x ( t ) , lim t 1 t t 1 Σ ( t 1 , t ) = Σ ˙ x x Σ ˙ x v Σ ˙ x v 0 ,
where Σ ˙ x x and Σ ˙ x v are given in Equations (27) and (28), while Σ 1 = Σ 1 ( t ) is the inverse of the equal time covariance matrix. Using Equation (29), we can show that the second term on RHS of Equation (25) becomes
lim t 1 t + 1 2 Tr [ ( Σ 1 ( t 1 , t ) t 1 Σ ( t 1 , t ) ) 2 ] = 1 2 | Σ | 2 ( Σ v v Σ ˙ x x Σ x v Σ ˙ x v ) 2 + 2 ( Σ v v Σ ˙ x v ) ( Σ x v Σ ˙ x x + Σ x x Σ ˙ x v ) + ( Σ x v Σ ˙ x v ) 2 = 1 2 | Σ | 2 2 | Σ | Σ ˙ x v 2 + ( Δ v x ) 2 .
Here,
Δ v x Σ v v Σ ˙ x x 2 Σ ˙ x v Σ x v = 2 Σ v v Σ x v 2 Σ v v Σ x v = 0 ,
where Σ ˙ x x = 2 Σ x v and Σ ˙ x v = Σ v v in Equations (27) and (28) are used. It is useful to note that Δ v x represents the rate at which the determinant of the covariance matrix changes in time for a fixed v and becomes zero. This is because, for a fixed v (essentially for γ = D = 0 as seen below in regard to Equation (52)), the evolution is conservative (reversible) where the phase space volume is conserved. Thus, the contribution from the variance to the information rate of x for a given v is solely determined by the temporal change in the cross-correlation Σ x v .
Finally, by using Equations (30) and (31) in Equation (25), we have
E x * = Σ x x 1 v ( t ) 2 + Σ ˙ x v 2 | Σ | .
It is interesting to compare the first term (caused by the mean motion d x d t ) on the RHS of Equation (32) with that in Equation (17). For instance, if Σ x v = 0 , Σ x x 1 = Σ v v | Σ | = 1 Σ x x , they take the same value. It should also be noted that, even when both Σ x v = 0 and d d t Σ x v = Σ v v ω 2 Σ x x = 0 (as in equilibrium), Σ ˙ x v = Σ v v 0 (unless Σ v v = 0 ).
Putting Equations (17) and (32) and lim t 1 t t 1 X i = δ i 1 t x ( t ) in Equation (23) gives us
Γ v x = E x * E x = Σ v v v ( t ) 2 | Σ | + Σ v v 2 | Σ | 1 2 v ( t ) 2 Σ x x + 2 Σ x v 2 Σ x x 2 1 2 ,
where we used Σ ˙ x v = Σ v v and Σ ˙ x x = 2 Σ x v . (See Appendix A for the values for means and covariance matrix.) Therefore, even when Σ x v = 0 , Equation (33) can have a non-trivial contribution from a non-zero mean velocity.
To understand the difference between E x * and E x , it is useful to define the following quantify
E v x = E x * E x = Σ x v 2 | Σ | Σ x x v ( t ) 2 + | Σ | 2 + Σ x v 4 | Σ | Σ x x 2 .
The cross-correlation Σ x v plays a more important role in Equation (34) than in Equation (33). For instance, Σ x v = 0 reduces Equation (34) into a simple form E v x = Σ v v Σ x x , with no contribution from the mean velocity v. As noted above, such simplification does not occur for Γ v x in Equation (33).
Nevertheless, if d x d t = d v d t = 0 and Σ x v = 0 , Equations (33) and (34) become
E v x = Σ v v Σ x x , Γ v x = Σ v v Σ x x .
For instance, in equilibrium, Σ x v = 0 , Σ x x = D γ ω 2 , Σ v v = D γ ; thus, Γ v x = ω . In Section 3.2 below, we show that the equality Γ v x = Γ x v = ω holds in equilibrium (see the discussion below Equation (57)).

3.2. Γ x v

We now consider the Kramers equation Equations (7)–(9), where X 1 = x is frozen for time ( t , t 1 ) ;
d x d t = 0 ,
d v d t = γ v ω 2 x + ξ .
Then, during ( t , t 1 ) , the bivariate Gaussian PDF in Equation (10) for a fixed x takes the following form:
p ( x , t ; v , t 1 ) = 1 ( 2 π ) | Σ ( t , t 1 ) | exp ( 1 2 Σ i j 1 ( t , t 1 ) ( X i X i ) ( X j X j ) ) ,
Σ ( t , t 1 ) = Σ x x ( t ) Σ x v ( t , t 1 ) Σ x v ( t , t 1 ) Σ v v ( t 1 ) = ( δ x ( t ) ) 2 δ x ( t ) δ v ( t 1 ) δ x ( t ) δ v ( t 1 ) ( δ v ( t 1 ) ) 2 ,
where X 1 = x ( t ) , X 2 = v ( t 1 ) , δ x ( t ) = x ( t ) x ( t ) , and δ v ( t 1 ) = v ( t 1 ) v ( t 1 ) .
We define E v * , Γ v * = E v * , Γ v = E v , and E x v as
Γ x v = Γ v * Γ v ,
E x v = E v * E v ,
E v * ( t ) = Γ v * ( t ) 2 = lim t 1 t + d x d v p ( x , t ; v , t 1 ) ( t 1 ln p ( x , t ; v , t 1 ) ) 2
= lim t 1 t + t 1 X i Σ i j 1 ( t , t 1 ) t 1 X j + 1 2 Tr [ ( Σ 1 ( t , t 1 ) t 1 Σ ( t , t 1 ) ) 2 ] ,
where E v is given in Equation (18). Note that Equation (43) simply follows by replacing Σ ( t 1 , t ) by Σ ( t , t 1 ) in Equation (25).
Since we are considering the evolution of joint PDF of v for a given x for an infinitesimal time interval ( t , t 1 ) through Equations (36) and (37), Σ x x remains constant, while Σ x v and Σ v v evolve in time as follows:
lim t 1 t t 1 Σ x x ( t ) = lim t 1 t t 1 ( δ x ( t ) ) δ x ( t ) = 0 ,
lim t 1 t t 1 Σ x v ( t , t 1 ) = lim t 1 t δ x ( t ) t 1 ( δ v ( t 1 ) ) Σ x v ,
lim t 1 t t 1 Σ v v ( t 1 ) = 2 lim t 1 t ( t 1 δ v ( t 1 ) ) δ v ( t 1 ) Σ v v .
Here, by using Equations (7)–(9), we can show (see Appendix C for comments):
Σ v v ( t ) = 2 ( t ( δ v ( t ) ) δ v ( t ) = 2 ( γ δ v ( t ) ω 2 δ x ( t ) + ξ ) δ v ( t ) = 2 ( γ Σ v v ω 2 Σ x v + D ) ,
Σ x v ( t ) = ( t ( δ v ( t ) ) δ x ( t ) = ( γ δ v ( t ) ω 2 δ x ( t ) + ξ ) δ x ( t ) = γ Σ x v ω 2 Σ x x .
We now need to calculate the two terms on RHS of Equation (43). First, since X 1 = x ( t ) is frozen,
lim t 1 t + t 1 X i Σ i j 1 ( t , t 1 ) t 1 X j = Σ v v 1 ( t ) ( t v ( t ) ) 2 ,
lim t 1 t + t 1 Σ ( t , t 1 ) = 0 Σ x v Σ x v Σ v v .
Secondly, using Equations (50) and (39), we can show that the second term on RHS of Equation (43) becomes
lim t 1 t + 1 2 Tr [ ( Σ 1 ( t , t 1 ) t 1 Σ ( t , t 1 ) ) 2 ] = 1 2 | Σ | 2 ( Σ x x Σ v v Σ x v Σ x v ) 2 + 2 ( Σ x x Σ x v ) ( Σ x v Σ v v + Σ v v Σ x v ) + ( Σ x v Σ x v ) 2 = 1 2 | Σ | 2 2 | Σ | Σ x v 2 + ( Δ v x ) 2 .
Here,
Δ x v Σ x x Σ v v 2 Σ x v Σ x v = 2 γ | Σ | + 2 D Σ x x ,
where we used Equations (47) and (48). It is useful to note that Δ x v in Equation (52), representing the rate at which the determinant of the covariance changes in time for a fixed x, contains the two terms involving γ (damping) and D (stochasticity) due to irreversibility. We also note that, in equilibrium, Σ v v = D γ and Σ x v = 0 ; thus, Δ x v = 0 ; d d t Σ x v = 0 , but Σ v x = γ Σ x v ω 2 Σ x x = ω 2 Σ x x 0 in Equation (51), in general, contributing to E v * in Equation (43).
We use Equations (49), (51) and (52) in Equation (43) to obtain
E v * = Σ v v 1 t v ( t ) 2 + 1 2 | Σ | 2 2 | Σ | Σ x v 2 + ( Δ v x ) 2 ,
where Σ v v 1 = Σ x x | Σ | . Using Equation (53) and (18) in Equation (40) gives us
Γ x v = E x * E x = t v ( t ) 2 Σ v v + 1 2 Σ v v 2 d Σ v v d t 2 1 2 Σ x x t v ( t ) 2 | Σ | + 2 | Σ | Σ x v 2 + ( Δ v x ) 2 2 | Σ | 2 1 2 ,
where d Σ v v d t = 2 ( γ Σ v v ω 2 Σ x v + D ) , Σ x v = γ Σ x v ω 2 Σ x x and Δ v x = 2 γ | Σ | + 2 D Σ x x .
Again, to understand the difference between E x and E x * , we perform straightforward but lengthy calculations using Equations (18), (41), (52), and (53) and find the following:
E x v = Σ v v 1 t v ( t ) 2 + 1 2 | Σ | 2 2 | Σ | Σ x v 2 + ( Δ v x ) 2 t v ( t ) 2 Σ v v + Σ v v 2 2 Σ v v 2 Σ x v 2 | Σ | Σ v v t v ( t ) 2 + Q ( t ) .
Here, Q is defined by
Q = 1 2 | Σ | 2 2 | Σ | Σ x v 2 + ( Δ v x ) 2 Σ v v 2 2 Σ v v 2 = 1 | Σ | Σ v v 2 ( 2 D Σ x v γ Σ x v Σ v v + | Σ | ω 2 ) 2 + 2 γ ω 2 Σ x v 3 Σ v v + Σ x v 4 ω 4 + 2 D 2 Σ x v 4 | Σ | .
We again note that Σ x v plays a key role in Equation (55), as in the case of Equation (34). In particular, if Σ x v = 0 , Q = | Σ | ω 4 Σ v v 2 . If both d x d t = d v d t = 0 and Σ x v = 0 , Equations (40) and (55) become
Γ x v = ω 2 Σ x x Σ v v , E x v = ω 4 Σ x x Σ v v .
In equilibrium where Σ x x = D γ ω 2 and Σ v v = D γ , Γ x v = ω and E x v = ω 2 . Thus, we have the equality Γ v x = Γ x v = ω and E x v = E v x = ω 2 , as alluded to in the discussion following Equation (35).

4. Entropy-Based Causality Measures

As noted previously, most of information theoretical measures of causality are based on entropy, joint entropy, conditional entropy, or mutual information, etc. Specifically, for the two dependent stochastic variables X 1 and X 2 with the marginal PDFs p ( X 1 , t ) and p ( X 2 , t ) and joint PDF p ( X 1 , t 1 ; X 2 , t 2 ) , entropy S ( X 1 ) , joint entropy S ( X 1 , X 2 ) , mutual entropy S ( X 1 | X 2 ) , and mutual information I ( X 1 , X 2 ) are defined by
S ( X 1 ( t 1 ) ) = d X 1 p ( X 1 , t 1 ) ln p ( X 1 , t 1 ) ,
S ( X 2 ( t 2 ) ) = d X 2 p ( X 2 , t 2 ) ln p ( X 2 , t 2 ) ,
S ( X 1 ( t 1 ) , X 2 ( t 2 ) ) = d X 1 d X 2 p ( X 1 , t 1 ; X 2 , t 2 ) ln p ( X 1 , t 1 ; X 2 , t 2 ) ,
S ( X 1 ( t 1 ) | X 2 ( t 2 ) ) = S ( X 1 ( t 1 ) , X 2 ( t 2 ) ) S ( X 2 ( t 2 ) ) ,
I ( X 1 ( t 1 ) : X 2 ( t 2 ) ) = S ( X 1 ( t 1 ) ) S ( X 1 ( t 1 ) , X 2 ( t 2 ) ) = S ( X 1 ( t 1 ) ) + S ( X 2 ( t 2 ) ) S ( X 1 ( t 1 ) , X 2 ( t 2 ) ) .
For Gaussian processes, Equations (13)–(15) show that the entropy depends only on the variance/covariance, being independent of the mean value. This can be problematic as entropy fails to capture the effect of one variable on the mean value of another variable, for instance, caused by rare events associated with coherent structures, such as vortices, shear flows, etc. This is explicitly shown in Section 5.4 (see Figure 5) in regard to causality. Although not widely recognized, it is important to point out the limitation of entropy-based measures in measuring perturbations (in particular, caused by abrupt events) that do not affect entropy, as shown in Reference [32]. In addition, entropy has shortcomings, such as being non-invariant under coordinate transformations and insensitive to the local arrangement (shape) of p ( x , t ) for fixed t. Similar comments are applicable to other entropy-based measures. To demonstrate this point, in this section, we provide a detailed analysis of information flow based on conditional entropy [16,41].

4.1. T v x

Information flow is based on predicting gain (or loss) of the future of subsystem 1 from the present state of subsystems 2 and defined as
T v x = lim t 1 t + t 1 I ( x ( t 1 ) : v ( t ) ) = d S ( x ( t ) ) d t lim t 1 t + t 1 S ( x ( t 1 ) | v ( t ) ) = d S ( x ( t ) ) d t lim t 1 t + t 1 S ( x ( t 1 ) , v ( t ) ) ,
where Equation (62) is used. Here, the first term and the second term on the RHS represent the rate of change of the marginal entropy of x ( t ) and the rate of change of the conditional entropy of x ( t 1 ) conditional on v ( t ) (i.e., frozen v). The difference between these two rates then quantifies the effect of the evolution of v on the entropy of x. Note that T v x can be both negative and positive; a negative T v x means that v acts to reduce the marginal entropy of x ( S 1 ), as numerically observed in Reference [32].
Using Equation (13), we have t 1 S ( x ( t 1 ) , v ( t ) ) = t 1 | Σ ( t 1 , t ) | | Σ ( t 1 , t ) | . Then, by using Equations (26)–(28) and (29), we obtain
lim t 1 t t 1 | Σ ( t 1 , t ) | = Σ ˙ x x Σ v v 2 Σ ˙ x v Σ x v = 2 Σ x v Σ v v 2 Σ v v Σ x v = 0 ,
and
T v x = d S ( x ( t ) ) d t = 1 2 Σ ˙ x x Σ x x = Σ x v Σ x x .
As can be seen from Equation (65), T v x depends only on the variance, being independent of the mean value. Furthermore, T v x is proportional to the cross-correlation Σ x v , becoming zero for Σ x v = 0 as in the case of equilibrium. (Note that Equation (65) is derived using a different method in Reference [32] for the Kramers equation.)

4.2. T x v

Similarly, information flow is based on predicting gain (or loss) of the future of subsystem 2 from the present state of subsystems 1 and defined as
T x v = lim t 1 t + t 1 I ( x ( t ) : v ( t 1 ) ) = d S ( v ( t ) ) d t lim t 1 t + t 1 S ( x ( t ) | v ( t 1 ) ) = d S ( v ( t ) ) d t lim t 1 t + t 1 S ( x ( t ) , v ( t 1 ) ) ,
where Equation (62) is used. Here, the first term and the second term on the RHS represent the rate of change of the marginal entropy of v ( t ) and the rate of change of the conditional entropy of v ( t 1 ) conditional on x ( t ) (i.e., frozen x). The difference between these two rates then quantifies the effect of the evolution of x on the entropy of v. Note again that T x v can be both negative and positive; a negative T x v means that x acts to reduce the marginal entropy of v ( S 2 ), as numerically observed in Reference [32].
For t 1 S ( x ( t ) , v ( t 1 ) ] = t 1 | Σ ( t , t 1 ) | | Σ ( t , t 1 ) | , we use Equations (44)–(48) and (50) to obtain
lim t 1 t t 1 | Σ ( t , t 1 ) | = Σ v v ( t ) Σ x x ( t ) 2 Σ x v ( t ) Σ x v = γ ( 2 Σ x x Σ v v + 2 Σ x v 2 ) + 2 D Σ x x = 2 γ | Σ | + 2 D Σ x x .
Thus,
T x v = d S ( v ( t ) ) d t lim t 1 t + t 1 S ( X 1 ( t ) , X 2 ( t 1 ) ) = 1 2 Σ v v Σ v v 2 γ | Σ | + 2 D Σ x x | Σ | = ω 2 Σ x v Σ v v D ( Σ x v ) 2 Σ v v | Σ | .
Again, Equation (68) is derived using a different method in Reference [32]. As in the case of T v x in Equation (65), T x v depends only on the variance, being independent of the mean value while being proportional to the cross-correlation Σ x v , becoming zero for Σ x v = 0 as in the case of equilibrium.

5. Comparisons between Causal Information Rate and Information Flow

In this section, we compare the causal information rate in Equations (33) and (54) with the information flow in Equations (65) and (68) for the Kramers equation by focusing on several interesting cases. We start by noting that if γ 0 , ω 0 , and D 0 , x and v evolve to equilibrium where the covariance matrix takes the values:
Σ x x = D γ ω 2 , Σ v v = D γ , Σ x v = 0 .
We use the same initial conditions
x ( 0 ) = 0.5 , v ( 0 ) = 0.5 , Σ x x ( 0 ) = Σ v v ( 0 ) = 0.01 , Σ x v ( 0 ) = 0 ,
and present various statistical quantifies in Figure 1, Figure 2, Figure 3, Figure 4 and Figure 5, including snapshots of PDFs, Σ x x ( t ) , Σ v v ( t ) , and Σ x v ( t ) in panel (a); T x v , T v x , Γ x v , Γ v x in panel (b). Note that PDF snapshots are shown for one-standard deviation, using different colors for different times.

5.1. No Stochastic Noise D = 0

It is useful to look at the deterministic case without the stochastic noise ξ = 0 in Equations (7) and (8), where a time-dependent PDF evolves due to non-zero initial conditions. Specifically, the two cases where γ = ω = 0 and γ = 0 and ω = 1 are considered in Figure 1 and Figure 2, respectively.
To gain a key insight into the meaning of our causal information rate, we start with the simplest case in Figure 1, where γ = 0 = ω , with v being fixed to its initial value v ( 0 ) so that x ( t ) = x ( 0 ) + v ( 0 ) t . The snapshots of the PDF and the covariance matrix in Figure 1a show that the PDF center (peak) undergoes a drift according to x ( t ) = x ( 0 ) + v ( 0 ) t , while the PDF broadens with time in the x-direction since δ x ( t ) = δ x ( 0 ) + t δ v ( 0 ) . As a result, Σ x v increases linearly with time. Σ x v 0 causes a rapid initial increase in information flow T v x 0 in Figure 1b. However, as t , T v x 0 since Σ x x increases faster than Σ x v in time, leading to T v x = Σ x v Σ x x 0 (see Equation (65)). Thus, T v x fails to reflect the feedback from v to x in the long time limit. In contrast, Γ v x monotonically increases with time, approaching a constant value ( Γ v x 5 ) as t . On the other hand, Γ x v = T x v = 0 at all times, reflecting the lack of the coupling from x to v at all times, consistent with our expectation. That is, the lack of the feedback from x on v is reflected in both Γ x v = T x v = 0 , while the one-way coupling of v to x is captured only by Γ v x 0 at any time.
To include the feedback of x on v, we now consider the case γ = 0 and ω = 1 in Figure 2. Non-zero value of ω ( = 1 )—only the difference from Figure 1—now establishes the two-way (mutual) communications between x and v, leading to the harmonic motion (see Equations (7) and (8)). Figure 2a shows how the PDF center drifts according to this harmonic motion, while the cross-correlation Σ x v ( t ) = 0 at any time. The latter leads to the information flow T x v = T v x = 0 shown in Figure 2b. In contrast, Γ x v and Γ v x in Figure 2b exhibit oscillations with a 90 degree phase-shift between the two due to the harmonic motion, capturing the two-way feedback between x and v. To highlight an exact symmetry between Γ x v and Γ v x , we can consider a global, path-dependent measure of causality by integrating Γ x v and Γ v x over the same integer multiples of the period ( 2 π / ω ), which would give the same value. These results reveal that our causal information rate captures the dependence between x and v even in the absence of their cross-correlation. (We recall that zero cross-correlation does not imply independence.)

5.2. Equilibrium: v

We now consider the Ornstein-Uhlenbeck (O-U) process of v by choosing γ = 1 , ω = 0 and D 0 in Figure 3. In this case, v approaches asymptotically its equilibrium distribution where Σ v v = D γ , while p ( x , t ) evolves in time. Specifically, we choose D = D x x ( 0 ) = D v v ( 0 ) = 0.01 (see also Equation (70)). Figure 3a shows that the PDF broadens in the x-direction ( Σ x x t ), while keeping its original width in the v-direction. On the other hand, due to the non-zero D 0 , the cross-correlation Σ x v ( t ) > 0 is seen to grow in time, approaching a constant value 0.01 as t . As in the case of Figure 2, T x v and T v x take finite values due to non-zero Σ x v 0 but become zero asymptotically as t (due to Σ x x t ), as shown in Figure 3b. On the other hand, the behavior of Γ x v and Γ v x is quite similar.

5.3. Equilibrium: x and v

To ensure a quick evolution to the equilibrium distribution in time, we choose D = 0.01 , ω = 1 , and γ = 1 so that the initial and final (equilibrium) PDFs have the same variance in Figure 4. Figure 4a shows that the PDF center undergoes a damped oscillation without changing its shape since Σ x x ( t ) = Σ v v ( t ) = 0.01 and Σ x v ( t ) = 0 at all times. Σ x v ( t ) = 0 leads to T x v = T v x = 0 in Figure 4b, as in the case of Figure 2b. In contrast, the drift of the PDF center leads to non-zero values of Γ v x and Γ x v , which asymptotically approach 1 (see Equations (57)). The latter reveals that the equilibrium is maintained through the two-way communications between x and v.

5.4. An Abrupt Change Introduced by an Impulse

In Reference [32], we showed that an abrupt event (modeled by an impulse function, with a peak at a certain time) caused a sudden increase in E = Γ 2 in all cases, while it caused a sudden increase in the magnitude of the information flow | T i j | ( i j ) only when the perturbation affects entropy (variance). Furthermore, it was shown that the peak of | T i j | ( i j ) followed (not preceded) the actual impulse peak, while the peak of E = Γ 2 tended to precede the impulse peak. This means that, by measuring the temporal change in E , especially, when its peak appears, we can forecast the onset of an abrupt event (whose peak appears later than E -peak in time). We now look at the effect of a sudden impulse on the causal information rate.
To this end, we introduce a sudden perturbation to the Kramers equation by adding an impulse function u ( t ) as a time-dependent additive force to Equation (8) as follows:
d v d t = γ v ω 2 x + u ( t ) + ξ , u ( t ) = 1 c π e t t 0 c 2 .
We use the analytical expressions for the mean values and covariance in Reference [32] and choose the parameter values c = 0.1 and t 0 = 4 in Equation (71) and D = ω = γ = 1 (the same as in Figure 4). The results are shown in Figure 5, where the impulse function u ( t ) localized around t = 4 is shown in red dotted line, using the right y axis, in the bottom panels in Figure 5b.
Figure 5a shows that u ( t ) causes a sudden drift of the PDF center with no change in variance. Therefore, T x v ( t ) = T v x ( t ) = 0 in Figure 5b, with no influence of u ( t ) . In sharp contrast, Γ x v and Γ v x exhibit abrupt change around time t = 4 . Furthermore, the peak in Γ x v (or Γ v x ) tends to proceed the impulse peak (in red dotted line). We observe similar results when an impulse is applied to the covariance matrix (results not shown). These results, thus, suggest that our causality information rate is sensitive to the perturbation (in both mean and variance) and predicts the onset of a sudden event very well, especially in comparison with the information flow. We emphasize that the information flow (and other entropy-based methods) cannot detect the onset of a sudden event that does not affect entropy (e.g., Reference [32] for different examples).

6. Conclusions

Information geometry in general concerns the distinguishability between two PDFs (e.g., constructed from data) and is sensitive to the local dynamics (e.g., Reference [27]), depending on a local arrangement (the shape) of the PDFs. This is different from entropy, which is a global measure of a PDF, being insensitive to such a local arrangement. When a PDF is a continuous function of time, the information rate and information length are helpful in understanding far-from-equilibrium phenomena in terms of the number of distinguishable statistical states that a system evolves through in time. Being very sensitive to evolving dynamics, it enables us to compare different far-from-equilibrium processes using the same dimensionless distance, as well as quantifying the relation (correlation, self-regulation, etc.) among variables (e.g., References [27,28,29,30]).
In this paper, by extending our previous work [20,21,22,23,24,25,26,27,28,29,30,31], we introduced the causal information rate as a general information-geometric method that can elucidate causality relations in stochastic processes involving temporal variabilities and strong fluctuations. The key idea was to quantify the effect of one variable on the information rate of the other variable. The cross-correlation between the variables was shown to play a key role in the information flow, zero cross-correlation leading to zero information flow. In comparison, the causal information rate can take a non-zero value in the absence of cross-correlation. Since zero cross-correlation (measuring only the linear dependence) does not imply independence in general, this means that the causal information rate captures the (directional) dependence between two variables even when they are uncorrelated with each other.
Furthermore, the causal information rate captures the temporal change in both covariance matrix and mean value. In comparison, the information flow depends only on the temporal change in the covariance matrix. Thus, the causal information rate is a sensitive method for predicting an abrupt event and quantifying causal relations. These properties are welcome for predicting rare, large-amplitude events. Application has been made to the Kramers equation to highlight these points. Although the analysis in this paper is limited to the Gaussian variables that are entirely characterized by the mean and variance, similar results are likely to hold for non-Gaussian variables because the information rate captures the temporal changes of a PDF itself, while entropy-based measures (e.g., information flow) depend only on variance.
Given that causality (directional dependence) plays a crucial role in science and engineering (e.g., References [43,44]), our method could be useful in a wide range of problems. In particular, it could be utilized to elucidate causal relations among different players in nonlinear dynamical systems, fluid/plasma dynamics, laboratory plasmas, astrophysical systems, environmental science, finance, etc. For instance, in fluid/plasmas turbulence, it could help resolving the controversy over causality in the low-to-high (L-H) confinement transition [29,30,45,46], as well as contributing to identifying a causal relationship among different players responsible for the onset of sudden abrupt events (e.g., fusion plasmas eruption) (e.g., References [47,48]), with a better chance of control. It could also elucidate causal relationships among different physiological signals, how different parts of a human body (e.g., brain-heart-connection) are self-regulated to maintain homeostasis (the optimal living condition for survival), and how this homeostasis degrades with the onset of diseases.
Finally, it will be interesting to investigate the effects of coarse-graining in future works. In Reference [49], for the information geometry given by the Fisher metric, relevant directions were shown to be exactly maintained under coarse-graining, while irrelevant directions contract. The analysis for more than two variables will also be addressed in future work.

Author Contributions

Conceptualization, E.-j.K.; Investigation E.-j.K.; Methodology, E.-j.K., A.-J.G.-C.; Software, A.-J.G.-C.; Visualization, A.-J.G.-C.; Writing—original draft, E.-j.K. All authors have read and agreed to the submitted version of the manuscript.

Funding

This research received no funding.

Acknowledgments

E.-j.K. acknowledges the Leverhulme Trust Research Fellowship (RF- 2018-142-9) and Fei He for helpful discussion.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Solutions to the Kramers Equations

The general solution to the Kramers equation in Equations (7)–(9) is
x ( t ) = x h ( t ) + 1 Δ 0 t d t 1 e α + t 1 e α t 1 ξ ( t 1 ) ,
v ( t ) = v h ( t ) + 1 Δ 0 t d t 1 α + e α + t 1 α e α t ξ ( t 1 ) ,
where x h ( t ) and v h ( t ) are the homogeneous solution and
α ± = 1 2 γ ± γ 2 4 ω 2 1 2 γ ± Δ ,
where Δ = γ 2 4 ω 2 . For the initial condition x ( t = 0 ) = x 0 and v ( t = 0 ) = v 0 , they are
x h ( t ) = 1 Δ ( α e α + t + α + e α t ) x ( 0 ) + ( e α + t e α t ) v ( 0 ) ,
v h ( t ) = 1 Δ α + α ( e α + t + e α t ) x ( 0 ) + ( α + e α + t α e α t ) v ( 0 ) .
By taking the averages of Equations (A1) and (A2) with the help of Equations (A4) and (A5), we obtain the evolution of the mean values as
x ( t ) = 1 Δ ( α e α + t + α + e α t ) x ( 0 ) + ( e α + t e α t ) v ( 0 ) ,
v ( t ) = 1 Δ α + α ( e α + t + e α t ) x ( 0 ) + ( α + e α + t α e α t ) v ( 0 ) .
The elements of the equal-time covariance matrix Σ ( t ) are obtained by subtracting Equations (A6) and (A7) from Equations (A1) and (A2), respectively, and then by multiplying and taking averages using Equation (9):
Σ x x = ( δ x ( t ) ) 2 = ( δ x h ( t ) ) 2 + 2 D Δ 2 0 t d t 1 e α + t 1 e α t 1 2 = ( δ x h ( t ) ) 2 + D Δ 2 1 α + [ e 2 α + t 1 ] + 1 α [ e 2 α t 1 ] 4 α + + α [ e ( α + + α ) t 1 ] ,
Σ v v = ( δ x ( t ) ) 2 = ( δ v h ( t ) ) 2 + 2 D Δ 2 0 t d t 1 α + e α + t 1 α e α t 1 2 = ( δ v h ( t ) ) 2 + D Δ 2 α + [ e 2 α + t 1 ] + α [ e 2 α t 1 ] 4 α + α α + + α [ e ( α + + α ) t 1 ] ,
Σ x v = δ x ( t ) δ v ( t ) = δ x h ( t ) δ v h ( t ) + 2 D Δ 2 0 t d t 1 e α + t 1 e α t 1 α + e α + t 1 α e α t 1 = δ x h ( t ) δ v h ( t ) + 4 D e γ t Δ 2 sinh 2 Δ t 2 .
Here,
( δ x h ( t ) ) 2 = 1 Δ 2 [ ( α e α + t + α + e α t ) 2 Σ x x ( 0 ) + ( e α + t e α t ) 2 Σ v v ( 0 ) + 2 ( e α + t e α t ) ( α e α + t + α + e α t ) Σ x v ( 0 ) ] ,
( δ v h ( t ) ) 2 = 1 Δ 2 [ α + 2 α 2 ( e α + t + e α t ) 2 Σ x x ( 0 ) + ( α + e α + t α e α t ) 2 Σ v v ( 0 ) + 2 α + α ( e α + t + e α t ) ( α + e α + t α e α t ) Σ x v ( 0 ) ] ,
δ x h ( t ) δ v h ( t ) = 1 Δ 2 [ α + α ( α e α + t + α + e α t ) ( e α + t + e α t ) Σ x x ( 0 ) + α + α ( e α + t e α t ) 2 + ( α + e α + t α e α t ) ( α e α + t + α + e α t ) Σ x v ( 0 ) + ( e α + t e α t ) ( α + e α + t α e α t ) Σ v v ( 0 ) ] ,
where Σ x x ( 0 ) = ( δ x ( 0 ) ) 2 , Σ v v ( 0 ) = ( δ v ( 0 ) ) 2 , and Σ x v ( 0 ) = δ x ( 0 ) δ v ( 0 ) .

Appendix B. Derivation of Equation (25)

We let ϕ = 1 2 Σ i j 1 ( t 1 , t ) ( X i X i ) ( X j X j ) so that Equation (21) becomes
p ( x , t 1 ; v , t ) = 1 2 π | Σ ( t 1 , t ) | exp ( ϕ ) .
Then, we have
t 1 p p = | Σ | ˙ 2 | Σ | t 1 ϕ ,
t 1 p p 2 = | Σ | ˙ 2 4 | Σ | 2 + t 1 ϕ | Σ | ˙ | Σ | + ( t 1 ϕ ) 2 ,
d x d v p t 1 p p 2 = | Σ | ˙ 2 4 | Σ | 2 + t 1 ϕ | Σ | ˙ | Σ | + ( t 1 ϕ ) 2 ,
where Σ ˙ = t 1 Σ . Using t 1 t 1 e ϕ = ( t 1 t 1 ϕ ) e ϕ + ( t 1 ϕ ) 2 e ϕ and d x d v e ϕ = 2 π | Σ | , we have
t 1 ϕ = 1 2 π | Σ | t 1 d x d v e ϕ = | Σ | ˙ 2 | Σ | ,
( t 1 ϕ ) 2 = 1 4 | Σ | ˙ 2 | Σ | 2 + 1 2 | Σ | ¨ | Σ | + t 1 t 1 ϕ .
Thus, by using Equations (A18) and (A19) in Equation (A17), we have
E x * ( t ) = lim t 1 t + 1 2 t | Σ | ˙ | Σ | + t 1 t 1 ϕ = lim t 1 t + 1 2 Tr t 1 Σ 1 Σ ˙ + t 1 t 1 ϕ ,
where we used | Σ | ˙ = | Σ | Tr Σ 1 Σ ˙ . To calculate t t ϕ , we use
t 1 ϕ = t 1 ( δ X i ) Σ i j 1 δ X j + 1 2 δ X i ( t 1 Σ i j 1 ) δ X j ,
t 1 t 1 ϕ = t 1 t 1 ( δ X i ) Σ i j 1 δ X j + 1 2 δ X i ( t 1 t 1 Σ i j 1 ) δ X j + t ( δ X i ) Σ i j 1 t 1 ( δ X j ) + 2 t 1 ( δ X i ) t 1 ( Σ i j 1 ) δ X j ,
where i , j = 1 , 2 and the symmetry Σ i j = Σ j i is used. Since t ( δ X i ) = t X i , etc., the average of Equation (A22) is simplified as
t 1 t 1 ϕ = t 1 X i Σ i j 1 t 1 X j + 1 2 δ X i ( t 1 t 1 Σ i j 1 ) δ X j = t X i Σ i j 1 t X j + 1 2 Tr [ ( t 1 t 1 Σ 1 ) Σ ] ,
where δ X i G i j δ X j = Tr [ G Σ ] is used. Then, by using t 1 Σ 1 = Σ 1 Σ ˙ Σ 1 twice in Equation (A23), and putting the results in Equation (A20), we obtain Equation (25) in the text.
The derivation above is general. For the equal-time joint PDF, a similar analysis, or simply putting t 1 = t in Equation (25) gives us Equation (16). Furthermore, by taking X i and Σ to be a scalar as X i = x δ i 1 and Σ = Σ x x in Equation (16), we obtain Equation (17). Similarly, taking X i = v δ i 2 and Σ = Σ v v in Equation (16) gives us Equation (18).

Appendix C. Comment on Equation (47)

In Equation (47), δ v ( t ) consists of the homogeneous δ v h ( t ) and inhomogeneous δ v I ( t ) due to initial conditions and ξ , respectively, evolving according to
t δ v h ( t ) = γ δ v h ( t ) ω 2 δ v h ( t ) ,
t δ v I ( t ) = γ δ v I ( t ) ω 2 δ v I ( t ) + ξ .
Thus, by using Equations (A24) and (A25), we calculate Σ v v in Equation (47) as follows:
Σ v v ( t ) = ( t δ v ( t ) ) δ v ( t ) = t [ δ v h ( t ) + δ v I ( t ) ] ( δ v h ( t ) + δ v I ( t ) ) = t ( δ v h ( t ) ) ( δ v h ( t ) ) ) + t ( δ v I ( t ) ) ( δ v I ( t ) ) = 2 γ Σ v v , h 2 ω 2 Σ x v , h 2 γ Σ v v , I 2 ω 2 Σ x v , I + D = 2 γ Σ v v 2 ω 2 Σ x v + D .
Here, we used v h ( t ) v I ( t ) = 0 ; Σ v v , h = ( δ v h ( t ) ) ( δ v h ( t ) ) , Σ v v , I = ( δ v I ( t ) ) ( δ v I ( t ) ) , Σ v v = Σ v v , h + Σ v v , I , and similarly for Σ v x .

References

  1. Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 623. [Google Scholar] [CrossRef]
  2. Frieden, B.R. Science from Fisher Information; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  3. Kullback, S.; Leibler, R.A. On Information Theory and Statistics. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
  4. Parr, T.; Da Costa, L.; Friston, K. Markov blankets, information geometry and stochastic thermodynamics. Philos. Trans. Royal Soc. A 2019, 378, 20190159. [Google Scholar] [CrossRef] [Green Version]
  5. Oizumi, M.; Tsuchiya, N.; Amari, S. Unified framework for information integration based on information geometry. Proc. Natl. Acad. Sci. USA 2016, 113, 14817. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Landauer, R. Irreversibility and heat generation in the computing process. IBM J. Res. Dev. 1961, 5, 183–191. [Google Scholar] [CrossRef]
  7. Leff, H.S.; Rex, A.F. Maxwell’s Demon: Entropy, Information, Computing; Princeton University Press: Princeton, NJ, USA, 1990. [Google Scholar]
  8. Bekenstein, J.D. How does the entropy/information bound work? Found. Phys. 2005, 35, 1805. [Google Scholar] [CrossRef] [Green Version]
  9. Capozziello, S.; Luongo, O. Information entropy and dark energy evolution. Int. J. Mod. Phys. 2018, 27, 1850029. [Google Scholar] [CrossRef] [Green Version]
  10. Sagawa, T. Thermodynamics of Information Processing in Small Systems; Springer: Berlin, Germany, 2012. [Google Scholar]
  11. Haas, K.R.; Yang, H.; Chu, J.-W. Trajectory Entropy of Continuous Stochastic Processes at Equilibrium. J. Phys. Chem. Lett. 2014, 5, 999. [Google Scholar] [CrossRef]
  12. Van den Broeck, C. Stochastic thermodynamics: A brief introduction. Phys. Complex Colloids 2013, 184, 155–193. [Google Scholar]
  13. Amigó, J.M.; Balogh, S.G.; Hernández, M. A Brief Review of Generalized Entropies. Entropy 2018, 20, 813. [Google Scholar] [CrossRef] [Green Version]
  14. Bérut, A.; Arakelyan, A.; Petrosyan, A.; Ciliberto, S.; Dillenschneider, R.; Lutz, E. Experimental verification of Landauer’s principle linking information and thermodynamics. Nature 2012, 483, 187. [Google Scholar] [CrossRef]
  15. Jarzynski, C. Nonequilibrium Equality for Free Energy Differences. Phys. Rev. Lett. 1997, 78, 2690–2693. [Google Scholar] [CrossRef] [Green Version]
  16. Horowitz, J.M.; Sandberg, H. Second-law-like inequalities with information and their interpretations. N. J. Phys. 2014, 16, 125007. [Google Scholar] [CrossRef]
  17. Nicholson, S.B.; García-Pintos, L.P.; del Campo, A.; Green, J.R. Time-information uncertainty relations in thermodynamics. Nat. Phys. 2020, 16, 1211–1215. [Google Scholar] [CrossRef]
  18. Davies, P. Does new physics lurk inside living matter? Phys. Today 2020, 73, 34–41. [Google Scholar] [CrossRef]
  19. Flego, S.P.; Frieden, B.R.; Plastino, A.; Plastino, A.R.; Soffer, B.H. Nonequilibrium thermodynamics and Fisher information: Sound wave propagation in a dilute gas. Phys. Rev. E 2003, 68, 016105. [Google Scholar] [CrossRef]
  20. Kim, E. Investigating Information Geometry in Classical and Quantum Systems through Information Length. Entropy 2018, 20, 574. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  21. Heseltine, J.; Kim, E. Novel mapping in non-equilibrium stochastic processes. J. Phys. A 2016, 49, 175002. [Google Scholar] [CrossRef]
  22. Kim, E.; Lee, U.; Heseltine, J.; Hollerbach, R. Geometric structure and geodesic in a solvable model of nonequilibrium process. Phys. Rev. E 2016, 93, 062127. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  23. Kim, E.; Hollerbach, R. Signature of nonlinear damping in geometric structure of a nonequilibrium process. Phys. Rev. E 2017, 95, 022137. [Google Scholar] [CrossRef] [Green Version]
  24. Kim, E.; Hollerbach, R. Geometric structure and information change in phase transitions. Phys. Rev. E 2017, 95, 062107. [Google Scholar] [CrossRef] [Green Version]
  25. Kim, E.; Jacquet, Q.; Hollerbach, R. Information geometry in a reduced model of self-organised shear flows without the uniform coloured noise approximation. J. Stat. Mech. 2019, 2019, 023204. [Google Scholar] [CrossRef] [Green Version]
  26. Anderson, J.; Kim, E.; Hnat, B.; Rafiq, T. Elucidating plasma dynamics in Hasegawa-Wakatani turbulence by information geometry. Phys. Plasmas 2020, 27, 022307. [Google Scholar] [CrossRef]
  27. Heseltine, J.; Kim, E. Comparing information metrics for a coupled Ornstein-Uhlenbeck process. Entropy 2019, 21, 775. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  28. Kim, E.; Heseltine, J.; Liu, H. Information length as a useful index to understand variability in the global circulation. Mathematics 2020, 8, 299. [Google Scholar] [CrossRef] [Green Version]
  29. Kim, E.; Hollerbach, R. Time-dependent probability density functions and information geometry of the low-to-high confinement transition in fusion plasma. Phys. Rev. Res. 2020, 2, 023077. [Google Scholar] [CrossRef] [Green Version]
  30. Hollerbach, R.; Kim, E.; Schmitz, L. Time-dependent probability density functions and information diagnostics in forward and backward processes in a stochastic prey-predator model of fusion plasmas. Phys. Plasmas 2020, 27, 102301. [Google Scholar] [CrossRef]
  31. Guel-Cortez, A.J.; Kim, E. Information Length Analysis of Linear Autonomous Stochastic Processes. Entropy 2020, 22, 1265. [Google Scholar] [CrossRef]
  32. Guel-Cortez, A.J.; Kim, E. Information geometric theory in the prediction of abrupt changes in system dynamics. Entropy 2021, 23, 694. [Google Scholar] [CrossRef]
  33. Bossomaier, T.; Barnett, L.; Harré, M.; Lizier, J.T. An Introduction to Transfer Entropy: Information Flow in Complex Systems; Springer International: Cham, Switzerland, 2016; ISBN 9783319432212. [Google Scholar]
  34. Shorten, D.P.; Spinney, R.E.; Lizier, J.T. Estimating Transfer Entropy in Continuous Time Between Neural Spike Trains or Other Event-Based Data. bioRxiv 2020. [Google Scholar] [CrossRef]
  35. Granger, C.J. Investigating Causal Relations by Econometric Models and Cross-Spectral Methods. Econometrica 1967, 37, 424. [Google Scholar] [CrossRef]
  36. Barnet, L.; Barrett, A.B.; Seth, A.K. Granger Causality and Transfer Entropy Are Equivalent for Gaussian Variables. Phys. Rev. Lett. 2009, 103, 238701. [Google Scholar] [CrossRef] [Green Version]
  37. He, F.; Billings, S.A.; Wei, H.L.; Sarrigiannis, P.G. A nonlinear causality measure in the frequency domain: Nonlinear partial directed coherence with applications to EEG. J. Neurosci. Meth. 2014, 225, 71. [Google Scholar] [CrossRef]
  38. Zhao, Y.; Billings, S.A.; Wei, H.; He, F.; Sarrigiannis, P.G. A new NARX-based Granger linear and nonlinear casual influence detection method with applications to EEG data. J. Neurosci. Meth. 2013, 212, 79. [Google Scholar] [CrossRef] [PubMed]
  39. Smirnov, D.A. Spurious causalities with transfer entropy. Phys. Rev. E 2013, 87, 042917. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  40. Liang, X.S. Information flow and causality as rigorous notions ab initio. Phys. Rev. E 2016, 94, 052201. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  41. Allahverdyan, A.E.; Janzing, D.; Mahler, G. Thermodynamic efficiency of information and heat flow. J. Stat. Mech. Theory Exp. 2009, 2009, 09011. [Google Scholar] [CrossRef] [Green Version]
  42. Risken, H. The Fokker-Planck Equation: Methods of Solutions and Applications; Springer: Berlin, Germany, 2013. [Google Scholar]
  43. Friston, K.J.; Parr, T.; Zeidman, P.; Razi, A.; Flandin, G.; Daunizeau, J.; Hulme, O.J.; Billig, A.J.; Litvak, V.; Moran, R.J.; et al. Dynamic causal modelling of COVID-19. [version 2; peer review: 2 approved]. Wellcome Open Res. 2020, 5, 89. [Google Scholar] [CrossRef] [PubMed]
  44. Kathpalia, A.; Nagaraj, N. Measuring Causality: The Science of Cause and Effect. Resonance 2021, 26, 191–210. [Google Scholar] [CrossRef]
  45. Schmitz, L. The role of turbulence-flow interactions in L- to H-mode transition dynamics: Recent progress. Nuclear Fusion 2017, 57, 025003. [Google Scholar] [CrossRef]
  46. Maggi, C.F.; Delabie, E.; Biewer, T.M.; Groth, M.; Hawkes, N.C.; Lehnen, M.; de la Luna, E.; McCormick, K.; Reux, C.; Rimini, F.; et al. L-H power threshold studies in JET with Be/W and C wall. Nuclear Fusion 2014, 54, 023007. [Google Scholar] [CrossRef] [Green Version]
  47. De Vries, P.C.; Johnson, M.F.; Alper, B.; Buratti, P.; Hender, T.C.; Koslowski, H.R.; Riccardo, V. JET-EFDA Contributors, Survey of disruption causes at JET. Nuclear Fusion 2011, 51, 053018. [Google Scholar] [CrossRef]
  48. Kates-Harbeck, J.; Svyatkovskiy, A.; Tang, W. Predicting disruptive instabilities in controlled fusion plasmas through deep learning. Nature 2019, 568, 527. [Google Scholar] [CrossRef] [PubMed]
  49. Raju, A.; Machta, B.B.; Sethna, J.P. Information loss under coarse graining: A geometric approach. Phys. Rev. E 2018, 98, 052112. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Snapshots of PDFs, Σ x x ( t ) , Σ v v ( t ) , and Σ x v ( t ) in panel (a) (PDF snapshots are shown for one-standard deviation, using different colors for different times); T x v , T v x , Γ x v , Γ v x in panel (b). Parameter values are D = 0 , γ = 0 , and ω = 0 .
Figure 1. Snapshots of PDFs, Σ x x ( t ) , Σ v v ( t ) , and Σ x v ( t ) in panel (a) (PDF snapshots are shown for one-standard deviation, using different colors for different times); T x v , T v x , Γ x v , Γ v x in panel (b). Parameter values are D = 0 , γ = 0 , and ω = 0 .
Entropy 23 01087 g001
Figure 2. Snapshots of PDFs, Σ x x ( t ) , Σ v v ( t ) , and Σ x v ( t ) in panel (a) (PDF snapshots are shown for one-standard deviation, using different colors for different times); T x v , T v x , Γ x v , Γ v x in panel (b). Parameter values are D = 0 , γ = 0 , and ω = 1 .
Figure 2. Snapshots of PDFs, Σ x x ( t ) , Σ v v ( t ) , and Σ x v ( t ) in panel (a) (PDF snapshots are shown for one-standard deviation, using different colors for different times); T x v , T v x , Γ x v , Γ v x in panel (b). Parameter values are D = 0 , γ = 0 , and ω = 1 .
Entropy 23 01087 g002
Figure 3. Snapshots of PDFs, Σ x x ( t ) , Σ v v ( t ) , and Σ x v ( t ) in panel (a) (PDF snapshots are shown for one-standard deviation, using different colors for different times); T x v , T v x , Γ x v , Γ v x in panel (b). Parameter values are D = 0.01 , γ = 1 , and ω = 0 .
Figure 3. Snapshots of PDFs, Σ x x ( t ) , Σ v v ( t ) , and Σ x v ( t ) in panel (a) (PDF snapshots are shown for one-standard deviation, using different colors for different times); T x v , T v x , Γ x v , Γ v x in panel (b). Parameter values are D = 0.01 , γ = 1 , and ω = 0 .
Entropy 23 01087 g003
Figure 4. Snapshots of PDFs, Σ x x ( t ) , Σ v v ( t ) , and Σ x v ( t ) in panel (a) (PDF snapshots are shown for one-standard deviation, using different colors for different times); T x v , T v x , Γ x v , Γ v x in panel (b). Parameter values are D = 0.01 , γ = 1 , ω = 1 .
Figure 4. Snapshots of PDFs, Σ x x ( t ) , Σ v v ( t ) , and Σ x v ( t ) in panel (a) (PDF snapshots are shown for one-standard deviation, using different colors for different times); T x v , T v x , Γ x v , Γ v x in panel (b). Parameter values are D = 0.01 , γ = 1 , ω = 1 .
Entropy 23 01087 g004
Figure 5. Snapshots of PDFs, Σ x x ( t ) , Σ v v ( t ) , and Σ x v ( t ) in panel (a) (PDF snapshots are shown for one-standard deviation, using different colors for different times); T x v , T v x , Γ x v , Γ v x in panel (b). Parameter values are D = 0.01 , γ = 1 , ω = 1 , with an impulse u ( t ) 0 with c = 0.1 and t 0 = 4 .
Figure 5. Snapshots of PDFs, Σ x x ( t ) , Σ v v ( t ) , and Σ x v ( t ) in panel (a) (PDF snapshots are shown for one-standard deviation, using different colors for different times); T x v , T v x , Γ x v , Γ v x in panel (b). Parameter values are D = 0.01 , γ = 1 , ω = 1 , with an impulse u ( t ) 0 with c = 0.1 and t 0 = 4 .
Entropy 23 01087 g005
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Kim, E.-j.; Guel-Cortez, A.-J. Causal Information Rate. Entropy 2021, 23, 1087. https://0-doi-org.brum.beds.ac.uk/10.3390/e23081087

AMA Style

Kim E-j, Guel-Cortez A-J. Causal Information Rate. Entropy. 2021; 23(8):1087. https://0-doi-org.brum.beds.ac.uk/10.3390/e23081087

Chicago/Turabian Style

Kim, Eun-jin, and Adrian-Josue Guel-Cortez. 2021. "Causal Information Rate" Entropy 23, no. 8: 1087. https://0-doi-org.brum.beds.ac.uk/10.3390/e23081087

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop