Next Article in Journal
Non-Equilibrium Entropy and Irreversibility in Generalized Stochastic Loewner Evolution from an Information-Theoretic Perspective
Next Article in Special Issue
Entropy Measures for Data Analysis II: Theory, Algorithms and Applications
Previous Article in Journal
Joint Lossless Image Compression and Encryption Scheme Based on CALIC and Hyperchaotic System
Previous Article in Special Issue
Intelligent Fault Identification for Rolling Bearings Fusing Average Refined Composite Multiscale Dispersion Entropy-Assisted Feature Extraction and SVM with Multi-Strategy Enhanced Swarm Optimization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Generalized Ordinal Patterns and the KS-Entropy

Institute of Mathematics, University of Lübeck, D-23562 Lübeck, Germany
*
Author to whom correspondence should be addressed.
Submission received: 1 July 2021 / Revised: 18 August 2021 / Accepted: 20 August 2021 / Published: 23 August 2021

Abstract

:
Ordinal patterns classifying real vectors according to the order relations between their components are an interesting basic concept for determining the complexity of a measure-preserving dynamical system. In particular, as shown by C. Bandt, G. Keller and B. Pompe, the permutation entropy based on the probability distributions of such patterns is equal to Kolmogorov–Sinai entropy in simple one-dimensional systems. The general reason for this is that, roughly speaking, the system of ordinal patterns obtained for a real-valued “measuring arrangement” has high potential for separating orbits. Starting from a slightly different approach of A. Antoniouk, K. Keller and S. Maksymenko, we discuss the generalizations of ordinal patterns providing enough separation to determine the Kolmogorov–Sinai entropy. For defining these generalized ordinal patterns, the idea is to substitute the basic binary relation ≤ on the real numbers by another binary relation. Generalizing the former results of I. Stolz and K. Keller, we establish conditions that the binary relation and the dynamical system have to fulfill so that the obtained generalized ordinal patterns can be used for estimating the Kolmogorov–Sinai entropy.

1. Introduction

In 2002, Bandt and Pompe introduced so-called permutation entropy [1]. This entropy has been established in non-linear dynamical system theory and time series analysis, including applications in many fields from biomedicine to econophysics (compare with Zanin et al. [2]). It is a crucial point that permutation entropy is theoretically justified by asymptotic results relating it to Kolmogorov–Sinai entropy (KS entropy, also called metric entropy) which is the central complexity measure for dynamical systems. The important relationship of permutation entropy and KS entropy was first observed and mathematically founded for piece-wise monotone dynamical systems by Bandt et al. [3].
The (empirical) concept of permutation entropy is based upon analyzing the distribution of ordinal patterns in a time series or the underlying system. In this paper, we concentrate on a measure-preserving dynamical system  ( Ω , A , μ , T ) , i.e., a probability space ( Ω , A , μ ) equipped with a measurable map T : Ω Ω satisfying μ ( T 1 ( A ) ) = μ ( A ) for all A A .
Given a random variable X : Ω R , in this paper, an ordinal pattern of length n N with respect to X is considered as a subset of the state space Ω . It is indicated by a permutation π = ( π 0 , π 1 , , π n 1 ) of { 0 , 1 , , n 1 } and defined by
P π : = { ω Ω X ( T π 0 ( ω ) ) X ( T π 1 ( ω ) ) X ( T π n 1 ( ω ) ) } .
(Usually, ordinal patterns are defined in the range of X, i.e., for the vectors ( X ( T ( ω ) ) , X ( T 1 ( ω ) ) , , X ( T n 1 ( ω ) ) ) ). The collection of ordinal patterns:
O P ( n ) : = { P π π is a permutation of length n }
is a partition of Ω .
In the rest of this section, we assume that X preserves enough information about the given system in a certain sense. This is particularly the case if Ω is contained in R and X is the identity map. A precise general description of the assumption is given when presenting the results of this paper. It was shown in [4] that, under not too restrictive further conditions, the probability distribution on the partitions O P ( n ) for n N can be used for determining the KS entropy of the given system. The reason is that, roughly speaking, under these conditions, O P ( n ) is able to separate the orbits of the system if n .
In order to address the problem that this paper is concerned with, we give a description of ordinal patterns being slightly different from the above. One can determine to which ordinal pattern P π of length n a point ω belongs to if, for all ( s , t ) in:
E n = { ( s , t ) N 0 2 0 s < t n 1 } ,
one knows whether X ( T s ( ω ) ) X ( T t ( ω ) ) holds true or not. In other words, there exists a set A E n such that:
P π = ( s , t ) A { ω Ω ( X ( T s ( ω ) ) , X ( T t ( ω ) ) ) R } ( s , t ) E n \ A { ω Ω ( X ( T s ( ω ) ) , X ( T t ( ω ) ) ) R 2 \ R } ,
where:
R : = { ( x , y ) R 2 x > y } .
The above set contains all the points ω Ω that satisfy X ( T s ( ω ) ) > X ( T t ( ω ) ) for ( s , t ) A and X ( T s ( ω ) ) X ( T t ( ω ) ) for ( s , t ) E n \ A . Note that, given some arbitrary A E n , the set on the right hand side of (3) can be empty. In the case that it is non-empty, it coincides with some ordinal pattern P π of length n.
While Equation (3) might be a bit more abstract than (1), it shows a way to generalize the concept of ordinal patterns on the basis of replacing the set R in (4) by some arbitrary Borel subset R of R 2 , also to investigate why ordinal patterns are so successful.
Definition 1.
We call a non-empty Borel subset R of R 2 discriminating relation.
The figures given in this paper show different discriminating relations R. In each case, only the part of R contained in [ 0 , 1 [ 2 is presented. Note that in the case that X maps Ω into [ 0 , 1 [ , the restriction of R itself to this part would not change anything. Figure 1a illustrates R as given in (4), again only on [ 0 , 1 [ 2 . In the case of such an R, note that tan ( π ( X 1 / 2 ) ) mapping [ 0 , 1 [ into [ , [ would not make a difference to a given X for our considerations, since order relations and associated partitions are preserved.
Given some discriminating relation, generalized ordinal patterns of length n with respect to X are given as the non-empty sets defined by the right hand side of (3) for some A E n . Obviously, they also form a partition of Ω . The question that arises is what a discriminating relation R should look like, such that those generalized ordinal patterns inhibit the same nice properties the original ordinal patterns had. More precisely, we ask the following question:
Main Question.
Under what conditions on a discriminating relation R the partitions given by the generalized ordinal patterns determine the KS entropy of a dynamical system?
Why is this determination of entropy, which is precisely described by formula (10) in Theorem 1 interesting? For answering this question, interpret X as an observable, ω as the initial state of the given system and X ( ω ) , X ( T ( ω ) ) , X ( T 2 ( ω ) ) , X ( T 3 ( ω ) ) , as the measured values at times 0 , 1 , 2 , 3 , . Determining (generalized) ordinal patterns on the basis of those values is a symbolization, where a symbol obtained is the (generalized) ordinal pattern containing ω . Generally, symbolization means a coarse-graining of the state space underlying a system, where each point is assigned one of finitely many given symbols. Instead of considering the precise development of the system, one is interested in the change of symbols in the course of time, justifying the naming of the method symbolic dynamics. Note that a symbolization is equivalent to partitioning the state space into classes of states (with the same symbol).
The reason for obtaining the full entropy from the (generalized) ordinal patterns is, roughly speaking, that the symbol system obtained has high potential for separating orbits. Such kinds of successful symbolizations are important, for example, in big data analysis, see, e.g., Smith et al. [5].
The above question was first considered in [6], where the authors basically showed that sets of the form:
R = { ( x , y ) R 2 g ( x ) y }
lead to generalized ordinal patterns that, under some conditions, can be used to determine the entropy if g : R R is measurable and one-to-one. Such an R is shown in Figure 1b and will be discussed in Section 4 as well as another R illustrated in Figure 1.
In this paper, we consider general sets R R 2 that cannot necessarily be described by functions and inequalities and establish some conditions under which the entropy can be determined using those sets. As in [6], the discussion also includes a generalization of the sets E n given by (2) and is conducted in a multidimensional framework. In particular, the results give insights as to why the basic ordinal approach and generalizations are working.
It is instructive to discuss the partition of R 2 into R and R 2 \ R from the viewpoint of symbolic dynamics. In contrast to classical symbolization approaches with symbolizing only in the range of single “measurements” x, the symbolization of pairs ( x , y ) via the partition { R , R 2 \ R } also regards some kind of link between x and y if R lies “diagonal” in a certain sense. We will discuss this constellation, which explains the success of ordinal patterns in a wider context, more precisely in Section 5.
A completely different constellation is given for the sets R shown in Figure 2. Here, R is obtained as a half-plane from a “horizontal” division of R 2 . If, for example, Ω = [ 0 , 1 [ , A is the Borel σ -algebra and μ the Lebesgue measure on Ω , and if T is the tent map on Ω , meaning that:
T ( ω ) = 2 ω if 0 ω < 1 2 , 2 2 ω if 1 2 ω < 1 ,
and X is the identity map, then the location of the horizontal cut is substantial.
On the one hand, R = { ( x , y ) R 2 x 2 / 3 } (Figure 2b) does not discriminate enough to obtain the KS entropy of the given system, and on the other hand, there is enough discrimination by R = { ( x , y ) R 2 x 1 / 2 } (Figure 2a) due to the fact that { [ 0 , 1 2 [ , [ 1 2 , 1 [ } is a generating partition for T. In the situation considered, there is no additional information given by the measurements ( x , y ) relative to measurement x, hence R provides nothing more than a classical symbolization. For a detailed discussion of these facts, see [6].
The rest of this paper is organized as follows. Section 2 provides the notions and concepts being necessary for formulating the main statement of this paper in Section 3. This statement is rather abstract and general and has to be considered in relation to some special cases discussed in Section 4 and making our ideas and findings transparent. Section 5 is devoted to the proof of our main statement.

2. Preliminaries

Throughout this paper, ( Ω , A , μ , T ) will be a measure-preserving dynamical system.

2.1. Some Notions

We will write B = B ( R ) or B ( R d ) for the Borel σ -algebra on R or R d ; d N , respectively. Given a random variable X : Ω R , by μ X we denote the push-forward measure of μ with regard to X, i.e., μ X ( A ) : = μ ( X 1 ( A ) ) for all A B ( R ) . The measure μ X × μ X = μ X 2 is the product measure, i.e., μ X 2 ( A × B ) = μ X ( A ) μ X ( B ) for all A , B B ( R ) .
For some Borel set R B ( R 2 ) , we define the function f X R : R [ 0 , 1 ] by
f X R ( x ) : = μ ( { ω Ω ( x , X ( ω ) ) R } ) .
If it is clear from the context which set R B ( R 2 ) is considered, we simply write f X instead of f X R . The function f X can be represented as the integral:
f X ( x ) = 1 R ( x , y ) d μ X ( y ) .
Since 1 R is integrable with regard to μ X 2 , f X is integrable and therefore, also measurable by Fubini’s Theorem.
The complement R 2 \ R of a set R R 2 will be denoted by R c . The notation R will be used for the boundary of a set R, i.e., the closure of R without its interior.

2.2. Entropy

The Shannon entropy of a finite partition P A of Ω is defined as
H ( P ) : = P P μ ( P ) log ( μ ( P ) ) .
The refinement of two partitions P , Q A of Ω is given by
P Q : = { P Q P P , Q Q } .
For a finite collection of partitions P i A , i { 1 , 2 , , n } of Ω , one analogously defines:
i = 1 n P i : = i = 1 n P i P i P i for all i { 1 , 2 , , n } .
The entropy rate of a finite partition P A of Ω is defined as
h ( T , P ) : = lim n 1 n H t = 0 n 1 T t ( P ) ,
where T t ( P ) = { T t ( P ) P P } . For the existence of the limit in the formula, see, e.g., [7]. We are interested in determining the Kolmogorov–Sinai entropy of a system, which is defined as
h ( T ) : = sup P h ( T , P ) ,
where the supremum is taken over all finite partitions of Ω in A .
Note that the Kolmogorov–Sinai entropy serves as the central complexity measure for dynamical systems and can be considered as a reference for other complexity measures, including in data analysis. Roughly speaking, it measures the mean information obtained by each iteration step. Since the Kolmogorov–Sinai entropy is the supremum of the entropy rates of all finite partitions, its determination and its estimation in a practical context are not easy and it is of some interest to find natural finite partitions for which the entropy rate is near the Kolmogorov–Sinai entropy. This is also a motivation for considering ordinal patterns and its generalization in this paper.

2.3. σ -Algebras

Given a family of sets A i A , i I , by σ ( A i i I ) we denote the smallest σ -algebra containing all sets A i . Analogously, for a family of partitions P i A , i I of Ω , we define σ ( P i i I ) as the smallest σ -algebra containing all partitions P i as subsets. Given d-dimensional random vectors ( Y i , 1 , Y i , 2 , , Y i , d ) : Ω R d , i I , we define:
σ ( ( Y i , 1 , Y i , 2 , , Y i , d ) i I ) : = σ ( ( Y i , 1 , Y i , 2 , , Y i , d ) 1 ( B ) i I , B B ( R d ) )
as the smallest σ -algebra containing all preimages of Borel sets. When comparing two σ -algebras A 1 , A 2 A , we ignore sets with measure 0, i.e., we write:
A 1 μ A 2
if for all A 1 A 1 there exists an A 2 A 2 with μ ( ( A 1 \ A 2 ) ( A 2 \ A 1 ) ) = 0 . In this sense, A 1 = μ A 2 means that A 1 μ A 2 and A 2 μ A 1 .

3. The Main Statement

Recall that a measure-preserving dynamical system ( Ω , A , μ , T ) is ergodic if for each B A with T 1 ( B ) = B it holds μ ( B ) { 0 , 1 } , meaning that the system does not divide into proper parts.
Referring to Section 1, we give some preparation for stating our main result. Recall that for defining (generalized) ordinal patterns it was basic to know whether ( X ( T s ( ω ) ) , X ( T s ( ω ) ) ) R or ( X ( T s ( ω ) ) , X ( T s ( ω ) ) ) R 2 \ R for ω Ω and a random variable X on Ω , where “time” pairs ( s , t ) were taken from the sets E n (see (2)). In order to also allow reducing the number of necessary “comparisons”, we relax the definition of the sets E n leading to the following concept.
Definition 2.
We call a sequence ( E n ) n N with E 1 E 2 E 3 N 0 2 timing, if E n contains finitely many elements and if there exists a sequence ( a n ) n N N 0 N with:
lim sup N # n { 1 , 2 , , N } ( a n , a n + n ) k = 1 E k N = 1 .
Formula (7) roughly says that nearly each “temporal” distance is available for “comparisons”. It guarantees that enough time-pairs are considered to not have any loss of information contained in the “thinned out” generalized ordinal patterns relative to the “full” generalized ordinal patterns.
Remark 1.
In the first paper on generalized ordinal patterns ([6]), a timing ( E n ) n N was differently defined: the authors of that paper called a sequence of finite sets E 1 E 2 E 3 N 0 2 timing, if there exists an increasing sequence ( a n ) n N such that for all n N :
(i)
E n { a 0 , a 1 , , a n } 2 ,
(ii)
for all s { a 0 , a 1 , , a n } , there exists a t { a 0 , a 1 , , a n } with s t and ( s , t ) E n ,
(iii)
( i d , T n ) 1 ( R i ) μ σ ( s , t ) E k ( T s , T t ) 1 ( R i ) k N for all i { 1 , 2 , , d }
hold true. Note that the last condition does not only depend on the timing ( E n ) n N but also on T and X = ( X 1 , X 2 , , X d ) . Instead of those three conditions, we instead simply require that almost all differences can be found in the timing.
Given a random vector ( X 1 , X 2 , , X d ) and R B ( R 2 ) , we define the partition:
R i : = ( X i , X i ) 1 ( { R , R c } ) = { ( X i , X i ) 1 ( R ) , ( X i , X i ) 1 ( R c ) }
for all i { 1 , 2 , , d } , which is equal to:
R i = { { ( ω 1 , ω 2 ) Ω 2 ( X i ( ω 1 ) , X i ( ω 2 ) ) R } , { ( ω 1 , ω 2 ) Ω 2 ( X i ( ω 1 ) , X i ( ω 2 ) ) R } } .
Then, for E n as given in (2) the partition ( s , t ) E k ( T s , T t ) 1 ( R i ) is no more than the partition of generalized ordinal patterns with respect to X i defined in Section 1 and i = 1 d ( s , t ) E k ( T s , T t ) 1 ( R i ) can be considered as the partition of generalized ordinal patterns with respect to ( X 1 , X 2 , , X d ) .
The proof of the following main theorem of the paper is given in Section 5.
Theorem 1.
Let ( Ω , A , μ , T ) be an ergodic measure-preserving dynamical system, X = ( X 1 , X 2 , , X d ) : Ω R d be a random vector, R B ( R 2 ) be a discriminating relation and ( E n ) n N be a timing. Assume that the following conditions are valid:
T h e r e   e x i s t s   a   c o u n t a b l e   s e t   C R 2 w i t h : μ X i 2 ( R \ C ) = 0 f o r a l l i { 1 , 2 , , d } .
T h e r e   e x i s t s   a   r a n d o m   v a r i a b l e   Y : Ω R w i t h : # Y ( Ω ) <   a n d   A = μ σ ( f X i R X i T t , Y ) t N 0 , i { 1 , 2 , , d } .
Then:
h ( T ) = lim k h T , i = 1 d ( s , t ) E k ( T s , T t ) 1 ( R i ) .
holds true.
At first glance, conditions (8) and (9), being sufficient for (10), are looking very special. The considerations in the following section will, however, elucidate their role and show that they are relatively general. Roughly speaking, (8) says that the distribution of pairs of “independent measurements” with respect to X i is discrete on the boundary of R. Condition (9) is an orbit separation condition based on the involved “measurements” and the functions f X i R . In general:
A μ σ ( f X i R X i T t , Y ) t N 0 , i { 1 , 2 , , d }
holds true because all functions involved in (9) are A measurable. Therefore, (9) is equivalent to
A μ σ ( f X i R X i T t , Y ) t N 0 , i { 1 , 2 , , d }
The inclusion of the random variable Y provides some further separation and allows the above inclusion to hold true for a wider class of dynamical systems than, for example, the ones considered in [6]. In the case that Y is constant, it can also be omitted. In theory, Y should be chosen to take different values on those sets on which f X i R X i T t takes the same values for i { 1 , 2 , , d } and t N 0 . In practice, the fact that such a random variable Y exists is sufficient and Y does not need to be explicitly specified. An example is given in Section 4.5.

4. Special Cases

In the following, we discuss some special situations where the assumptions of Theorem 1, i.e., (8) and (9), are satisfied. Lemma 2 provides an easy-to-check condition, that of when (8) holds true. It is more difficult to see, when the condition (9) is satisfied. Roughly speaking, this condition is fulfilled if X = ( X 1 , X 2 , , X d ) together with Y can uniquely describe the outcomes of the whole dynamical system and applying f X i to the results of X is, in some sense, “reversible” for all i { 1 , 2 , , d } . In other words, X = ( X 1 , X 2 , , X d ) together with Y preserve the information of the system and there is no information loss for the symbolization. The first means that:
A = μ σ ( X i T t , Y ) t N 0 , i { 1 , 2 , , d } ,
which obviously follows from
A = μ σ ( f X i R X i T t , Y ) t N 0 , i { 1 , 2 , , d } .
To describe the range of outcomes of the random variables X on a probability space ( Ω , A , μ ) , we will use its cumulative distribution functions F X : R [ 0 , 1 ] defined by
F X ( x ) : = μ X ( ] , x ] ) .
When applying the cumulative distribution functions F X to the outcomes X of a system, we do not lose any essential information about the system, according to the following lemma. This lemma is a simple modification of Lemma A.3 in [6].
Lemma 1.
Let ( Ω , A , μ ) be a probability space, X : Ω R be a random variable and g : R R be a B B measurable function which maps Borel sets to Borel sets and satisfies the following property:
F o r   a l l   x R   i t   h o l d s   μ ( X 1 ( g 1 ( g ( ] , x ] ) ) \ ] , x ] ) ) = 0
Then, σ ( g X ) = μ σ ( X ) . In particular, σ ( g X ) = μ σ ( X ) if g = F X or g is injective on X ( Ω ) .
Condition (11) in the above Lemma is a slightly weaker condition on g than injectivity. If g is injective, then g 1 ( g ( ] , x ] ) ) = ] , x ] will hold true for all x R and condition (11) will be satisfied. More general, condition (11) can still be true if g is not necessarily injective but if all sets on which g is not injective, which are given by X 1 ( g 1 ( g ( ] , x ] ) ) \ ] , x ] ) for all x R , have measure 0. For example, this is true if g is equal to the cumulative distribution function.

4.1. On the Boundary of R

The condition (8) in Theorem 1, that the boundary of R apart from countably many points has measure 0, holds true for all “simple” sets R. In the following lemma, we specify what we mean by “simple”.
Lemma 2.
Let ( Ω , A , μ ) be a probability space, ( X 1 , X 2 , , X d ) be a random vector and R B ( R 2 ) . If, for all i { 1 , 2 , , d } :
R ( { x } × R ) is countable for μ X i - almost all x R or R ( R × { y } ) is countable for μ X i - almost all y R ,
then R satisfies (8), i.e., there exists a countable set C R 2 with:
μ X i 2 ( R \ C ) = 0 for all i { 1 , 2 , , d } .
Proof. 
Consider the sets:
A i : = { x R μ X i ( { x } ) > 0 }
for i { 1 , 2 , , d } , which, obviously, are countable. Set:
C : = i = 1 d A i × A i .
Let i { 1 , 2 , , d } . If { y R ( x , y ) R } is countable for μ X i -almost all x R , Fubini’s theorem implies:
μ X i 2 ( R \ C ) = 1 R \ C ( x , y ) d μ X i ( y ) d μ X i ( x ) = μ X i ( { y R ( x , y ) R \ C } ) d μ X i ( x ) = y R : ( x , y ) R \ C μ X i ( { y } ) d μ X i ( x ) = 0 d μ X i ( x ) = 0 .
Analogously, one can show the same if { x R ( x , y ) R } is countable for μ X i -almost all y R . □
Remark 2.
The patterns visualized in Figure 1 could also be defined on the whole real axis instead of a bounded interval by, for example, applying the transformation φ ( x ) = tan ( π ( x 1 / 2 ) ) . Then, R ˜ = ( φ × φ ) ( R ) is a pattern defined on R 2 if R is defined on [ 0 , 1 [ 2 .
In the following three subsections, Y is assumed to be constant, and hence can be omitted.

4.2. Basic Ordinal Patterns

If:
R = { ( x , y ) R 2 x y }
(see Figure 1a), then f X i R is just the distribution function of μ X i , i.e., f X i R = F X i . Since R { x } × R = { ( x , x ) } is finite for all x R , (8) holds true by Lemma 2. According to Lemma 1, one has:
σ ( X i T t ) μ σ ( F X i X i T t )
for all t N 0 and i = 1 , 2 , , n . Therefore:
A = μ σ ( f X i R X i T t ) t N 0 , i { 1 , 2 , , d } = σ ( F X i X i T t ) t N 0 , i { 1 , 2 , , d }
is equivalent to:
A = μ σ ( X i T t ) t N 0 , i { 1 , 2 , , d } .
By Theorem 1, for ergodic systems, condition (14) implies (10). A more general statement also includes a large class of non-ergodic systems which was shown in [4]. Condition (14) is, for example, satisfied if Ω B ( R d ) and X i is the projection on the i-th coordinate for all i { 1 , 2 , , d } , or if Ω is a compact Hausdorff space and X = ( X 1 , X 2 , , X d ) is injective and continuous. One can also use Taken’s theorem to argue that the set of maps X : Ω R d that satisfy (14) is large in a certain topological sense. For both, see Keller [8].

4.3. Patterns Defined by “Injective” Functions

Let X = ( X 1 , X 2 , , X d ) be a random vector and consider now:
R = { ( x , y ) R g ( x ) y }
for a B B measurable function g : R R (see Figure 1b). Since R ( { x } × R ) = { ( x , g ( x ) ) } is finite for all x R , (8) holds true by Lemma 2. Moreover, one easily sees that f X i R = F X i g .
Now, suppose that:
σ ( F X i ) σ ( F X i g )
holds true for all i { 1 , 2 , , d } . This directly yields:
σ ( F X i X i T t ) σ ( F X i g X i T t )
for all i { 1 , 2 , , d } and t N . Remember that σ ( X i T t ) μ σ ( F X i X i T t ) holds true according to Lemma 1. Thus, (14) and (16) imply (10). When considering basic ordinal patterns in Section 4.2, we stated some conditions under which (14) holds true. It remains to consider when (16) is satisfied.
Assume that g maps Borel sets to Borel sets and is injective. This implies:
σ ( g X i T t ) = σ ( X i T t )
for all t N 0 and i { 1 , 2 , , d } . Now, suppose that:
A = σ F X i X i T t t N 0 , i { 1 , 2 , , d } .
holds true. This would then imply (16). However, the above equation only holds true μ -almost surely (see (13)). This can be a problem when applying the function g because there could exist sets B B with μ X i ( B ) = 0 but μ X i ( g ( B ) ) > 0 . Additionally, we therefore need to require that μ X i ( g 1 ( B ) ) = 0 implies μ X i ( B ) = 0 for all B B .
Theorem 1 then provides the following statement:
Corollary 1.
Let ( Ω , A , μ , T ) be an ergodic measure-preserving dynamical system, X = ( X 1 , X 2 , , X d ) : Ω R d be a random vector and ( E n ) n N be a timing. Let further g : R R be a B B measurable function which maps Borel sets to Borel sets, is injective on X i ( Ω ) and satisfies μ X i ( g 1 ( B ) ) = 0 μ X i ( B ) = 0 for all B B and i { 1 , 2 , , d } . Let R = { ( x , y ) R 2 g ( x ) y } .
Then, (14) implies (10). Moreover, (10) holds true if Ω B ( R d ) and X i is the projection on the i-th coordinate for all i { 1 , 2 , , d } or if Ω is a compact Hausdorff space and X = ( X 1 , X 2 , , X d ) is injective and continuous.
Note that the statements in Corollary 1, in principle, were shown in [6]. The case of basic ordinal patterns is included by g ( x ) = x for all x R .

4.4. Patterns Defined by “Surjective” Functions

Swapping coordinates in (15) yields the set:
R = { ( x , y ) R x < g ( y ) }
(see Figure 1c) with (8) following from Lemma 2 and with:
f X i R ( x ) = μ ( { ω Ω ( x , X i ( ω ) ) R } ) = μ ( { ω Ω x < g ( X i ( ω ) ) } ) = μ ( { ω Ω g ( X i ( ω ) ) ] x , [ } ) = 1 F g X i ( x ) .
Corollary 2.
Let ( Ω , A , μ , T ) be an ergodic measure-preserving dynamical system, X = ( X 1 , X 2 , , X d ) : Ω R d be a random vector and ( E n ) n N be a timing. Let further g : R R be a B B measurable function and let R = { ( x , y ) R 2 x < g ( y ) } . Then, the following holds:
(i)
If F g X i is injective on X i ( Ω ) for i { 1 , 2 , , d } , (14) implies (10).
(ii)
If Ω B ( R d ) and X i is the projection on the i-th coordinate for all i { 1 , 2 , , d } or if Ω is a compact Hausdorff space and X = ( X 1 , X 2 , , X d ) is injective and continuous, if further μ ( U ) > 0 for every non-empty open set U Ω , g is continuous and X i ( Ω ) g ( X i ( Ω ) ) , then (10) is valid in each of the following two cases:
(1)
For each i { 1 , 2 , , d } and all x 1 , x 2 X i ( Ω ) with x 1 < x 2 , there exists some y X i ( Ω ) with x 1 < y < x 2 ,
(2)
Ω is connected.
Proof.
(i): If the above assumptions are satisfied and F g X i is injective on X i ( Ω ) for all i { 1 , 2 , , d } , then by Lemma 1 it holds that σ ( X i T t ) μ σ ( F g X i X i T t ) for all t N 0 and i { 1 , 2 , , d } . This implies (9), hence, by Theorem 1 the statement (10).
(ii): Given the assumptions of (ii), we have to show that F g X i is injective on X i ( Ω ) for all i { 1 , 2 , , d } . If Ω is connected, then (1) is obviously satisfied. We can thus start from (1). Take x 1 , x 2 X i ( Ω ) with x 1 < x 2 . Then, g 1 ( ] x 1 , x 2 [ ) is non-empty and because g X i is continuous, X i 1 ( g 1 ( ] x 1 , x 2 ] ) ) contains a non-empty open set. This implies that F g X i ( x 1 ) < F g X i ( x 2 ) . because every non-empty open set was assumed to have a strictly positive measure. □
Notice that, unlike in (15), it is not necessary that g is one-to-one.

4.5. Piecewise Patterns

The previous subsection illustrates that (9) is fulfilled if, roughly speaking, ( X 1 , X 2 , , X d ) preserves all information and if f X i R is a μ X i almost surely invertible function for all i { 1 , 2 , , d } . The finite-valued random variable Y in (9) can be used to weaken the condition of invertibility in the sense that only piecewise invertibility is needed where the different pieces are induced by the random variable Y.
For Ω = [ 0 , 1 [ and an absolutely continuous measure μ , one could, for example, consider:
R circles = { ( x , y ) Ω 2 ( k x mod 1 , k y mod 1 ) ( 0.5 , 0.5 ) 2 0.5 }
for any k N , as shown for k = 5 in Figure 1d. The set R satisfies condition (9) with Y ( ω ) = i for ω [ ( i 1 ) / ( 2 k ) , i / ( 2 k ) [ and i { 1 , 2 , , 2 k } . The set R is a pattern with k 2 circles of diameter 1 / k distributed in [ 0 , 1 ] 2 on a square grid.

4.6. A Remark on the Work of Amigó et al.

Consider the discriminating relation:
R k = { ( x , y ) R 2 k · x k · y }
shown in Figure 3 for k N . Assume for simplicity that the dynamical system is defined on Ω = [ 0 , 1 [ and that X is the identity map i d . It is easy to see that:
σ t = 0 n T t P k n N = μ σ f i d R k T t t N 0
holds true, where P k : = { [ ( i 1 ) / k , i / k [ } i = 1 k . Therefore, (9) in Theorem 1 holds true if P k is a generating partition.
Additionally, one could consider the quantity:
lim k lim inf n 1 n H s = 0 n 1 t = s + 1 n 1 ( T s , T t ) 1 ( { R k , R 2 \ R k } )
which was introduced by Amigó et. al. [9]. They used finite-valued random variables to quantize the dynamical system into k parts and considered the ordinal patterns of the quantized systems while we directly apply the quantization to the discriminating relation. Both approaches only differ in their notation. They showed in their paper that the limit in (18) is equal to the Kolmogorov–Sinai entropy.

5. Proof of the Main Statement

We first recall some definitions and statements related to partitions and the conditional entropy. For two partitions P , Q A of Ω , the conditional entropy is defined as
H ( P | Q ) : = H ( P Q ) H ( Q ) .
Roughly speaking, the conditional entropy H ( P | Q ) describes how much uncertainty is left in the outcomes described by the sets given in P if one already has information about the outcomes described by the sets given in Q . For example, if P = Q , then H ( P | Q ) = 0 . However, if P and Q are independent, meaning that μ ( P Q ) = μ ( P ) · μ ( Q ) for all P P and Q Q , and H ( P | Q ) = H ( Q ) .
Without explicitly referencing them, we will use the following properties of the conditional entropy:
(i)
H ( T 1 ( P ) | T 1 ( Q ) ) = H ( P | Q ) ,
(ii)
H ( i = 1 n P i | Q ) i = 1 n H ( P i | Q ) ,
(iii)
H ( P | Q 1 Q 2 ) H ( P | Q 1 ) .
See, for examples, [7] for proofs.
A sequence of partitions ( P i ) i N in A of Ω is said to be generating (the σ -algebra A ), if σ ( P i ) σ ( P i + 1 ) for all i N and:
σ ( P i i N ) = μ A
holds true. As a consequence of this property:
lim n H ( P P n ) = 0
holds true for all partitions P A of Ω . Using the properties of the conditional entropy implies:
h ( T ) = lim n h ( T , P n ) .
For N N :
U N = ( i 1 ) / 2 N , i / 2 N i { 1 , 2 , , 2 N 1 } ( 2 N 1 ) / 2 N , 1
will denote the partition of [ 0 , 1 ] in 2 N equally sized intervals.
We start the proof of Theorem 1 with two basic lemmata.
Lemma 3.
Let ( Ω , A , μ , T ) be a measure-preserving dynamical system, X = ( X 1 , X 2 , , X d ) be a random vector and Y be a random variable satisfying (9). Then, there exists some constant c R with:
h ( T m ) lim N h T m , i = 1 d t = 0 m 1 T t X i 1 f X i 1 U N + c .
for all m N .
Proof. 
Fix m N . Set:
M : = Y 1 ( y ) y Y ( Ω )
Since Y was assumed to attain only a finite number of different values, M is a finite partition of Ω . Because the Borel σ -algebra of [ 0 , 1 ] is generated by the partitions U N and due to (9), we have:
A = μ σ ( f X i X i T t , Y ) t N 0 , i { 1 , 2 , , d } = σ T t X i 1 f X i 1 U N M N N , t N 0 , i { 1 , 2 , , d } .
Thus, for any ε > 0 and any finite partition P A of Ω , there exists an N ε N and a t ε N with:
h ( T m , P ) h T m , i = 1 d t = 0 t ε 1 T t X i 1 f X i 1 U N M + ε h T m , i = 1 d t = 0 t ε 1 T t X i 1 f X i 1 U N + h T m , M + ε h T m , i = 1 d t = 0 t ε 1 T t X i 1 f X i 1 U N + H M + ε h T m , i = 1 d t = 0 m 1 T t X i 1 f X i 1 U N + H M + ε
for all N N ε . Hence:
h ( T m , P ) lim N h T m , i = 1 d t = 0 m 1 T t X i 1 f X i 1 U N + H M + ε
for any ε > 0 , which implies:
h ( T m ) = sup P h ( T m , P ) lim N h T m , i = 1 d t = 0 m 1 T t X i 1 f X i 1 U N + H M .
Lemma 4.
Let ( Ω , A , μ ) be a probability space, X : Ω R be a random variable and A , B B ( R 2 ) . Then:
f X A f X B d μ X μ X 2 ( A B )
holds true.
Proof. 
For all x R :
μ ( { ω Ω ( x , X ( ω ) ) A Δ B } ) = μ ( { ω Ω ( x , X ( ω ) ) A \ B } ) + μ ( { ω Ω ( x , X ( ω ) ) B \ A } ) μ ( { ω Ω ( x , X ( ω ) ) A \ B } ) μ ( { ω Ω ( x , X ( ω ) ) A } ) μ ( { ω Ω ( x , X ( ω ) ) B } ) = f X A ( x ) f X B ( x )
holds true. Analogously, one can show:
μ ( { ω Ω ( x , X ( ω ) ) A Δ B } ) f X B ( x ) f X A ( x ) .
This implies:
μ ( { ω Ω ( x , X ( ω ) ) A Δ B } ) f X A ( x ) f X B ( x )
and, by Fubini’s theorem:
f X A f X B d μ X ( x ) μ ( { ω Ω ( x , X ( ω ) ) A Δ B } ) d μ X ( x ) = 1 { ω Ω ( x , X ( ω ) ) A Δ B } ( ω ) d μ ( ω ) d μ X ( x ) = 1 { y R ( x , y ) A Δ B } ( y ) d μ X ( y ) d μ X ( x ) = R 2 1 A Δ B ( x , y ) d μ X 2 ( x , y ) = μ X 2 ( A Δ B ) .
Therefore, in particular, the above lemma implies that, if ( R j ) j N is a sequence of sets in B ( R 2 ) with lim j μ X 2 ( R j R ) = 0 , then f X R j converges to f X R in L 1 for j .
Given R R 2 and a random variable X : Ω R , consider the function f X , n R : Ω × R [ 0 , 1 ] with:
f X , n R ( x , ω ) : = 1 n # { t { 1 , 2 , , n } ( x , X ( T t ( ω ) ) ) R } .
We want to show that f X , n R ( x , ω ) converges to f X R ( x ) for all x R and μ -almost all ω Ω . If f X , n R ( x , ω ) is monotone in x for all ω Ω and n N , this can be shown relatively easily using the pointwise ergodic theorem and the monotonicity of the considered functions. Monotonicity is guaranteed, if x 1 x 2 implies:
{ y R ( x 1 , y ) R } { y R ( x 2 , y ) R } .
For example, if R = { ( x , y ) R 2 x > y } , the above implication holds true. For this special case, a proof of the statement in Lemma 5 can be found in [4].
However, we are interested in general sets R B ( R 2 ) and therefore, cannot use the monotonicity. Therefore, we have to prove this statement differently.
Lemma 5.
Let ( Ω , A , μ , T ) be an ergodic measure-preserving dynamical system, X = ( X 1 , X 2 , , X d ) be a random vector and R B ( R 2 ) satisfy (8). Then, for all i { 1 , 2 , , d } , there exist sets Ω ˜ A and B B ( R ) with μ ( Ω ˜ ) = μ X i ( B ) = 1 satisfying:
lim n f X i , n R ( x , ω ) = f X i R ( x )
for all ω Ω ˜ and x B .
Proof. 
Fix i { 1 , 2 , , d } . According to (8), there exists a countable set
C = { ( a k , b k ) k N } with:
μ X i 2 ( R \ C ) = 0 .
By the pointwise ergodic theorem (see, e.g., [10]), for all j , k N , there exists Ω j , k * A with μ ( Ω j , k * ) = 1 and:
lim n f X i , n { ( a k , b k ) } ( a j , ω ) = f X i { ( a k , b k ) } ( a j )
for all ω Ω j , k * . It is easy to see that:
f X i , n { ( a k , b k ) } ( x , ω ) = 0 = f X i { ( a k , b k ) } ( x )
holds true for all n N and ω Ω if x a k . Hence:
lim n f X i , n { ( a k , b k ) } ( x , ω ) = f X i { ( a k , b k ) } ( x )
for all x R and ω j = 1 Ω j , k * . Using Fatou’s lemma and the fact that C is countable implies:
lim inf n f X i , n R C ( x , ω ) = lim inf n k N : ( a k , b k ) R f X i , n { ( a k , b k ) } ( x , ω ) k N : ( a k , b k ) R lim inf n f X i , n { ( a k , b k ) } ( x , ω ) = ( 20 ) k N : ( a k , b k ) R f X i { ( a k , b k ) } ( x ) = f X i R C ( x )
for all x R and ω j = 1 k = 1 Ω j , k * . We will use this fact later.
Since R \ R is open, there exists a countable collection of pairwise disjoint rectangles A j R 2 with:
R \ R = j = 1 A j .
Take ( x j , y j ) A j for all j N . Using the pointwise ergodic theorem, for all j N there exists a set Ω j A with μ ( Ω j ) = 1 and:
lim n f X i , n A j ( x j , ω ) = f X i A j ( x j )
for all ω Ω j . Because A j is a rectangle, for all ω Ω :
f X i , n A j ( x , ω ) = f X i , n A j ( x j , ω ) and f X i A j ( x ) = f X i A j ( x j )
holds true for all x R with { x } × R A i and:
f X i , n A j ( x , ω ) = f X i A j ( x ) = 0
holds true for all x R with R × { x } A i = . Together with (23), this implies:
lim n f X i , n A j \ C ( x , ω ) = f X i A j \ C ( x )
for all x R and ω Ω j .
Set R J : = j = 1 J A j . Lemma 4 provides:
lim J f X i R J \ C ( x ) f X i R \ C ( x ) d μ X ( x ) = lim J μ X i 2 ( ( R J \ C ) ( R \ C ) ) = lim J μ X i 2 ( ( R \ C ) \ ( R J \ C ) ) = lim J μ X i 2 ( R \ ( R J C ) ) = lim J μ X i 2 ( ( ( R R ) ( R \ R ) ) \ ( R J C ) ) = lim J μ X i 2 ( ( R R ) \ ( R J C ) ) + μ X i 2 ( ( R \ R ) \ ( R J C ) ) lim J μ X i 2 ( R \ C ) + μ X i 2 ( ( R \ R ) \ R J ) = ( 8 ) lim J μ X i 2 ( ( R \ R ) \ R J ) = ( 22 ) μ X i 2 ( ( R \ R ) \ ( R \ R ) ) = 0 .
Therefore, there exists a set B 1 with μ X i ( B 1 ) = 1 and a sequence ( J k ) k N with:
lim k f X i R J k \ C ( x ) = f X i R \ C ( x )
for all x B 1 . Thus:
lim inf n f X i , n R ( x , ω ) lim inf n f X i , n R C ( x , ω ) + lim inf n f X i , n R \ C ( x , ω ) ( 21 ) f X i R C ( x ) + lim inf n f X i , n R J k \ C ( x , ω ) f X i R C ( x ) + lim k lim inf n f X i , n R J k \ C ( x , ω ) = f X i R C ( x ) + lim k lim inf n j = 1 J k f X i , n A j \ C ( x , ω ) f X i R C ( x ) + lim k j = 1 J k lim inf n f X i , n A j \ C ( x , ω ) = ( 24 ) f X i R C ( x ) + lim k j = 1 J k f X i A j \ C ( x ) = f X i R C ( x ) + lim k f X i R J k \ C ( x ) = ( 25 ) f X i R C ( x ) + f X i R \ C ( x ) = f X i R ( x )
for all x B 1 and ω Ω ˜ 1 : = j = 1 k = 1 Ω j Ω j , k * . Because R c \ R is open as well, one can analogously show that there exist sets Ω ˜ 2 A and B 2 B ( R ) with μ ( Ω ˜ 2 ) = μ X i ( B 2 ) = 1 and:
lim inf n f X i , n R c ( x , ω ) f X i R c ( x )
for all x B 2 and ω Ω ˜ 2 . This implies:
lim sup n f X i , n R ( x , ω ) = 1 lim inf n f X i , n R c ( x , ω ) 1 f X i R c ( x ) = f X i R ( x ) .
Hence:
f X i R ( x ) lim inf n f X i , n R ( x , ω ) lim sup n f X i , n R ( x , ω ) f X i R ( x )
for all x B : = B 1 B 2 and ω Ω ˜ : = Ω ˜ 1 Ω ˜ 2 . □
Given a random vector ( X 1 , X 2 , , X d ) and R B ( R 2 ) , we define the partition:
R i : = ( X i , X i ) 1 ( { R , R c } )
for all i { 1 , 2 , , d } , which is equal to:
R i = { { ( ω 1 , ω 2 ) Ω 2 ( X i ( ω 1 ) , X i ( ω 2 ) ) R } , { ( ω 1 , ω 2 ) Ω 2 ( X i ( ω 1 ) , X i ( ω 2 ) ) R } } .
Lemma 1.
Let ( Ω , A , μ , T ) be an ergodic measure-preserving dynamical system, X = ( X 1 , X 2 , , X d ) : Ω R d be a random vector, ( E n ) n N be a timing and R B ( R 2 ) satisfying (8). Then, there exists a sequence ( n k ) k N N 0 N with:
lim k H T n k ( f X i X i ) 1 ( U N ) v = 0 n k T v ( s , t ) E k ( T s , T t ) 1 ( R i ) = 0
for all N N and i { 1 , 2 , , d } .
Proof. 
Because ( E n ) n N is a timing, there exist a sequence ( a n ) n N N 0 N with:
lim sup N # n { 1 , 2 , , N } ( a n , a n + n ) k = 1 E k N = 1 .
So one can find a strictly increasing sequence ( N n ) n N N N with:
( a N n , a N n + N n ) k = 1 E k
for all n N and:
lim sup n n N n = 1 .
Now, fix i { 1 , 2 , , d } . According to Lemma 5, there exist sets Ω ˜ A and B B ( R ) with μ ( Ω ˜ ) = μ X i ( B ) = 1 satisfying:
lim n f X i , n R ( ω , x ) = f X i R ( x )
for all ω Ω ˜ and x B . Set
Ω 0 : = Ω ˜ X i 1 ( B ) .
Consider the function ϕ n : Ω [ 0 , 1 ] with:
ϕ n ( ω ) : = 1 N n # { t { N 1 , N 2 , , N n } ( X i ( T t ( ω ) ) , X i ( ω ) ) R } .
Then:
f X i R ( X i ( ω ) ) = lim n f X i , n R ( ω , X i ( ω ) ) = lim sup n f X i , N n R ( ω , X i ( ω ) ) lim sup n ϕ n ( ω ) = lim sup n f X i , N n R ( ω , X i ( ω ) ) 1 N n # { t { 1 , 2 , , N n } \ { N 1 , N 2 , , N n } ( X i ( T t ( ω ) ) , X i ( ω ) ) R } lim sup n f X i , N n R ( ω , X i ( ω ) ) 1 N n # { 1 , 2 , , N n } \ { N 1 , N 2 , , N n } = lim sup n f X i , N n R ( ω , X i ( ω ) ) N n n N n = lim n f X i , N n R ( ω , X i ( ω ) ) 1 + lim sup n n N n = ( 27 ) f X i R ( X i ( ω ) )
for all ω Ω 0 .
It is easy to see that:
σ ϕ n σ t = 1 k i d , T N t 1 ( R i ) k N
holds true for all n N . This implies (see for instance [11], Theorem 13.4 (i)):
σ f X i R X i = μ ( 28 ) σ lim sup n ϕ n σ t = 1 k i d , T N t 1 ( R i ) k N .
Therefore t = 1 k i d , T N t 1 ( R i ) is a sequence of partitions generating σ f X i R X i . By (19), this implies:
lim k H ( f X i X i ) 1 ( U N ) t = 1 k i d , T N t 1 ( R i ) = 0
for all N N . Set:
n k : = max 1 n k a N n .
Notice that:
σ T n k t = 1 k i d , T N t 1 ( R i ) σ v = 0 n k T v ( s , t ) E k ( T s , T t ) 1 ( R i )
holds true for all k N . Consequently:
lim k H T n k ( f X i X i ) 1 ( U N ) v = 0 n k T v ( s , t ) E k ( T s , T t ) 1 ( R i ) lim k H T n k ( f X i X i ) 1 ( U N ) T n k t = 1 k i d , T N t 1 ( R i ) = lim k H ( f X i X i ) 1 ( U N ) t = 1 k i d , T N t 1 ( R i ) = ( 29 ) 0
for all N N . □
We can now finalize the proof of Theorem 1.
Proof of Theorem 1.
Let N N and m N . Set:
P N i : = X i 1 f X i 1 U N
and:
Q k i : = ( s , t ) E k ( T s , T t ) 1 ( R i )
for all i { 1 , 2 , , d } and k N . According to Lemma 1, there exists a sequence ( n k ) k N N 0 N with:
lim k H T n k P N i v = 0 n k T v Q k i = 0
for all i { 1 , 2 , , d } . We have:
lim n 1 n H i = 1 d u = 0 n m 1 T u P N i i = 1 d u = 0 n m 1 T u Q k i i = 1 d lim n 1 n H u = 0 n m 1 T u P N i u = 0 n m 1 T u Q k i = i = 1 d lim n 1 n H u = 0 n m 1 n k T u T n k P N i u = 0 n m 1 n k T u v = 0 n k T v Q k i i = 1 d lim n 1 n u = 0 n m 1 n k H T u T n k P N i v = 0 n k T v Q k i = i = 1 d lim n n m 1 n k n H T n k P N i v = 0 n k T v Q k i i = 1 d m · H T n k P N i v = 0 n k T v Q k i
for all k , m , N N . This implies:
h T m , i = 1 d u = 0 m 1 T u P N i lim k h T m , i = 1 d u = 0 m 1 T u Q k i lim k lim n 1 n H i = 1 d u = 0 n m 1 T u P N i i = 1 d u = 0 n m 1 T u Q k i lim k i = 1 d m · H T n k P N i v = 0 n k T v Q k i = ( 30 ) 0 .
Using Lemma 3, we can conclude that there exists a constant c R with:
h ( T m ) lim k h T m , i = 1 d u = 0 m 1 T u ( s , t ) E k ( T s , T t ) 1 ( R i ) + c = lim k m · h T , i = 1 d ( s , t ) E k ( T s , T t ) 1 ( R i ) + c
for all m N . Thus:
h ( T ) lim k h T , i = 1 d ( s , t ) E k ( T s , T t ) 1 ( R i ) = lim m 1 m · h ( T m ) lim k h T , i = 1 d ( s , t ) E k ( T s , T t ) 1 ( R i ) lim m 1 m · c = 0 ,
which is equivalent to:
h ( T ) lim k h T , i = 1 d ( s , t ) E k ( T s , T t ) 1 ( R i ) .
On the other hand:
h ( T ) = sup P h ( T , P ) lim k h T , i = 1 d ( s , t ) E k ( T s , T t ) 1 ( R i ) ,
which, together with (31), finishes the proof. □

6. Conclusions

We discussed a special “two-dimensional” approach to symbolic dynamics differing from many usual approaches which was introduced in [6]. From the practical viewpoint, the difference can be illustrated as follows: given the time-dependent measurements of a real-valued quantity, a symbolization is not conducted for the measurements themselves as in usual approaches, but for pairs of measurements at two different times. This means that to each pair of possible measured values, a symbol from a finite symbol set is assigned. Here, we only considered two symbols which lead to a partitioning of the two-dimensional real space R 2 into a set R and its complement R 2 \ R . In usual approaches, partitions of R are considered. (Advantages of the “two-dimensional” approach are described in [6]).
The set R, called a discriminating relation, was considered as a basic building block for constructing partitions of the state space of a given dynamical system, having time-dependent measurements of finitely many quantities in mind. In addition to the discrimination relation, the second central concept was the concept of a timing which roughly describes which pairs of times are included in the symbolization process and guarantees that there are not too few such pairs. The central question of the paper was that of under which conditions on a discriminating relation R the partitions constructed from R determine the KS entropy of a measure-preserving dynamical system. With Theorem 1, we gave a relatively general statement partially answering this question. Some specifications of the theorem in Section 4 illustrate the nature of “successful” discriminating relations.
Although the statement of Theorem 1 appears relatively natural when looking at the proofs a little closer, we do not expect that all cases where the K-S entropy can be constructed based on a discriminating relation is covered by the statement; however, we have no counterexample. The main tool used in the proofs of the results is the pointwise ergodic theorem. It allows to establish a connection between the generalized ordinal patterns and the shape of the discriminating relation.
The results of this paper, being on a rather abstract level, give some insights as to why the idea of ordinal patterns is working well, as reported by several applied papers, with extracting those advantageous features being more general than in the original ordinal approach. Having many choices for a discriminating relation, for practical purposes such as, for example, in a classification context, one needs methods and criteria for finding good discrimination relations, adapted to given data and problems. This is an important challenge for further research related to the given approach to symbolic dynamics. A further aspect is to discuss the approach for partitioning the R 2 into more than two pieces.

Author Contributions

T.G. and K.K. designed and wrote the paper and T.G. provided the proofs. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bandt, C.; Pompe, B. Permutation Entropy: A Natural Complexity Measure for Time Series. Phys. Rev. Lett. 2002, 88, 174102. [Google Scholar] [CrossRef]
  2. Zanin, M.; Zunino, L.; Rosso, O.A.; Papo, D. Permutation Entropy and Its Main Biomedical and Econophysics Applications: A Review. Entropy 2012, 14, 1553–1577. [Google Scholar] [CrossRef]
  3. Bandt, C.; Keller, G.; Pompe, B. Entropy of interval maps via permutations. Nonlinearity 2002, 15, 1595–1602. [Google Scholar] [CrossRef]
  4. Antoniouk, A.; Keller, K.; Maksymenko, S. Kolmogorov-Sinai entropy via separation properties of order-generated σ-algebras. Discret. Contin. Dyn. Syst. A 2014, 34, 1793–1809. [Google Scholar] [CrossRef]
  5. Smith, G.; Goulding, J.; Barrack, D. Towards Optimal Symbolization for Time Series Comparisons. In Proceedings of the 2013 IEEE 13th International Conference on Data Mining Workshops, Dallas, TX, USA, 7–10 December 2013; pp. 646–653. [Google Scholar] [CrossRef]
  6. Stolz, I.; Keller, K. A General Symbolic Approach to Kolmogorov-Sinai Entropy. Entropy 2017, 19, 675. [Google Scholar] [CrossRef] [Green Version]
  7. Walters, P. An introduction to ergodic theory. In Graduate Texts in Mathematics; Springer: New York, NY, USA, 1982; Volume 79. [Google Scholar]
  8. Keller, K. Permutations and the Kolmogorov-Sinai entropy. Discret. Contin. Dyn. Syst. 2012, 32, 891–900. [Google Scholar] [CrossRef]
  9. Amigó, J.M.; Kennel, M.B.; Kocarev, L. The permutation entropy rate equals the metric entropy rate for ergodic information sources and ergodic dynamical systems. Phys. D 2005, 210, 77–95. [Google Scholar] [CrossRef] [Green Version]
  10. Cornfeld, I.P.; Fomin, S.V.; Sinai, Y.G. Ergodic Theory, 1st ed.; Springer: New York, NY, USA, 1982. [Google Scholar]
  11. Billingsley, P. Probability and Measure, 2nd ed.; John Wiley and Sons: Hoboken, NJ, USA, 1986. [Google Scholar]
Figure 1. This figure illustrates some special discriminating relations R (striped areas) considered in Section 4. Only the part of R contained in [0, 1[2 is shown (compare the corresponding remarks in Section 4).
Figure 1. This figure illustrates some special discriminating relations R (striped areas) considered in Section 4. Only the part of R contained in [0, 1[2 is shown (compare the corresponding remarks in Section 4).
Entropy 23 01097 g001
Figure 2. “Non-diagonal” discriminating relations.
Figure 2. “Non-diagonal” discriminating relations.
Entropy 23 01097 g002
Figure 3. R k : = { ( x , y ) R 2 k · x k · y } for k = 8 (left side) and k = 16 (right side), only shown in [ 0.1 ] 2 .
Figure 3. R k : = { ( x , y ) R 2 k · x k · y } for k = 8 (left side) and k = 16 (right side), only shown in [ 0.1 ] 2 .
Entropy 23 01097 g003
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Gutjahr, T.; Keller, K. Generalized Ordinal Patterns and the KS-Entropy. Entropy 2021, 23, 1097. https://0-doi-org.brum.beds.ac.uk/10.3390/e23081097

AMA Style

Gutjahr T, Keller K. Generalized Ordinal Patterns and the KS-Entropy. Entropy. 2021; 23(8):1097. https://0-doi-org.brum.beds.ac.uk/10.3390/e23081097

Chicago/Turabian Style

Gutjahr, Tim, and Karsten Keller. 2021. "Generalized Ordinal Patterns and the KS-Entropy" Entropy 23, no. 8: 1097. https://0-doi-org.brum.beds.ac.uk/10.3390/e23081097

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop