## 1. Introduction

^{N}, called prediction space. In our formulation, statistical complexity is the marginal entropy of the prediction process. This is equivalent to the classical definition as entropy of a certain partition of the past. We only replace equivalence classes with the respective induced probabilities on the future. In this section, we also show that the discrete (and thus technically vastly simplified) version of the prediction process has a continuous Markov transition kernel (Proposition 5).

## 2. Prediction Dynamic and Statistical Complexity

_{k}, as the projections on ${\Delta}^{\mathbb{Z}}$. If not stated otherwise, product spaces are equipped with product and spaces of probability measures are equipped with weak-$\ast $-topology. We use the arrow $\stackrel{*}{\rightharpoonup}$ to denote weak-$\ast $ convergence.

#### 2.1. Discrete Version of Knight’s Prediction Process

**Definition 1.**

**prediction process**of ${X}_{\mathbb{Z}}$. $\mathcal{P}({\Delta}^{\mathbb{N}})$ is called

**prediction space**.

**Proposition 2.**

**Definition 3.**

**prediction dynamic**.

**Lemma 4.**

**Proof.**

_{z}id clopen with z(Ω

_{z}) = 1. Uniform convergence on compacta is equivalent to ${\varphi}_{{z}_{n}}({\omega}_{n})\stackrel{*}{\rightharpoonup}{\varphi}_{z}(\omega )$ whenever ω

_{n}→ ω in Ω

_{z}. For sufficiently large n, ζ

_{1}(ω

_{n}) = ζ

_{1}(ω) and because ς

^{−1}maps cylinder sets to cylinder sets, ${\varphi}_{{z}_{n}}({\omega}_{n})=\frac{{z}_{n}({A}_{\omega}\cap {\varsigma}^{-1}(\xb0))}{{z}_{n}({A}_{\omega})}\stackrel{*}{\rightharpoonup}{\varphi}_{z}(\omega )$.

**Proposition 5.**

#### 2.2. Statistical Complexity

**resolvent**or

**barycentre map**(see [15]) and $\mathrm{id}$ is the identity map. Here, measure valued integrals are Gel’fand integrals. That is, $\mu =\int K\phantom{\rule{0.166667em}{0ex}}\mathrm{d}\nu $ for some kernel K means $\int f\phantom{\rule{0.166667em}{0ex}}\mathrm{d}\mu =\int \int f\phantom{\rule{0.166667em}{0ex}}\mathrm{d}K{\scriptstyle (\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}})}\phantom{\rule{0.166667em}{0ex}}\mathrm{d}\nu $ for all continuous, real-valued f or, equivalently, $\mu (B)=\int K(\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}};\phantom{\rule{0.166667em}{0ex}}B)\phantom{\rule{0.166667em}{0ex}}\mathrm{d}\nu $ for all measurable sets B. $z=r(\nu )$ means that z is a mixture (convex combination) of other processes, and the mixture is described by $\nu $. A trivial representation for z is given by ${\delta}_{z}$, the Dirac measure in z. The measure $\nu $ is called

**S-invariant**if $\nu S=\nu $, where $\nu S:=\int S\phantom{\rule{0.166667em}{0ex}}\mathrm{d}\nu $. In other words, it is S-invariant if the iteration with the prediction dynamic S does not change it. We see in the following lemma that general iteration with S shifts the represented measure, i.e., $\nu S$ represents $z\circ {\varsigma}^{-1}$.

**Lemma 6.**

**Definition 7.**

**causal state distribution**${\mu}_{\mathfrak{C}}(P)$ is the marginal distribution of the prediction process, i.e., ${\mu}_{\mathfrak{C}}(P):=P\circ {Z}_{0}^{-1}\in \mathcal{P}(\mathcal{P}({\Delta}^{\mathbb{N}}))$.

**Lemma 8.**

**Remark.**

**Example 9.**

**Definition 10.**

**statistical complexity**of P.

**Lemma 11.**

#### 3. Partially Deterministic HMMs

#### 3.1. HMMs

_{0}is μ-distributed and the joint process is Markovian with

**invariant**and extend the generated processes to stationary processes ${X}_{\mathbb{Z}}$ and ${W}_{\mathbb{Z}}$. We need some further notation.

**Definition 12.**

- a)
- The
**output kernel**K: $M\to \mathcal{P}(\Delta )$ is defined by $K(m):={K}_{m}:=T(m;\xb7\times M)\in \mathcal{P}(\Delta )$. We also use the notations $\overline{{K}_{d}}(m):={K}_{m}(d):={K}_{m}(\{d\})$ and ${K}_{\nu}:=\int Kd\nu $. - b)
- The
**internal operators**${L}_{d}:\mathcal{P}(M)\to \mathcal{P}(M)\bigcup \left\{0\right\}$ are defined as follows. ${L}_{d}(\nu )=0$ and$${L}_{d}(\nu )(B):=\frac{\int T(\xb7;\{d\}\times B)\text{d}\nu}{{K}_{\nu}(d)}\text{otherwise}.$$

**Remark.**

- a)
- ${K}_{m}$ is the distribution of the next output symbol when the internal state is m, i.e. ${K}_{m}=P({X}_{1}\mid {W}_{0}=m)\phantom{\rule{0.166667em}{0ex}}$ a.s. Further, ${K}_{\mu}$ is the law of ${X}_{1}$.
- b)
- The internal operator ${L}_{d}$ describes the update of knowledge of the internal state when the symbol $d\in \Delta $ is observed. For Dirac measures, we obtain$${L}_{d}({\delta}_{m})=P({W}_{1}\mid {W}_{0}=m,\phantom{\rule{0.277778em}{0ex}}{X}_{1}=d)\phantom{\rule{1.em}{0ex}}\text{a.s.}$$
_{d}is not induced by a kernel in the following sense. There is no kernel ${l}_{d}:M\to \mathcal{P}(M)$ such that ${L}_{d}(\nu )=\int {l}_{d}\text{d}\nu $. To see this, note that ${L}_{d}(\nu )\ne \int {L}_{d}\u20d8\iota \text{d}\nu $ for $\iota (m)={\zeta}_{m}$, because L_{d}(ν) is normalised outside the integral as opposed to an individual normalisation of the L_{d}(ζ_{m}) inside the integral on the right-hand side.

**Lemma 13.**

**Definition 14.**

**Remark.**

**Lemma 15.**

- a)
- $\phantom{\rule{0.277778em}{0ex}}{Y}_{1}(\omega )\phantom{\rule{0.277778em}{0ex}}=\phantom{\rule{0.277778em}{0ex}}{L}_{{X}_{1}(\omega )}({Y}_{0}(\omega ))\phantom{\rule{1.em}{0ex}}$ a.s.
- b)
- $\phantom{\rule{0.277778em}{0ex}}P(\{\phantom{\rule{0.166667em}{0ex}}{X}_{1}=d\phantom{\rule{0.166667em}{0ex}}\}|{Y}_{0})(\omega )\phantom{\rule{0.277778em}{0ex}}=\phantom{\rule{0.277778em}{0ex}}P(\{\phantom{\rule{0.166667em}{0ex}}{X}_{1}=d\phantom{\rule{0.166667em}{0ex}}\}|{X}_{-{\mathbb{N}}_{0}})(\omega )\phantom{\rule{0.277778em}{0ex}}=\phantom{\rule{0.277778em}{0ex}}{K}_{{Y}_{0}(\omega )}(d)\phantom{\rule{1.em}{0ex}}$ a.s.

- a)
- Let $d={X}_{1}(\omega )$ and for $B\in \mathfrak{B}(M)$ set ${F}_{B}:=\{\phantom{\rule{0.166667em}{0ex}}{X}_{1}=d,\phantom{\rule{0.277778em}{0ex}}{W}_{1}\in B\phantom{\rule{0.166667em}{0ex}}\}$. We obtain a.s.$${L}_{d}({Y}_{0})(B)\phantom{\rule{0.277778em}{0ex}}\stackrel{(5)}{=}\phantom{\rule{0.277778em}{0ex}}\frac{P({F}_{B}\mid {X}_{-{\mathbb{N}}_{0}})}{P({F}_{M}\mid {X}_{-{\mathbb{N}}_{0}})}\phantom{\rule{0.277778em}{0ex}}\stackrel{\text{(}d={X}_{1}(\omega ))}{=}\phantom{\rule{0.277778em}{0ex}}P(\{\phantom{\rule{0.166667em}{0ex}}{W}_{1}\in B\phantom{\rule{0.166667em}{0ex}}\}|{X}_{-{\mathbb{N}}_{0}},\phantom{\rule{0.166667em}{0ex}}{X}_{1})\phantom{\rule{0.277778em}{0ex}}=\phantom{\rule{0.277778em}{0ex}}{Y}_{1}(\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}})(B)$$
- b)
- The second equality follows directly from (5). The first follows because, due to the second equality, $P(\{\phantom{\rule{0.166667em}{0ex}}{X}_{1}=d\phantom{\rule{0.166667em}{0ex}}\}|{X}_{-{\mathbb{N}}_{0}})$ is $\sigma ({Y}_{0})$-measurable modulo P. □

**Lemma 16.**

#### Partial Determinism

**Definition 17.**

**partially deterministic**if there is a measurable function $f:M\times \Delta \to M$, called

**transition function**, such that $T(m)={K}_{m}\otimes {\delta}_{{f}_{m}(\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}})}$ for all $m\in M$, i.e.,

**Remark.**

**Proposition 18.**

_{k}and X

_{K}

_{+1}are independent given ${Y}_{k}=\nu $, i.e. $K{\upharpoonright}_{supp(\nu )}$ is constant. □

**Example 19.**(shift HMM)

**Corollary 20.**

^{n}, the internal space is M, whereas the output and internal processes ${\widehat{X}}_{Z}$ and ${\widehat{W}}_{Z}$ are given by ${\widehat{X}}_{k}={X}_{[(k-1)n+1,kn]}$ and ${\widehat{W}}_{k}={W}_{nk}$. This is achieved by the HMM $(\widehat{T},\mu )$ with $\widehat{T}:M\to \mathcal{P}({\Delta}^{n}\times M)$, $\widehat{T}(m)=P({X}_{[1,n]},{W}_{n}\mid {W}_{0}=m)$. The HMM is obviously partially deterministic with transition function ${f}_{{d}_{n}}\circ \cdots \circ {f}_{{d}_{1}}$ and invariant. Thus Proposition 18 implies that $P({X}_{[1,n]}\mid {W}_{0}=\xb7)=P({\widehat{X}}_{1}\mid {\widehat{W}}_{0}=\xb7)$ is constant on $supp({\widehat{Y}}_{0})$. Because we can couple the processes such that ${\widehat{Y}}_{0}={Y}_{0}$, the claim follows. □

#### 3.3. Representations on Prediction Space

**Definition 21.**

**Lemma 22.**

**Remark.**($\epsilon $-machine).

**Proposition 23.**

**Example 24.**

**Example 25.**(lifted shift).

#### 4. Properties of the Statistical Complexity Functional

**Proposition 26.**

**Corollary 27**

**Corollary 28.**

**Corollary 29.**

**Example 30.**

**Lemma 31.**

**Theorem 32.**

**Corollary 33.**

**Example 34.**

**Example 35.**

## Appendix

## Acknowledgements

## References

- Olbrich, E.; Bertschinger, N.; Ay, N.; Jost, J. How Should Complexity Scale with System Size? Eur. Phys. J. B
**2008**, 63, 407–415. [Google Scholar] - Jänicke, H.; Wiebel, A.; Scheuermann, G.; Kollmann, W. Multifield Visualization Using Local Statistical Complexity. IEEE Trans. Visual. Comput. Gr.
**2007**, 13, 1384–1391. [Google Scholar] - Crutchfield, J.P.; Young, K. Inferring Statistical Complexity. Phys. Rev. Let.
**1989**, 63, 105–108. [Google Scholar] - Shalizi, C.R.; Crutchfield, J.P. Computational Mechanics: Pattern and Prediction, Structure and Simplicity. J. Statist. Phys.
**2001**, 104, 817–879. [Google Scholar] - Ay, N.; Crutchfield, J.P. Reductions of Hidden Information Sources. J. Statist. Phys.
**2005**, 120, 659–684. [Google Scholar] - Clarke, R.W.; Freeman, M.P.; Watkins, N.W. Application of Computational Mechanics to the Analysis of Natural Data: An Example in Geomagnetism. Phys. Rev. E
**2003**, 67, 016203.1–016203.15. [Google Scholar] - Hopcroft, J.; Ullman, J. Introduction to Automata Theory, Language, and Computation; Addison-Wesely: Reading, Massachusetts, USA, 1979. [Google Scholar]
- Keller, G. Equilibrium States in Ergodic Theory; London Mathematical Society: New York, USA, 1998. [Google Scholar]
- Dębowski, Ł. Ergodic Decomposition of Excess Entropy and Conditional Mutual Information. IPI PAN Reports, nr 993. 2006. [Google Scholar]
- Dębowski, Ł. A General Definition of Conditional Information and Its Application to Ergodic Decomposition. Stat. Probab. Lett.
**2009**, 79, 1260–1268. [Google Scholar] - Knight, F. A Predictive View of Continuous Time Processes. The Annals of Probability
**1975**, 573–596. [Google Scholar] - Knight, F. Foundations of the Prediction Process; Oxford Science Publications: New York, USA, 1992. [Google Scholar]
- Meyer, P. La théorie de la prédiction de F. Knight. Seminaire de Probabilités
**1976**, X, 86–103. [Google Scholar] - Knight, F. Essays on the Prediction Process; Vol. 1, Lecture Notes Series; Institute of Mathematical Statistics: Hayward, CA, USA, 1981. [Google Scholar]
- Choquet, G. Lectures on Analysis, Volume II (Representation Theory); W. A. Benjamin, Inc.: New York, USA, 1969. [Google Scholar]
- Parthasarathy. On the Category of Ergodic Measures. Illinois J. Math.
**1961**, 5, 648–656. [Google Scholar] - Löhr, W.; Ay, N. On the Generative Nature of Prediction. Adv. Complex. Syst.
**2009**, 12, 169–194. [Google Scholar] - Billingsley, P. Convergence of Probability Measures, 2nd Ed. ed; Wiley: New York, USA, 1968. [Google Scholar]

© 2009 by the author; licensee Molecular Diversity Preservation International, Basel, Switzerland. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license http://creativecommons.org/licenses/by/3.0/.