Multiscale Information Decomposition: Exact Computation for Multivariate Gaussian Processes

Faes, Luca; Marinazzo, Daniele; Stramaglia, Sebastiano

doi:10.3390/e19080408

Open AccessFeature PaperArticle

Multiscale Information Decomposition: Exact Computation for Multivariate Gaussian Processes

by

Luca Faes

^1,2,*,

Daniele Marinazzo

³

and

Sebastiano Stramaglia

^4,5

¹

Bruno Kessler Foundation, 38123 Trento, Italy

²

BIOtech, Department of Industrial Engineering, University of Trento, 38123 Trento, Italy

³

Data Analysis Department, Ghent University, 9000 Ghent, Belgium

⁴

Dipartimento di Fisica, Universitá degli Studi Aldo Moro, 70126 Bari, Italy

⁵

Istituto Nazionale di Fisica Nucleare, 70126 Sezione di Bari, Italy

^*

Author to whom correspondence should be addressed.

Entropy 2017, 19(8), 408; https://0-doi-org.brum.beds.ac.uk/10.3390/e19080408

Submission received: 21 June 2017 / Revised: 3 August 2017 / Accepted: 7 August 2017 / Published: 8 August 2017

(This article belongs to the Special Issue Information Decomposition of Target Effects from Multi-Source Interactions)

Download

Browse Figures

Versions Notes

Abstract

:

Exploiting the theory of state space models, we derive the exact expressions of the information transfer, as well as redundant and synergistic transfer, for coupled Gaussian processes observed at multiple temporal scales. All of the terms, constituting the frameworks known as interaction information decomposition and partial information decomposition, can thus be analytically obtained for different time scales from the parameters of the VAR model that fits the processes. We report the application of the proposed methodology firstly to benchmark Gaussian systems, showing that this class of systems may generate patterns of information decomposition characterized by prevalently redundant or synergistic information transfer persisting across multiple time scales or even by the alternating prevalence of redundant and synergistic source interaction depending on the time scale. Then, we apply our method to an important topic in neuroscience, i.e., the detection of causal interactions in human epilepsy networks, for which we show the relevance of partial information decomposition to the detection of multiscale information transfer spreading from the seizure onset zone.

Keywords:

information dynamics; information transfer; multiscale entropy; multivariate time series analysis; redundancy and synergy; state space models; vector autoregressive models

1. Introduction

The information-theoretic treatment of groups of correlated degrees of freedom can reveal their functional roles as memory structures or information processing units. A large body of recent work has shown how the general concept of “information processing” in a network of multiple interacting dynamical systems described by multivariate stochastic processes can be dissected into basic elements of computation defined within the so-called framework of information dynamics [1]. These elements essentially reflect the new information produced at each moment in time about a target system in the network [2], the information stored in the target system [3,4], the information transferred to it from the other connected systems [5,6] and the modification of the information flowing from multiple source systems to the target [7,8]. The measures of information dynamics have gained more and more importance in both theoretical and applicative studies in several fields of science [9,10,11,12,13,14,15,16,17,18]. While the information-theoretic approaches to the definition and quantification of new information, information storage and information transfer are well understood and widely accepted, the problem of defining, interpreting and using measures of information modification has not been fully addressed in the literature.

Information modification in a network is tightly related to the concepts of redundancy and synergy between source systems sharing information about a target system, which refer to the existence of common information about the target that can be retrieved when the sources are used separately (redundancy) or when they are used jointly (synergy) [19]. Classical multivariate entropy-based approaches refer to the interaction information decomposition (IID), which reflects information modification through the balance between redundant and synergetic interaction among different source systems influencing the target [20,21,22]. The IID framework has the drawback that it implicitly considers redundancy and synergy as mutually exclusive concepts, because it quantifies information modification with a single measure of interaction information [23] (also called co-information [24]) that takes positive or negative values depending on whether the net interaction between the sources is synergistic or redundant. This limitation has been overcome by the elegant mathematical framework introduced by Williams and Beer [25], who proposed the so-called partial information decomposition (PID) as a nonnegative decomposition of the information shared between a target and a set of sources into terms quantifying separately unique, redundant and synergistic contributions. However, the PID framework has the drawback that the terms composing the PID cannot be obtained unequivocally from classic measures of information theory (i.e., entropy and mutual information), but a new definition of either redundant, synergistic or unique information needs to be provided to implement the decomposition. Accordingly, much effort has focused on finding the most proper measures to define the components of the PID, with alternative proposals defining new measures of redundancy [25,26], synergy [27,28] or unique information [29]. The proliferation of different definitions is mainly due to the fact that there is no full consensus on which axioms should be stated to impose desirable properties for the PID measures. An additional problem which so far has seriously limited the practical implementation of these concepts is the difficulty in providing reliable estimates of the information measures appearing in the IID and PID decompositions. The naive estimation of probabilities by histogram-based methods followed by the use of plug-in estimators leads to serious bias problems [30,31]. While the use of binless density estimators [32] and the adoption of schemes for dimensionality reduction [33,34] have been shown to improve the reliability of estimates of information storage and transfer [35], the effectiveness of these approaches for the computation of measures of information modification has not been demonstrated yet. Interestingly, both the problems of defining appropriate PID measures and of reliably estimating these measures from data are much alleviated if one assumes that the observed variables have a joint Gaussian distribution. Indeed, in such a case, recent studies have proven the equivalence between most of the proposed redundancy measures to be used in the PID [36] and have provided closed form solutions to the issue of computing any measure of information dynamics from the parameters of the vector autoregressive (VAR) model that characterizes an observed multivariate Gaussian process [17,37,38].

The second fundamental question that is addressed in this study is relevant to the computation of information dynamics for stochastic processes displaying multiscale dynamical structures. It is indeed well known that many complex physical and biological systems exhibit peculiar oscillatory activities, which are deployed across multiple temporal scales [39,40,41]. The most common way to investigate such activities is to resample at different scales, typically through low pass filtering and downsampling [42,43], the originally measured realization of an observed process, so as to yield a set of rescaled time series, which are then analyzed employing different dynamical measures. This approach is well established and widely used for the multiscale entropy analysis of individual time series measured from scalar stochastic processes. However, its extension to the investigation of the multiscale structure of the information transfer among coupled processes is complicated by theoretical and practical issues [44,45]. Theoretically, the procedure of rescaling alters the causal interactions between lagged components of the processes in a way that is not fully understood and, if not properly performed, may alter the temporal relations between processes and thus induce spurious detection of information transfer. In practical analysis, filtering and downsampling are known to degrade severely the estimation of information dynamics and to impact consistently the detectability, accuracy and data demand [46,47].

In recent works, we have started tackling the above problems within the framework of linear VAR modeling of multivariate Gaussian processes, with the focus on the multiscale computation of information storage and information transfer [48,49]. In this study, we aim at extending these recent theoretical advances to the multiscale analysis of information modification in multivariate Gaussian systems performed through the IID and PID decomposition frameworks. To this end, we exploit the theory of state space (SS) models [50] and build on recent theoretical results [44,45] to show that exact values of interaction transfer, as well as redundant and synergistic transfer can be obtained for coupled Gaussian processes observed at different time scales starting from the parameters of the VAR model that fits the processes and from the scale factor. The theoretical derivations are first used in examples of benchmark Gaussian systems, reporting that these systems may generate patterns of information decomposition characterized by prevalently redundant or synergistic information transfer persisting across multiple time scales or even by alternating the prevalence of redundant and synergistic source interaction depending on the time scale. The high computational reliability of the SS approach is then exploited in the analysis of real data by the application to a topic of great interest in neuroscience, i.e., the detection of information transfer in epilepsy networks.

The proposed framework is implemented in the msID MATLAB^® toolbox, which is uploaded as Supplementary Material to this article and is freely available for download from www.lucafaes.net/msID.html and https://github.com/danielemarinazzo/multiscalePID.

2. Information Transfer Decomposition in Multivariate Processes

Let us consider a discrete-time, stationary vector stochastic process composed of M real-valued zero-mean scalar processes,

Y_{n} = {[Y_{1, n} \dots Y_{M, n}]}^{T}

,

- \infty < n < \infty

. In an information-theoretic framework, the information transfer between scalar sub-processes is quantified by the well-known transfer entropy (TE), which is a popular measure of the “information transfer” directed towards an assigned target process from one or more source processes. Specifically, the TE quantifies the amount of information that the past of the source provides about the present of the target over and above the information already provided by the past of the target itself [5]. Taking

Y_{j}

as target and

Y_{i}

as source, the TE is defined as:

T_{i \to j} = I (Y_{j, n}; Y_{i, n}^{-} | Y_{j, n}^{-})

(1)

where

Y_{i, n}^{-} = [Y_{i, n - 1}, Y_{i, n - 2} \dots]

and

Y_{j, n}^{-} = [Y_{j, n - 1}, Y_{j, n - 2} \dots]

represent the past of the source and target processes and

I (\cdot; \cdot | \cdot)

denotes conditional mutual information (MI). In the presence of two sources

Y_{i}

and

Y_{k}

and a target

Y_{j}

, the information transferred toward

Y_{j}

from the sources

Y_{i}

and

Y_{k}

taken together is quantified by the joint TE:

T_{i k \to j} = I (Y_{j, n}; Y_{i, n}^{-}, Y_{k, n}^{-} | Y_{j, n}^{-}) .

(2)

Under the premise that the information jointly transferred to the target by the two sources is different than the sum of the amounts of information transferred individually, in the following, we present two possible strategies to decompose the joint TE (2) into amounts eliciting the individual TEs, as well as redundant and/or synergistic TE terms.

2.1. Interaction Information Decomposition

The first strategy, which we denote as interaction information decomposition (IID), decomposes the joint TE (2) as:

T_{i k \to j} = T_{i \to j} + T_{k \to j} + I_{i k \to j},

(3)

where

I_{i k \to j}

is denoted as interaction transfer entropy (ITE) because it is equivalent to the interaction information [23] computed between the present of the target and the past of the two sources, conditioned to the past of the target:

I_{i k \to j} = I (Y_{j, n}; Y_{i, n}^{-}; Y_{k, n}^{-} | Y_{j, n}^{-}) .

(4)

The interaction TE quantifies the modification of the information transferred from the source processes

Y_{i}

and

Y_{k}

to the target

Y_{j}

, being positive when

Y_{i}

and

Y_{k}

cooperate in a synergistic way and negative when they act redundantly. This interpretation is evident from the diagrams of Figure 1: in the case of synergy (Figure 1a), the two sources

Y_{i}

and

Y_{k}

taken together contribute to the target

Y_{j}

with more information than the sum of their individual contributions

(T_{i k \to j} > T_{i \to j} + T_{k \to j})

, and the ITE is positive; in the case of redundancy (Figure 1b), the sum of the information amounts transferred individually from each source to the target is higher than the joint information transfer

(T_{i \to j} + T_{k \to j} > T_{i k \to j})

, so that the ITE is negative.

2.2. Partial Information Decomposition

An alternative expansion of the joint TE is that provided by the so-called partial information decomposition (PID) [25]. The PID evidences four distinct quantities measuring the unique information transferred from each individual source to the target, measured by the unique TEs

U_{i \to j}

and

U_{k \to j}

, and the redundant and synergistic information transferred from the two sources to the target, measured by the redundant TE

R_{i k \to j}

and the synergistic TE

S_{i k \to j}

. These four measures are related to each other and to the joint and individual TEs by the following equations (see also Figure 1c):

\begin{matrix} T_{i k \to j} = U_{i \to j} + U_{k \to j} + R_{i k \to j} + S_{i k \to j}, \end{matrix}

(5a)

\begin{matrix} T_{i \to j} = U_{i \to j} + R_{i k \to j}, \end{matrix}

(5b)

\begin{matrix} T_{k \to j} = U_{k \to j} + R_{i k \to j} . \end{matrix}

(5c)

In the PID defined above, the terms

U_{i \to j}

and

U_{k \to j}

quantify the parts of the information transferred to the target process

Y_{j}

, which are unique to the source processes

Y_{i}

and

Y_{k}

, respectively, thus reflecting contributions to the predictability of the target that can be obtained from one of the sources alone, but not from the other source alone. Each of these unique contributions sums up with the redundant transfer

R_{i k \to j}

to yield the information transfer from one source to the target as is known from the classic Shannon information theory. Then, the term

S_{i k \to j}

refers to the synergy between the two sources while they transfer information to the target, intended as the information that is uniquely obtained taking the two sources

Y_{i}

and

Y_{k}

together, but not considering them alone. Compared to the IID defined in (3), the PID (5) has the advantage that it provides distinct non-negative measures of redundancy and synergy, thereby accounting for the possibility that redundancy and synergy may coexist as separate elements of information modification. Interestingly, the IID and PID defined in Equations (3) and (5) are related to each other in a way such that:

I_{i k \to j} = S_{i k \to j} - R_{i k \to j},

(6)

thus showing that the interaction TE is actually a measure of the ‘net’ synergy manifested in the transfer of information from the two sources to the target.

An issue with the PID (5) is that its constituent measures cannot be obtained through classic information theory simply subtracting conditional MI terms as done for the IID; an additional ingredient to the theory is needed to get a fourth defining equation to be added to (5) for providing an unambiguous definition of

U_{i \to j}

,

U_{k \to j}

,

R_{i k \to j}

and

S_{i k \to j}

. While several PID definitions have been proposed arising from different conceptual definitions of redundancy and synergy [26,27,29], here, we make reference to the so-called minimum MI (MMI) PID [36]. According to the MMI PID, redundancy is defined as the minimum of the information provided by each individual source to the target. In terms of information transfer measured by the TE, this leads to the following definition of the redundant TE:

R_{i k \to j} = min {T_{i \to j}, T_{k \to j}} .

(7)

This choice satisfies the desirable property that the redundant TE is independent of the correlation between the source processes. Moreover, it has been shown that, if the observed processes have a joint Gaussian distribution, all previously-proposed PID formulations reduce to the MMI PID [36].

3. Multiscale Information Transfer Decomposition

3.1. Multiscale Representation of Multivariate Gaussian Processes

In the linear signal processing framework, the M-dimensional vector stochastic process

Y_{n} = {[Y_{1, n} \dots Y_{M, n}]}^{T}

is classically described using a vector autoregressive (VAR) model of order p:

Y_{n} = \sum_{k = 1}^{p} A_{k} Y_{n - k} + U_{n}

(8)

where

A_{k}

are

M \times M

matrices of coefficients and

U_{n} = {[U_{1, n} \dots U_{M, n}]}^{T}

is a vector of M zero mean Gaussian processes with covariance matrix

Σ

≡

E [U_{n} U_{n}^{T}]

(

E

is the expectation operator). To study the observed process

Y

at the temporal scale identified by the scale factor

τ

, we apply the following transformation to each constituent process

Y_{m}, m = 1, \dots, M

:

{\bar{Y}}_{m, n} = \sum_{l = 0}^{q} b_{l} Y_{m, n τ - l} .

(9)

This rescaling operation corresponds to transforming the original process

Y

through a two-step procedure that consists of the following filtering and downsampling steps, yielding respectively the processes

\tilde{Y}

and

\bar{Y}

:

\begin{matrix} {\tilde{Y}}_{n} & = \sum_{l = 0}^{q} b_{l} Y_{n - l}, \end{matrix}

(10a)

\begin{matrix} {\bar{Y}}_{n} & = {\tilde{Y}}_{n τ}, n = 1, \dots, N / τ \end{matrix}

(10b)

The change of scale in (9) generalizes the averaging procedure originally proposed in [42], which sets

q = τ - 1

and

b_{l} = 1 / τ

and, thus, realizes the step of filtering through the simple procedure of averaging

τ

subsequent samples. To improve the elimination of the fast temporal scales, in this study, we follow the idea of [43], in which a more appropriate low pass filter than averaging is employed. Here, we identify the

b_{l}

as the coefficients of a linear finite impulse response (FIR) low pass filter of order q; the FIR filter is designed using the classic window method with the Hamming window [51], setting the cutoff frequency at

f_{τ} = 1 / 2 τ

in order to avoid aliasing in the subsequent downsampling step. Substituting (8) in (10a), the filtering step leads to the process representation:

{\tilde{Y}}_{n} = \sum_{k = 1}^{p} A_{k} {\tilde{Y}}_{n - k} + \sum_{l = 0}^{q} B_{l} U_{n - l}

(11)

where

B_{l} = b_{l} I_{M}

(

I_{M}

is the

M \times M

identity matrix). Hence, the change of scale introduces a moving average (MA) component of order q in the original VAR

(p)

process, transforming it into a VARMA

(p, q)

process. As we will show in the next section, the downsampling step (10b) keeps the VARMA representation, altering the model parameters.

3.2. State Space Processes

3.2.1. Formulation of State Space Models

State space models are models that make use of state variables to describe a system by a set of first-order difference equations, rather than by one or more high-order difference equations [52,53]. The general linear state space (SS) model describing an observed vector process

Y

has the form:

\begin{matrix} X_{n + 1} & = A X_{n} + W_{n} \end{matrix}

(12a)

\begin{matrix} Y_{n} & = C X_{n} + V_{n} \end{matrix}

(12b)

where the state Equation (12a) describes the update of the L-dimensional state (unobserved) process through the

L \times L

matrix

A

, and the observation Equation (12b) describes the instantaneous mapping from the state to the observed process through the

M \times L

matrix

C

.

W_{n}

and

V_{n}

are zero-mean white noise processes with covariances

Q

≡

E [W_{n} W_{n}^{T}]

and

R

≡

E [V_{n} V_{n}^{T}]

and cross-covariance

S

≡

E [W_{n} V_{n}^{T}]

. Thus, the parameters of the SS model (12) are (

A, C, Q, R, S

).

Another possible SS representation is that evidencing the innovations

E_{n} = Y_{n} - E [Y_{n} | Y_{n}^{-}]

, i.e., the residuals of the linear regression of

Y_{n}

on its infinite past

Y_{n}^{-} = {[Y_{n - 1}^{T} Y_{n - 2}^{T} \dots]}^{T}

[53]. This new SS representation, usually referred to as the “innovations form” SS model (ISS), is characterized by the state process

Z_{n} = E [X_{n} | Y_{n}^{-}]

and by the

L \times M

Kalman gain matrix

K

:

\begin{matrix} Z_{n + 1} & = A Z_{n} + K E_{n} \end{matrix}

(13a)

\begin{matrix} Y_{n} & = C Z_{n} + E_{n} \end{matrix}

(13b)

The parameters of the ISS model (13) are (

A, C, K, V

), where

V

is the covariance of the innovations,

V

≡

E [E_{n} E_{n}^{T}]

. Note that the ISS (13) is a special case of (12) in which

W_{n} = K E_{n}

and

V_{n} = E_{n}

, so that

Q = K V K^{T}

,

R = V

and

S = K V

.

Given an SS model in the form (12), the corresponding ISS model (13) can be identified by solving a so-called discrete algebraic Riccati equation (DARE) formulated in terms of the state error variance matrix

P

[45]:

\begin{matrix} P & = A P A^{T} + Q - (A P C^{T} + S) {(C P C^{T} + R)}^{- 1} (C P A^{T} + S^{T}) \end{matrix}

(14)

Under some assumptions [45], the DARE (14) has a unique stabilizing solution, from which the Kalman gain and innovation covariance can be computed as:

\begin{matrix} V & = C P C^{T} + R \\ K & = (A P C^{T} + S) V^{- 1}, \end{matrix}

(15)

thus completing the transformation from the SS form to the ISS form.

3.2.2. State Space Models of Filtered and Downsampled Linear Processes

Exploiting the close relation between VARMA models and SS models, first we show how to convert the VARMA model (11) into an ISS model in the form of (13) that describes the filtered process

\tilde{Y_{n}}

. To do this, we exploit Aoki’s method [50] defining the state process

{\tilde{Z}}_{n} = {[Y_{n - 1}^{T} \dots Y_{n - p}^{T} U_{n - 1}^{T} \dots U_{n - q}^{T}]}^{T}

that, together with

\tilde{Y_{n}}

, obeys the state Equation (13) with parameters (

\tilde{A}, \tilde{C}, \tilde{K}, \tilde{V}

), where:

\tilde{A} = [\begin{matrix} A_{1} & \dots & A_{p - 1} & A_{p} & B_{1} & \dots & B_{q - 1} & B_{q} \\ I_{M} & \dots & 0_{M} & 0_{M} & 0_{M} & \dots & 0_{M} & 0_{M} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0_{M} & \dots & I_{M} & 0_{M} & 0_{M} & \dots & 0_{M} & 0_{M} \\ 0_{M} & \dots & 0_{M} & 0_{M} & 0_{M} & \dots & 0_{M} & 0_{M} \\ 0_{M} & \dots & 0_{M} & 0_{M} & I_{M} & \dots & 0_{M} & 0_{M} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0_{M} & \dots & 0_{M} & 0_{M} & 0_{M} & \dots & I_{M} & 0_{M} \end{matrix}]

\tilde{C} = [\begin{matrix} A_{1} & \dots & A_{p} & B_{1} & \dots & B_{q} \end{matrix}]

\tilde{K} = {[\begin{matrix} I_{M} & 0_{M \times M (p - 1)} & B_{0}^{- T} & 0_{M \times M (q - 1)} \end{matrix}]}^{T}

and

\tilde{V} = B_{0}

Σ

B_{0}^{T}

, where

\tilde{V}

is the covariance of the innovations

{\tilde{E}}_{n} = B_{0} U_{n}

.

Now, we turn to show how the downsampled process

{\bar{Y}}_{n}

can be represented through an ISS model directly from the ISS formulation of the filtered process

{\tilde{Y}}_{n}

. To this end, we exploit recent theoretical findings providing the state space form of downsampled signals (Theorem III in [45]). Accordingly, the SS representation of the process downsampled at scale

τ

,

{\bar{Y}}_{n} = {\tilde{Y}}_{n τ}

has parameters (

\bar{A}, \bar{C}, \bar{Q}, \bar{R}, \bar{S}

), where

\bar{A} = {\tilde{A}}^{τ}

,

\bar{C} = \tilde{C}

,

\bar{Q} = Q_{τ}

,

\bar{R} = \tilde{V}

and

\bar{S} = S_{τ}

, with

Q_{τ}

and

S_{τ}

given by:

\begin{matrix} S_{τ} & = {\tilde{A}}^{τ - 1} \tilde{K} \tilde{V} \\ Q_{τ} & = \tilde{A} Q_{τ - 1} {\tilde{A}}^{T} + \tilde{K} \tilde{V} {\tilde{K}}^{T}, τ \geq 2 \\ Q_{1} & = \tilde{K} \tilde{V} {\tilde{K}}^{T}, τ = 1 . \end{matrix}

(16)

Therefore, the downsampled process has an ISS representation with state process

{\bar{Z}}_{n} = {\tilde{Z}}_{n τ}

, innovation process

{\bar{E}}_{n} = {\tilde{E}}_{n τ}

and parameters (

\bar{A}, \bar{C}, \bar{K}, \bar{V}

), where

\bar{K}

and

\bar{V}

are obtained solving the DARE (14) and (15) for the SS model with parameters (

\bar{A}, \bar{C}, \bar{Q}, \bar{R}, \bar{S}

).

To sum up, the relations and parametric representations of the original process

Y

, the filtered process

\tilde{Y}

and the downsampled process

\bar{Y}

are depicted in Figure 2a. The step of low pass filtering (FLT) applied to a VAR(p) process yields a VARMA(

p, q

) process (where q is the filter order, and the cutoff frequency is

f_{τ} = 1 / 2 τ

); this process is equivalent to an ISS process [50]. The subsequent downsampling (DWS) yields a different SS process, which in turn can be converted to the ISS form solving the DARE. Thus, both the filtered process

{\tilde{Y}}_{n}

and the downsampled process

{\bar{Y}}_{n}

can be represented as ISS processes with parameters (

\tilde{A}, \tilde{C}, \tilde{K}, \tilde{V}

) and (

\bar{A}, \bar{C}, \bar{K}, \bar{V}

) which can be derived analytically from the knowledge of the parameters of the original process (

A_{1}, \dots, A_{p}, Σ

) and of the filter (

q, f_{τ}

). In the next section, we show how to compute analytically any measure appearing in the information decomposition of a jointly Gaussian multivariate stochastic process starting from its associated ISS model parameters, thus opening the way to the analytical computation of these measures for multiscale (filtered and downsampled) processes.

3.3. Multiscale IID and PID

After introducing the general theory of information decomposition and deriving the multiscale representation of the parameters of a linear VAR model, in this section, we provide expressions for the terms of the IID and PID decompositions of the information transfer valid for multivariate jointly Gaussian processes. The derivations are based on the knowledge that the linear parametric representation of Gaussian processes given in (8) captures all of the entropy differences that define the various information measures [37] and that these entropy differences are related to the partial variances of the present of the target given its past and the past of one or more sources, intended as variances of the prediction errors resulting from linear regression [15,17]. Specifically, let us denote as

E_{j | j, n} = Y_{j, n} - E [Y_{j, n} | Y_{j, n}^{-}]

,

E_{j | i j, n} = Y_{j, n} - E [Y_{j, n} | Y_{i, n}^{-}, Y_{j, n}^{-}]

the prediction error of a linear regression of

Y_{j, n}

performed respectively on

Y_{j, n}^{-}

and

(Y_{j, n}^{-}, Y_{i, n}^{-})

and as

λ_{j | j} = E [E_{j | j, n}^{2}]

,

λ_{j | i j} = E [E_{j | i j, n}^{2}]

, the corresponding prediction error variances. Then, the TE from

Y_{i}

to

Y_{j}

can be expressed as:

T_{i \to j} = \frac{1}{2} ln \frac{λ_{j | j}}{λ_{j | i j}} .

(17)

In a similar way, the joint TE from

(Y_{i}, Y_{k})

to

Y_{j}

can be defined as:

T_{i k \to j} = \frac{1}{2} ln \frac{λ_{j | j}}{λ_{j | i j k}},

(18)

where

λ_{j | i j k} = E [E_{j | i j k, n}^{2}]

is the variance of the prediction error of a linear regression of

Y_{j, n}

on

(Y_{j, n}^{-}, Y_{i, n}^{-}, Y_{k, n}^{-})

,

E_{j | i j k, n} = Y_{j, n} - E [Y_{j, n} | Y_{i, n}^{-}, Y_{j, n}^{-}, Y_{k, n}^{-}]

. Based on these derivations, one can easily complete the IID decomposition of TE by computing

T_{k \to j}

as in (17) and deriving the interaction TE from (3) and the PID decomposition, as well by deriving the redundant TE from (7), the synergistic TE from (6) and the unique TEs from (5).

Next, we show how to compute any partial variance from the parameters of an ISS model in the form of (13) [44,45]. The partial variance

λ_{j | a}

, where the subscript a denotes any combination of indexes

\in {1, \dots, M}

, can be derived from the ISS representation of the innovations of a submodel obtained removing the variables not indexed by a from the observation equation. Specifically, we need to consider the submodel with state Equation (13b) and observation equation:

Y_{n}^{(a)} = C^{(a)} Z_{n} + E_{n}^{(a)},

(19)

where the superscript

^{(a)}

denotes the selection of the rows with indices a of a vector or a matrix. It is important to note that the submodels (13a) and (19) are not in innovations form, but are rather an SS model with parameters (

A, C^{(a)}, K V K^{T}, V (a, a), K V (:, a)

). This SS model can be converted to an ISS model with innovation covariance

V^{(a)}

solving the DARE (14) and (15), so that the partial variance

λ_{j | a}

is derived as the diagonal element of

V^{(a)}

corresponding to the position of the target

Y_{j}

. Thus, with this procedure, it is possible to compute the partial variances needed for the computation of the information measures starting from a set of ISS model parameters; since any VAR process can be represented at scale

τ

as an ISS process, the procedure allows computing the IID and PID information decompositions for the rescaled multivariate process (see Figure 2).

It is worth remarking that, while the general formulation of IID and PID decompositions introduced in Section 2 holds for arbitrary processes, the multiscale extension detailed in Section 3 is exact only if the processes have a joint Gaussian distribution. In such a case, the linear VAR representation captures exhaustively the joint variability of the processes, and any nonlinear extension has no additional utility (a formal proof of the fact that a stationary Gaussian VAR process must be linear can be found in [37]). If, on the contrary, non-Gaussian processes are under scrutiny, the linear representation provided in Section 3.1 can still be adopted, but may miss important properties in the dynamics and thus provide only a partial description. Moreover, since the close correspondence between conditional entropies and partial variances reported in this subsection does not hold anymore for non-Gaussian processes, all of the obtained measures should be regarded as indexes of (linear) predictability rather than as information measures.

4. Simulation Experiment

To study the multiscale patterns of information transfer in a controlled setting with known dynamical interactions between time series, we consider a simulation scheme similar to some already used for the assessment of theoretical values of information dynamics [15,17]. Specifically, we analyze the following VAR process of order M = 4:

\begin{matrix} Y_{1, n} & = 2 ρ_{1} c o s 2 π f_{1} Y_{1, n - 1} - ρ_{1}^{2} Y_{1, n - 2} + U_{1, n}, \end{matrix}

(20a)

\begin{matrix} Y_{2, n} & = 2 ρ_{2} c o s 2 π f_{2} Y_{2, n - 1} - ρ_{2}^{2} Y_{2, n - 2} + c Y_{1, n - 1} + U_{2, n}, \end{matrix}

(20b)

\begin{matrix} Y_{3, n} & = 2 ρ_{3} c o s 2 π f_{3} Y_{3, n - 1} - ρ_{3}^{2} Y_{3, n - 2} + c Y_{1, n - 1} + U_{3, n}, \end{matrix}

(20c)

\begin{matrix} Y_{4, n} & = b Y_{2, n - 1} + (1 - b) Y_{3, n - 1} + U_{4, n}, \end{matrix}

(20d)

where

U_{n} = {[U_{1, n} \dots U_{4, n}]}^{T}

is a vector of zero mean white Gaussian noises with unit variance and uncorrelated with each other (

Σ

=

I

). The parameter design in Equation (20) is chosen to allow autonomous oscillations in the processes

Y_{i}

,

i = 1, \dots, 3

, obtained placing complex-conjugate poles with modulus

ρ_{i}

and frequency

f_{i}

in the complex plane representation of the transfer function of the vector process, as well as causal interactions between the processes at a fixed time lag of one sample and with strength modulated by the parameters b and c (see Figure 3). In this study, we set the coefficients related to self-dependencies to values generating well-defined oscillations in all processes (

ρ_{1} = ρ_{2} = ρ_{3} = 0.95

) and letting

Y_{1}

fluctuate at slower time scales than

Y_{2}

and

Y_{3}

(

f_{1} = 0.1, f_{2} = f_{3} = 0.025

). We consider four configurations of the parameters, chosen to reproduce paradigmatic conditions of interaction between the processes:

(a): isolation of $Y_{1}$ and $Y_{2}$ and unidirectional coupling $Y_{3} \to Y_{4}$ , obtained setting $b = c = 0$ ;
(b): common driver effects $Y_{2} \leftarrow Y_{1} \to Y_{3}$ and unidirectional coupling $Y_{3} \to Y_{4}$ , obtained setting $b = 0$ and $c = 1$ ;
(c): isolation of $Y_{1}$ and unidirectional couplings $Y_{2} \to Y_{4}$ and $Y_{3} \to Y_{4}$ , obtained setting $b = 0.5$ and $c = 0$ ;
(d): common driver effects $Y_{2} \leftarrow Y_{1} \to Y_{3}$ and unidirectional couplings $Y_{2} \to Y_{4}$ and $Y_{3} \to Y_{4}$ , obtained setting $b = 0.5$ and $c = 1$ .

With this simulation setting, we compute all measures appearing in the IID and PID decompositions of the information transfer, considering

Y_{4}

as the target process and

Y_{2}

and

Y_{3}

as the source processes. The theoretical values of these measures, computed as a function of the time scale using the IID and the PID, are reported in Figure 4. In the simple case of unidirectional coupling

Y_{3} \to Y_{4}

(

b = c = 0

, Figure 4a), the joint information transferred from

(Y_{2}, Y_{3})

to

Y_{4}

is exclusively due to the source

Y_{3}

without contributions from

Y_{2}

and without interaction effects between the sources

(T_{23 \to 4} = T_{3 \to 4} = U_{3 \to 4}, T_{2 \to 4} = U_{2 \to 4} = 0, I_{23 \to 4} = S_{23 \to 4} = R_{23 \to 4} = 0)

.

When the causal interactions towards

Y_{4}

are still due exclusively to

Y_{3}

, but the two sources

Y_{2}

and

Y_{3}

share information arriving from

Y_{1}

(

b = 0, c = 1

; Figure 4b), the IID evidences that the joint information transfer coincides again with the transfer from

Y_{3}

(

T_{23 \to 4} = T_{3 \to 4}

), but a non-trivial amount of information transferred from

Y_{2}

to

Y_{4}

emerges, which is fully redundant (

T_{2 \to 4} = - I_{23 \to 4}

). The PID highlights that the information from

Y_{3}

to

Y_{4}

is not all unique, but is in part transferred redundantly with

Y_{2}

, while the unique transfer from

Y_{2}

and the synergistic transfer are negligible.

In the case of two isolated sources equally contributing to the target (

b = 0.5, c = 0

, Figure 4c), the IID evidences the presence of net synergy and of identical amounts of information transferred to

Y_{4}

from

Y_{2}

or

Y_{3}

(

I_{23 \to 4} > 0, T_{2 \to 4} = T_{3 \to 4}

). The PID documents that there are no unique contributions, so that the two amounts of information transfer from each source to the target coincide with the redundant transfer, and the remaining part of the joint transfer is synergistic (

U_{2 \to 4} = U_{3 \to 4} = 0

,

T_{2 \to 4} = T_{3 \to 4} = R_{23 \to 4}, S_{23 \to 4} = T_{23 \to 4} - R_{23 \to 4}

).

Finally, when the two sources share common information and contribute equally to the target (

b = 0.5, c = 1

; Figure 4d), we find that they send the same amount of information as before, but in this case, no unique information is sent by any of the sources (

T_{2 \to 4} = T_{3 \to 4}, U_{2 \to 4} = U_{3 \to 4} = 0

). Moreover, the nature of the interaction between the sources is not trivial and is scale dependent: at low time scales, where the dynamics are likely dominated by the fast oscillations of

Y_{1}

, the IID reveals net redundancy, and the PID shows that the redundant transfer prevails over the synergistic (

I_{23 \to 4} < 0, R_{23 \to 4} > S_{23 \to 4}

); at higher time scales, where fast dynamics are filtered out and the slow dynamics of

Y_{2}

and

Y_{3}

prevail, the IID reveals net synergy, and the PID shows that the synergistic transfer prevails over the redundant (

I_{23 \to 4} > 0, S_{23 \to 4} > R_{23 \to 4}

).

5. Application

As a real data application, we analyze intracranial EEG recordings from a patient with drug-resistant epilepsy measured by an implanted array of

8 \times 8

cortical electrodes and two left hippocampal depth electrodes with six contacts each. The data are available in [54], and further details on the dataset are given in [55]. Data were sampled at 400 Hz and correspond to 10-s segments recorded in the pre-ictal period, just before the seizure onset, and 10 s during the ictal stage of the seizure, for a total of eight seizures. Defining and locating the seizure onset zone, i.e., the specific location in the brain where the synchronous activity of neighboring groups of cells becomes so strong so as to be able to spread its own activity to other distant regions, is an important issue in the study of epilepsy in humans. Here, we focus on the information flow from the sub-cortical regions, probed by depth electrodes, to the brain cortex. In [21], it has been suggested that Contacts 11 and 12, in the second depth electrode, are mostly influencing the cortical activity; accordingly, in this work, we consider Channels 11 and 12 as a pair of source variables for all of the cortical electrodes and decompose the information flowing from them using the multiscale IID and PID here proposed, both in the pre-ictal stage and in the ictal stage. An FIR filter with

q = 12

coefficients is used, and the order p of the VAR model is fixed according to the Bayesian information criterion. In the analyzed dataset, the model order assessed in the pre-ictal phase was

p = 14.61 \pm 1.07

(mean ± std. dev.across 64 electrodes and eight seizures) and during the ictal phase decreased significantly to

p = 11.09 \pm 3.95

.

In Figure 5, we depict the terms of the IID applied from the two sources (Channels

{11, 12}

) to any of the electrodes as a function of the scale

τ

, averaged over the eight seizures. We observe a relevant enhancement of the joint TE during the seizure, w.r.t. the pre-ictal period. This enhancement is determined by a marked increase of both the individual TEs from Channels 11 and 12 to all of the cortical electrodes; the patterns of the two TEs are similar to each other in both stages. The pattern of interaction information transfer displays prevalent redundant transfer for low values of

τ

and prevalent synergistic transfer for high

τ

, but the values of the interaction TE have relatively low magnitude and are only slightly different in pre-ictal and ictal conditions. It is worth stressing that at scale

τ

, the algorithm analyzes oscillations, in the time series, slower than

\frac{1}{2 τ f_{s}}

s, where

f_{s} = 400

Hz.

In Figure 6, we depict, on the other hand, the terms of the PID computed for the same data. This decomposition shows that the increased joint TE across the seizure transition seen in Figure 5a is in large part the result of an increase of both the synergistic and the redundant TE, which are markedly higher during the ictal stage compared with the pre-ictal. This explains why the interaction TE of Figure 5d, which is the difference between two quantities that both increase, is nearly constant moving from the pre-ictal to the ictal stage. The quantity that, instead, clearly differentiates between Channels 11 and 12 is the unique information transfer: indeed, only the unique TE from Channel 12 increases in the ictal stage, while the unique TE from Channel 13 remains at low levels.

In order to investigate the variability across trials of the estimates of the various information measures, in Figure 7, we depict the terms of both IID and PID expressed for each ictal episode as average values over all 64 cortical electrodes. The analysis shows that the higher average values observed in Figure 5 and Figure 6 at Scales 1–4 during the ictal state for the joint TE, the two individual TEs, the redundant and synergistic TEs and the unique TE from depth Channel 12 are the result of an increase of the measures for almost all of the observed seizure episodes.

These findings are largely in agreement with the increasing awareness that epilepsy is a network phenomenon that involves aberrant functional connections across vast parts of the brain on virtually all spatial scales [56,57]. Indeed, our results document that the occurrence of seizures is associated with a relevant increase of the information flowing from the subcortical regions (associated with the depth electrode) to the cortex and that the character of this information flow is mostly redundant both in the pre-ictal and in the ictal state. Here, the need for a multiscale approach is testified by the fact that several quantities in the ictal state (e.g., the joint TE, the synergistic ITand the unique ITfrom Channel 12) attain their maximum at scale

τ > 1

.

Moreover, the approaches that we propose for information decomposition appear useful to improve the localization of epileptogenic areas in patients with drug-resistant epilepsy. Indeed, our analysis suggests that Contact 12 is the closest to the seizure onset zone, and it is driving the cortical oscillations during the ictal stage, as it sends unique information to the cortex. On the other hand, to disentangle this effect, it has been necessary to include also Channel 11 in the analysis and to make the PID of the total information from the pair of depth channels to the cortex; indeed, the redundancy between Channels 11 and 12 confounds the informational pattern unless the PID is performed.

6. Conclusions

Understanding how multiple inputs may combine to create the output of a given target is a fundamental challenge in many fields, in particular in neuroscience. Shannon’s information theory is the most suitable frame to cope with this problem and thus to assess the informational character of multiplets of variables describing complex systems; IID indeed measures the balance between redundant and synergetic interaction within the classical multivariate entropy-based approach. Recently Shannon’s information theory has been extended, in the PID, so as to provide specific measures for the information that several variables convey individually (unique information), redundantly (shared information) or only jointly (synergistic information) about the output.

The contribution of the present work is the proposal of an analytical frame where both IID and PID can be exactly evaluated in a multiscale fashion, for multivariate Gaussian processes, on the basis of simple vector autoregressive identification. In doing this, our work opens the way for both the theoretical analysis and the practical implementation of information modification in processes that exhibit multiscale dynamical structures. The effectiveness of the proposed approach has been demonstrated both on simulated examples and on real publicly-available intracranial EEG data. Our results provide a firm ground to the multiscale evaluation of PID, to be applied in all applications where causal influences coexist at multiple temporal scales.

Future developments of this work include the refinement of the SS model structure to accommodate the description of long-range linear correlations [58] or its expansion to the description of nonstationary processes [59] and the formalization of exact cross-scale computation of information decomposition within and between multivariate processes [60]. A major challenge in the field remains the generalization of this type of analysis to non-Gaussian processes, for which exact analytical solutions or computationally-reliable estimation approaches are still lacking. This constitutes a main direction for further research, because real-world processes display very often non-Gaussian distributions, which would make an extension to nonlinear models or model-free approaches beneficial. The questions that are still open in this respect include the evaluation of proper theoretical definitions of synergy or redundancy for nonlinear processes [25,26,27,28,29], the development of reliable entropy estimators for multivariate variables with different dimensions [6,35,61] and the assessment of the extent to which non-linear model-free methods really outperform the linear model-based approach adopted here and in previous investigations [62].

Supplementary Materials

Supplementary Material to this article and is freely available for download from www.lucafaes.net/msID.html and https://github.com/danielemarinazzo/multiscalePID.

Acknowledgments

The study was supported in part by the IRCS-Healthcare Research Implementation Program, Autonomous Province of Trento.

Author Contributions

Luca Faes, Daniele Marinazzo and Sebastiano Stramaglia conceived of the study, participated in he critical discussion on all aspects and contributed to writing the paper. Luca Faes designed the theoretical framework, realized the MATLAB codes and performed the simulation study. Sebastiano Stramaglia analyzed the experimental data. All authors have approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Lizier, J.T.; Prokopenko, M.; Zomaya, A.Y. A framework for the local information dynamics of distributed computation in complex systems. In Guided Self-Organization: Inception; Springer: Berlin/Heidelberg, Germany, 2014; pp. 115–158. [Google Scholar]
Pincus, S. Approximate entropy (ApEn) as a complexity measure. Chaos Interdiscip. J. Nonlinear Sci. 1995, 5, 110–117. [Google Scholar] [CrossRef] [PubMed]
Lizier, J.T.; Prokopenko, M.; Zomaya, A.Y. Local measures of information storage in complex distributed computation. Inf. Sci. 2012, 208, 39–54. [Google Scholar] [CrossRef]
Wibral, M.; Lizier, J.T.; Vögler, S.; Priesemann, V.; Galuske, R. Local active information storage as a tool to understand distributed neural information processing. Front. Neuroinform. 2014, 8. [Google Scholar] [CrossRef] [PubMed]
Schreiber, T. Measuring information transfer. Phys. Rev. Lett. 2000, 85, 461. [Google Scholar] [CrossRef] [PubMed]
Wibral, M.; Vicente, R.; Lizier, J.T. Directed Information Measures in Neuroscience; Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
Lizier, J.T.; Prokopenko, M.; Zomaya, A.Y. Information modification and particle collisions in distributed computation. Chaos Interdiscip. J. Nonlinear Sci. 2010, 20, 037109. [Google Scholar] [CrossRef] [PubMed]
Wibral, M.; Lizier, J.T.; Priesemann, V. Bits from brains for biologically inspired computing. Front. Robot. Artif. Intell. 2015, 2. [Google Scholar] [CrossRef]
Lizier, J.T.; Pritam, S.; Prokopenko, M. Information dynamics in small-world Boolean networks. Artif. Life 2011, 17, 293–314. [Google Scholar] [CrossRef] [PubMed]
Wibral, M.; Rahm, B.; Rieder, M.; Lindner, M.; Vicente, R.; Kaiser, J. Transfer entropy in magnetoencephalographic data: Quantifying information flow in cortical and cerebellar networks. Prog. Biophys. Mol. Biol. 2011, 105, 80–97. [Google Scholar] [CrossRef] [PubMed]
Hlinka, J.; Hartman, D.; Vejmelka, M.; Runge, J.; Marwan, N.; Kurths, J.; Paluš, M. Reliability of inference of directed climate networks using conditional mutual information. Entropy 2013, 15, 2023–2045. [Google Scholar] [CrossRef]
Barnett, L.; Lizier, J.T.; Harré, M.; Seth, A.K.; Bossomaier, T. Information flow in a kinetic Ising model peaks in the disordered phase. Phys. Rev. Lett. 2013, 111, 177203. [Google Scholar] [CrossRef] [PubMed]
Marinazzo, D.; Pellicoro, M.; Wu, G.; Angelini, L.; Cortés, J.M.; Stramaglia, S. Information transfer and criticality in the ising model on the human connectome. PLoS ONE 2014, 9, e93616. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Faes, L.; Nollo, G.; Jurysta, F.; Marinazzo, D. Information dynamics of brain–heart physiological networks during sleep. New J. Phys. 2014, 16, 105005. [Google Scholar] [CrossRef]
Faes, L.; Porta, A.; Nollo, G. Information decomposition in bivariate systems: Theory and application to cardiorespiratory dynamics. Entropy 2015, 17, 277–303. [Google Scholar] [CrossRef]
Porta, A.; Faes, L.; Nollo, G.; Bari, V.; Marchi, A.; de Maria, B.; Takahashi, A.C.; Catai, A.M. Conditional self-entropy and conditional joint transfer entropy in heart period variability during graded postural challenge. PLoS ONE 2015, 10, e0132851. [Google Scholar] [CrossRef] [PubMed]
Faes, L.; Porta, A.; Nollo, G.; Javorka, M. Information Decomposition in Multivariate Systems: Definitions, Implementation and Application to Cardiovascular Networks. Entropy 2017, 19, 5. [Google Scholar] [CrossRef]
Wollstadt, P.; Sellers, K.K.; Rudelt, L.; Priesemann, V.; Hutt, A.; Fröhlich, F.; Wibral, M. Breakdown of local information processing may underlie isoflurane anesthesia effects. PLoS Comput. Biol. 2017, 13, e1005511. [Google Scholar] [CrossRef] [PubMed]
Schneidman, E.; Bialek, W.; Berry, M.J. Synergy, redundancy, and independence in population codes. J. Neurosci. 2003, 23, 11539–11553. [Google Scholar] [PubMed]
Stramaglia, S.; Wu, G.R.; Pellicoro, M.; Marinazzo, D. Expanding the transfer entropy to identify information circuits in complex systems. Phys. Rev. E 2012, 86, 066211. [Google Scholar] [CrossRef] [PubMed]
Stramaglia, S.; Cortes, J.M.; Marinazzo, D. Synergy and redundancy in the Granger causal analysis of dynamical networks. New J. Phys. 2014, 16, 105003. [Google Scholar] [CrossRef]
Stramaglia, S.; Angelini, L.; Wu, G.; Cortes, J.; Faes, L.; Marinazzo, D. Synergetic and Redundant Information Flow Detected by Unnormalized Granger Causality: Application to Resting State fMRI. IEEE Trans. Biomed. Eng. 2016, 63, 2518–2524. [Google Scholar] [CrossRef] [PubMed]
McGill, W. Multivariate information transmission. Trans. IRE Prof. Group Inf. Theory 1954, 4, 93–111. [Google Scholar] [CrossRef]
Bell, A.J. The co-information lattice. In Proceedings of the Fourth International Symposium on Independent Component Analysis and Blind Signal Separation (ICA), Nara, Japan, 1–4 April 2003. [Google Scholar]
Williams, P.L.; Beer, R.D. Nonnegative decomposition of multivariate information. arXiv, 2010; arXiv:1004.2515. [Google Scholar]
Harder, M.; Salge, C.; Polani, D. Bivariate measure of redundant information. Phys. Rev. E 2013, 87, 012130. [Google Scholar] [CrossRef] [PubMed]
Griffith, V.; Chong, E.K.; James, R.G.; Ellison, C.J.; Crutchfield, J.P. Intersection information based on common randomness. Entropy 2014, 16, 1985–2000. [Google Scholar] [CrossRef]
Quax, R.; Har-Shemesh, O.; Sloot, P.M.A. Quantifying Synergistic Information Using Intermediate Stochastic Variables. Entropy 2017, 19, 85. [Google Scholar] [CrossRef]
Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J.; Ay, N. Quantifying unique information. Entropy 2014, 16, 2161–2183. [Google Scholar] [CrossRef]
Panzeri, S.; Senatore, R.; Montemurro, M.A.; Petersen, R.S. Correcting for the sampling bias problem in spike train information measures. J. Neurophysiol. 2007, 98, 1064–1072. [Google Scholar] [CrossRef] [PubMed]
Faes, L.; Porta, A. Conditional entropy-based evaluation of information dynamics in physiological systems. In Directed Information Measures in Neuroscience; Springer: Berlin/Heidelberg, Germany, 2014; pp. 61–86. [Google Scholar]
Kozachenko, L.; Leonenko, N.N. Sample estimate of the entropy of a random vector. Probl. Peredachi Inf. 1987, 23, 9–16. [Google Scholar]
Vlachos, I.; Kugiumtzis, D. Nonuniform state-space reconstruction and coupling detection. Phys. Rev. E 2010, 82, 016207. [Google Scholar] [CrossRef] [PubMed]
Marinazzo, D.; Pellicoro, M.; Stramaglia, S. Causal information approach to partial conditioning in multivariate data sets. Comput. Math. Methods Med. 2012, 2012, 303601. [Google Scholar] [CrossRef] [PubMed]
Faes, L.; Kugiumtzis, D.; Nollo, G.; Jurysta, F.; Marinazzo, D. Estimating the decomposition of predictive information in multivariate systems. Phys. Rev. E 2015, 91, 032904. [Google Scholar] [CrossRef] [PubMed]
Barrett, A.B. Exploration of synergistic and redundant information sharing in static and dynamical Gaussian systems. Phys. Rev. E 2015, 91, 052802. [Google Scholar] [CrossRef] [PubMed]
Barrett, A.B.; Barnett, L.; Seth, A.K. Multivariate Granger causality and generalized variance. Phys. Rev. E 2010, 81, 041907. [Google Scholar] [CrossRef] [PubMed]
Porta, A.; Bari, V.; de Maria, B.; Takahashi, A.C.; Guzzetti, S.; Colombo, R.; Catai, A.M.; Raimondi, F.; Faes, L. Quantifying Net Synergy/Redundancy of Spontaneous Variability Regulation via Predictability and Transfer Entropy Decomposition Frameworks. IEEE Trans. Biomed. Eng. 2017. [Google Scholar] [CrossRef] [PubMed]
Ivanov, P.; Nunes Amaral, L.; Goldberger, A.; Havlin, S.; Rosenblum, M.; Struzik, Z.; Stanley, H. Multifractality in human heartbeat dynamics. Nature 1999, 399, 461–465. [Google Scholar] [CrossRef] [PubMed]
Chou, C.M. Wavelet-based multi-scale entropy analysis of complex rainfall time series. Entropy 2011, 13, 241–253. [Google Scholar] [CrossRef]
Wang, J.; Shang, P.; Zhao, X.; Xia, J. Multiscale entropy analysis of traffic time series. Int. J. Mod. Phys. C 2013, 24, 1350006. [Google Scholar] [CrossRef]
Costa, M.; Goldberger, A.L.; Peng, C.K. Multiscale entropy analysis of complex physiologic time series. Phys. Rev. Lett. 2002, 89, 068102. [Google Scholar] [CrossRef] [PubMed]
Valencia, J.; Porta, A.; Vallverdú, M.; Clariá, F.; Baranowski, R.; Orłowska-Baranowska, E.; Caminal, P. Refined multiscale entropy: Application to 24-h holter recordings of heart period variability in healthy and aortic stenosis subjects. IEEE Trans. Biomed. Eng. 2009, 56, 2202–2213. [Google Scholar] [CrossRef] [PubMed]
Barnett, L.; Seth, A.K. Granger causality for state-space models. Phys. Rev. E 2015, 91, 040101. [Google Scholar] [CrossRef] [PubMed]
Solo, V. State-space analysis of Granger-Geweke causality measures with application to fMRI. Neural Comput. 2016, 28, 914–949. [Google Scholar] [CrossRef] [PubMed]
Florin, E.; Gross, J.; Pfeifer, J.; Fink, G.R.; Timmermann, L. The effect of filtering on Granger causality based multivariate causality measures. Neuroimage 2010, 50, 577–588. [Google Scholar] [CrossRef] [PubMed]
Barnett, L.; Seth, A.K. Detectability of Granger causality for subsampled continuous-time neurophysiological processes. J. Neurosci. Methods 2017, 275, 93–121. [Google Scholar] [CrossRef] [PubMed]
Faes, L.; Montalto, A.; Stramaglia, S.; Nollo, G.; Marinazzo, D. Multiscale Analysis of Information Dynamics for Linear Multivariate Processes. arXiv, 2016; arXiv:1602.06155. [Google Scholar]
Faes, L.; Nollo, G.; Stramaglia, S.; Marinazzo, D. Multiscale Granger causality. arXiv, 2017; arXiv:1703.08487. [Google Scholar]
Aoki, M.; Havenner, A. State space modeling of multiple time series. Econom. Rev. 1991, 10, 1–59. [Google Scholar] [CrossRef]
Oppenheim, A.V.; Schafer, R.W. Digital Signal Processing; Prentice-Hall: Englewood Cliffs, NJ, USA, 1975. [Google Scholar]
Hannan, E.J.; Deistler, M. The Statistical Theory of Linear Systems; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 2012; Volume 70. [Google Scholar]
Aoki, M. State Space Modeling of Time Series; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
Earth System Research Laboratory. Available online: http://math.bu.edu/people/kolaczyk/datasets.html (accessed on 5 May 2017).
Kramer, M.A.; Kolaczyk, E.D.; Kirsch, H.E. Emergent network topology at seizure onset in humans. Epilepsy Res. 2008, 79, 173–186. [Google Scholar] [CrossRef] [PubMed]
Richardson, M.P. Large scale brain models of epilepsy: Dynamics meets connectomics. J. Neurol Neurosurg. Psychiatry 2012, 83, 1238–1248. [Google Scholar] [CrossRef] [PubMed]
Dickten, H.; Porz, S.; Elger, C.E.; Lehnertz, K. Weighted and directed interactions in evolving large-scale epileptic brain networks. Sci. Rep. 2016, 6, 34824. [Google Scholar] [CrossRef] [PubMed]
Sela, R.J.; Hurvich, C.M. Computationally efficient methods for two multivariate fractionally integrated models. J. Time Ser. Anal. 2009, 30, 631–651. [Google Scholar] [CrossRef]
Kitagawa, G. Non-gaussian state—Space modeling of nonstationary time series. J. Am. Stat. Assoc. 1987, 82, 1032–1041. [Google Scholar] [CrossRef]
Paluš, M. Cross-scale interactions and information transfer. Entropy 2014, 16, 5263–5289. [Google Scholar] [CrossRef]
Papana, A.; Kugiumtzis, D.; Larsson, P. Reducing the bias of causality measures. Phys. Rev. E 2011, 83, 036207. [Google Scholar] [CrossRef] [PubMed]
Porta, A.; de Maria, B.; Bari, V.; Marchi, A.; Faes, L. Are Nonlinear Model-Free Conditional Entropy Approaches for the Assessment of Cardiac Control Complexity Superior to the Linear Model-Based One? IEEE Trans. Biomed. Eng. 2017, 64, 1287–1296. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Venn diagram representations of the interaction information decomposition (IID) (a,b) and the partial information decomposition (PID) (c). The IID is depicted in a way such that all areas in the diagrams are positive: the interaction information transfer

I_{i k \to j}

is positive in (a), denoting net synergy, and is negative in (b), denoting net redundancy.

Figure 1. Venn diagram representations of the interaction information decomposition (IID) (a,b) and the partial information decomposition (PID) (c). The IID is depicted in a way such that all areas in the diagrams are positive: the interaction information transfer

I_{i k \to j}

is positive in (a), denoting net synergy, and is negative in (b), denoting net redundancy.

Figure 2. Schematic representation of a linear VAR process and of its multiscale representation obtained through filtering (FLT) and downsampling (DWS) steps. The downsampled process has an innovations form state space model (ISS) representation from which submodels can be formed to compute the partial variances needed for the computation of information measures appearing in the IID and PID decompositions. This makes it possible to perform multiscale information decomposition analytically from the original VAR parameters and from the scale factor.

Figure 3. Graphical representation of the four-variate VAR process of Equation (20) that we use to explore the multiscale decomposition of the information transferred to

Y_{4}

, selected as the target process, from

Y_{2}

and

Y_{3}

, selected as the source processes, in the presence of

Y_{1}

, acting as the exogenous process. To favor such exploration, we set oscillations at different time scales for

Y_{1}

(

f_{1} = 0.1

) and for

Y_{2}

and

Y_{3}

(

f_{2} = f_{3} = 0.025

), induce common driver effects from the exogenous process to the sources modulated by the parameter c and allow for varying strengths of the causal interactions from the sources to the target as modulated by the parameter b. The four configurations explored in this study are depicted in (a–d).

Figure 3. Graphical representation of the four-variate VAR process of Equation (20) that we use to explore the multiscale decomposition of the information transferred to

Y_{4}

, selected as the target process, from

Y_{2}

and

Y_{3}

, selected as the source processes, in the presence of

Y_{1}

, acting as the exogenous process. To favor such exploration, we set oscillations at different time scales for

Y_{1}

(

f_{1} = 0.1

) and for

Y_{2}

and

Y_{3}

(

f_{2} = f_{3} = 0.025

), induce common driver effects from the exogenous process to the sources modulated by the parameter c and allow for varying strengths of the causal interactions from the sources to the target as modulated by the parameter b. The four configurations explored in this study are depicted in (a–d).

Figure 4. Multiscale information decomposition for the simulated VAR process of Equation (20). Plots depict the exact values of the entropy measures forming the interaction information decomposition (IID, upper row) and the partial information decomposition (PID, lower row) of the information transferred from the source processes

Y_{2}

and

Y_{3}

to the target process

Y_{4}

generated according to the scheme of Figure 3 with four different configurations of the parameters. We find that linear processes may generate trivial information patterns with the absence of synergistic or redundant behaviors (a); patterns with the prevalence of redundant information transfer (b) or synergistic information transfer (c) that persist across multiple time scales; or even complex patterns with the alternating prevalence of redundant transfer and synergistic transfer at different time scales (d).

Figure 4. Multiscale information decomposition for the simulated VAR process of Equation (20). Plots depict the exact values of the entropy measures forming the interaction information decomposition (IID, upper row) and the partial information decomposition (PID, lower row) of the information transferred from the source processes

Y_{2}

and

Y_{3}

to the target process

Y_{4}

generated according to the scheme of Figure 3 with four different configurations of the parameters. We find that linear processes may generate trivial information patterns with the absence of synergistic or redundant behaviors (a); patterns with the prevalence of redundant information transfer (b) or synergistic information transfer (c) that persist across multiple time scales; or even complex patterns with the alternating prevalence of redundant transfer and synergistic transfer at different time scales (d).

Figure 5. Interaction information decomposition (IID) of the intracranial EEG information flow from subcortical to cortical regions in an epileptic patient. The joint transfer entropy from depth Channels 11 and 12 to cortical electrodes (a); the transfer entropy from depth Channel 11 to cortical electrodes (b); the transfer entropy from depth Channel 12 to cortical electrodes (c) and the interaction transfer entropy from depth Channels 11 and 12 to cortical electrodes (d) are depicted as a function of the scale

τ

, after averaging over the eight pre-ictal segments (left column) and over the eight ictal segments (right column). Compared with pre-ictal periods, during the seizure, the IID evidences marked increases of the joint and individual information transfer from depth to cortical electrodes and low and almost unvaried levels of interaction transfer.

Figure 5. Interaction information decomposition (IID) of the intracranial EEG information flow from subcortical to cortical regions in an epileptic patient. The joint transfer entropy from depth Channels 11 and 12 to cortical electrodes (a); the transfer entropy from depth Channel 11 to cortical electrodes (b); the transfer entropy from depth Channel 12 to cortical electrodes (c) and the interaction transfer entropy from depth Channels 11 and 12 to cortical electrodes (d) are depicted as a function of the scale

τ

, after averaging over the eight pre-ictal segments (left column) and over the eight ictal segments (right column). Compared with pre-ictal periods, during the seizure, the IID evidences marked increases of the joint and individual information transfer from depth to cortical electrodes and low and almost unvaried levels of interaction transfer.

Figure 6. Partial information decomposition (PID) of the intracranial EEG information flow from subcortical to cortical regions in an epileptic patient. The synergistic transfer entropy from depth Channels 11 and 12 to cortical electrodes (a); the redundant transfer entropy from depth Channels 11 and 12 to cortical electrodes (b); the unique transfer entropy from depth Channel 11 to cortical electrodes (c) and the unique transfer entropy from depth Channel 12 to cortical electrodes (d) are depicted as a function of the scale

τ

, after averaging over the eight pre-ictal segments (left column) and over the eight ictal segments (right column). Compared with pre-ictal periods, during the seizure, the PID evidences marked increases of the information transferred synergistically and redundantly from depth to cortical electrodes and of the information transferred uniquely from one of the two depth electrodes, but not from the other.

Figure 6. Partial information decomposition (PID) of the intracranial EEG information flow from subcortical to cortical regions in an epileptic patient. The synergistic transfer entropy from depth Channels 11 and 12 to cortical electrodes (a); the redundant transfer entropy from depth Channels 11 and 12 to cortical electrodes (b); the unique transfer entropy from depth Channel 11 to cortical electrodes (c) and the unique transfer entropy from depth Channel 12 to cortical electrodes (d) are depicted as a function of the scale

τ

, after averaging over the eight pre-ictal segments (left column) and over the eight ictal segments (right column). Compared with pre-ictal periods, during the seizure, the PID evidences marked increases of the information transferred synergistically and redundantly from depth to cortical electrodes and of the information transferred uniquely from one of the two depth electrodes, but not from the other.

Figure 7. Multiscale representation of the measures of interaction information decomposition (IID, top) and partial information decomposition (PID, bottom) computed as a function of the time scale for each of the eight seizures during the pre-ictal period (black) and the ictal period (red). Values of joint transfer entropy (TE), individual TE, interaction TE, redundant TE, synergistic TE and unique TE are obtained taking the depth Channels 11 and 12 as sources and averaging over all 64 target cortical electrodes. Increases during seizure of the joint TE, individual TEs from both depth electrodes, redundant and synergistic TE and unique TE from the depth electrode 12 are evident at low time scales for almost all considered episodes.

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Faes, L.; Marinazzo, D.; Stramaglia, S. Multiscale Information Decomposition: Exact Computation for Multivariate Gaussian Processes. Entropy 2017, 19, 408. https://0-doi-org.brum.beds.ac.uk/10.3390/e19080408

AMA Style

Faes L, Marinazzo D, Stramaglia S. Multiscale Information Decomposition: Exact Computation for Multivariate Gaussian Processes. Entropy. 2017; 19(8):408. https://0-doi-org.brum.beds.ac.uk/10.3390/e19080408

Chicago/Turabian Style

Faes, Luca, Daniele Marinazzo, and Sebastiano Stramaglia. 2017. "Multiscale Information Decomposition: Exact Computation for Multivariate Gaussian Processes" Entropy 19, no. 8: 408. https://0-doi-org.brum.beds.ac.uk/10.3390/e19080408

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multiscale Information Decomposition: Exact Computation for Multivariate Gaussian Processes

Abstract

1. Introduction

2. Information Transfer Decomposition in Multivariate Processes

2.1. Interaction Information Decomposition

2.2. Partial Information Decomposition

3. Multiscale Information Transfer Decomposition

3.1. Multiscale Representation of Multivariate Gaussian Processes

3.2. State Space Processes

3.2.1. Formulation of State Space Models

3.2.2. State Space Models of Filtered and Downsampled Linear Processes

3.3. Multiscale IID and PID

4. Simulation Experiment

5. Application

6. Conclusions

Supplementary Materials

Acknowledgments

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI