Geometric Insights into the Multivariate Gaussian Distribution and Its Entropy and Mutual Information

Jwo, Dah-Jing; Cho, Ta-Shun; Biswal, Amita

doi:10.3390/e25081177

Open AccessCommunication

Geometric Insights into the Multivariate Gaussian Distribution and Its Entropy and Mutual Information

by

Dah-Jing Jwo

^1,*,

Ta-Shun Cho

² and

Amita Biswal

¹

Department of Communications, Navigation and Control Engineering, National Taiwan Ocean University, 2 Peining Rd., Keelung 202301, Taiwan

²

Department of Business Administration, Asia University, 500 Liufeng Road, Wufeng, Taichung 41354, Taiwan

^*

Author to whom correspondence should be addressed.

Entropy 2023, 25(8), 1177; https://0-doi-org.brum.beds.ac.uk/10.3390/e25081177

Submission received: 6 July 2023 / Revised: 31 July 2023 / Accepted: 4 August 2023 / Published: 7 August 2023

(This article belongs to the Special Issue Entropy and Organization in Natural and Social Systems)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, we provide geometric insights with visualization into the multivariate Gaussian distribution and its entropy and mutual information. In order to develop the multivariate Gaussian distribution with entropy and mutual information, several significant methodologies are presented through the discussion, supported by illustrations, both technically and statistically. The paper examines broad measurements of structure for the Gaussian distributions, which show that they can be described in terms of the information theory between the given covariance matrix and correlated random variables (in terms of relative entropy). The content obtained allows readers to better perceive concepts, comprehend techniques, and properly execute software programs for future study on the topic’s science and implementations. It also helps readers grasp the themes’ fundamental concepts to study the application of multivariate sets of data in Gaussian distribution. The simulation results also convey the behavior of different elliptical interpretations based on the multivariate Gaussian distribution with entropy for real-world applications in our daily lives, including information coding, nonlinear signal detection, etc. Involving the relative entropy and mutual information as well as the potential correlated covariance analysis, a wide range of information is addressed, including basic application concerns as well as clinical diagnostics to detect the multi-disease effects.

Keywords:

multivariate Gaussians; correlated random variables; visualization; entropy; relative entropy; mutual information

1. Introduction

Understanding the ways knowledge concerning an external variable or the reciprocal information of its parts can assist in characterizing and inferring the underlying mechanics and function of the system. This goal has driven the development of several techniques for dissecting the elements of a set of variables’ combined entropy or for dissecting the contributions of a set of variables to the mutual information about the variable of interest. In actuality, this association and its modifications exist for any input signal and the widest range of Gaussian pathways, comprising discrete-time and continuous-time pathways in scalar or vector forms.

In a more general way, mutual information and mean-square error (MSE) are the fundamental concepts of information theory and estimation theory, respectively. In contrast to the minimum MSE (MMSE), which determines how precisely each input sample can be restored using the channel’s outcomes, the input-output mutual information is an estimation of whether the information can be consistently delivered over a channel given a specific input signal. An inactive functioning characterization for mutual information is provided by the substantial relevance of mutual information to estimation and filtering. Therefore, the significance of identity is not only obvious, but the link is also fascinating and merits an in-depth explanation [1,2,3]. Relations between the MMSE of the approximation of the output given the input and the localized actions of the mutual information at diminishing signal-to-noise ratio (SNR) are presented in [4]. The authors of [5] give an idea about the probabilistic ratios of geometric characteristics of signal detection in Gaussian noise. Furthermore, whether in a continuous-time [5,6,7] or discrete-time [8] context, the likelihood ratio is difficult in the relationship between observation and estimation.

Considering the specific instance of parametric computation (or Gaussian inputs), correlations relating to causal and non-causal estimation errors have been investigated in [9,10], involving the limit on the loss owing to the causality restriction. Knowing how data pertaining to an external parameter, or inversely related data within its parts, distributes across the parts of a multivariate system can assist in categorizing and determining the fundamental mechanics and functionality of the structure. The mechanism served as the impetus for the development of various techniques for decomposing the various elements of a set of parameters’ joint entropy [11,12,13,14,15,16,17,18,19] or for deconvoluting the additions of a set of elements to the mutual information about a target variable [13]. The mutual information techniques can be used to examine a variety of intricate systems, including those in the physical distinction domain, such as gene networks [17] or brain coding [20], as well as those in the social domain, such as selection agents [21] and community behavior [22]. It can also be used to analyze artificial agents [23]. Furthermore, some new proposals deviated more significantly from the original framework by incorporating novel principles such as the consideration of the presence of harmful elements associated with errors and the use of joint entropy subdivisions in place of mutual information [24,25].

In the multivariate scenario, the challenges of breaking down mutual information into redundancy and complementary sections have nevertheless been significantly increased. The maximum entropy framework allows for a more straightforward generalization of the efficiency measurements to the multivariate case [26,27]. The novel redundancy determines that were initially developed are only defined for the bivariate situation or allow negative components [28], whereas measurements of coordination are more readily extended to the multivariate case, especially when using the maximum entropy architecture [29,30]. By either utilizing the associations between lattices formed by various numbers of parameters or utilizing the multiple interactions between redundant lattices and information loss lattices, for which collaborative efforts are more actually defined, the study in [31,32] established two analogous techniques for constructing multivariate redundant metrics. Information theory variables have a benefit compared to more known test results measurements in that they may be employed when numerous ailments are being considered as well as when a test of diagnosis can produce several or continuous findings [33].

Although there are some valuable references detailing entropy-related topics, both discrete and continuous, they may not be easily accessible to some readers from the existing publications. Therefore, in this present study, we propose an extension of the bivariate Gaussian distribution technique to calculate multivariate redundant metrics inside the maximum entropy context. The importance of the maximum entropy approach in the multivariate scenario, where it offers constraints for the actual redundancy, unique information, and efficiency terms under logical presumptions shared by additional criteria, acts as the motivation for this particular focus [26,34]. The maximum entropy measurements, specifically, offer a lower limit for the actual cooperation and redundant terms and a higher limit for the actual specific information if it is presumed that a bivariate non-negative disintegration exists and that redundancy can be calculated from the bivariate distributions of the desired outcome with every source. Furthermore, if these bivariate distributions are consistent with possibly having little interaction under the previous hypotheses, then the maximum entropy decomposition returns not only boundaries but also the precise actual terms. Here, in the proposed framework, we also demonstrated that, under similar presumptions, the maximum entropy reduction also plays this dominant role in the multivariate situation [35]. This paper intends to convey the important issues and inspire new applications of information theory to a number of areas, such as information coding, nonlinear signal detection, and clinical diagnostic testing.

The remainder of this paper is organized as follows. A brief review of the geometry of the Gaussian distribution is reviewed in Section 2. The three consecutive sections deal with various important topics on information entropy with illustrative examples, with an emphasis on visualization of the information and discussion. In Section 3, continuous entropy/differential entropy are presented. In Section 4, the relative entropy (Kullback–Leibler divergence) is presented. Mutual information is presented in Section 5. Conclusions are given in Section 6.

2. Geometry of the Gaussian Distribution

In this section, the background relations of the Gaussian distribution from different parametric points of view will be discussed. The exploratory objective of the fundamental analysis is to identify “the framework” in multivariate datasets. Ordinary least-squares regression and principal component analysis (PCA), respectively, analyze the measurements for dependency (the predicted connection between particular components) and rigidity (the degree of prominence of the probability density function (pdf) around a low-dimensional axis) for bivariate Gaussian distributions. Mutual information, an established measure of dependency, is not an accurate indicator of rigidity since it is not invariant with an opposite rotation of the parameters. For bivariate Gaussian distributions, a suitable rotating invariant compactness measure is constructed and demonstrated to reduce the corresponding PCA measure.

2.1. Standard Parametric Representation of an Ellipse

For the uncorrelated data, which has zero covariance, the ellipse is not rotated and the axis is aligned. The radii of the ellipse in both directions are the variances. Geometrically, a not-rotated ellipse at point (0, 0) and radii

a

and

b

for the

x_{1}

- and

x_{2}

-direction is described by:

{(\frac{x_{1}}{a})}^{2} + {(\frac{x_{2}}{b})}^{2} = 1 .

(1)

The general probability density function for the multivariate Gaussian is given by the following:

f_{X} (x | μ, \sum) = \frac{1}{{(\sqrt{2 π})}^{n} | \sum |^{1 / 2}} e^{\{- \frac{1}{2} {(X - μ)}^{T} \sum^{- 1} (X - μ)\}},

(2)

where

μ = E [X]

and

\sum = Cov (X) = E [(X - μ) {(X - μ)}^{T}]

is a symmetric, positive semi-definite matrix. If

\sum

is the identity matrix, then the Mahalanobis distance reduces to the standard Euclidean distance between

X

and

μ

.

For bivariate Gaussian distributions, the mean and covariance matrix are given by the following:

μ = [\begin{matrix} μ_{1} \\ μ_{2} \end{matrix}]; \sum = [\begin{matrix} σ_{1}^{2} & σ_{12} \\ σ_{12} & σ_{2}^{2} \end{matrix}] = [\begin{matrix} σ_{1}^{2} & ρ σ_{1} σ_{2} \\ ρ σ_{1} σ_{2} & σ_{2}^{2} \end{matrix}],

(3)

where the linear correlation coefficient

| ρ | \leq 1

.

Variance measures the variation of a single random variable, whereas covariance is a measure of the two random variables varying together. With the covariance, we can calculate the entries of the covariance matrix, which is a square matrix. In addition, the covariance matrix is symmetric. The diagonal entries of the covariance matrix are the variances; however, the other entries are the covariances. Due to this reason, the covariance matrix is often called the variance-covariance matrix.

2.2. The Confidence Ellipse

A typical way to visualize two-dimensional Gaussian-distributed data is by plotting a confidence ellipse. The distance

d_{M} = {(X - μ)}^{T} \sum^{- 1} (X - μ)

is a constant value referred to as the Mahalanobis distance, which is a random variable distributed by the chi-squared distribution, denoted as

χ_{k}^{2}

.

P [{(X - μ)}^{T} \sum^{- 1} (X - μ) \leq χ_{k}^{2} (α)] = 1 - α,

(4)

where

k

is the number of degrees of freedom and

α

is the given probability related to the confidence ellipse. For example, if

α = 0.95,

95% confidence ellipse is defined. Extension from Equation (1): the radius in each direction is the standard deviation

σ_{1}

and

σ_{2}

parametrized by a scale factor s, known as the Mahalanobis radius of the ellipsoid:

{(\frac{x_{1}}{σ_{1}})}^{2} + {(\frac{x_{2}}{σ_{2}})}^{2} = s .

(5)

The goal is to determine the scale s such that confidence p is met. Since the data are multivariate Gaussian-distributed, the left-hand side of the equation is the sum of squares of Gaussian-distributed samples, which follows a χ² distribution. A χ² distribution is defined by the degrees of freedom, and since we have two dimensions, the number of degrees of freedom is two. Now, we have calculated the probability with the sum, and therefore s has a certain value under a χ² distribution.

This ellipse with a probability contour defines the region of a minimum area (or volume in the multivariate case) containing a given probability under the Gaussian assumption. The equation can be solved using a χ² table or simply using the relationship

s = - 2 \ln (1 - p)

. The confidence interval can be evaluated through the following:

p = 1 - e x p (- 0.5 s) .

(6)

for

s = 1

we have

p = 1 - e x p (- 0.5) \approx 0.3935

. Furthermore, typical values include

s = 2.279

,

s = 4.605

,

s = 5.991

, and

s = 9.210

for

p = 0.68

,

p = 0.9

,

p = 0.95

, and

p = 0.99

, respectively. The ellipse can then be drawn with radii

σ_{1} \sqrt{s}

and

σ_{2} \sqrt{s}

. Figure 1 shows the relationship between the confidence interval and the scale factor s.

The Mahalanobis distance accounts for the variance of each variable and the covariance between variables.

\begin{array}{l} {(X - μ)}^{T} \sum^{- 1} (X - μ) \\ = [\begin{matrix} x_{1} - μ_{1} & x_{2} - μ_{2} \end{matrix}] {[\begin{matrix} σ_{1}^{2} & ρ σ_{1} σ_{2} \\ ρ σ_{1} σ_{2} & σ_{2}^{2} \end{matrix}]}^{- 1} [\begin{matrix} x_{1} - μ_{1} \\ x_{2} - μ_{2} \end{matrix}] \\ = [\begin{matrix} x_{1} - μ_{1} & x_{2} - μ_{2} \end{matrix}] \frac{[\begin{matrix} σ_{2}^{2} & - ρ σ_{1} σ_{2} \\ ρ σ_{1} σ_{2} & σ_{1}^{2} \end{matrix}]}{σ_{1}^{2} σ_{2}^{2} (1 - ρ^{2})} [\begin{matrix} x_{1} - μ_{1} \\ x_{2} - μ_{2} \end{matrix}] \\ = \frac{1}{1 - ρ^{2}} (\frac{{(x_{1} - μ_{1})}^{2}}{σ_{1}^{2}} - \frac{2 ρ (x_{1} - μ_{1}) (x_{2} - μ_{2})}{σ_{1} σ_{2}} + \frac{{(x_{2} - μ_{2})}^{2}}{σ_{2}^{2}}) \end{array} .

(7)

Geometrically, it does this by transforming the data into standardized, uncorrelated data and computing the ordinary Euclidean distance for the transformed data. In this way, the Mahalanobis distance is like a univariate z-score: it provides a way to measure distances that takes into account the scale of the data.

In the general case, covariances

σ_{12}

and

σ_{21}

are not zero, and therefore the ellipse-coordinate system is not axis-aligned. In such a case, instead of using the variance as a spread indicator, we use the eigenvalues of the covariance matrix. The eigenvalues represent the spread in the direction of the eigenvectors, which are the variances under a rotated coordinate system. By definition, a covariance matrix is positive and definite; therefore, all eigenvalues are positive and can be seen as a linear transformation of the data. The actual radii of the ellipse are

\sqrt{λ_{1}}

and

\sqrt{λ_{2}}

for the two eigenvalues

λ_{1}

and

λ_{2}

of the scaled covariance matrix

s \cdot \sum

.

Based on Equations (2) and (7), the bivariate Gaussian distributions can be represented as follows:

f (x_{1}, x_{2}) = \frac{1}{2 π σ_{1} σ_{2} \sqrt{1 - ρ^{2}}} e^{- \frac{1}{2} (\frac{1}{1 - ρ^{2}}) \{\frac{{(x_{1} - μ_{1})}^{2}}{σ_{1}^{2}} - 2 ρ \frac{(x_{1} - μ_{1}) (x_{2} - μ_{2})}{σ_{1} σ_{2}} + \frac{{(x_{2} - μ_{2})}^{2}}{σ_{2}^{2}}\}},

(8)

and the level surface of

f (x_{1}, x_{2})

are concentric ellipses:

\frac{{(x_{1} - μ_{1})}^{2}}{σ_{1}^{2}} - 2 ρ \frac{(x_{1} - μ_{1}) (x_{2} - μ_{2})}{σ_{1} σ_{2}} + \frac{{(x_{2} - μ_{2})}^{2}}{σ_{2}^{2}} = c,

(9)

where

c

is the Mahalanobis distance possessing the following properties:

▪: It accounts for the fact that the variances in each direction are different;
▪: It accounts for the covariance between variables;
▪: It reduces to the familiar Euclidean distance for uncorrelated variables with unit variance.

The length of the ellipse axes is a function of the given probability of the chi-squared distribution with 2 degrees of freedom

χ_{2}^{2} (α)

, the eigenvalues

λ = {[\begin{matrix} λ_{1} & λ_{2} \end{matrix}]}^{T}

and the linear correlation coefficient

ρ

. If

α = 0.95,

95% confidence ellipse is defined by:

[\begin{matrix} x_{1} - μ_{1} & x_{2} - μ_{2} \end{matrix}] \sum^{- 1} [\begin{matrix} x_{1} - μ_{1} \\ x_{2} - μ_{2} \end{matrix}] \leq χ_{2}^{2} (0.05)

(10)

where

\sum^{- 1} = \frac{1}{σ_{1}^{2} σ_{2}^{2} (1 - ρ^{2})} [\begin{matrix} σ_{2}^{2} & - ρ σ_{1} σ_{2} \\ ρ σ_{1} σ_{2} & σ_{1}^{2} \end{matrix}],

(11)

as

\sum

denotes a symmetric matrix, the eigenvectors of

\sum

is linearly independent (or orthogonal).

2.3. Similarity Transform

The simplest similarity transformation method for eigenvalue computation is the Jacobi method, which deals with the standard eigenproblems. In the multivariate Gaussian distribution, the covariance matrix

\sum

can be expressed in terms of eigenvectors:

\sum = U Λ U^{- 1} = U Λ U^{T} = [\begin{matrix} u_{1} & u_{2} \end{matrix}] [\begin{matrix} λ_{1} & 0 \\ 0 & λ_{2} \end{matrix}] [\begin{matrix} u_{1}^{T} \\ u_{2}^{T} \end{matrix}],

(12)

where

U = [\begin{matrix} u_{1} & u_{2} \end{matrix}]

are the eigenvectors of

\sum

and

Λ

is the diagonal matrix of the eigenvalues

λ = {[\begin{matrix} λ_{1} & λ_{2} \end{matrix}]}^{T}

Λ = [\begin{matrix} λ_{1} & 0 \\ 0 & λ_{2} \end{matrix}],

replacing

\sum

by

\sum^{- 1} = U Λ^{- 1} U^{- 1}

, the square of the difference can be written as:

[\begin{matrix} x_{1} - μ_{1} & x_{2} - μ_{2} \end{matrix}] U Λ^{- 1} U^{- 1} [\begin{matrix} x_{1} - μ_{1} \\ x_{2} - μ_{2} \end{matrix}] \leq χ_{2}^{2} (0.05),

(13)

as

U^{T} = U^{- 1}

. Denoting

[\begin{matrix} y_{1} \\ y_{2} \end{matrix}] = U^{- 1} [\begin{matrix} x_{1} - μ_{1} \\ x_{2} - μ_{2} \end{matrix}],

(14)

the square of the difference can then be expressed as:

[\begin{matrix} y_{1} & y_{2} \end{matrix}] [\begin{matrix} λ_{1} & 0 \\ 0 & λ_{2} \end{matrix}] [\begin{matrix} y_{1} \\ y_{2} \end{matrix}] \leq χ_{2}^{2} (0.05) .

(15)

If the above equation is further evaluated, the resulting equation is the equation of an ellipse aligned with the axis

y_{1}

and

y_{2}

in the new coordinate system:

\frac{y_{1}^{2}}{χ_{2}^{2} (0 . 05) λ_{1}} + \frac{y_{2}^{2}}{χ_{2}^{2} (0 . 05) λ_{2}} \leq 1,

(16)

the axes of the ellipse are defined by

y_{1}

axis with a length

2 \sqrt{λ_{1} χ_{2}^{2} (0 . 05)}

and

y_{2}

axis with a length

2 \sqrt{λ_{2} χ_{2}^{2} (0 . 05)}

.

When

ρ = 0

, the eigenvectors are equal to

λ_{1} = σ_{1}

and

λ_{2} = σ_{2}

. Additionally,

U

matrix whose elements are the eigenvectors, of

\sum

becomes an identity matrix. The final equation of an ellipse is then defined by:

\frac{{(x_{1} - μ_{1})}^{2}}{χ_{2}^{2} (0 . 05) λ_{1}} + \frac{{(x_{2} - μ_{2})}^{2}}{χ_{2}^{2} (0 . 05) λ_{2}} \leq 1 .

(17)

It is clear from the equation given above that the axes of the ellipse are parallel to the coordinate axes. The lengths of the axes of the ellipse are then defined as

2 \sqrt{σ_{11} χ_{2}^{2} (0 . 05)}

and

2 \sqrt{σ_{22} χ_{2}^{2} (0 . 05)}

.

The covariance matrix can be presented by its eigenvectors and eigenvalues:

\sum U = U Λ

, where

U

is the matrix whose columns are the eigenvectors of

\sum

and

Λ

is the diagonal matrix with diagonal elements given by the eigenvalues of

\sum

. Transformation is performed based on the three steps involving scaling, rotation, and translation:

Scaling

The covariance matrix can be written as

\sum = U Λ U^{- 1} = U S S U^{- 1}

, where

S

is a diagonal scaling matrix

S = Λ^{1 / 2} = S^{T}

;

2.: Rotation

U

is generalized from the normalized eigenvectors of the covariance matrix

\sum

.

U = [\begin{matrix} \cos (θ) & - \sin (θ) \\ \sin (θ) & \cos (θ) \end{matrix}],

(18)

it can be noted that

U

is an orthogonal matrix

U^{- 1} = U^{T}

and

| U | = 1

. Here, we have calculated the matrix with rotation and scaling

T = U S

and

T^{T} = {(U S)}^{T} = S^{T} U^{T} = S U^{- 1}

. Thus, the covariance matrix can be written as

\sum = T T^{T}

and

U^{T} \sum U = Λ

with diagonal eigenvalues

λ_{i}

. Since

T = U S

, we have

Y = T X = U S X = U Λ^{1 / 2} X

.

[\begin{matrix} x_{1} (t) \\ x_{2} (t) \end{matrix}] = [\begin{matrix} u_{1 x} & u_{2 x} \\ u_{1 y} & u_{2 y} \end{matrix}] [\begin{matrix} \sqrt{λ_{1}} \cos (t) \\ \sqrt{λ_{2}} \sin (t) \end{matrix}] = [\begin{matrix} \cos (θ) & - \sin (θ) \\ \sin (θ) & \cos (θ) \end{matrix}] [\begin{matrix} \sqrt{λ_{1}} \cos (t) \\ \sqrt{λ_{2}} \sin (t) \end{matrix}]

(19)

The similarity transform is applied to obtain the relationship between

X^{T} \sum^{- 1} X = Y^{T} U^{T} \sum^{- 1} U Y = Y^{T} Λ^{- 1} Y

, and the pdf of

Y

vector, which can be found by considering the below expression:

f_{Y} (y) = \prod_{i = 1}^{n} \frac{1}{\sqrt{2 π} \sqrt{λ_{i}}} e^{- \frac{1}{2} \frac{y_{i}^{2}}{λ_{i}}},

(20)

the ellipse in the transformed frame can be represented as:

\frac{y_{1}^{2}}{λ_{1}} + \frac{y_{2}^{2}}{λ_{2}} = c,

(21)

where the eigenvectors are equal to

λ_{1} = σ_{1}^{2}

and

λ_{2} = σ_{2}^{2}

;

3.: Translation

x_{1} (t) = \sqrt{λ_{1}} \cos (θ) \cos (t) - \sqrt{λ_{2}} \sin (θ) \sin (t) + μ_{1},

(22)

x_{2} (t) = \sqrt{λ_{1}} \sin (θ) \cos (t) + \sqrt{λ_{2}} \cos (θ) \sin (t) + μ_{2},

(23)

the eigenvalues

λ = {[\begin{matrix} λ_{1} & λ_{2} \end{matrix}]}^{T}

can be calculated from:

λ_{1} = \frac{1}{2} [σ_{1}^{2} + σ_{2}^{2} + \sqrt{{(σ_{1}^{2} - σ_{2}^{2})}^{2} + 4 ρ^{2} σ_{1}^{2} σ_{2}^{2}}]; λ_{2} = \frac{1}{2} [σ_{1}^{2} + σ_{2}^{2} - \sqrt{{(σ_{1}^{2} - σ_{2}^{2})}^{2} + 4 ρ^{2} σ_{1}^{2} σ_{2}^{2}}],

and thus

| \sum | = λ_{1} \cdot λ_{2} = σ_{1}^{2} σ_{2}^{2} (1 - ρ^{2}) .

(24)

From another point of view, the covariance matrix can be calculated as:

\begin{matrix} \sum = U Λ U^{T} = [\begin{matrix} \cos (θ) & - \sin (θ) \\ \sin (θ) & \cos (θ) \end{matrix}] [\begin{matrix} λ_{1} & 0 \\ 0 & λ_{2} \end{matrix}] [\begin{matrix} \cos (θ) & \sin (θ) \\ - \sin (θ) & \cos (θ) \end{matrix}], \\ = [\begin{matrix} λ_{1} \cos (θ) & - λ_{2} \sin (θ) \\ λ_{1} \sin (θ) & λ_{2} \cos (θ) \end{matrix}] [\begin{matrix} \cos (θ) & \sin (θ) \\ - \sin (θ) & \cos (θ) \end{matrix}], \\ = [\begin{matrix} λ_{1} \cos^{2} (θ) + λ_{2} \sin^{2} (θ) & (λ_{1} - λ_{2}) (\sin (θ) - \cos (θ)) \\ s y m s & λ_{1} \sin^{2} (θ) + λ_{2} \cos^{2} (θ) \end{matrix}] . \end{matrix}

(25)

Calculation for the determinant of the above covariance matrix gives the same result, and the inverse is:

\begin{matrix} \sum^{- 1} = \frac{1}{λ_{1} \cdot λ_{2}} [\begin{matrix} λ_{1} \sin^{2} (θ) + λ_{2} \cos^{2} (θ) & (λ_{2} - λ_{1}) (\sin (θ) - \cos (θ)) \\ s y m s & λ_{1} \cos^{2} (θ) + λ_{2} \sin^{2} (θ) \end{matrix}], \\ = [\begin{matrix} \frac{\sin^{2} (θ)}{λ_{2}} + \frac{\cos^{2} (θ)}{λ_{1}} & \sin (θ) \cos (θ) (\frac{1}{λ_{1}} - \frac{1}{λ_{2}}) \\ s y m s & \frac{\sin^{2} (θ)}{λ_{1}} + \frac{\sin^{2} (θ)}{λ_{2}} \end{matrix}] . \end{matrix}

(26)

2.4. Simulation with a Given Variance-Covariance Matrix

With the given data

X ~ N (μ, \sum)

, an ellipse represents the confidence p, which can be plotted by calculating the radii, its center, and the rotation. Here,

θ

(by which

U

can be obtained) and

S

for generating the covariance matrix

\sum

, from which

ρ

can be derived. The inclination angle is calculated by:

θ = \{\begin{matrix} 0 & i f σ_{12} = 0 a n d σ_{1}^{2} \geq σ_{2}^{2} \\ π / 2 & i f σ_{12} = 0 a n d σ_{1}^{2} < σ_{2}^{2} \\ \tan^{- 1} (λ_{1} - σ_{1}^{2}, σ_{12}) & e l s e \end{matrix},

(27)

which can be used in calculations with the values of

U

U = [\begin{matrix} \cos (θ) & - \sin (θ) \\ \sin (θ) & \cos (θ) \end{matrix}],

(28)

and the covariance can be evaluated by:

\sum = U Λ U^{T} = U S S U^{T}

if

S

is specified. On the other hand, with the correlation coefficient

ρ

and variances for generating the covariance matrix

\sum

,

θ

can be obtained.

To generate the sampling points that meet the specified correlation, the following procedure can be followed. Given two random variables

X_{1}

and

X_{2}

, their linear combination is

Y = α X_{1} + β X_{2}

. For the generation of correlated random variables, if we have two Gaussian, uncorrelated random variables

X_{1}

,

X_{2}

then we can create two correlated random variables using the formula:

Y = ρ X_{1} + \sqrt{1 - ρ^{2}} X_{2},

(29)

and then

Y

will have a correlation

ρ

with

X_{1}

:

ρ = σ_{12} / (σ_{1} σ_{2}) .

Based on the relationship:

X = A Z + μ

,

Z ~ N (0, 1)

, the following equation can be employed to generate the sampling points for the scatter plots using the MATLAB software:

X = A * r a n d n (2, K) + μ * o n e s (1, K),

(30)

where the Cholesky decomposition of

\sum

has a lower triangular matrix for

A

,

\sum = A A^{T}

and

μ

is the vector of mean values.

When

ρ = 0

, the axes of the ellipse are parallel to the original coordinate system, and when

ρ \neq 0

, the axes of the ellipse are aligned with the rotated axes in the transformed coordinate system. Figure 2 and Figure 3 show the ellipses for various levels of confidence. The plots provide the idea of confidence (error) ellipses with different confidence levels (i.e., 68%,

s = 2.279

; 90%,

s = 4.605

; 95%,

s = 5.991

; and 99%,

s = 9.210

) from inner to outer ellipses, respectively, by considering the cases where the random variables are: (1) positively correlated

ρ > 0

, (2) negatively correlated

ρ < 0

, and (3) independent

ρ = 0

. More specifically, in Figure 2, the position of the ellipse with various correlation coefficients given by the angle of inclination is specified

θ

to obtain

ρ

,

ρ = σ_{12} / (σ_{1} σ_{2})

: (a)

θ = 30^{\circ}

,

ρ \approx 0.55

; (b)

θ = 0^{\circ}

,

ρ = 0

; and (c)

θ = 150^{\circ}

,

ρ \approx - 0.55

, respectively. On the other hand, in Figure 3, the position of the ellipse with various values of the correlation constant given the angle of inclination is specified

ρ

to obtain

θ

: (a)

ρ = {0.95}^{\circ}

,

θ = 45^{\circ}

; (b)

ρ = 0

,

θ = 0^{\circ}

; and (c)

ρ = - {0.95}^{\circ}

,

θ = 135^{\circ}

, respectively. The rotation angle is measured

0 \leq θ \leq 180^{\circ}

with respect to the positive axis. When

ρ > 0

, the angle is in the first quadrant and when

ρ < 0

, the angle is in the second quadrant.

In the following, two case studies involve more illustrations:

(1) Equal variances for two random variables with nonzero

ρ

:

Case 1: fixed correlation coefficient. As an example, when

ρ = 0.5

, and the variances

σ_{1} = σ_{2} = σ

range from

2 ~ 5

, as shown in Figure 4. As can be seen, the contours and the scatter plots are ellipses instead of circles.

\sum = [\begin{matrix} σ_{1}^{2} & σ_{12} \\ σ_{21} & σ_{2}^{2} \end{matrix}] = [\begin{matrix} σ_{1}^{2} & ρ σ_{1} σ_{2} \\ ρ σ_{1} σ_{2} & σ_{2}^{2} \end{matrix}] = [\begin{matrix} 4^{2} & 0.5 (4) (2) \\ 0.5 (4) (2) & 2^{2} \end{matrix}] = [\begin{matrix} 4^{2} & 4 \\ 4 & 2^{2} \end{matrix}]

Subplot (a) in Figure 5 shows the ellipses for

ρ = 0.5

with varying variances. Here and in subsequent illustrations, 95% confidence levels are shown;

Case 2: increasing the correlation coefficient

ρ

from zero correlation. With fixed variance

σ_{1} = σ_{2} = σ

, the contour will initially be a circle when

ρ = 0

and then an ellipse as

ρ

increases when

ρ \neq 0

. Subplot (b) in Figure 5 provides the contours with scatter plots for

ρ = 0, 0.5, 0.9, 0.99

, respectively, when

σ_{1} = σ_{2} = 2

. The eccentricity of the ellipses increases with the increase of

ρ

.

(2) Unequal variances for two random variables,

σ_{1} \neq σ_{2}

with fixed correlation coefficient

ρ = 0.5

.

Case 1:

σ_{1} > σ_{2}

. The variations of three-dimensional surfaces and ellipses are presented in Figure 6 and Figure 7a with the increase of

σ_{1} / σ_{2}

, where

σ_{1} = 2 ~ 5

and

σ_{2} = 2

.

Case 2:

σ_{2} > σ_{1}

. The variation of the ellipses is presented in Figure 7b with the increase of

σ_{2} / σ_{1}

, where

σ_{2} = 2 ~ 5

and

σ_{1} = 2

. Figure 8 shows the variation of inclination angle as a function of

σ_{1}

and

σ_{2}

, for

ρ = 0

and

ρ = 0.5

for providing further insights on the variation of inclination angle

θ

with respect to

σ_{1}

and

σ_{2}

.

(3) Variation of the ellipses for the various positive and negative correlations. For a given variance, when

ρ

is specified, the eigenvalues and the inclination angle are obtained accordingly. Figure 9 presents results for the cases of

σ_{1} > σ_{2}

(

σ_{1} = 4

,

σ_{2} = 2

in this example) and

σ_{2} > σ_{1}

(

σ_{1} = 2

,

σ_{2} = 4

in this example) with various correlation coefficients (namely, positive, zero, and negative), including

ρ = 0, 0.5, 0.9, 0.99

and

ρ = 0, - 0.5, - 0.9, - 0.99

. In the figure,

σ_{1} = 4

,

σ_{2} = 2

are applied for the top plots, while

σ_{1} = 2

,

σ_{2} = 4

are applied for the bottom plots. On the other hand,

ρ = 0, 0.5, 0.9, 0.99

are applied for the left plots, while

ρ = 0, - 0.5, - 0.9, - 0.99

are applied for the right plots. Furthermore, Figure 10 provides a comparison of the ellipses for various

σ_{1}

and

σ_{2}

for the following cases: (i)

σ_{1} = 2

,

σ_{2} = 4

; (ii)

σ_{1} = 4

,

σ_{2} = 2

; (iii)

σ_{1} = σ_{2} = 2

; and (iv)

σ_{1} = σ_{2} = 4

, while

ρ = 0.5

.

3. Continuous Entropy/Differential Entropy

Differential entropy (also referred to as continuous entropy) is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy, a measure of the average surprise of a random variable, to continuous probability distributions. Unfortunately, Shannon did not derive this formula and rather just assumed it was the correct continuous analog of discrete entropy, but it is not [1]. The actual continuous version of discrete entropy is the limiting density of discrete points (LDDP). Differential entropy (described here) is commonly encountered in the literature, but it is a limiting case of the LDDP and one that loses its fundamental association with discrete entropy.

In the following discussion, differential entropy and relative entropy are measured in bits, which are used in the definition. Instead, if ln is used, it is then measured in nats, and the only difference in the expression is the

\log_{2} e

factor.

3.1. Entropy of a Univariate Gaussian Distribution

If we have a continuous random variable

X

with a probability density function (pdf)

f_{X} (x)

, the differential entropy of

X

in bits is expressed as:

h (X) = - E [\log_{2} f_{X} (x)] = - \int f_{X} (x) \log_{2} f_{X} (x) d x,

(31)

let

X

be a Gaussian random variable

X ~ N (μ, σ^{2})

f_{X} (x) = \frac{1}{\sqrt{2 π} σ} e^{- \frac{1}{2} {(\frac{x - μ}{σ})}^{2}} .

The differential entropy for this univariate Guassian distribution can be evaluated as follows:

\begin{array}{l} h (X) \\ = - E [\log_{2} f_{X} (x)] \\ = - \int f_{X} (x) \log_{2} f_{X} (x) d x \\ = - \int f_{X} (x) \log_{2} (\frac{1}{\sqrt{2 π} σ} e^{- \frac{{(x - μ)}^{2}}{2 σ^{2}}}) d x \\ = \frac{1}{2} \log_{2} (2 π e σ^{2}) \end{array} .

(32)

Figure 11 shows the differential entropy as a function

σ^{2}

for the univariate Gaussian variable, which is concave downward and grows first very fast and then much more slowly at high values of

σ^{2}

.

3.2. Entropy of a Multivariate Gaussian Distribution

Let

X

follow a multivariate Gaussian distribution

X ~ N (μ, \sum)

, as given by Equation (2), then the differential entropy of

X

in nats is:

h (X) = - E [\log_{2} f_{X} (x)] = - \int f_{X} (x) \log_{2} f_{X} (x) d x,

(33)

and the differential entropy is given by Appendix B:

h (X) = \frac{1}{2} \log_{2} ({(2 π e)}^{n} | \sum |) .

(34)

The above calculation involves the evaluation of expectations of the Mahalanobis distance as (Appendix C):

E [{(x - μ)}^{T} \sum^{- 1} (x - μ)] = n .

(35)

For a fixed variance, the normal distribution is the pdf that maximizes entropy. Let

X = {[\begin{matrix} X_{1} & X_{2} \end{matrix}]}^{T}

be a 2D Gaussian vector, and the entropy of

X

can be calculated to be:

h (X) = h (X_{1}, X_{2}) = \frac{1}{2} \log_{2} ({(2 π e)}^{2} | \sum |) = \log_{2} (2 π e σ_{1} σ_{2} \sqrt{1 - ρ^{2}}),

(36)

with a covariance matrix:

\sum = [\begin{matrix} σ_{1}^{2} & σ_{12} \\ σ_{12} & σ_{2}^{2} \end{matrix}] = [\begin{matrix} σ_{1}^{2} & ρ σ_{1} σ_{2} \\ ρ σ_{1} σ_{2} & σ_{2}^{2} \end{matrix}] .

If

σ_{1} = σ_{2} = σ

, this becomes:

h (X_{1}, X_{2}) = \log_{2} (2 π e σ^{2} \sqrt{1 - ρ^{2}}),

(37)

which is a function of

ρ^{2}

concave downward and grows first very fast and then much more slowly for high

ρ^{2}

values, as shown in Figure 12.

3.3. The Differential Entropy in the Transformed Frame

The differential entropy is invariant to a translation (change in the mean of the pdf):

h (X + a) = h (X),

and

h (b X) = h (X) + \log_{2} | b | .

For a random variable vector, the differential entropy in the transformed frame remains the same as the one in the original frame. It can be shown in general that:

h (Y) = h (U X) = h (X) + \log_{2} | U | = h (X) .

(38)

For the case of a multivariate Gaussian distribution, we have:

h (X) = \frac{1}{2} \log_{2} ({(2 π e)}^{n} | \sum |) = \frac{n}{2} \log_{2} (2 π e) + \frac{1}{2} \log_{2} | \sum | = \frac{n}{2} \log_{2} (2 π e) + \sum_{i = 1}^{n} \frac{1}{2} \log_{2} λ_{i}

It is known that the determinant of the covariance matrix is equal to the product of its eigenvalues:

| \sum | = \prod_{i = 1}^{n} λ_{i} .

For the case of a bivariate Gaussian distribution,

n = 2

, we have:

\begin{array}{l} f_{Y} (y) = \prod_{i = 1}^{2} \frac{1}{\sqrt{2 π} \sqrt{λ_{i}}} e^{- \frac{1}{2} \frac{y_{i}^{2}}{λ_{i}}} \\ = \frac{1}{\sqrt{2 π} \sqrt{λ_{1}}} e^{- \frac{1}{2} \frac{y_{1}^{2}}{λ_{1}}} \cdot \frac{1}{\sqrt{2 π} \sqrt{λ_{2}}} e^{- \frac{1}{2} \frac{y_{2}^{2}}{λ_{2}}} \\ = \frac{1}{2 π \sqrt{λ_{1} λ_{2}}} e^{- \frac{1}{2} (\frac{y_{1}^{2}}{λ_{1}} + \frac{y_{2}^{2}}{λ_{2}})} \end{array} .

(39)

It can be shown that the entropy in the transformed frame is given by:

h (Y) = \frac{2}{2} \log_{2} (2 π e) + \sum_{i = 1}^{2} \log_{2} (λ_{i}) = \log_{2} (2 π e) + \log_{2} (λ_{1} \cdot λ_{2}) .

Detailed derivations are provided in Appendix D. As discussed, the determinant of the covariance matrix is equal to the product of its eigenvalues:

\begin{array}{l} | \sum | = λ_{1} \cdot λ_{2} \\ = (\frac{1}{2} [σ_{1}^{2} + σ_{2}^{2} + \sqrt{{(σ_{1}^{2} - σ_{2}^{2})}^{2} + 4 σ_{1}^{2} σ_{2}^{2} ρ^{2}}]) (\frac{1}{2} [σ_{1}^{2} + σ_{2}^{2} - \sqrt{{(σ_{1}^{2} - σ_{2}^{2})}^{2} + 4 σ_{1}^{2} σ_{2}^{2} ρ^{2}}]) \\ = σ_{1}^{2} σ_{2}^{2} (1 - ρ^{2}) \end{array},

(40)

and thus, the entropy can be presented as:

h (Y_{1}, Y_{2}) = \frac{1}{2} \log_{2} {(2 π e)}^{2} | \sum | = \frac{1}{2} \log_{2} {(2 π e)}^{2} σ_{1}^{2} σ_{2}^{2} (1 - ρ^{2})) = \log_{2} (2 π e σ_{1} σ_{2} \sqrt{1 - ρ^{2}}) .

(41)

The result confirms the statement that the differential entropy remains unchanged in the transformed frame.

4. Relative Entropy (Kullback–Leibler Divergence)

In this section, various important issues regarding relative entropy (Kullback–Leibler divergence) are discussed. Despite the aforementioned flaws, there is a possibility of information theory in the continuous case. A key result is that the definitions for relative entropy and mutual information follow naturally from the discrete case and retain their usefulness.

The relative entropy is a type of statistical distance that provides a measure of probability distribution

f_{X}

, is different from a second reference probability distribution

g_{X}

, denoted as:

D_{K L} (f ‖ g) = \int f_{X} (x) \log_{2} \frac{f_{X} (x)}{g_{X} (x)} d x .

(42)

A detailed derivation is provided in Appendix E. The relative entropy between two Gaussian distributions with different mean and variance is given by:

D_{K L} (f ‖ g) = \frac{1}{2} [\ln (\frac{σ_{2}^{2}}{σ_{1}^{2}}) + \frac{σ_{1}^{2}}{σ_{2}^{2}} + {(\frac{μ_{1} - μ_{2}}{σ_{2}})}^{2} - 1] \cdot \log_{2} e .

(43)

It is worth noting that the relative entropy measured in bits where

\log_{2}

is used in the definition. However, if ln is used, then it would be measured in nats. The only difference in the expression is the

\log_{2} e

factor. Several conditions are discussed with examples of the characteristics of relative entropy:

(1) If

σ_{1} = σ_{2} = σ

,

D_{K L} (f ‖ g) = \frac{1}{2} {(\frac{μ_{1} - μ_{2}}{σ})}^{2} \log_{2} e

, which is 0 when

μ_{1} = μ_{2}

. Figure 13 shows the relative entropy as a function of

σ

and

μ_{1} - μ_{2}

when

σ_{1} = σ_{2} = σ

, where a three-dimensional surface and a contour with an entropy gradient are provided.

(2) If

σ_{1} = σ_{2} = 1

,

D_{K L} (f ‖ g) = \frac{1}{2} {(μ_{1} - μ_{2})}^{2} \cdot \log_{2} e

, which is an even function with a minimum value of 0 when

μ_{1} = μ_{2}

. Figure 14 illustrates the variations of relative entropy as a function of

μ_{1}

and

μ_{2}

and as a function of

μ_{1} - μ_{2}

.

–: If $μ_{2} = 0$ , $D_{K L} (f ‖ g) = \frac{1}{2} μ_{1}^{2} \log_{2} e$ , it is a function of $μ_{1}$ concave upward.
–: If $μ_{1} = 0$ , $D_{K L} (f ‖ g) = \frac{1}{2} μ_{2}^{2} \log_{2} e$ , it is a function of $μ_{2}$ concave upward.

(3) If

μ_{1} = μ_{2}

,

D_{K L} (f ‖ g) = \frac{1}{2} [\ln (\frac{σ_{2}^{2}}{σ_{1}^{2}}) + \frac{σ_{1}^{2}}{σ_{2}^{2}} - 1] \cdot \log_{2} e

. Figure 15 demonstrates relative entropy as a function of

σ_{1}

and

σ_{2}

when

μ_{1} = μ_{2}

, where a three-dimensional surface and the contour with an entropy gradient are plotted.

When

σ_{2} = 1

,

D_{K L} (f ‖ g) = \frac{1}{2} [\ln (\frac{1}{σ_{1}^{2}}) + σ_{1}^{2} - 1] \cdot \log_{2} e

.

When

σ_{1} = 1

,

D_{K L} (f ‖ g) = \frac{1}{2} [\ln (σ_{2}^{2}) + \frac{1}{σ_{2}^{2}} - 1] \cdot \log_{2} e

.

Figure 16 illustrates the variations of relative entropy as a function of the variance when the other variance is unity under the condition

μ_{1} = μ_{2}

.

A sensitivity analysis of the relative entropy due to changes in variance and mean is carried out. The gradient of

D_{K L} (f ‖ g)

given by:

\frac{\partial D_{K L} (σ_{1}, σ_{2}, μ_{1}, μ_{2})}{\partial x} = [\begin{matrix} \frac{\partial D_{K L}}{\partial σ_{1}} & \frac{\partial D_{K L}}{\partial σ_{2}} & \frac{\partial D_{K L}}{\partial μ_{1}} & \frac{\partial D_{K L}}{\partial μ_{2}} \end{matrix}],

can be calculated with the partial derivatives where the chain rule is involved. Based on the relationship

\frac{d}{d x} \ln x = \frac{1}{x}

, we have:

\frac{\partial}{\partial σ_{1}} [\ln (\frac{σ_{2}^{2}}{σ_{1}^{2}})] = \frac{σ_{1}^{2}}{σ_{2}^{2}} \cdot (- 2) σ_{2}^{2} σ_{1}^{- 3} = - \frac{2}{σ_{1}},

and the following expressions are obtained:

(1)

\frac{\partial D_{K L}}{\partial σ_{1}} = \frac{\partial}{\partial σ_{1}} [\ln (\frac{σ_{2}^{2}}{σ_{1}^{2}}) + \frac{σ_{1}^{2}}{σ_{2}^{2}}] \cdot \frac{1}{2} \log_{2} e = (\frac{σ_{1}}{σ_{2}^{2}} - \frac{1}{σ_{1}}) \cdot \log_{2} e

,

(2)

\frac{\partial D_{K L}}{\partial σ_{2}} = \frac{\partial}{\partial σ_{2}} [\ln (\frac{σ_{2}^{2}}{σ_{1}^{2}}) + \frac{σ_{1}^{2}}{σ_{2}^{2}} + \frac{{(μ_{1} - μ_{2})}^{2}}{σ_{2}^{2}}] \cdot \frac{1}{2} \log_{2} e = [\frac{1}{σ_{2}} - \frac{σ_{1}^{2}}{σ_{2}^{3}} - \frac{{(μ_{1} - μ_{2})}^{2}}{σ_{2}^{3}}] \cdot \log_{2} e

,

(3)

\frac{\partial D_{K L}}{\partial μ_{1}} = \frac{\partial}{\partial μ_{1}} {(\frac{μ_{1} - μ_{2}}{σ_{2}})}^{2} \cdot \frac{1}{2} \log_{2} e = (\frac{μ_{1} - μ_{2}}{σ_{2}^{2}}) \cdot \log_{2} e

,

(4)

\frac{\partial D_{K L}}{\partial μ_{2}} = \frac{\partial}{\partial μ_{2}} {(\frac{μ_{1} - μ_{2}}{σ_{2}})}^{2} \cdot \frac{1}{2} \log_{2} e = (\frac{μ_{2} - μ_{1}}{σ_{2}^{2}}) \cdot \log_{2} e

.

The optimal condition for each of the above cases can be:

\frac{\partial D_{K L}}{\partial σ_{1}} = \frac{σ_{1}}{σ_{2}^{2}} - \frac{1}{σ_{1}} = 0

when

σ_{1}^{2} = σ_{2}^{2}

,

\frac{\partial D_{K L}}{\partial σ_{2}} = \frac{1}{σ_{2}} - \frac{σ_{1}^{2}}{σ_{2}^{3}} - \frac{{(μ_{1} - μ_{2})}^{2}}{σ_{2}^{3}} = 0

when

σ_{2}^{2} = σ_{1}^{2} + {(μ_{1} - μ_{2})}^{2}

,

(\frac{μ_{1} - μ_{2}}{σ_{2}^{2}}) \cdot \log_{2} e = 0

when

μ_{1} = μ_{2}

,

(\frac{μ_{2} - μ_{1}}{σ_{2}^{2}}) \cdot \log_{2} e = 0

when

μ_{1} = μ_{2}

.

5. Mutual Information

Mutual information is one of many quantities that measures one random variable and tells us about another. It is a dimensionless quantity with (generally) units of bits and can be thought of as the reduction in uncertainty about one random variable given knowledge of another. The mutual information

I (X; Y)

between two variables with joint pdf

f_{X Y} (x, y)

is given by:

I (X; Y) = E [\log \frac{f_{X Y} (x, y)}{f_{X} (x) f_{Y} (y)}] = \int \int f_{X Y} (x, y) \log \frac{f_{X Y} (x, y)}{f_{X} (x) f_{Y} (y)} d x d y .

(44)

The mutual information between the random variables X and Y has the following relationships:

I (X; Y) = I (Y; X),

(45)

where

I (X; Y) = h (X) - h (X | Y) \geq 0,

(46)

and

I (Y; X) = h (Y) - h (Y | X) \geq 0,

(47)

implying that

h (X) \geq h (X | Y)

and

h (Y) \geq h (Y | X)

. The mutual information of a random variable with itself is self-information, which is entropy. High mutual information indicates a large reduction in uncertainty; low mutual information indicates a small reduction; and zero mutual information between two random variables,

I (X; Y) = 0

, meaning that the variables are independent. In such a case,

h (X) = h (X | Y)

and

h (Y) = h (Y | X)

.

Let’s consider the mutual information between the correlated Gaussian variables X and Y given by:

\begin{array}{l} I (X; Y) = h (X) + h (Y) - h (X, Y) \\ = \frac{1}{2} \log_{2} (2 π e) σ_{x}^{2} + \frac{1}{2} \log_{2} (2 π e) σ_{y}^{2} - \frac{1}{2} \log_{2} {(2 π e)}^{2} σ_{x}^{2} σ_{y}^{2} (1 - ρ^{2}) \\ = - \frac{1}{2} \log_{2} (1 - ρ^{2}) \end{array} .

(48)

Figure 17 presents the mutual information versus

ρ^{2}

, where it grows first much slower and then very fast for high values of

ρ^{2}

. If

ρ = \pm 1

, the random variables X and Y are perfectly correlated, the mutual information is infinite. It can be seen that

I (X; Y) = 0

for

ρ = 0

and that

I (X; Y) \to \infty

for

ρ \to \pm 1

.

On the other hand, considering the additive white Gaussian noise (AWGN) channel, shown in Figure 18, the mutual information is given by:

I (X; Y) = h (Y) - h (Y | X) = \frac{1}{2} \log_{2} (\frac{2 π e (σ_{x}^{2} + σ_{n}^{2})}{2 π e σ_{n}^{2}}) = \frac{1}{2} \log_{2} (1 + \frac{σ_{x}^{2}}{σ_{n}^{2}}),

(49)

where

h (Y | X) = h (N) = h (X, Y) - h (X)

, and

h (Y) = \frac{1}{2} \log_{2} (2 π e (σ_{x}^{2} + σ_{n}^{2})); h (Y | X) = h (N) = \frac{1}{2} \log_{2} (2 π e) σ_{n}^{2}

Mutual information for the additive white Gaussian noise (AWGN) channel is shown in Figure 19, including the three-dimensional surface as a function of

σ_{x}^{2}

and

σ_{n}^{2}

, and also in terms of the signal-to-noise ratio

SNR = σ_{x}^{2} / σ_{n}^{2}

. It can be seen that the mutual information grows first very fast and then much more slowly for high values of the signal-to-noise ratio.

6. Conclusions

This paper intends to serve the readers as a supplement note for the multivariate Gaussian distribution with its entropy, relative entropy, and mutual information. The illustrative examples are discussed to provide further insights into the geometric interpretation and visualization, enabling the readers to correctly interpret the theory for future design. The fundamental objective is to study the application of multivariate sets of data to a Gaussian distribution. This paper examines broad measurements of structure for Gaussian distributions, which show that they can be described in terms of the information theory between the given covariance matrix and correlated random variables (in terms of relative entropy). To develop the multivariate Gaussian distribution with entropy and mutual information, several significant methodologies are presented through the discussion supported by illustrations, both technically and statistically. The content obtained allows readers to better perceive concepts, comprehend techniques, and properly execute software programs for future study on the topic’s science and implementations. It also helps readers grasp the themes’ fundamental concepts. Involving the relative entropy and mutual information as well as the potential correlated covariance analysis based on differential equations, a wide range of information is addressed, from basic to application concerns. Moreover, the proposed techniques of multivariate Gaussian distribution and mutual information are intended to inspire new applications of information theory to a number of areas, including information coding, nonlinear signal detection, and clinical diagnostic testing, particularly when data from improved testing equipment becomes accessible.

Author Contributions

Conceptualization, D.-J.J.; methodology, D.-J.J.; software, D.-J.J.; validation, D.-J.J. and T.-S.C.; writing—original draft preparation, D.-J.J. and T.-S.C.; writing—review and editing, D.-J.J., T.-S.C. and A.B.; supervision, D.-J.J. All authors have read and agreed to the published version of the manuscript.

Funding

The author gratefully acknowledges the support of the National Science and Technology Council, Taiwan, under grant number NSTC 111-2221-E-019-047.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Derivation of the Differential Entropy for the Univariate Gaussian Distribution

\begin{array}{l} h (X) \\ = - E [\log_{2} f_{X} (x)] \\ = - \int f_{X} (x) \log_{2} f_{X} (x) d x \\ = - \int f_{X} (x) \log_{2} (\frac{1}{\sqrt{2 π} σ} e^{- \frac{{(x - μ)}^{2}}{2 σ^{2}}}) d x \\ = - \int f_{X} (x) (\log_{2} {(2 π σ^{2})}^{- \frac{1}{2}} + \log_{2} e^{- \frac{{(x - μ)}^{2}}{2 σ^{2}}}) d x \\ = - \int f_{X} (x) [(- \frac{1}{2} \log_{2} (2 π σ^{2}) - \frac{{(x - μ)}^{2}}{2 σ^{2}} \log_{2} e)] d x \\ = \frac{1}{2} \log_{2} (2 π σ^{2}) \int_{- \infty}^{\infty} f_{X} (x) d x + \frac{\log_{2} e}{2 σ^{2}} \int_{- \infty}^{\infty} {(x - μ)}^{2} f_{X} (x) d x \\ = \frac{1}{2} \log_{2} (2 π σ^{2}) + \frac{σ^{2}}{2 σ^{2}} \log_{2} e \\ = \frac{1}{2} \log_{2} (2 π e σ^{2}) \end{array}

Appendix B. Derivation of the Differential Entropy for the Multivariate Gaussian Distribution

\begin{array}{l} h (X) \\ = - E [\log_{2} f_{X} (x)] \\ = - E [\log_{2} (\frac{1}{\sqrt{{(2 π)}^{n} | \sum |}} e^{- \frac{1}{2} {(x - μ)}^{T} \sum^{- 1} (x - μ)})] \\ = - E [- \frac{n}{2} \log_{2} (2 π) - \frac{1}{2} \log_{2} | \sum | - \log_{2} e^{\frac{1}{2} {(x - μ)}^{T} \sum^{- 1} (x - μ)}] \\ = - \int f_{X} (x) [(- \frac{1}{2} \log_{2} {(2 π)}^{n} | \sum |) - \frac{\log_{2} e}{2} ({(x - μ)}^{T} \sum^{- 1} (x - μ)))] d x \\ = \frac{1}{2} \log_{2} ({(2 π)}^{n} | \sum |) + \frac{n}{2} \log_{2} e \\ = \frac{n}{2} \log_{2} (2 π) + \frac{1}{2} \log_{2} | \sum | + \log_{2} e^{\frac{1}{2} {(x - μ)}^{T} \sum^{- 1} (x - μ)} \\ = \frac{n}{2} \log_{2} (2 π) + \frac{1}{2} \log_{2} | \sum | + \log_{2} e^{\frac{n}{2}} \\ = \frac{n}{2} \log_{2} (2 π) + \frac{1}{2} \log_{2} | \sum | + \frac{n}{2} \log_{2} e \\ = \frac{1}{2} \log_{2} ({(2 π e)}^{n} | \sum |) \end{array}

The calculation involves the evaluation of expectations of the Mahalanobis distance.

Appendix C. Evaluation of Expectations of the Mahalanobis Distance $E [{(x - μ)}^{T} \sum^{- 1} (x - μ)] = n$

\begin{array}{l} E [{(x - μ)}^{T} \sum^{- 1} (x - μ)] \\ = E [t r ({(x - μ)}^{T} \sum^{- 1} (x - μ))] \\ = E [t r (\sum^{- 1} {(x - μ)}^{T} (x - μ))] \\ = t r (\sum^{- 1} E [{(x - μ)}^{T} (x - μ)]) \\ = t r (\sum^{- 1} \sum) \\ = t r (I_{n}) \\ = n \end{array}

A special case for

n = 1

\begin{array}{l} E [{(x - μ)}^{T} \sum^{- 1} (x - μ)] \\ = E [\frac{{(x - μ)}^{2}}{σ^{2}}] \\ = \int f_{X} (x) (\frac{{(x - μ)}^{2}}{σ^{2}}) d x \\ = \frac{1}{σ^{2}} \int {(x - μ)}^{2} f_{X} (x) d x \\ = 1 \end{array}

Appendix D. Derivation of the Differential Entropy in the Transformed Frame

\begin{array}{l} h (Y) \\ = - E [\log_{2} (\prod_{i = 1}^{n} \frac{1}{\sqrt{2 π} \sqrt{λ_{i}}} e^{- \frac{1}{2} \frac{y_{i}^{2}}{λ_{i}}})] \\ = - E [\sum_{i = 1}^{n} \log_{2} (\frac{1}{\sqrt{2 π} \sqrt{λ_{i}}} e^{- \frac{1}{2} \frac{y_{i}^{2}}{λ_{i}}})] \\ = - \sum_{i = 1}^{n} E [\log_{2} (\frac{1}{\sqrt{2 π} \sqrt{λ_{i}}} e^{- \frac{1}{2} \frac{y_{i}^{2}}{λ_{i}}})] \\ = - \sum_{i = 1}^{n} E [\log_{2} (\frac{1}{\sqrt{2 π} \sqrt{λ_{i}}}) + \log_{2} e^{- \frac{1}{2} \frac{y_{i}^{2}}{λ_{i}}}] \\ = - \sum_{i = 1}^{n} [\int f_{Y} (y_{i}) (\log_{2} (\frac{1}{\sqrt{2 π} \sqrt{λ_{i}}}) + \log_{2} e^{- \frac{1}{2} \frac{y_{i}^{2}}{λ_{i}}}) d y_{i}] \\ = - \sum_{i = 1}^{n} [\int f_{Y} (y_{i}) (- \frac{1}{2} \log_{2} (2 π λ_{i}) - \frac{1}{2} \frac{y_{i}^{2}}{λ_{i}} \log_{2} e) d y_{i}] \\ = \sum_{i = 1}^{n} [\frac{1}{2} \log_{2} (2 π λ_{i}) + \frac{1}{2} \log_{2} e] \\ = \sum_{i = 1}^{n} [\frac{1}{2} \log_{2} (2 π) + \frac{1}{2} \log_{2} (λ_{i}) + \frac{1}{2} \log_{2} e] \\ = \frac{n}{2} \log_{2} (2 π e) + \frac{1}{2} \sum_{i = 1}^{n} \log_{2} (λ_{i}) \end{array}

The eigenvalues

λ_{i}

are the diagonal elements of the covariance matrix, namely variances, in the transformed frame. When

ρ = 0

, the eigenvectors are equal to

λ_{i} = σ_{i}^{2}

.

Appendix E. Derivation of the Kullback–Leibler Divergence between Two Normal Distributions

\begin{array}{l} D_{K L} (f ‖ g) \\ = \int f_{X} (x) \log_{2} \frac{f_{X} (x)}{g_{X} (x)} d x \\ = \int f_{X} (x) \log_{2} \frac{\frac{1}{\sqrt{2 π} σ_{1}} e^{- \frac{1}{2} {(\frac{x - μ_{1}}{σ_{1}})}^{2}}}{\frac{1}{\sqrt{2 π} σ_{2}} e^{- \frac{1}{2} {(\frac{x - μ_{2}}{σ_{2}})}^{2}}} d x \\ = \int f_{X} (x) \log_{2} (\frac{σ_{2}}{σ_{1}}) d x + \int f_{X} (x) \log_{2} [\exp (- \frac{1}{2} {(\frac{x - μ_{1}}{σ_{1}})}^{2} + \frac{1}{2} {(\frac{x - μ_{2}}{σ_{2}})}^{2})] d x \\ = \log_{2} (\frac{σ_{2}}{σ_{1}}) - \frac{\log_{2} e}{2 σ_{1}^{2}} \int f_{X} (x) {(x - μ_{1})}^{2} d x + \frac{\log_{2} e}{2 σ_{2}^{2}} \int f_{X} (x) {(x - μ_{2})}^{2} d x \\ = \log_{2} (\frac{σ_{2}}{σ_{1}}) - \frac{\log_{2} e}{2} + \frac{\log_{2} e}{2 σ_{2}^{2}} \int f_{X} (x) {(x - μ_{1} + μ_{1} - μ_{2})}^{2} d x \\ = \log_{2} (\frac{σ_{2}}{σ_{1}}) - \frac{\log_{2} e}{2} + \frac{\log_{2} e}{2 σ_{2}^{2}} \int f_{X} (x) (x - μ_{1})^{2} + {(μ_{1} - μ_{2})}^{2} + 2 (x - μ_{1}) (μ_{1} - μ_{2})) d x \\ = \frac{1}{2} \log_{2} (\frac{σ_{2}^{2}}{σ_{1}^{2}}) - \frac{\log_{2} e}{2} + \frac{\log_{2} e}{2 σ_{2}^{2}} [σ_{1}^{2} + {(μ_{1} - μ_{2})}^{2}] \\ = \frac{1}{2} [\ln (\frac{σ_{2}^{2}}{σ_{1}^{2}}) + \frac{σ_{1}^{2}}{σ_{2}^{2}} + {(\frac{μ_{1} - μ_{2}}{σ_{2}})}^{2} - 1] \cdot \log_{2} e \end{array}

where equality

\log_{2} (\cdot) = \log_{2} e \cdot \ln (\cdot)

was used.

References

Verdú, S. On channel capacity per unit cost. IEEE Trans. Inf. Theory 1990, 36, 1019–1030. [Google Scholar] [CrossRef] [Green Version]
Lapidoth, A.; Shamai, S. Fading channels: How perfect need perfect side information be? IEEE Trans. Inf. Theory 2002, 48, 1118–1134. [Google Scholar] [CrossRef]
Verdú, S. Spectral efficiency in the wideband regime. IEEE Trans. Inf. Theory 2002, 48, 1319–1343. [Google Scholar] [CrossRef] [Green Version]
Prelov, V.; Verdú, S. Second-order asymptotics of mutual information. IEEE Trans. Inf. Theory 2004, 50, 1567–1580. [Google Scholar] [CrossRef]
Kailath, T. A general likelihood-ratio formula for random signals in Gaussian noise. IEEE Trans. Inf. Theory 1969, IT-15, 350–361. [Google Scholar] [CrossRef]
Kailath, T. A note on least squares estimates from likelihood ratios. Inf. Control 1968, 13, 534–540. [Google Scholar] [CrossRef] [Green Version]
Kailath, T. A further note on a general likelihood formula for random signals in Gaussian noise. IEEE Trans. Inf. Theory 1970, IT-16, 393–396. [Google Scholar] [CrossRef]
Jaffer, A.G.; Gupta, S.C. On relations between detection and estimation of discrete time processes. Inf. Control 1972, 20, 46–54. [Google Scholar] [CrossRef] [Green Version]
Duncan, T.E. On the calculation of mutual information. SIAM J. Appl. Math. 1970, 19, 215–220. [Google Scholar] [CrossRef] [Green Version]
Kadota, T.T.; Zakai, M.; Ziv, J. Mutual information of the white Gaussian channel with and without feedback. IEEE Trans. Inf. Theory 1971, 17, 368–371. [Google Scholar] [CrossRef]
Amari, S.I. Information Geometry and Its Applications; Springer: Berlin/Heidelberg, Germany, 2016; Volume 194. [Google Scholar]
Schneidman, E.; Still, S.; Berry, M.J.; Bialek, W. Network information and connected correlations. Phys. Rev. Lett. 2003, 91, 238701. [Google Scholar] [CrossRef] [Green Version]
Timme, N.; Alford, W.; Flecker, B.; Beggs, J.M. Synergy, redundancy, and multivariate information measures: An experimentalist’s perspective. J. Comput. Neurosci. 2014, 36, 119–140. [Google Scholar] [CrossRef]
Ahmed, N.A.; Gokhale, D.V. Entropy expressions and their estimators for multivariate distributions. IEEE Trans. Inform. Theory 1989, 35, 688–692. [Google Scholar] [CrossRef]
Misra, N.; Singh, H.; Demchuk, E. Estimation of the entropy of a multivariate normal distribution. J. Multivar. Anal. 2005, 92, 324–342. [Google Scholar] [CrossRef] [Green Version]
Arellano-Valle, R.B.; Contreras-Reyes, J.E.; Genton, M.G. Shannon entropy and mutual information for multivariate skew-elliptical distributions. Scand. J. Stat. 2013, 40, 42–62. [Google Scholar] [CrossRef]
Liang, K.C.; Wang, X. Gene regulatory network reconstruction using conditional mutual information. EURASIP J. Bioinform. Syst. Biol. 2008, 2008, 253894. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Novais, R.G.; Wanke, P.; Antunes, J.; Tan, Y. Portfolio optimization with a mean-entropy-mutual information model. Entropy 2022, 24, 369. [Google Scholar] [CrossRef]
Verdú, S. Error exponents and α-mutual information. Entropy 2021, 23, 199. [Google Scholar] [CrossRef] [PubMed]
Panzeri, S.; Magri, C.; Logothetis, N.K. On the use of information theory for the analysis of the relationship between neural and imaging signals. Magn. Reson. Imaging 2008, 26, 1015–1025. [Google Scholar] [CrossRef]
Katz, Y.; Tunstrøm, K.; Ioannou, C.C.; Huepe, C.; Couzin, I.D. Inferring the structure and dynamics of interactions in schooling fish. Proc. Natl. Acad. Sci. USA 2011, 108, 18720–18725. [Google Scholar] [CrossRef] [PubMed]
Cutsuridis, V.; Hussain, A.; Taylor, J.G. (Eds.) Perception-Action Cycle: Models, Architectures, and Hardware; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
Ay, N.; Bernigau, H.; Der, R.; Prokopenko, M. Information-driven self-organization: The dynamical system approach to autonomous robot behavior. Theory Biosci. 2012, 131, 161–179. [Google Scholar] [CrossRef] [PubMed]
Rosas, F.; Ntranos, V.; Ellison, C.J.; Pollin, S.; Verhelst, M. Understanding interdependency through complex information sharing. Entropy 2016, 18, 38. [Google Scholar] [CrossRef] [Green Version]
Ince, R.A. The Partial Entropy Decomposition: Decomposing multivariate entropy and mutual information via pointwise common surprisal. arXiv 2017, arXiv:1702.01591. [Google Scholar]
Harder, M.; Salge, C.; Polani, D. Bivariate measure of redundant information. Phys. Rev. E 2013, 87, 012130. [Google Scholar] [CrossRef] [Green Version]
Rauh, J.; Banerjee, P.K.; Olbrich, E.; Jost, J.; Bertschinger, N. On extractable shared information. Entropy 2017, 19, 328. [Google Scholar] [CrossRef] [Green Version]
Ince, R.A. Measuring multivariate redundant information with pointwise common change in surprisal. Entropy 2017, 19, 318. [Google Scholar] [CrossRef] [Green Version]
Perrone, P.; Ay, N. Hierarchical quantification of synergy in channels. Front. Robot. AI 2016, 2, 35. [Google Scholar] [CrossRef] [Green Version]
Bertschinger, N.; Rauh, J.; Olbrich, E.; Jost, J.; Ay, N. Quantifying unique information. Entropy 2014, 16, 2161–2183. [Google Scholar] [CrossRef] [Green Version]
Chicharro, D.; Panzeri, S. Synergy and redundancy in dual decompositions of mutual information gain and information loss. Entropy 2017, 19, 71. [Google Scholar] [CrossRef] [Green Version]
Michalowicz, J.V.; Nichols, J.M.; Bucholtz, F. Calculation of differential entropy for a mixed Gaussian distribution. Entropy 2008, 10, 200. [Google Scholar] [CrossRef] [Green Version]
Benish, W.A. A review of the application of information theory to clinical diagnostic testing. Entropy 2020, 22, 97. [Google Scholar] [CrossRef] [Green Version]
Cadirci, M.S.; Evans, D.; Leonenko, N.; Makogin, V. Entropy-based test for generalised Gaussian distributions. Comput. Stat. Data Anal. 2022, 173, 107502. [Google Scholar] [CrossRef]
Goethe, M.; Fita, I.; Rubi, J.M. Testing the mutual information expansion of entropy with multivariate Gaussian distributions. J. Chem. Phys. 2017, 147, 224102. [Google Scholar] [CrossRef]

Figure 1. Relationship between the confidence interval and the scale factor s.

Figure 2. The position of the ellipse with various correlation coefficients given by the angle of inclination is specified

θ

to obtain

ρ

,

ρ = σ_{12} / (σ_{1} σ_{2})

: (a)

θ = 30^{\circ}

,

ρ \approx 0.55

; (b)

θ = 0^{\circ}

,

ρ = 0

; and (c)

θ = 150^{\circ}

,

ρ \approx - 0.55

, respectively.

Figure 2. The position of the ellipse with various correlation coefficients given by the angle of inclination is specified

θ

to obtain

ρ

,

ρ = σ_{12} / (σ_{1} σ_{2})

: (a)

θ = 30^{\circ}

,

ρ \approx 0.55

; (b)

θ = 0^{\circ}

,

ρ = 0

; and (c)

θ = 150^{\circ}

,

ρ \approx - 0.55

, respectively.

Figure 3. The position of the ellipse for various values of the correlation constant with the angle of inclination

ρ

is specified to obtain

θ

: (a)

ρ = 0.95

,

θ = 45^{\circ}

; (b)

ρ = 0

,

θ = 0^{\circ}

; and (c)

ρ = - 0.95

,

θ = 135^{\circ}

, respectively.

Figure 3. The position of the ellipse for various values of the correlation constant with the angle of inclination

ρ

is specified to obtain

θ

: (a)

ρ = 0.95

,

θ = 45^{\circ}

; (b)

ρ = 0

,

θ = 0^{\circ}

; and (c)

ρ = - 0.95

,

θ = 135^{\circ}

, respectively.

Figure 4. The contours and the scatter plots of ellipses for equal variances

σ_{1} = σ_{2} = σ

with a fixed

ρ = 0.5

: (a)

σ = 2

(b)

σ = 3

(c)

σ = 4

(d)

σ = 5

.

Figure 4. The contours and the scatter plots of ellipses for equal variances

σ_{1} = σ_{2} = σ

with a fixed

ρ = 0.5

: (a)

σ = 2

(b)

σ = 3

(c)

σ = 4

(d)

σ = 5

.

Figure 5. Ellipses for (a)

ρ = 0.5

with varying variances

σ_{1} = σ_{2} = σ = 2 ~ 5

; and (b) equal variances

σ_{1} = σ_{2} = 2

with varying

ρ = 0; 0.5; 0.9; 0.99

.

Figure 5. Ellipses for (a)

ρ = 0.5

with varying variances

σ_{1} = σ_{2} = σ = 2 ~ 5

; and (b) equal variances

σ_{1} = σ_{2} = 2

with varying

ρ = 0; 0.5; 0.9; 0.99

.

Figure 6. The variations of surface plots in three-dimensional with the increase in

σ_{1} / σ_{2}

for a fixed

ρ = 0.5

where

σ_{2} = 2

: (a)

σ_{1} = 2

; (b)

σ_{1} = 3

; (c)

σ_{1} = 4

; and (d)

σ_{1} = 5

.

Figure 6. The variations of surface plots in three-dimensional with the increase in

σ_{1} / σ_{2}

for a fixed

ρ = 0.5

where

σ_{2} = 2

: (a)

σ_{1} = 2

; (b)

σ_{1} = 3

; (c)

σ_{1} = 4

; and (d)

σ_{1} = 5

.

Figure 7. Ellipses for a fixed correlation coefficient when

σ_{1} \neq σ_{2}

for a fixed

ρ = 0.5

: (a)

σ_{1} > σ_{2}

,

σ_{1} / σ_{2}

increases where

σ_{1} = 2 ~ 5

and

σ_{2} = 2

; and (b)

σ_{2} > σ_{1}

,

σ_{2} / σ_{1}

increases where

σ_{2} = 2 ~ 5

and

σ_{1} = 2

.

Figure 7. Ellipses for a fixed correlation coefficient when

σ_{1} \neq σ_{2}

for a fixed

ρ = 0.5

: (a)

σ_{1} > σ_{2}

,

σ_{1} / σ_{2}

increases where

σ_{1} = 2 ~ 5

and

σ_{2} = 2

; and (b)

σ_{2} > σ_{1}

,

σ_{2} / σ_{1}

increases where

σ_{2} = 2 ~ 5

and

σ_{1} = 2

.

Figure 8. The variation for inclination angle with a function of

σ_{1}

and

σ_{2}

, for (a)

ρ = 0.5

; and (b)

ρ = 0

.

Figure 8. The variation for inclination angle with a function of

σ_{1}

and

σ_{2}

, for (a)

ρ = 0.5

; and (b)

ρ = 0

.

Figure 9. (

σ_{1} = 4

,

σ_{2} = 2

) with (a)

ρ = 0, 0.5, 0.9, 0.99

; (b)

ρ = 0, - 0.5, - 0.9, - 0.99

as compared to

σ_{2} > σ_{1}

(

σ_{1} = 2

,

σ_{2} = 4

) with (c)

ρ = 0, 0.5, 0.9, 0.99

; and (d)

ρ = 0, - 0.5, - 0.9, - 0.99

.

Figure 9. (

σ_{1} = 4

,

σ_{2} = 2

) with (a)

ρ = 0, 0.5, 0.9, 0.99

; (b)

ρ = 0, - 0.5, - 0.9, - 0.99

as compared to

σ_{2} > σ_{1}

(

σ_{1} = 2

,

σ_{2} = 4

) with (c)

ρ = 0, 0.5, 0.9, 0.99

; and (d)

ρ = 0, - 0.5, - 0.9, - 0.99

.

Figure 10. Comparison of the ellipses for various (i)

σ_{1} = 2

,

σ_{2} = 4

; (ii)

σ_{1} = 4

,

σ_{2} = 2

; (iii)

σ_{1} = σ_{2} = 2

; and (iv)

σ_{1} = σ_{2} = 4

, while

ρ = 0.5

.

Figure 10. Comparison of the ellipses for various (i)

σ_{1} = 2

,

σ_{2} = 4

; (ii)

σ_{1} = 4

,

σ_{2} = 2

; (iii)

σ_{1} = σ_{2} = 2

; and (iv)

σ_{1} = σ_{2} = 4

, while

ρ = 0.5

.

Figure 11. The differential entropy as a function of

σ^{2}

for a univariate Gaussian variable.

Figure 11. The differential entropy as a function of

σ^{2}

for a univariate Gaussian variable.

Figure 12. The variation in differential entropy for the bivariate Gaussian distribution (a) as a function of

ρ^{2}

and

σ^{2}

, and (b) as a function of

ρ^{2}

when

σ_{1} = σ_{2} = 1

.

Figure 12. The variation in differential entropy for the bivariate Gaussian distribution (a) as a function of

ρ^{2}

and

σ^{2}

, and (b) as a function of

ρ^{2}

when

σ_{1} = σ_{2} = 1

.

Figure 13. The variation in relative entropy as a function of

σ

and

μ_{1} - μ_{2}

when

σ_{1} = σ_{2} = σ

for (a) a three-dimensional surface; and (b) a contour with an entropy gradient.

Figure 13. The variation in relative entropy as a function of

σ

and

μ_{1} - μ_{2}

when

σ_{1} = σ_{2} = σ

for (a) a three-dimensional surface; and (b) a contour with an entropy gradient.

Figure 14. The variations in relative entropy with

σ_{1} = σ_{2} = 1

for (a) a three-dimensional surface as a function of

μ_{1}

and

μ_{2}

; and (b) as a function of

μ_{1} - μ_{2}

.

Figure 14. The variations in relative entropy with

σ_{1} = σ_{2} = 1

for (a) a three-dimensional surface as a function of

μ_{1}

and

μ_{2}

; and (b) as a function of

μ_{1} - μ_{2}

.

Figure 15. The variation in relative entropy as a function of

σ_{1}

and

σ_{2}

with

μ_{1} = μ_{2}

for (a) at the three-dimensional surface; and (b) contour with an entropy gradient.

Figure 15. The variation in relative entropy as a function of

σ_{1}

and

σ_{2}

with

μ_{1} = μ_{2}

for (a) at the three-dimensional surface; and (b) contour with an entropy gradient.

Figure 16. Variations of relative entropy as a function of (a)

σ_{1}

when fixed

σ_{2} = 1

and (b)

σ_{2}

when fixed

σ_{1} = 1

, respectively (

μ_{1} = μ_{2}

).

Figure 16. Variations of relative entropy as a function of (a)

σ_{1}

when fixed

σ_{2} = 1

and (b)

σ_{2}

when fixed

σ_{1} = 1

, respectively (

μ_{1} = μ_{2}

).

Figure 17. Mutual information versus

ρ^{2}

between the correlated Gaussian variables.

Figure 17. Mutual information versus

ρ^{2}

between the correlated Gaussian variables.

Figure 18. Schematic illustration of the additive white Gaussian noise (AWGN) channel.

Figure 19. The mutual information with the additive white Gaussian noise (AWGN) channel for (a) the three-dimensional surface as a function of

σ_{x}^{2}

and

σ_{n}^{2}

; and (b) in terms of the signal-to-noise ratio.

Figure 19. The mutual information with the additive white Gaussian noise (AWGN) channel for (a) the three-dimensional surface as a function of

σ_{x}^{2}

and

σ_{n}^{2}

; and (b) in terms of the signal-to-noise ratio.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jwo, D.-J.; Cho, T.-S.; Biswal, A. Geometric Insights into the Multivariate Gaussian Distribution and Its Entropy and Mutual Information. Entropy 2023, 25, 1177. https://0-doi-org.brum.beds.ac.uk/10.3390/e25081177

AMA Style

Jwo D-J, Cho T-S, Biswal A. Geometric Insights into the Multivariate Gaussian Distribution and Its Entropy and Mutual Information. Entropy. 2023; 25(8):1177. https://0-doi-org.brum.beds.ac.uk/10.3390/e25081177

Chicago/Turabian Style

Jwo, Dah-Jing, Ta-Shun Cho, and Amita Biswal. 2023. "Geometric Insights into the Multivariate Gaussian Distribution and Its Entropy and Mutual Information" Entropy 25, no. 8: 1177. https://0-doi-org.brum.beds.ac.uk/10.3390/e25081177

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Geometric Insights into the Multivariate Gaussian Distribution and Its Entropy and Mutual Information

Abstract

1. Introduction

2. Geometry of the Gaussian Distribution

2.1. Standard Parametric Representation of an Ellipse

2.2. The Confidence Ellipse

2.3. Similarity Transform

2.4. Simulation with a Given Variance-Covariance Matrix

3. Continuous Entropy/Differential Entropy

3.1. Entropy of a Univariate Gaussian Distribution

3.2. Entropy of a Multivariate Gaussian Distribution

3.3. The Differential Entropy in the Transformed Frame

4. Relative Entropy (Kullback–Leibler Divergence)

5. Mutual Information

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Derivation of the Differential Entropy for the Univariate Gaussian Distribution

Appendix B. Derivation of the Differential Entropy for the Multivariate Gaussian Distribution

Appendix C. Evaluation of Expectations of the Mahalanobis Distance $E [{(x - μ)}^{T} \sum^{- 1} (x - μ)] = n$

Appendix D. Derivation of the Differential Entropy in the Transformed Frame

Appendix E. Derivation of the Kullback–Leibler Divergence between Two Normal Distributions

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Geometric Insights into the Multivariate Gaussian Distribution and Its Entropy and Mutual Information

Abstract

1. Introduction

2. Geometry of the Gaussian Distribution

2.1. Standard Parametric Representation of an Ellipse

2.2. The Confidence Ellipse

2.3. Similarity Transform

2.4. Simulation with a Given Variance-Covariance Matrix

3. Continuous Entropy/Differential Entropy

3.1. Entropy of a Univariate Gaussian Distribution

3.2. Entropy of a Multivariate Gaussian Distribution

3.3. The Differential Entropy in the Transformed Frame

4. Relative Entropy (Kullback–Leibler Divergence)

5. Mutual Information

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Derivation of the Differential Entropy for the Univariate Gaussian Distribution

Appendix B. Derivation of the Differential Entropy for the Multivariate Gaussian Distribution

Appendix C. Evaluation of Expectations of the Mahalanobis Distance E [ ( x − μ ) T ∑ − 1 ( x − μ ) ] = n

Appendix D. Derivation of the Differential Entropy in the Transformed Frame

Appendix E. Derivation of the Kullback–Leibler Divergence between Two Normal Distributions

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Appendix C. Evaluation of Expectations of the Mahalanobis Distance $E [{(x - μ)}^{T} \sum^{- 1} (x - μ)] = n$