Next Article in Journal
A Novel Technique for Achieving the Approximated ISI at the Receiver for a 16QAM Signal Sent via a FIR Channel Based Only on the Received Information and Statistical Techniques
Next Article in Special Issue
Achievable Information Rates for Probabilistic Amplitude Shaping: An Alternative Approach via Random Sign-Coding Arguments
Previous Article in Journal
Optimization of the Casualties’ Treatment Process: Blended Military Experiment
Previous Article in Special Issue
Probabilistic Shaping for Finite Blocklengths: Distribution Matching and Sphere Shaping
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Some Useful Integral Representations for Information-Theoretic Analyses

The Andrew and Erna Viterbi Faculty of Electrical Engineering, Israel Institute of Technology Technion City, Haifa 3200003, Israel
*
Author to whom correspondence should be addressed.
Submission received: 13 May 2020 / Revised: 9 June 2020 / Accepted: 24 June 2020 / Published: 26 June 2020
(This article belongs to the Special Issue Information Theory for Communication Systems)

Abstract

:
This work is an extension of our earlier article, where a well-known integral representation of the logarithmic function was explored and was accompanied with demonstrations of its usefulness in obtaining compact, easily-calculable, exact formulas for quantities that involve expectations of the logarithm of a positive random variable. Here, in the same spirit, we derive an exact integral representation (in one or two dimensions) of the moment of a nonnegative random variable, or the sum of such independent random variables, where the moment order is a general positive non-integer real (also known as fractional moments). The proposed formula is applied to a variety of examples with an information-theoretic motivation, and it is shown how it facilitates their numerical evaluations. In particular, when applied to the calculation of a moment of the sum of a large number, n, of nonnegative random variables, it is clear that integration over one or two dimensions, as suggested by our proposed integral representation, is significantly easier than the alternative of integrating over n dimensions, as needed in the direct calculation of the desired moment.

1. Introduction

In mathematical analyses associated with many problems in information theory and related fields, one is often faced with the need to compute expectations of logarithmic functions of composite random variables (see, e.g., [1,2,3,4,5,6,7,8]), or moments of such random variables, whose order may be a general positive real, not even necessarily an integer (see, e.g., [9,10,11,12,13,14,15,16,17,18,19,20,21,22]).
In the case of the logarithmic function, the common practice is either to resort to approximate evaluations, provided by upper and lower bounds on the desired expression (for example, by using Jensen’s inequality) or to approximate the calculations by using the Taylor series expansion of the function ln x . More recently, it has become popular to use the replica trick (see, e.g., Chapter 8 in [23]), which is a non-rigorous, but useful technique, borrowed from statistical physics.
In our earlier work [6], we demonstrated how the following well-known integral representation of the logarithmic function,
ln x = 0 e u e u x d u u , x > 0 ,
can be useful in a variety of application areas in the field of information theory, including both source and channel coding, as well as other aspects of this field. To calculate the expectation, E { ln X } , where X is a positive random variable, the idea is simply to invoke the integral representation (1) and to commute the expectation and integration operators, i.e.,
E { ln X } = 0 ( e u E e u X ) d u u ,
thereby replacing the calculation of E { ln X } by the calculation of the moment-generating function (MGF), M X ( u ) : = E { e u X } for all u 0 , which is often much easier to express in closed form. Moreover, in frequently encountered situations where X is given by the sum of n independently identically distributed (i.i.d.) random variables, the MGF of X is given by the n th power of the MGF of a single random variable in the sum that forms X. This reduces the dimension of the integration from n (in the original expression) to a single dimension of the integration over u. Interestingly, this integral representation has also been used in the statistical physics literature (see, e.g., [23] (p. 140) [24,25]), but not as much as the replica trick.
In this paper, we proceed in the same spirit as in [6], and we extend the scope to propose an integral representation of a general moment of a nonnegative random variable, X, namely the expectation, E { X ρ } for a given real ρ > 0 . Obviously, when ρ is an integer, this moment is simply given by the ρ th order derivative of the MGF of X, calculated at the origin, as is very well known. However, the integral representation we propose, in this work, applies to any non-integer, positive ρ , and here too, it replaces the direct calculation of E { X ρ } by integration of an expression that involves the MGF of X. We refer to this representation as an extension of (2), as the latter can be obtained as a special case of the formula for E { X ρ } , by invoking one of the equivalent identities
E { ln X } = lim ρ 0 E { X ρ } 1 ρ , E { ln X } = lim ρ 0 ln [ E { X ρ } ] ρ .
While the proposed integral representation of E { X ρ } can be readily obtained from [26] (p. 363, Identity (3.434.1)) in the range ρ ( 0 , 1 ) , the nontrivial extension we propose for a non-integer and real ρ > 1 is new to the best of our knowledge.
Fractional moments have been considered in the mathematical literature (see, e.g., [27,28,29,30]). A relationship between fractional and integer-order moments was considered in [27] by expressing a fractional moment as an infinite series, which depends on all the positive integer-order moments, followed by an algorithm for numerical calculations of fractional moments.
As in [6], the proposed integral representation is applied to a variety of examples with an information-theoretic motivation, and it is shown how it facilitates the numerical evaluations. In particular, similar to the case of the logarithmic function, when applied to the calculation of a moment of the sum of a large number, n, of nonnegative random variables, it is clear that integration over one or two dimensions, as suggested by our proposed integral representation, is significantly easier than the alternative of integrating over n dimensions, as needed in the direct calculation of the desired moment. Furthermore, single- or double-dimensional integrals can be instantly and accurately calculated using built-in numerical integration procedures.
Fractional moments have been considered in the mathematical literature (see, e.g., [27,28,29,30]). A relationship between fractional and integer-order moments was considered in [27] by expressing a fractional moment as an infinite series that depends on all the positive integer-order moments, which was followed by an algorithm for numerical calculations of fractional moments.
The outline of the remainder part of this paper is as follows. In Section 2, we provide the mathematical background associated with the integral representation in general. In Section 3, we demonstrate this integral representation in applications, including: moments of guesswork, moments of estimation errors, differential Rényi entropies of generalized multivariate Cauchy distributions, and mutual information calculations of a certain model of a jammed channel. Each one of these examples occupies one subsection of Section 3. The integral representations in this paper are not limited to the examples in Section 3, and such representations can be proven useful in other information-theoretic problems (see, e.g., [6] and the references therein for the integral representation of the logarithmic expectation and some of its information-theoretic applications).

2. Statistical Moments of Arbitrary Positive Orders

It is well known that integer-order moments of a random variable X are calculable from its MGF
M X ( u ) : = E e u X , u R ,
by using its derivatives, calculated at u = 0 , i.e.,
E { X ρ } = M X ( ρ ) ( 0 ) , ρ N .
Quite often, however, there is a theoretical and practical interest to calculate fractional moments of nonnegative random variables. We next obtain a closed-form integral expression of the ρ th moment of a nonnegative random variable X, as a functional of its MGF, for any positive real ρ . Before we proceed, it should be noted that for ρ ( 0 , 1 ) , such an expression is available in handbooks of standard tables of integrals, for example in [26] (p. 363, Identity (3.434.1)). The first innovation here, however, is in a nontrivial extension of this formula for all ρ > 0 as an expression that involves a one-dimensional integral. It should be noted that although the definition of a fractional moment of a random variable (RV) is also given by a one-dimensional integral (or a sum, depending on whether the RV is discrete or continuous), the utility of our formula is, e.g., in expressing the ρ th moment of a sum of nonnegative and independent random variables as a one-dimensional integral, instead of an n-dimensional integral, which is obtained by the direct definition. This new formula serves as the basic building block in all of our information-theoretic applications throughout this paper.
We first define the Beta and Gamma functions (see, e.g., Section 8.3 in [26] and Chapter 5 in [31]):
B ( u , v ) : = 0 1 t u 1 ( 1 t ) v 1 d t , u , v > 0 ,
Γ ( u ) : = 0 t u 1 e t d t , u > 0 ,
where these functions are related by the equality
B ( u , v ) = Γ ( u ) Γ ( v ) Γ ( u + v ) , u , v > 0 .
Theorem 1.
Let X be a nonnegative random variable, and let ρ > 0 be a non-integer real. Then,
E { X ρ } = 1 1 + ρ = 0 ρ α B ( + 1 , ρ + 1 ) + ρ sin ( π ρ ) Γ ( ρ ) π 0 1 u ρ + 1 j = 0 ρ ( 1 ) j α j j ! u j e u M X ( u ) d u ,
where, for all j { 0 , 1 , , } ,
α j : = E { ( X 1 ) j }
= 1 j + 1 = 0 j ( 1 ) j M X ( ) ( 0 ) B ( + 1 , j + 1 ) .
Proof. 
See Appendix A. □
Remark 1.
The proof of (9) in Appendix A does not apply to ρ N (see (A7) and (A8) etc., where the denominators vanish for ρ N ). In the latter case, by referring to the second term on the right-hand side of (9), we get sin ( π ρ ) = 0 , and also, the integral diverges (specifically, for ρ N , the integrand scales like 1 u for u that is sufficiently close to zero), yielding an expression of the type 0 · . However, taking a limit in (9) where we let ρ tend to an integer and applying L’Hôpital’s rule can reproduce the well-known result in (5).
Corollary 1.
For any ρ ( 0 , 1 ) ,
E { X ρ } = 1 + ρ Γ ( 1 ρ ) 0 e u M X ( u ) u 1 + ρ d u .
Proof. 
Equation (12) is due to Theorem 1 and by using (A20) and (A22) (see Appendix A), and α 0 : = 1 , which give
Γ ( ρ ) Γ ( 1 ρ ) = π sin ( π ρ ) ,
1 1 + ρ α 0 B ( 1 , ρ + 1 ) = 1 1 + ρ Γ ( ρ + 2 ) Γ ( ρ + 1 ) = 1 .
 □
Remark 2.
Corollary 1 also follows from [26] (p. 363, Identity (3.434.1)) (see Section 4 in [6]).
Corollary 2.
[6] Let X be a positive random variable. Then,
E { ln X } = 0 e u M X ( u ) u d u .
A proof of (15) was presented in Section 2 in [6], based on the integral representation of the logarithmic function in (1), and by interchanging the integration and the expectation (due to Fubini’s theorem). It can be alternatively proven by using Corollary 1, the identity
ln x = lim ρ 0 x ρ 1 ρ , x > 0 ,
and swapping the order of the expectation and limit by the dominated convergence theorem.
Identity (15) has many useful information-theoretic applications in its own right, as demonstrated in [6], and here, we add even some more. The current work is an extension and further development of [6], whose main theme is exploiting Theorem 1 and studying its information-theoretic applications, as well as some more applications of the logarithmic expectation.

3. Applications

In this section, we exemplify the usefulness of the integral representation of the ρ th moment in Theorem 1 and the logarithmic expectation in several problem areas in information theory and statistics. These include analyses of randomized guessing, estimation errors, Rényi entropy of n-dimensional generalized Cauchy distributions, and finally, calculations of the mutual information for channels with a certain jammer model. To demonstrate the direct computability of the relevant quantities, we also present graphs of their numerical calculations.

3.1. Moments of Guesswork

Consider the problem of guessing the realization of a random variable, which takes on values in a finite alphabet, using a sequence of yes/no questions of the form “Is X = x 1 ?”, “Is X = x 2 ?”, etc., until a positive response is provided by a party that observes the actual realization of X. Given a distribution of X, a commonly used performance metric for this problem is the expected number of guesses or, more generally, the ρ th moment of the number of guesses until X is guessed successfully. When it comes to guessing random vectors, say, of length n, minimizing the moments of the number of guesses by different (deterministic or randomized) guessing strategies has several applications and motivations in information theory, such as sequential decoding, guessing passwords, etc., and it is also strongly related to lossless source coding (see, e.g., [9,10,11,12,13,19,20,21,22,32,33,34]). In this vector case, the moments of the number of guesses behave as exponential functions of the vector dimension, n, at least asymptotically, as n grows without bound. For random vectors with i.i.d. components, the best achievable asymptotic exponent of the ρ th guessing moment is expressed in [9] by using the Rényi entropy of X of order ρ ˜ : = 1 1 + ρ . Arikan assumed in [9] that the distribution of X is known and analyzed the optimal deterministic guessing strategy, which orders the guesses according to nonincreasing probabilities. Refinements of the exponential bounds in [9] with tight upper and lower bounds on the guessing moments for optimal deterministic guessing were recently derived in [19]. In the sequel, we refer to randomized guessing strategies, rather than deterministic strategies, and we aim to derive exact, calculable expressions for their associated guessing moments (as is later explained in this subsection).
Let the random variable X take on values in a finite alphabet X . Consider a random guessing strategy where the guesser sequentially submits a sequence of independently drawn random guesses according to a certain probability distribution, P ˜ ( · ) , defined on X . Randomized guessing strategies have the advantage that they can be used by multiple asynchronous agents, which submit their guesses concurrently (see [33,34]).
In this subsection, we consider the setting of randomized guessing and obtain an exact representation of the guessing moment in the form of a one-dimensional integral. Let x X be any realization of X, and let the guessing distribution, P ˜ , be given. The random number, G, of independent guesses until success has the geometric distribution
Pr { G = k | x } = [ 1 P ˜ ( x ) ] k 1 P ˜ ( x ) , k N ,
and so, the corresponding MGF is equal to
M G ( u | x ) = k = 1 e k u Pr { G = k | x } = P ˜ ( x ) e u ( 1 P ˜ ( x ) ) , u < ln 1 1 P ˜ ( x ) .
In view of (9)–(11) and (18), for x X and non-integer ρ > 0 ,
E { G ρ | x } = 1 1 + ρ = 0 ρ α B ( + 1 , ρ + 1 ) + ρ sin ( π ρ ) Γ ( ρ ) π 0 1 u ρ + 1 j = 0 ρ ( 1 ) j α j j ! u j e u P ˜ ( x ) e u ( 1 P ˜ ( x ) ) d u ,
with α 0 : = 1 , and for all j N
α j : = E G 1 j | X = x = k = 1 ( k 1 ) j 1 P ˜ ( x ) k 1 P ˜ ( x ) = P ˜ ( x ) Li j 1 P ˜ ( x ) .
In (20), Li j ( · ) is a polylogarithm (see, e.g., Section 25.12 in [31]), which is given by
Li j ( x ) = x d d x j x 1 x , j N { 0 } ,
with x d d x j denoting differentiation with respect to x and multiplication of the derivative by x, repeatedly j times. In particular, we have
Li 0 ( x ) = x 1 x , Li 1 ( x ) = x ( 1 x ) 2 , Li 2 ( x ) = x ( 1 + x ) ( 1 x ) 3 ,
and so on. The function Li j ( x ) is a built-in function in the MATLAB and Mathematica software, which is expressed as polylog ( j , x ) . By Corollary 1, if ρ ( 0 , 1 ) , then (19) is simplified to
E { G ρ | x } = 1 + ρ Γ ( 1 ρ ) 0 e u e 2 u u ρ + 1 [ ( 1 P ˜ ( x ) ) 1 e u ] d u .
Let P denote the distribution of X. Averaging over X to get the unconditional ρ th moment using (23), one obtains for all ρ ( 0 , 1 ) ,
E { G ρ } = 1 + ρ Γ ( 1 ρ ) 0 1 1 z ( ln z ) ρ + 1 x X P ( x ) ( 1 P ˜ ( x ) ) 1 z ( 1 P ˜ ( x ) ) d z ,
where (24) is obtained by using the substitution z : = e u . A suitable expression of such an integral is similarly obtained, for all ρ > 0 , by averaging (19) over X. In comparison, a direct calculation of the ρ th moment gives
E { G ρ } = x X P ( x ) E { G ρ | x } = k = 1 x X k ρ ( 1 P ˜ ( x ) ) k 1 P ˜ ( x ) P ( x ) .
The double sum in (25) involves a numerical computation of an infinite series, where the number of terms required to obtain a good approximation increases with ρ and needs to be determined. The right-hand side of (24), on the other hand, involves integration over [ 0 , 1 ] . For every practical purpose, however, definite integrals in one or two dimensions can be calculated instantly using built-in numerical integration procedures in MATLAB, Maple, Mathematica, or any other mathematical software tools, and the computational complexity of the integral in (24) is not affected by ρ .
As a complement to (19) (which applies to a non-integral and positive ρ ), we obtain that the ρ th moment of the number of randomized guesses, with ρ N , is equal to
E { G ρ | x } = j = 0 ρ ρ j E { G 1 j | x } = j = 0 ρ ρ j α j = 1 + P ˜ ( x ) j = 1 ρ ρ j Li j ( 1 P ˜ ( x ) ) ,
where (26) follows from (20) and since α 0 = 1 . By averaging over X,
E { G ρ } = 1 + x c X P ( x ) P ˜ ( x ) j = 1 ρ ρ j Li j ( 1 P ˜ ( x ) ) .
To conclude, (19) and its simplification in (23) for ρ ( 0 , 1 ) give calculable one-dimensional integral expressions for the ρ th guessing moment with any ρ > 0 . This refers to a randomized guessing strategy whose practical advantages were further explained in [33,34]. This avoids the need for numerical calculations of infinite sums. A further simplification for ρ N is provided in (26) and (27), expressed in closed form as a function of polylogarithms.

3.2. Moments of Estimation Errors

Let X 1 , , X n be i.i.d. random variables with an unknown expectation θ to be estimated, and consider the simple estimator,
θ ^ n = 1 n i = 1 n X i .
For given ρ > 0 , we next derive an easily-calculable expression of the ρ th moment of the estimation error.
Let D n : = ( θ ^ n θ ) 2 and ρ : = ρ 2 . By Theorem 1, if ρ > 0 is a non-integral multiple of two, then
E θ ^ n θ ρ = E D n ρ = 2 2 + ρ = 0 ρ / 2 α B + 1 , ρ / 2 + 1
+ ρ 2 π sin π ρ 2 Γ ρ 2 0 1 u ρ / 2 + 1 j = 0 ρ / 2 ( 1 ) j α j j ! u j e u M D n ( u ) d u ,
where
M D n ( u ) = E { exp ( u ( θ ^ n θ ) 2 ) } , u 0 ,
α 0 : = 1 , and for all j N (see (11))
α j = 1 j + 1 = 0 j ( 1 ) j M D n ( ) ( 0 ) B ( + 1 , j + 1 ) .
By Corollary 1 and (29), if in particular ρ ( 0 , 2 ) , then the right-hand side of (30) is simplified to
E { | θ ^ n θ | ρ } = 1 + ρ 2 Γ ( 1 1 2 ρ ) 0 u ( 1 + 1 2 ρ ) e u M D n ( u ) d u ,
and, for all k N ,
E { | θ ^ n θ | 2 k } = M D n ( k ) ( 0 ) .
In view of (29)–(34), obtaining a closed-form expression for the ρ th moment of the estimation error, for an arbitrary ρ > 0 , hinges on the calculation of the right side of (31) for all u 0 . To this end, we invoke the identity
e u z 2 = 1 2 π u e j ω z ω 2 / ( 4 u ) d ω , u > 0 , z R ,
which is the MGF of a zero-mean Gaussian random variable with variance 1 2 u . Together with (31), it gives (see Appendix B.1)
M D n ( u ) = 1 2 π u e j ω θ ϕ X n ω n e ω 2 / ( 4 u ) d ω , u > 0 ,
where X is a generic random variable with the same distribution as X i for all i.
The combination of (30)–(34) enables calculating exactly the ρ th moment E { | θ ^ n θ | ρ } , for any given ρ > 0 , in terms of a two-dimensional integral. Combining (33) and (36) yields, for all ρ ( 0 , 2 ) ,
E { | θ ^ n θ | ρ } = 1 + ρ 2 Γ ( 1 1 2 ρ ) 0 u ( ρ / 2 + 1 ) 1 2 e u | ω | 1 2 π u ϕ X n ω n e j ω θ ω 2 / ( 4 u ) d ω d u ,
where we have used the identity 1 2 e | ω | d ω = 1 in the derivation of the first term of the integral on the right-hand side of (37).
As an example, consider the case where { X i } i = 1 n are i.i.d. Bernoulli random variables with
P { X 1 = 1 } = θ , P { X 1 = 0 } = 1 θ
where the characteristic function is given by
ϕ X ( u ) : = E e j u X = 1 + θ e j u 1 , u R .
Thanks to the availability of the exact expression, we can next compare the exact ρ th moment of the estimation error | θ ^ n θ | , with the following closed-form upper bound (see Appendix B.2) and thereby assess its tightness:
E { | θ ^ n θ | ρ } K ( ρ , θ ) · n ρ / 2 ,
which holds for all n N , ρ > 0 , and θ [ 0 , 1 ] , with
K ( ρ , θ ) : = ρ Γ ρ 2 2 θ ( 1 θ ) ρ / 2 .
Figure 1 and Figure 2 display plots of E | θ ^ n θ | as a function of θ and n, in comparison to the upper bound (40). The difference in the plot of Figure 1 is significant except for the boundaries of the interval [ 0 , 1 ] , where both the exact value and the bound vanish. Figure 2 indicates that the exact value of E | θ ^ n θ | , for large n, scales like n ; this is reflected by the apparent parallelism of the curves in both graphs and by the upper bound (40).
To conclude, this subsection provides an exact, double-integral expression for the ρ th moment of the estimation error of the expectation of n i.i.d. random variables. In other words, the dimension of the integral does not increase with n, and it is a calculable expression. We further compare our expression with an upper bound that stems from concentration inequalities. Although the scaling of the bound as a polynomial of n is correct, the difference between the exact expression and the bound is significant (see Figure 1 and Figure 2).

3.3. Rényi Entropy of Extended Multivariate Cauchy Distributions

Generalized Cauchy distributions, their mathematical properties, and applications are of interest (see, e.g., [6,35,36,37]). The Shannon differential entropy of a family of generalized Cauchy distributions was derived in Proposition 1 in [36], and also, a lower bound on the differential entropy of a family of extended multivariate Cauchy distributions (cf. Equation (42) in [37]) was derived in Theorem 6 in [37]. Furthermore, an exact single-letter expression for the differential entropy of the different family of extended multivariate Cauchy distributions was recently derived in Section 3.1 in [6]. Motivated by these studies, as well as the various information-theoretic applications of Rényi information measures, we apply Theorem 1 to obtain the Rényi (differential) entropy of an arbitrary positive order α for the extended multivariate Cauchy distributions in Section 3.1 in [6]. As we shall see in this subsection, the integral representation for the Rényi entropy of the latter family of extended multivariate Cauchy distributions is two-dimensional, irrespective of the dimension n of the random vector.
Let X n = ( X 1 , , X n ) be a random vector whose probability density function is of the form
f ( x n ) = C n 1 + i = 1 n g ( x i ) q , x n = ( x 1 , , x n ) R n ,
for a certain function g : R [ 0 , ) and a positive constant q such that
R n 1 1 + i = 1 n g ( x i ) q d x n < .
We refer to this kind of density (see also Section 3.1 in [6]) as a generalized multivariate Cauchy density because the multivariate Cauchy density function is the special case pertaining to the choices g ( x ) = x 2 and q = 1 2 ( n + 1 ) . The differential Shannon entropy of the generalized multivariate Cauchy density was derived in Section 3.1 in [6] using the integral representation of the logarithm (1), where it was presented as a two-dimensional integral.
We next extend the analysis of [6] to differential Rényi entropies of an arbitrary positive order α (recall that the differential Rényi entropy is specialized to the differential Shannon entropy at α = 1 [38]). We show that, for the generalized multivariate Cauchy density, the differential Rényi entropy can be presented as a two-dimensional integral, rather than an n-dimensional integral. Defining
Z ( t ) : = e t g ( x ) d x , t > 0 ,
we get from (42) (see Section 3.1 in [6]) that
C n = Γ ( q ) 0 t q 1 e t Z n ( t ) d t .
For g ( x ) = | x | θ , with a fixed θ > 0 , (44) implies that
Z ( t ) = 2 Γ ( 1 / θ ) θ t 1 / θ .
In particular, for θ = 2 and q = 1 2 ( n + 1 ) , we get the multivariate Cauchy density from (42). In this case, it follows from (46) that Z ( t ) = π t for t > 0 , and from (45)
C n = Γ n + 1 2 π ( n + 1 ) / 2 .
For α ( 0 , 1 ) ( 1 , ) , the (differential) Rényi entropy of order α is given by
h α ( X n ) : = 1 1 α log R n f α ( x n ) d x n = 1 1 α log E f α 1 ( X n ) .
Using the Laplace transform relation,
1 s q = 1 Γ ( q ) 0 t q 1 e s t d t , q > 0 , Re ( s ) > 0 ,
we obtain that, for α > 1 (see Appendix C),
h α ( X n ) = α α 1 log 0 t q 1 e t Z n ( t ) d t + log Γ q ( α 1 ) α 1 log Γ ( q ) 1 α 1 log 0 0 t q ( α 1 ) 1 u q 1 e ( t + u ) Z n ( t + u ) d u d t .
If α ( 0 , 1 ) , we distinguish between the following two cases:
(a)
If α = 1 m q for some m { 1 , , q 1 } , then
h α ( X n ) = α log C n 1 α log Γ ( q ) 1 α + 1 1 α log = 0 m ( 1 ) m 0 t q 1 e t φ n ( ) ( t ) d t ,
with
φ n ( t ) : = Z n ( t ) , t 0 .
(b)
Otherwise (i.e., if ρ : = q ( 1 α ) N ), then
h α ( X n ) = log C n + 1 1 α log ( 1 1 + ρ = 0 ρ β ( n ) B ( + 1 , ρ + 1 ) + ρ sin ( π ρ ) Γ ( ρ ) π · 0 e u u ρ + 1 j = 0 ρ ( 1 ) j β j ( n ) j ! u j C n Γ ( q ) 0 t q 1 e t Z n ( t + u ) d t d u ) ,
where β 0 : = 1 , and for all j N ,
β j ( n ) : = C n Γ ( q ) = 0 j ( 1 ) j B ( + 1 , j + 1 ) k = 0 ( 1 ) k k 0 t q 1 e t φ n ( k ) ( t ) d t .
The proof of the integral expressions of the Rényi entropy of order α ( 0 , 1 ) , as given in (50)–(54), is provided in Appendix C.
Once again, the advantage of these expressions, which do not seem to be very simple (at least on the face of it), is that they only involve one- or two-dimensional integrals, rather than an expression of an n-dimensional integral (as it could have been in the case of an n-dimensional density).

3.4. Mutual Information Calculations for Communication Channels with Jamming

Consider a channel that is fed by an input vector X n = ( X 1 , , X n ) X n and generates an output vector Y n = ( Y 1 , , Y n ) Y n , where X and Y are either finite, countably infinite, or continuous alphabets, and X n and Y n are their n th order Cartesian powers. Let the conditional probability distribution of the channel be given by
p Y n | X n ( y n | x n ) = 1 n i = 1 n j i q Y | X ( y j | x j ) r Y | X ( y i | x i ) ,
where r Y | X ( · | · ) and q Y | X ( · | · ) are given conditional probability distributions of Y given X, x n = ( x 1 , , x n ) X n and y n = ( y 1 , , y n ) Y n . This channel model refers to a discrete memoryless channel (DMC), which is nominally given by
q Y n | X n ( y n | x n ) = i = 1 n q Y | X ( y i | x i ) ,
where one of the transmitted symbols is jammed at a uniformly distributed random time, i, and the transition distribution of the jammed symbol is given by r Y | X ( y i | x i ) instead of q Y | X ( y i | x i ) . The restriction to a single jammed symbol is made merely for the sake of simplicity, but it can easily be extended.
We wish to evaluate how the jamming affects the mutual information I ( X n ; Y n ) . Clearly, when one talks about jamming, the mutual information is decreased, but this is not part of the mathematical model, where the relation between r and q has not been specified. Let the input distribution be given by the product form
p X n ( x n ) = i = 1 n p X ( x i ) , x n X n .
The mutual information (in nats) is given by
I ( X n ; Y n ) = h ( Y n ) h ( Y n | X n )
= X n × Y n p X n , Y n ( x n , y n ) ln p Y n | X n ( y n | x n ) d x n d y n Y n p Y n ( y n ) ln p Y n ( y n ) d y n .
For the simplicity of notation, we henceforth omit the domains of integration whenever they are clear from the context. We have,
p X n , Y n ( x n , y n ) ln p Y n | X n ( y n | x n ) d x n d y n = p X n , Y n ( x n , y n ) ln p Y n | X n ( y n | x n ) q Y n | X n ( y n | x n ) d x n d y n + p X n , Y n ( x n , y n ) ln q Y n | X n ( y n | x n ) d x n d y n .
By using the logarithmic expectation in (15) and the following equality (see (55) and (56)):
p Y n | X n ( y n | x n ) q Y n | X n ( y n | x n ) = 1 n i = 1 n r Y | X ( y i | x i ) q Y | X ( y i | x i ) ,
we obtain (see Appendix D.1)
p X n , Y n ( x n , y n ) ln p Y n | X n ( y n | x n ) q Y n | X n ( y n | x n ) d x n d y n = 0 1 u e u f n 1 u n g u n d u ,
where, for u 0 ,
f ( u ) : = p X ( x ) q Y | X ( y | x ) exp u r Y | X ( y | x ) q Y | X ( y | x ) d x d y ,
g ( u ) : = p X ( x ) r Y | X ( y | x ) exp u r Y | X ( y | x ) q Y | X ( y | x ) d x d y .
Moreover, owing to the product form of q n , it is shown in Appendix D.2 that
p X n , Y n ( x n , y n ) ln q Y n | X n ( y n | x n ) d x n d y n = p X ( x ) r Y | X ( y | x ) ln q Y | X ( y | x ) d x d y + ( n 1 ) p X ( x ) q Y | X ( y | x ) ln q Y | X ( y | x ) d x d y .
Combining (60), (62), and (65), we express h ( Y n | X n ) as a double integral over X × Y , independently of n (rather than an integration over X n × Y n ):
h ( Y n | X n ) = 0 1 u f n 1 u n g u n e u d u p X ( x ) r Y | X ( y | x ) ln q Y | X ( y | x ) d x d y ( n 1 ) p X ( x ) q Y | X ( y | x ) ln q Y | X ( y | x ) d x d y .
We next calculate the differential channel output entropy, h ( Y n ) , induced by p Y n | X n ( · | · ) . From Appendix D.3,
p Y n ( y n ) = j = 1 n v ( y j ) · 1 n i = 1 n w ( y i ) v ( y i ) ,
where, for all y Y ,
v ( y ) : = q Y | X ( y | x ) p X ( x ) d x ,
w ( y ) : = r Y | X ( y | x ) p X ( x ) d x .
By (1), the following identity holds for every positive random variable Z (see Appendix D.3):
E { Z ln Z } = 0 1 u M Z ( 0 ) e u M Z ( u ) d u
where M Z ( u ) : = E { e u Z } . By setting Z : = 1 n i = 1 n w ( V i ) v ( V i ) where { V i } i = 1 n are i.i.d. random variables with the density function v, some algebraic manipulations give (see Appendix D.3)
h ( Y n ) = 0 1 u t n 1 u n s u n e u d u w ( y ) ln v ( y ) d y ( n 1 ) v ( y ) ln v ( y ) d y ,
where
s ( u ) : = w ( y ) exp u w ( y ) v ( y ) d y , u 0 ,
t ( u ) : = v ( y ) exp u w ( y ) v ( y ) d y , u 0 .
Combining (58), (66), and (71), we obtain the mutual information for the channel with jamming, which is given by
I p ( X n ; Y n ) = 0 1 u t n 1 u n s u n f n 1 u n g u n d u + p X ( x ) r Y | X ( y | x ) ln q Y | X ( y | x ) d x d y w ( y ) ln v ( y ) d y + ( n 1 ) p X ( x ) q Y | X ( y | x ) ln q Y | X ( y | x ) d x d y v ( y ) ln v ( y ) d y .
We next exemplify our results in the case where q is a binary symmetric channel (BSC) with crossover probability δ ( 0 , 1 2 ) and p is a BSC with a larger crossover probability, ε ( δ , 1 2 ] . We assume that the input bits are i.i.d. and equiprobable. The specialization of our analysis to this setup is provided in Appendix D.4, showing that the mutual information of the channel p X n , Y n , fed by the binary symmetric source, is given by
I p ( X n ; Y n ) = n ln 2 d ( ε δ ) H b ( ε ) ( n 1 ) H b ( δ ) + 0 { e u ( 1 δ ) exp ( 1 ε ) u ( 1 δ ) n + δ exp ε u δ n n 1 · ( 1 ε ) exp ( 1 ε ) u ( 1 δ ) n + ε exp ε u δ n } d u u ,
where H b : [ 0 , 1 ] [ 0 , ln 2 ] is the binary entropy function
H b ( x ) : = x ln ( x ) ( 1 x ) ln ( 1 x ) , x [ 0 , 1 ] ,
with the convention that 0 ln 0 = 0 , and
d ( ε δ ) : = ε ln ε δ + ( 1 ε ) ln 1 ε 1 δ , ( δ , ε ) [ 0 , 1 ] 2
denotes the binary relative entropy. By the data processing inequality, the mutual information in (75) is smaller than that of the BSC with crossover probability δ :
I q ( X n ; Y n ) = n ln 2 H b ( δ ) .
Figure 3 refers to the case where δ = 10 3 and n = 128 . Here, I q ( X n ; Y n ) = 87.71 nats , and I p ( X n ; Y n ) is decreased by 2.88 nats due to the jammer (see Figure 3).
Figure 4 refers to the case where δ = 10 3 and ε = 1 2 (referring to complete jamming of a single symbol, which is chosen uniformly at random), and it shows the difference in the mutual information I ( X n ; Y n ) , as a function of the length n, between the jamming-free BSC with crossover probability δ and the channel with jamming.
To conclude, this subsection studies the change in the mutual information I ( X n ; Y n ) due to jamming, relative to the mutual information associated with the nominal channel without jamming. Due to the integral representations provided in our analysis, the calculation of the mutual information finally depends on one-dimensional integrals, as opposed to the original n-dimensional integrals, pertaining to the expressions that define the associated differential entropies.

Author Contributions

Both authors contributed equally to this research work, as well as to the write-up of this article. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Theorem 1

Let ρ > 0 be a non-integer real, and define the function F ρ : ( 0 , ) R as follows:
F ρ ( μ ) : = 0 1 u ρ + 1 e μ u j = 0 ρ ( 1 ) j j ! ( μ 1 ) j u j e u d u , μ > 0 ,
with the convention that 0 0 : = lim x 0 + x x = 1 . By the Taylor series expansion of e μ u as a function of μ around μ = 1 , we find that for small positive u, the integrand of (A1) scales as u ( ρ ρ ) with ρ ρ ( 0 , 1 ) . Furthermore, for large u, the same integrand scales as u ( ρ + 1 ) e min { μ , 1 } u . This guarantees the convergence of the integral, and so, F ρ ( · ) is well defined and finite in the interval ( 0 , ) .
From (A1), F ρ ( 1 ) = 0 (for μ = 1 , the integrand of (A1) is identically zero on ( 0 , ) ). Differentiation times with respect to μ , under the integration sign with 0 , , ρ , gives according to Leibniz integral rule
F ρ ( ) ( μ ) = 0 1 u ρ + 1 ( 1 ) u e μ u j = ρ ( 1 ) j ( j ) ! · ( μ 1 ) j u j e u d u ,
which implies that
F ρ ( ) ( 1 ) = 0 , = 0 , , ρ .
We next calculate F ρ ( k ) ( μ ) for k : = ρ + 1 and μ > 0 . By invoking the Leibniz integral rule,
F ρ ( k ) ( μ ) = 0 1 u ρ + 1 k μ k e μ u j = 0 ρ ( 1 ) j j ! · ( μ 1 ) j u j e u d u = 0 ( u ) k e μ u u ρ + 1 d u = ( 1 ) k 0 u k ρ 1 e μ u d u = ( 1 ) k 0 t μ k ρ 1 e t μ 1 d t = ( 1 ) k μ ρ k Γ ( k ρ ) .
Hence, from (A3) and (A4),
F ρ ( 1 ) = = F ρ ( ρ ) ( 1 ) = 0 ,
F ρ ( k ) ( μ ) = ( 1 ) k μ ρ k Γ ( k ρ ) , k : = ρ + 1 , μ > 0 .
By integrating both sides of (A6) with respect to μ , successively k times, the equalities in (A5) imply that
F ρ ( μ ) = ( 1 ) k Γ ( k ρ ) μ ρ i = 0 k 1 ( ρ i ) + i = 0 k 1 c i ( ρ ) ( μ 1 ) i , k : = ρ + 1 , μ > 0 ,
with some suitable constants c i ( ρ ) i = 0 k 1 . Since F ρ ( 1 ) = 0 (see (A5)), (A7) implies that
c 0 ( ρ ) = ( 1 ) k + 1 Γ ( k ρ ) i = 0 k 1 ( ρ i ) ,
and since (by assumption) ρ is a non-integer, the denominator on the right-hand side of (A8) is nonzero. Moreover, since F ρ ( ) ( 1 ) = 0 for all { 1 , k 1 } (see (A5)), differentiation of both sides of (A7) times at μ = 1 yields
c ( ρ ) : = ( 1 ) k + 1 Γ ( k ρ ) i = 0 1 ( ρ i ) ! i = 0 k 1 ( ρ i ) , = 1 , , k 1 .
Substituting (A8) and (A9) into (A7) gives
F ρ ( μ ) = ( 1 ) k Γ ( k ρ ) i = 0 k 1 ( ρ i ) μ ρ 1 = 1 k 1 1 ! i = 0 1 ( ρ i ) ( μ 1 ) , μ > 0 .
Combining (A1) with (A10) and rearranging the terms, we obtain
μ ρ = 1 + = 1 k 1 1 ! i = 0 1 ( ρ i ) ( μ 1 ) + ( 1 ) k 1 i = 0 k 1 ( ρ i ) Γ ( k ρ ) 0 1 u ρ + 1 j = 0 ρ ( 1 ) j j ! ( μ 1 ) j u j e u e μ u d u .
Setting μ : = X 0 and taking expectations of both sides of (A11) imply, by swapping the order of integration and expectation (which is validated due to Fubini’s theorem), that the following equality holds (see (4) and (10)):
E X ρ = 1 + = 1 k 1 1 ! i = 0 1 ( ρ i ) α + ( 1 ) k 1 i = 0 k 1 ( ρ i ) Γ ( k ρ ) 0 1 u ρ + 1 j = 0 ρ ( 1 ) j α j j ! u j e u M X ( u ) d u .
We next rewrite and simplify both terms in the right side of (A12) as follows:
1 + = 1 k 1 1 ! i = 0 1 ( ρ i ) α = 1 + = 1 k 1 1 Γ ( + 1 ) Γ ( ρ + 1 ) Γ ( ρ + 1 ) · α
= 1 + 1 1 + ρ = 1 k 1 1 Γ ( + 1 ) Γ ( ρ + 2 ) Γ ( ρ + 1 ) · α
= 1 + 1 1 + ρ = 1 k 1 α B ( + 1 , ρ + 1 )
= 1 1 + ρ = 0 k 1 α B ( + 1 , ρ + 1 ) ,
and
( 1 ) k 1 i = 0 k 1 ( ρ i ) Γ ( k ρ ) = ( 1 ) k 1 Γ ( ρ + 1 ) Γ ( k ρ ) Γ ( ρ k + 1 )
= ( 1 ) k 1 Γ ( ρ + 1 ) · sin π ( k ρ ) π
= ρ sin ( π ρ ) Γ ( ρ ) π .
Equations (A13), (A14), (A17), and (A19) are based on the recursion (see, e.g., [26] (Identity (8.331)))
Γ ( x + 1 ) = x Γ ( x ) , x > 0 ,
(A15) relies on the relation between the Beta and Gamma functions in (6); (A16) is based on the following equality (see (6), (A20), and recall that Γ ( 1 ) = 1 ):
B ( 1 , ρ + 1 ) = Γ ( 1 ) Γ ( ρ + 1 ) Γ ( ρ + 2 ) = 1 ρ + 1 ,
and finally, (A19) holds by using the identity (see, e.g., [26] (page 905, Identity (8.334)))
Γ ( x ) Γ ( 1 x ) = π sin ( π x ) , x ( 0 , 1 ) ,
with x : = k ρ = ρ + 1 ρ ( 0 , 1 ) (since, by assumption, ρ is a non-integer). Combining (A12)–(A19) gives (9) (recall that α 0 : = 1 , and k 1 : = ρ holds by (A6)).
We finally prove (11). By (10), for all j N ,
α j = E ( X 1 ) j = = 0 j ( 1 ) j j E X = = 0 j ( 1 ) j Γ ( j + 1 ) M X ( ) ( 0 ) Γ ( + 1 ) Γ ( j + 1 ) = 1 j + 1 = 0 j ( 1 ) j Γ ( j + 2 ) M X ( ) ( 0 ) Γ ( + 1 ) Γ ( j + 1 ) = 1 j + 1 = 0 j ( 1 ) j M X ( ) ( 0 ) B ( + 1 , j + 1 ) .

Appendix B. Complementary Details of the Analysis in Section 3.2

Appendix B.1. Proof of Equation (36)

For all u > 0 ,
M D n ( u ) = E exp u θ ^ n θ 2
= E 1 2 π u e j ω ( θ ^ n θ ) e ω 2 / ( 4 u ) d ω
= 1 2 π u e j ω θ E { e j ω θ ^ n } e ω 2 / ( 4 u ) d ω
= 1 2 π u e j ω θ E exp j ω n i = 1 n X i e ω 2 / ( 4 u ) d ω
= 1 2 π u e j ω θ ϕ X n ω n e ω 2 / ( 4 u ) d ω ,
where (A24) is (31); (A25) relies on (35); (A26) holds by interchanging expectation and integration (due to Fubini’s theorem); (A27) is due to (28), and (A28) holds by the assumption that X 1 , , X n are i.i.d.

Appendix B.2. Derivation of the Upper Bound in (40)

For all ρ > 0 ,
E { | θ ^ n θ | ρ } = 0 P ( θ ^ n θ ρ t ) d t = 0 P ( θ ^ n θ ρ ε ρ ) ρ ε ρ 1 d ε = 0 P ( θ ^ n θ ε ) ρ ε ρ 1 d ε .
We next use the Chernoff bound for upper bounding P ( θ ^ n θ ε ) for all ε > 0 ,
P ( θ ^ n θ ε ) = P i = 1 n ( X i θ ) n ε inf s 0 e s n ε E exp s i = 1 n ( X i θ ) = inf s 0 e s n ε i = 1 n E e s ( X i θ ) = inf s 0 e s n ε θ e s ( 1 θ ) + ( 1 θ ) e s θ n = inf s 0 e n s ε + n H θ ( s )
with θ [ 0 , 1 ] , and
H θ ( s ) : = ln θ e s ( 1 θ ) + ( 1 θ ) e s θ , s 0 .
We now use an upper bound on H θ ( s ) for every s 0 . By Theorem 3.2 and Lemma 3.3 in [39] (see also Lemma 2.4.6 in [40]), we have
H θ ( s ) C ( θ ) s 2
with
C ( θ ) : = { 0 , if θ = 0 , 1 2 θ 4 ln 1 θ θ , if θ 0 , 1 2 , 1 2 θ ( 1 θ ) , if θ 1 2 , 1 .
Combining (A30) and (A32) yields
P ( θ ^ n θ ε ) inf s 0 e n ε s + n C ( θ ) s 2 = exp n ε 2 4 C ( θ ) .
Similarly, it is easy to show that the same Chernoff bound applies also to P ( θ ^ n θ ε ) , which overall gives
P ( θ ^ n θ ε ) 2 exp n ε 2 4 C ( θ ) .
Inequality (A35) is a refined version of Hoeffding’s inequality (see Section 2.4.4 in [40]), which is derived for the Bernoulli distribution (see (A30)) and by invoking the Chernoff bound; moreover, (A35) coincides with Hoeffding’s inequality in the special case θ = 1 2 (which, from (A33), yields C ( θ ) = 1 8 ). In view of the fact that (A35) forms a specialization of Theorem 2.4.7 in [40], it follows that the Bernoulli case is the worst one (in the sense of leading to the looser upper bound) among all probability distributions whose support is the interval [ 0 , 1 ] and whose expected value is θ [ 0 , 1 ] . However, in the Bernoulli case, a simple symmetry argument applies for improving the bound (A35) as follows. Since { X i } are i.i.d., Bernoulli with mean θ , then obviously, { 1 X i } are Bernoulli, i.i.d. with mean 1 θ and (from (28))
θ ^ n ( 1 X 1 , , 1 X n ) = 1 θ ^ n ( X 1 , , X n ) ,
which implies that the error estimation is identical in both cases. Hence, P ( | θ ^ n θ | ε ) is symmetric around θ = 1 2 . It can be verified that
min C ( θ ) , C ( 1 θ ) = 1 2 θ ( 1 θ ) , θ [ 0 , 1 ] ,
which follows from (A33) and since C ( θ ) > C ( 1 θ ) for all θ ( 0 , 1 2 ) (see Figure 2.1 in [40]). In view of (A37) and the above symmetry consideration, the upper bound in (A35) is improved for values of θ ( 0 , 1 2 ) , which therefore gives
P ) | θ ^ n θ | ε ) 2 exp n ε 2 2 θ ( 1 θ ) , θ [ 0 , 1 ] , ε > 0 .
From (28), the probability in (A38) vanishes if θ = 0 or θ = 1 . Consequently, for ρ > 0 ,
E θ ^ n θ ρ = 0 P θ ^ n θ ε ρ ε ρ 1 d ε
0 2 exp n ε 2 2 θ ( 1 θ ) ρ ε ρ 1 d ε
= ρ 2 θ ( 1 θ ) ρ / 2 0 u ρ / 2 1 e u d u · n ρ / 2
= ρ Γ ρ 2 2 θ ( 1 θ ) ρ / 2 · n ρ / 2
= K ( ρ , θ ) · n ρ / 2 ,
where (A39)–(A43) hold, respectively, due to (A29), (A38), the substitution u : = n ε 2 2 θ ( 1 θ ) , (7), and (41).

Appendix C. Complementary Details of the Analysis in Section 3.3

We start by proving (50). In view of (48), for α ( 0 , 1 ) ( 1 , ) ,
h α ( X n ) = 1 1 α log E f α 1 ( X n ) ,
where X n : = ( X 1 , , X n ) . For α > 1 , we get
E f α 1 ( X n ) = C n α 1 E 1 + i = 1 n g ( X i ) q ( 1 α )
= C n α 1 R n f ( x n ) · 1 Γ q ( α 1 ) 0 t q ( α 1 ) 1 exp 1 + i = 1 n g ( x i ) t d t
= C n α 1 Γ q ( α 1 ) 0 t q ( α 1 ) 1 e t E exp t i = 1 n g ( X i ) d t .
where (A45) holds due to (42); (A46) follows from (49), and (A47) holds by swapping the order of integrations, which is validated by Fubini’s theorem. Furthermore, from (42) and (49),
f ( x n ) = C n ( 1 + i = 1 n g ( x i ) ) q = C n Γ ( q ) 0 u q 1 e u exp u i = 1 n g ( x i ) d u , x n R n ,
and it follows from (A48) and by swapping the order of integrations (due to Fubini’s theorem),
E exp t i = 1 n g ( X i ) = C n Γ ( q ) 0 u q 1 e u R n exp ( t + u ) i = 1 n g ( x i ) d x n d u = C n Γ ( q ) 0 u q 1 e u i = 1 n exp ( t + u ) g ( x i ) d x i d u = C n Γ ( q ) 0 u q 1 e u exp ( t + u ) g ( x ) d x n d u = C n Γ ( q ) 0 u q 1 e u Z n ( t + u ) d u ,
where (A49) holds by the definition of Z ( · ) in (44). Finally, combining (45), (A44), (A47), and (A49) gives (50).
The proof of (51)–(54) is a straightforward calculation, which follows by combining (A44), (A45), (A49) and Theorem 1 (we replace { α j } in Theorem 1 with { β j ( n ) } in order not to confuse the order α of the Rényi entropy of X n ).

Appendix D. Calculations of the n-Dimensional Integrals in Section 3.4

Appendix D.1. Proof of Equations (62)–(64)

p X n , Y n ( x n , y n ) ln p Y n | X n ( y n | x n ) q Y n | X n ( y n | x n ) d x n d y n = p X n , Y n ( x n , y n ) ln 1 n i = 1 n r Y | X ( y i | x i ) q Y | X ( y i | x i ) d x n d y n
= 0 1 u e u p X n , Y n ( x n , y n ) exp u n i = 1 n r Y | X ( y i | x i ) q Y | X ( y i | x i ) d x n d y n d u
= 0 1 u [ e u 1 n i = 1 n j i q Y | X ( y j | x j ) p X ( x j ) · r Y | X ( y i | x i ) p X ( x i )
· exp u n i = 1 n r Y | X ( y i | x i ) q Y | X ( y i | x i ) d x n d y n ] d u
= 0 1 u [ e u 1 n i = 1 n { j i q Y | X ( y j | x j ) p X ( x j ) exp u n r Y | X ( y j | x j ) q Y | X ( y j | x j )
· r Y | X ( y i | x i ) p X ( x i ) exp u n r Y | X ( y i | x i ) q Y | X ( y i | x i ) } d x n d y n ] d u
= 0 1 u [ e u 1 n i = 1 n { j i q Y | X ( y j | x j ) p X ( x j ) exp u n r Y | X ( y j | x j ) q Y | X ( y j | x j ) d x j d y j
· r Y | X ( y i | x i ) p X ( x i ) exp u n r Y | X ( y i | x i ) q Y | X ( y i | x i ) d x i d y i } ] d u
= 0 1 u [ e u 1 n i = 1 n { q Y | X ( y | x ) p X ( x ) exp u n r Y | X ( y | x ) q Y | X ( y | x ) d x d y n 1
· r Y | X ( y | x ) p X ( x ) exp u n r Y | X ( y | x ) q Y | X ( y | x ) d x d y } ] d u
= 0 1 u [ e u q Y | X ( y | x ) p X ( x ) exp u n r Y | X ( y | x ) q Y | X ( y | x ) d x d y n 1
· r Y | X ( y | x ) p X ( x ) exp u n r Y | X ( y | x ) q Y | X ( y | x ) d x d y ] d u
= 0 1 u e u f n 1 u n g u n d u ,
where f ( · ) and g ( · ) are defined in (63) and (64), respectively. Consequently, f ( 0 ) = g ( 0 ) = 1 , and 0 f ( u ) , g ( u ) 1 for all u > 0 .

Appendix D.2. Proof of Equation (65)

p X n , Y n ( x n , y n ) ln q Y n | X n ( y n | x n ) d x n d y n
= p X n , Y n ( x n , y n ) j = 1 n ln q Y | X ( y j | x j ) d x n d y n
= = 1 n p X ( x ) · 1 n i = 1 n i q Y | X ( y | x ) r Y | X ( y i | x i ) j = 1 n ln q Y | X ( y j | x j ) d x n d y n
= = 1 n p X ( x ) · 1 n i = 1 n j = 1 n i q Y | X ( y | x ) · r Y | X ( y i | x i ) ln q Y | X ( y j | x j ) d x n d y n
= 1 n X n = 1 n p X ( x ) i = 1 n j = 1 n Y n i q Y | X ( y | x ) · r Y | X ( y i | x i ) ln q Y | X ( y j | x j ) d y n d x n .
We next calculate the inner integral on the right-hand side of (A60). For i = j ,
Y n i q Y | X ( y | x ) · r Y | X ( y i | x i ) ln q Y | X ( y j | x j ) d y n = i Y q Y | X ( y | x ) d y · Y r Y | X ( y i | x i ) ln q Y | X ( y i | x i ) d y i = Y r Y | X ( y | x i ) ln q Y | X ( y | x i ) d y ,
else, if i j ,
Y n i q Y | X ( y | x ) · r Y | X ( y i | x i ) ln q Y | X ( y j | x j ) d y n = { i , j } Y q Y | X ( y | x ) d y · Y q Y | X ( y j | x j ) ln q Y | X ( y j | x j ) d y j · Y r Y | X ( y i | x i ) d y i = Y q Y | X ( y | x j ) ln q Y | X ( y | x j ) d y .
Hence, from (A60)–(A62),
p X n , Y n ( x n , y n ) ln q Y n | X n ( y n | x n ) d x n d y n = 1 n X n = 1 n p X ( x ) i = 1 n Y r Y | X ( y | x i ) ln q Y | X ( y | x i ) d y + i = 1 n j i Y q Y | X ( y | x j ) ln q Y | X ( y | x j ) d y d x n = 1 n [ i = 1 n i X p X ( x ) d x · X × Y r Y | X ( y | x i ) ln q Y | X ( y | x i ) p X ( x i ) d x i d y + i = 1 n j i j X p X ( x ) d x · X × Y p X ( x j ) q Y | X ( y | x j ) ln q Y | X ( y | x j ) d x j d y ] = 1 n i = 1 n X × Y r Y | X ( y | x ) ln q Y | X ( y | x ) p X ( x ) d x d y + i = 1 n j i X × Y p X ( x ) q Y | X ( y | x ) ln q Y | X ( y | x ) d x d y = X × Y p X ( x ) r Y | X ( y | x ) ln q Y | X ( y | x ) d x d y + ( n 1 ) X × Y p X ( x ) q Y | X ( y | x ) ln q Y | X ( y | x ) d x d y .

Appendix D.3. Proof of Equations (67)–(73)

p Y n ( y n ) = p Y n | X n ( y n | x n ) p X n ( x n ) d x n = 1 n i = 1 n j i q Y | X ( y j | x j ) p X ( x j ) d x j · r Y | X ( y i | x i ) P X ( x i ) d x i = 1 n i = 1 n j i v ( y j ) · w ( y i ) = j = 1 n v ( y j ) · 1 n i = 1 n w ( y i ) v ( y i ) , y n Y n ,
where v ( · ) and w ( · ) are probability densities on Y , as defined in (68) and (69), respectively. This proves (67).
We next prove (70), which is used to calculate the entropy of Y n with the density p Y n ( · ) in (A64). In view of the integral representation of the logarithmic function in (1) and by interchanging the order of the integrations (due to Fubini’s theorem), we get that for a positive random variable Z
E Z ln Z = 0 1 u · E Z e u e u Z d u = 0 E Z e u E Z e u Z u d u = 0 M Z ( 0 ) e u M Z ( u ) u d u ,
which proves (70). Finally, we prove (71). In view of (A64),
h ( Y n ) = p Y n ( y n ) ln p Y n ( y n ) d y n = j = 1 n v ( y j ) · 1 n i = 1 n w ( y i ) v ( y i ) · [ ln j = 1 n v ( y j ) + ln 1 n i = 1 n w ( y i ) v ( y i ) ] d y n = j = 1 n v ( y j ) · 1 n i = 1 n w ( y i ) v ( y i ) · [ j = 1 n ln v ( y j ) + ln 1 n i = 1 n w ( y i ) v ( y i ) ] d y n = = 1 n v ( y ) · 1 n i = 1 n j = 1 n w ( y i ) ln v ( y j ) v ( y i ) d y n j = 1 n v ( y j ) · 1 n i = 1 n w ( y i ) v ( y i ) · ln 1 n i = 1 n w ( y i ) v ( y i ) d y n .
A calculation of the first integral on the right-hand side of (A66) gives
= 1 n v ( y ) · 1 n i = 1 n j = 1 n w ( y i ) ln v ( y j ) v ( y i ) d y n = 1 n i = 1 n j = 1 n = 1 n v ( y ) · w ( y i ) ln v ( y j ) v ( y i ) d y n = 1 n i = 1 n j = 1 n i v ( y ) · w ( y i ) ln v ( y j ) d y n .
For i = j , the inner integral on the right-hand side of (A67) satisfies
i v ( y ) · w ( y i ) ln v ( y j ) d y n = i v ( y ) d y · w ( y i ) ln v ( y i ) d y i = w ( y ) ln v ( y ) d y ,
and for i j ,
i v ( y ) · w ( y i ) ln v ( y j ) d y n = i , j v ( y ) d y · w ( y i ) d y i · v ( y j ) ln v ( y j ) d y j = v ( y ) ln v ( y ) d y .
Therefore, combining (A67)–(A69) gives
= 1 n v ( y ) · 1 n i = 1 n j = 1 n w ( y i ) ln v ( y j ) v ( y i ) d y n = w ( y ) ln v ( y ) d y + ( n 1 ) v ( y ) ln v ( y ) d y .
Finally, we calculate the second integral on the right-hand side of (A66). Let μ n be the probability density function defined as
μ n ( y n ) : = j = 1 n v ( y j ) , y n Y n ,
and let
Z : = 1 n i = 1 n w ( V i ) v ( V i )
where { V i } i = 1 n are i.i.d. Y -valued random variables with a probability density function v. Then, in view of (70), the second integral on the right-hand side of (A66) satisfies
j = 1 n v ( y j ) · 1 n i = 1 n w ( y i ) v ( y i ) · ln 1 n i = 1 n w ( y i ) v ( y i ) d y n = E Z ln Z = 0 M Z ( 0 ) e u M Z ( u ) u d u .
The MGF of Z is equal to
M Z ( u ) = Y n i = 1 n v ( r i ) exp u n i = 1 n w ( r i ) v ( r i ) d r ̲ = i = 1 n Y v ( r i ) exp u n w ( r i ) v ( r i ) d r i = K n u n ,
where
K ( u ) : = Y v ( y ) exp u w ( y ) v ( y ) d y , u R ,
and consequently, (A75) yields
M Z ( u ) = K n 1 u n K u n = v ( y ) exp u w ( y ) v ( y ) d y n 1 w ( y ) exp u w ( y ) v ( y ) d y ,
and
M Z ( 0 ) = 1 .
Therefore, combining (A73)–(A77) gives the following single-letter expression for the second multi-dimensional integral on the right-hand side of (A66):
j = 1 n v ( y j ) · 1 n i = 1 n w ( y i ) v ( y i ) · ln 1 n i = 1 n w ( y i ) v ( y i ) d y n = 0 1 u e u t n 1 u n s u n d u
where the functions s ( · ) and t ( · ) are defined in (72) and (73), respectively. Combining (A66), (A70), and (A78) gives (71).

Appendix D.4. Specialization to a BSC with Jamming

In the BSC example considered, we have X = Y = { 0 , 1 } , and
r Y | X ( y | x ) = ε 1 { x y } + ( 1 ε ) 1 { x = y } ,
q Y | X ( y | x ) = δ 1 { x y } + ( 1 δ ) 1 { x = y } ,
where 1 { relation } is the indicator function that is equal to one if the relation holds, and to zero otherwise. Recall that we assume 0 < δ < ε 1 2 . Let
p X ( 0 ) = p X ( 1 ) = 1 2 ,
be the binary symmetric source (BSS). From (63) and (64), for u 0 ,
f ( u ) = x , y p X ( x ) q Y | X ( y | x ) exp u r Y | X ( y | x ) q Y | X ( y | x )
= ( 1 δ ) exp ( 1 ε ) u 1 δ + δ exp ε u δ ,
g ( u ) = x , y p X ( x ) r Y | X ( y | x ) exp u r Y | X ( y | x ) q Y | X ( y | x )
= ( 1 ε ) exp ( 1 ε ) u 1 δ + ε exp ε u δ .
Furthermore, we get from (A79), (A80), and (A81) that
x , y p X ( x ) r Y | X ( y | x ) ln q Y | X ( y | x ) = ε ln δ ( 1 ε ) ln ( 1 δ ) = d ( ε δ ) + H b ( ε ) ,
and
x , y p X ( x ) q Y | X ( y | x ) ln q Y | X ( y | x ) = H b ( δ ) .
Substituting (A82)–(A85) into (66) (where the integrals in (66) are replaced by sums) gives
H ( Y n | X n ) = d ( ε δ ) + H b ( ε ) + ( n 1 ) H b ( δ ) + 0 f n 1 u n g u n e u d u u .
Since the input is a BSS, due to the symmetry of the channel (55), the output is also a BSS. This implies that (in units of nats)
H ( Y n ) = n ln 2 .
As a sanity check, we verify it by using (71). From (68) and (69), for y { 0 , 1 } ,
v ( y ) = p X ( 0 ) q Y | X ( y | 0 ) + p X ( 1 ) q Y | X ( y | 1 ) = 1 2 ,
w ( y ) = p X ( 0 ) r Y | X ( y | 0 ) + p X ( 1 ) r Y | X ( y | 1 ) = 1 2 ,
and from (72) and (73), it consequently follows that
s ( u ) = w ( 0 ) exp u w ( 0 ) v ( 0 ) + w ( 1 ) exp u w ( 1 ) v ( 1 ) = e u , u 0 ,
and also
t ( u ) = e u , u 0 .
It can be verified that substituting (A88)–(A91) into (71) reproduces (A87). Finally, subtracting (A86) from (A87) gives (75).

References

  1. Dong, A.; Zhang, H.; Wu, D.; Yuan, D. Logarithmic expectation of the sum of exponential random variables for wireless communication performance evaluation. In 2015 IEEE 82nd Vehicular Technology Conference (VTC2015-Fall); IEEE: Piscataway, NJ, USA, 2015. [Google Scholar] [CrossRef]
  2. Appledorn, C.R. The entropy of a Poisson distribution. SIAM Rev. 1988, 30, 314–317. [Google Scholar] [CrossRef]
  3. Knessl, C. Integral representations and asymptotic expansions for Shannon and Rényi entropies. Appl. Math. Lett. 1998, 11, 69–74. [Google Scholar] [CrossRef] [Green Version]
  4. Lapidoth, A.; Moser, S. Capacity bounds via duality with applications to multiple-antenna systems on flat fading channels. IEEE Trans. Inf. Theory 2003, 49, 2426–2467. [Google Scholar] [CrossRef]
  5. Martinez, A. Spectral efficiency of optical direct detection. J. Opt. Soc. Am. B 2007, 24, 739–749. [Google Scholar] [CrossRef]
  6. Merhav, N.; Sason, I. An integral representation of the logarithmic function with applications in information theory. Entropy 2020, 22, 51. [Google Scholar] [CrossRef] [Green Version]
  7. Rajan, A.; Tepedelenlioǧlu, C. Stochastic ordering of fading channels through the Shannon transform. IEEE Trans. Inf. Theory 2015, 61, 1619–1628. [Google Scholar] [CrossRef] [Green Version]
  8. Zadeh, P.H.; Hosseini, R. Expected logarithm of central quadratic form and its use in KL-divergence of some distributions. Entropy 2016, 18, 288. [Google Scholar] [CrossRef] [Green Version]
  9. Arikan, E. An inequality on guessing and its application to sequential decoding. IEEE Trans. Inf. Theory 1996, 42, 99–105. [Google Scholar] [CrossRef] [Green Version]
  10. Arikan, E.; Merhav, N. Guessing subject to distortion. IEEE Trans. Inf. Theory 1998, 44, 1041–1056. [Google Scholar] [CrossRef]
  11. Arikan, E.; Merhav, N. Joint source-channel coding and guessing with application to sequential decoding. IEEE Trans. Inf. Theory 1998, 44, 1756–1769. [Google Scholar] [CrossRef] [Green Version]
  12. Boztaş, S. Comments on “An inequality on guessing and its application to sequential decoding”. IEEE Trans. Inf. Theory 1997, 43, 2062–2063. [Google Scholar] [CrossRef]
  13. Bracher, A.; Hof, E.; Lapidoth, A. Guessing attacks on distributed-storage systems. IEEE Trans. Inf. Theory 2019, 65, 6975–6998. [Google Scholar] [CrossRef] [Green Version]
  14. Bunte, C.; Lapidoth, A. Encoding tasks and Rényi entropy. IEEE Trans. Inf. Theory 2014, 60, 5065–5076. [Google Scholar] [CrossRef] [Green Version]
  15. Campbell, L.L. A coding theorem and Rényi’s entropy. Inf. Control 1965, 8, 423–429. [Google Scholar] [CrossRef] [Green Version]
  16. Courtade, T.; Verdú, S. Cumulant generating function of codeword lengths in optimal lossless compression. In Proceedings of the 2014 IEEE International Symposium on Information Theory, Honolulu, HI, USA, 29 June–4 July 2014; pp. 2494–2498. [Google Scholar] [CrossRef]
  17. Courtade, T.; Verdú, S. Variable-length lossy compression and channel coding: Non-asymptotic converses via cumulant generating functions. In Proceedings of the 2014 IEEE International Symposium on Information Theory, Honolulu, HI, USA, 29 June–4 July 2014; pp. 2499–2503. [Google Scholar] [CrossRef]
  18. Merhav, N. Lower bounds on exponential moments of the quadratic error in parameter estimation. IEEE Trans. Inf. Theory 2018, 64, 7636–7648. [Google Scholar] [CrossRef] [Green Version]
  19. Sason, I.; Verdú, S. Improved bounds on lossless source coding and guessing moments via Rényi measures. IEEE Trans. Inf. Theory 2018, 64, 4323–4346. [Google Scholar] [CrossRef] [Green Version]
  20. Sason, I. Tight bounds on the Rényi entropy via majorization with applications to guessing and compression. Entropy 2018, 20, 896. [Google Scholar] [CrossRef] [Green Version]
  21. Sundaresan, R. Guessing under source uncertainty. IEEE Trans. Inf. Theory 2007, 53, 269–287. [Google Scholar] [CrossRef] [Green Version]
  22. Sundaresan, R. Guessing based on length functions. In Proceedings of the 2007 IEEE International Symposium on Information Theory, Nice, France, 24–29 June 2007; pp. 716–719. [Google Scholar] [CrossRef] [Green Version]
  23. Mézard, M.; Montanari, A. Information, Physics, and Computation; Oxford University Press: New York, NY, USA, 2009. [Google Scholar]
  24. Esipov, S.E.; Newman, T.J. Interface growth and Burgers turbulence: The problem of random initial conditions. Phys. Rev. E 1993, 48, 1046–1050. [Google Scholar] [CrossRef]
  25. Song, J.; Still, S.; Rojas, R.D.H.; Castillo, I.P.; Marsili, M. Optimal work extraction and mutual information in a generalized Szilárd engine. arXiv 2019, arXiv:1910.04191. [Google Scholar]
  26. Gradshteyn, I.S.; Ryzhik, I.M. Tables of Integrals, Series, and Products, 8th ed.; Academic Press: New York, NY, USA, 2014. [Google Scholar]
  27. Gzyl, H.; Tagliani, A. Stieltjes moment problem and fractional moments. Appl. Math. Comput. 2010, 216, 3307–3318. [Google Scholar] [CrossRef]
  28. Gzyl, H.; Tagliani, A. Determination of the distribution of total loss from the fractional moments of its exponential. Appl. Math. Comput. 2012, 219, 2124–2133. [Google Scholar] [CrossRef]
  29. Tagliani, A. On the proximity of distributions in terms of coinciding fractional moments. Appl. Math. Comput. 2003, 145, 501–509. [Google Scholar] [CrossRef]
  30. Tagliani, A. Hausdorff moment problem and fractional moments: A simplified procedure. Appl. Math. Comput. 2011, 218, 4423–4432. [Google Scholar] [CrossRef]
  31. Olver, F.W.J.; Lozier, D.W.; Boisvert, R.F.; Clark, C.W. NIST Handbook of Mathematical Functions; NIST and Cambridge University Press: New York, NY, USA, 2010. [Google Scholar]
  32. Hanawal, M.K.; Sundaresan, R. Guessing revisited: A large deviations approach. IEEE Trans. Inf. Theory 2011, 57, 70–78. [Google Scholar] [CrossRef] [Green Version]
  33. Merhav, N.; Cohen, A. Universal randomized guessing with application to asynchronous decentralized brute-force attacks. IEEE Trans. Inf. Theory 2020, 66, 114–129. [Google Scholar] [CrossRef]
  34. Salamatian, S.; Huleihel, W.; Beirami, A.; Cohen, A.; Médard, M. Why botnets work: Distributed brute-force attacks need no synchronization. IEEE Trans. Inf. Foren. Sec. 2019, 14, 2288–2299. [Google Scholar] [CrossRef]
  35. Carrillo, R.E.; Aysal, T.C.; Barner, K.E. A generalized Cauchy distribution framework for problems requiring robust behavior. EURASIP J. Adv. Signal Process. 2010, 2010, 312989. [Google Scholar] [CrossRef] [Green Version]
  36. Alzaatreh, A.; Lee, C.; Famoye, F.; Ghosh, I. The generalized Cauchy family of distributions with applications. J. Stat. Distrib. App. 2016, 3, 12. [Google Scholar] [CrossRef] [Green Version]
  37. Marsiglietti, A.; Kostina, V. A lower bound on the differential entropy of log-concave random vectors with applications. Entropy 2018, 20, 185. [Google Scholar] [CrossRef] [Green Version]
  38. Rényi, A. On measures of entropy and information. In Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 8–9 August 1961; pp. 547–561. [Google Scholar]
  39. Berend, D.; Kontorovich, A. On the concentration of the missing mass. Electron. Commun. Probab. 2013, 18, 3. [Google Scholar] [CrossRef]
  40. Raginsky, M.; Sason, I. Concentration of Measure Inequalities in Information Theory, Communications and Coding. In Foundations and Trends in Communications and Information Theory, 3rd ed.; NOW Publishers: Hanover, MA, USA; Delft, The Netherlands, 2019. [Google Scholar] [CrossRef]
Figure 1. E θ ^ n θ (see (37) and (39)) versus its upper bound in (40) as functions of θ [ 0 , 1 ] with n = 1000 .
Figure 1. E θ ^ n θ (see (37) and (39)) versus its upper bound in (40) as functions of θ [ 0 , 1 ] with n = 1000 .
Entropy 22 00707 g001
Figure 2. E θ ^ n θ (see (37) and (39)) versus its upper bound in (40) as functions of n with θ = 1 4 .
Figure 2. E θ ^ n θ (see (37) and (39)) versus its upper bound in (40) as functions of n with θ = 1 4 .
Entropy 22 00707 g002
Figure 3. The degradation in mutual information for n = 128 . The jammer-free channel q is a binary symmetric channel (BSC) with crossover probability δ = 10 3 , and r is a BSC with crossover probability ε δ , 1 2 . The input bits are i.i.d. and equiprobable. The degradation in I ( X n ; Y n ) (nats) is displayed as a function of ε .
Figure 3. The degradation in mutual information for n = 128 . The jammer-free channel q is a binary symmetric channel (BSC) with crossover probability δ = 10 3 , and r is a BSC with crossover probability ε δ , 1 2 . The input bits are i.i.d. and equiprobable. The degradation in I ( X n ; Y n ) (nats) is displayed as a function of ε .
Entropy 22 00707 g003
Figure 4. The degradation in mutual information as a function of n. The jammer-free channel q Y | X is a BSC with crossover probability δ = 10 3 , and r Y | X for the jammed symbol is a BSC with crossover probability ε = 1 2 . The input bits are i.i.d. and equiprobable.
Figure 4. The degradation in mutual information as a function of n. The jammer-free channel q Y | X is a BSC with crossover probability δ = 10 3 , and r Y | X for the jammed symbol is a BSC with crossover probability ε = 1 2 . The input bits are i.i.d. and equiprobable.
Entropy 22 00707 g004

Share and Cite

MDPI and ACS Style

Merhav, N.; Sason, I. Some Useful Integral Representations for Information-Theoretic Analyses. Entropy 2020, 22, 707. https://0-doi-org.brum.beds.ac.uk/10.3390/e22060707

AMA Style

Merhav N, Sason I. Some Useful Integral Representations for Information-Theoretic Analyses. Entropy. 2020; 22(6):707. https://0-doi-org.brum.beds.ac.uk/10.3390/e22060707

Chicago/Turabian Style

Merhav, Neri, and Igal Sason. 2020. "Some Useful Integral Representations for Information-Theoretic Analyses" Entropy 22, no. 6: 707. https://0-doi-org.brum.beds.ac.uk/10.3390/e22060707

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop