Next Article in Journal
The Ordering of Shannon Entropies for the Multivariate Distributions and Distributions of Eigenvalues
Next Article in Special Issue
Asymptotic Rate-Distortion Analysis of Symmetric Remote Gaussian Source Coding: Centralized Encoding vs. Distributed Encoding
Previous Article in Journal
Multiscale Entropy Quantifies the Differential Effect of the Medium Embodiment on Older Adults Prefrontal Cortex during the Story Comprehension: A Comparative Analysis
Previous Article in Special Issue
Quasi-Concavity for Gaussian Multicast Relay Channels
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Amplitude Constrained MIMO Channels: Properties of Optimal Input Distributions and Bounds on the Capacity †

1
Department of Electrical Engineering, Princeton University, Princeton, NJ 08544, USA
2
Department of Electrical Engineering, Technion-Israel Institute of Technology, Technion City, Haifa 32000, Israel
*
Authors to whom correspondence should be addressed.
Parts of the material in this paper were presented at the 2017 IEEE Global Communications Conference (Singapore, 4–8 December 2017).
Submission received: 21 January 2019 / Revised: 6 February 2019 / Accepted: 13 February 2019 / Published: 19 February 2019
(This article belongs to the Special Issue Information Theory for Data Communications and Processing)

Abstract

:
In this work, the capacity of multiple-input multiple-output channels that are subject to constraints on the support of the input is studied. The paper consists of two parts. The first part focuses on the general structure of capacity-achieving input distributions. Known results are surveyed and several new results are provided. With regard to the latter, it is shown that the support of a capacity-achieving input distribution is a small set in both a topological and a measure theoretical sense. Moreover, explicit conditions on the channel input space and the channel matrix are found such that the support of a capacity-achieving input distribution is concentrated on the boundary of the input space only. The second part of this paper surveys known bounds on the capacity and provides several novel upper and lower bounds for channels with arbitrary constraints on the support of the channel input symbols. As an immediate practical application, the special case of multiple-input multiple-output channels with amplitude constraints is considered. The bounds are shown to be within a constant gap to the capacity if the channel matrix is invertible and are tight in the high amplitude regime for arbitrary channel matrices. Moreover, in the regime of high amplitudes, it is shown that the capacity scales linearly with the minimum between the number of transmit and receive antennas, similar to the case of average power-constrained inputs.

1. Introduction

While the capacity of a multiple-input multiple-output (MIMO) channel with an average power constraint is well understood [1], there is surprisingly little known about the capacity of the more practically relevant case in which the channel inputs are subject to amplitude constraints. Shannon was the first who considered a channel that is constrained in its amplitude [2]. In that paper, he derived corresponding upper and lower bounds and showed that in the low-amplitude regime, the capacity behaves as that of channel with an average power constraint. The next major contribution to this problem was a seminal paper of Smith [3] published in 1971. Smith showed that, for the single-input single-output (SISO) Gaussian noise channel with an amplitude-constrained input, the capacity-achieving inputs are discrete with finite support. In [4], this result is extended to peak-power-constrained quadrature Gaussian channels. Using the approach of Shamai [4], it is shown in [5] that the input distribution that achieves the capacity of a MIMO channel with an identity channel matrix and a Euclidian norm constraint on the input vector is discrete. Even though the optimal input distribution is known to be discrete, very little is known about the number or the optimal positions of the corresponding constellation points. To the best of our knowledge, the only case for which the input distribution is precisely known is considered in [6], where it is shown for the Gaussian SISO channel with an amplitude constraint that two point masses are optimal if amplitude values are smaller than 1.665 and three for amplitude values of up to 2.786 . Finally, it has been shown very recently that the number of mass points in the support of the capacity-achieving input distribution of a SISO channel is of the order O ( A 2 ) with A the amplitude constraint.
Based on a dual capacity expression, in [7], McKellips derived an upper bound on the capacity of a SISO channel that is subject to an amplitude constraint. The bound is asymptotically tight; that is, for amplitude values that tend to infinity. By cleverly choosing an auxiliary channel output distribution in the dual capacity expression, the authors of [8] sharpened McKellips’ upper bound and extended it to parallel MIMO channels with a Euclidian norm constraint on the input. The SISO version of the upper bound in [8] has been further sharpened in [9] by yet another choice of auxiliary output distribution. In [10], asymptotic lower and upper bounds for a 2 × 2 MIMO channel are presented and the gap between the bounds is specified.
In this work, we make progress on this open problem by deriving several new upper and lower bounds that hold for channels with arbitrary constraints on the support of the channel input distribution and then apply them to the practically relevant special case of MIMO channels that are subject to amplitude-constraints.

1.1. Contributions and Paper Organization

The remainder of the paper is organized as follows. The problem is stated in Section 2. In Section 3, we study properties of input distributions that achieve the capacity of input-constrained MIMO channels. The section reviews known results on the structure of optimal input distributions and presents several new results. In particular, Theorem 3 shows that the support of a capacity-achieving input distribution must necessarily be a small set both topologically and measure theoretically. Moreover, Theorem 8 characterizes conditions on the channel input space as well as on the channel matrix such that the support of the optimal input distribution is concentrated on the boundary of the channel input space.
In Section 4, we derive novel upper and lower bounds on the capacity of a MIMO channel that is subject to an arbitrary constraint on the support of the input. In particular, three families of upper bounds are proposed, which are based on: (i) the maximum entropy principle (see the bound in Theorem 9); (ii) the dual capacity characterization (see the bound in Theorem 10); and (iii) a relationship between mutual information and the minimum mean square error that is known as the I-MMSE relationship (see the bound in Theorem 11). On the other hand, Section 4 provides three different lower bounds. The first one is given in Theorem 12 and is based on the entropy power inequality. The second one (see Theorem 13) is based on a generalization of the celebrated Ozarow–Wyner bound [11] to the MIMO case. The third upper bound (see Theorem 14) is based on Jensen’s inequality and depends on the characteristic function of the channel input distribution.
In Section 5, we evaluate the performance of our bounds by studying MIMO channels with invertible channel matrices. In particular, Theorem 17 states that our upper and lower bounds are within n log 2 ( ρ ) bits, where ρ is the packing efficiency and n the number of transmit and receive antennas. For diagonal channel matrices, it is then shown (see Theorem 18) that the Cartesian product of simple pulse-amplitude modulation (PAM) constellations achieves the capacity to within 1.64 n bits.
Section 6 is devoted to MIMO channels with arbitrary channel matrices. It is shown that, in the regime of high amplitudes, similar to the case of average power-constrained channel inputs, the capacity scales linearly with the minimum of the number of transmit and receive antennas.
In Section 7, our upper and lower bounds are applied to the SISO case, which are then compared with bounds known from the literature. Finally, Section 8 concludes the paper. Note that parts of the results in this paper were also published in [12].

1.2. Notation

Vectors are denoted as bold lowercase letters, random vectors as bold uppercase letters, and matrices as bold uppercase sans serif letters (e.g., x , X , X ). For any deterministic vector x R n , n N , we denote the Euclidian norm of x by x . For some random X supp ( X ) R n and any p > 0 , we define
X p : = 1 n E [ X p ] 1 p ,
where supp ( X ) denotes the support of X . Note that for p 1 , the quantity in Equation (1) defines a norm and for n = 1 we simply have X p p = E [ | X | p ] .
The norm of a matrix H R n × n is defined as
H : = sup x : x 0 H x x ,
whereas Tr ( H ) is denoting its trace. The n × n identity matrix is represented as I n .
Let S be a subset of R n . Then,
Vol ( S ) : = S d x
denotes its volume. Moreover, the boundary of S is denoted as S .
Let R + : = { x R : x 0 } . We define an n-dimensional ball or radius r R + centered at x R n as the set
B x ( r ) : = { y R n : x y r } .
Recall that, for any x R n and r R + ,
Vol B x ( r ) = π n 2 Γ n 2 + 1 r n ,
where Γ ( z ) denotes the gamma function.
For any matrix H R k × n and some S R n , we define
H S : = { y R k : y = H x , x S } .
Note that for an invertible H R n × n , we have
Vol ( H S ) = | det ( H ) | Vol ( S )
with det ( H ) the determinant of H . We define the maximum and minimum radius of a set S R n that contains the origin as
r max ( S ) : = min { r R + : S B 0 ( r ) } , r min ( S ) : = max { r R + : B 0 ( r ) S } .
For a given vector a = ( a 1 , , a n ) R + n , we define
Box ( a ) : = { x R n : | x i | a i , i = 1 , , n }
and the smallest box containing a given set S R n as
Box ( S ) : = inf { Box ( a ) : S Box ( a ) } ,
respectively.
The entropy of any discrete random object X is denoted as H ( X ) , whereas h ( X ) (i.e., the differential entropy) is used whenever X is continuous. The mutual information between two random objects X and Y is denoted as I ( X ; Y ) and N ( m , C ) denotes the multivariate normal distribution with mean vector m and covariance matrix C . Finally, log a + x : = max { log a ( x ) , 0 } , for any base a > 0 , Q ( x ) , x R , denotes the Q-function, and δ x ( y ) the Kronecker delta, which is one for x = y and zero otherwise.

2. Problem Statement

Consider a MIMO system with n t N transmit and n r N receive antennas. The corresponding n r -dimensional channel output for a single channel use is of the form
Y = H X + Z ,
for some fixed channel matrix H R n r × n t (considering a real-valued channel model is without loss of generality). Here and hereafter, we assume Z N ( 0 , I n r ) is independent of the channel input X R n t and H is known to both the transmitter and the receiver.
Now, let X R n t be a convex and compact channel input space that contains the origin (i.e., the length- n t zero vector) and let F X denote the cumulative distribution function of X . As of the writing of this paper, the capacity
C ( X , H ) : = max F X : X X I ( X ; Y ) = max F X : X X I ( X ; H X + Z ) ,
of this channel is unknown and we are interested in finding novel lower and upper bounds. Even though most of the results in this paper hold for arbitrary convex and compact X , we are mainly interested in the two important special cases:
(i)
per-antenna amplitude constraints, i.e., X = Box ( a ) for some given a = ( A 1 , , A n t ) R + n t ; and
(ii)
n t -dimensional amplitude constraint, i.e., X = B 0 ( A ) for some given A R + .
Remark 1.
Note that determining the capacity of a MIMO channel with average per-antenna power constraints is also still an open problem and has been solved for some special cases only [13,14,15,16,17].

3. Properties of an Optimal Input Distribution

Unlike the special cases of real and complex-valued SISO channels (i.e., n t = n r = 1 ), the structure of the capacity-achieving input distribution, denoted as F X , is in general not known. To motivate why in this paper we are seeking for novel upper and lower bounds on the capacity (Equation (2)) instead of trying to solve the optimization problem directly, in this section we first summarize properties optimal input distributions must posses, which demonstrate how complicated the optimization problem actually is. Note that, whereas an optimal input distribution always exists, it does not necessarily need to be unique.

3.1. Necessary and Sufficient Conditions for Optimality

To study properties of an optimal input distribution, we need the notion of a point of increase of probability distribution.
Definition 1.
(Points of Increase of a Distribution) A point x R n , n N , is said to be a point of increase of a given probability distribution F X if for any open set A R n containing x , F X ( A ) > 0 .
The following result provides necessary and sufficient conditions for the optimality of a channel input distribution.
Theorem 1.
Let F X be some given channel input distribution and let E ( F X ) X denote the set of points of increase of F X . Then, the following holds:
  • F X is capacity-achieving if and only if the Karush–Kuhn–Tucker (KKT) conditions
    h ( x ; F X ) h ( H X + Z ) , x X ,
    h ( x ; F X ) = h ( H X + Z ) , x E ( F X ) X ,
    are satisfied [3,18], where
    h ( x ; F X ) : = R n r 1 ( 2 π ) n r 2 e y H x 2 2 log 2 f Y ( y ) d y
    with f Y being the probability density of the channel output induced by the channel input X F X .
  • F X is unique and symmetric if H is left invertible [18].
  • F Y (i.e., the channel output distribution) is unique [18,19].

3.2. General Structure of Capacity-Achieving Input Distributions

Theorem 1 can be used to find general properties of the support of a capacity-achieving input distribution, which we will do in this subsection.
Remark 2.
Fully characterizing an input distribution that achieves the capacity of a general MIMO channel with per-antenna or an n t -dimensional amplitude constraint is still an open problem. To the best of our knowledge, the only general available for showing that discrete channel inputs are optimal was developed by Smith in [3] for the amplitude and variance-constrained Gaussian SISO channel. Since then, it has been useful to also characterize the optimal input distribution of several other SISO channels (see, for instance, [4,20,21,22,23,24]). The method relies on the following series of steps:
  • Towards a contradiction, it is assumed that the set of points of increase E ( F X ) is infinite.
  • The assumption in Step 1 is then used to establish a certain property of the function h ( x ; F X ) on the input space X . For example, by showing that h ( x ; F X ) has an analytic continuation to C . Then, by means of the Identity Theorem of complex analysis and the Bolzano–Weierstrass Theorem [25], Smith was able to show that h ( x ; F X ) must be constant.
  • By using either the Fourier or Laplace transform of h ( x ; F X ) together with the property of h ( x ; F X ) established in Step 2, a new a property of the channel output distribution F Y is established. For example, Smith was able to show that F Y must be constant.
  • A conclusion out of Step 3 is used to reach a contradiction. The contradiction implies that E ( F X ) must be finite. For example, to reach a contradiction, Smith was using the fact that the channel output distribution F Y results from a convolution with a Gaussian probability density, which cannot be constant.
Remark 3.
Under the restriction that the output space, Y , of a Gaussian SISO channel is finite and the channel input space, X , is subject to an amplitude constraint, Witsenhausen has shown in [26] that the capacity-achieving input distribution is discrete with the number of mass points bounded as | X | | Y | . The approach of Witsenhausen, however, does not use the variational technique of Smith and relies on arguments from convex analysis instead.
According to Remark 2, assuming in the MIMO case that E ( F X ) is of infinite cardinality does not help (or at least it is not clear how this assumption should be used) in showing that the capacity-achieving input distribution is discrete and finite. However, by using the weaker assumption that E ( F X ) contains a non-empty open subset in conjunction with the following version of the Identity Theorem, we can show that the support of the optimal input distribution is a small set in a certain topological sense.
Theorem 2.
(Identity Theorem for Real-Analytic Functions [27]) For some n N let U be a subset of R n and f , g : U R be two real-analytic functions that agree on a set A U . Then, f and g agree on R n if one of the following two conditions is satisfied:
(i)
A is an open set.
(ii)
A is a set of positive Lebesgue measure.
Furthermore, for n = 1 , it suffices for A to be an arbitrary set with an accumulation point.
We also need the definitions of a dense and a nowhere dense set.
Definition 2.
(Dense and Nowhere Dense Sets) A subset A X is said to be dense in the set X if every element x X either belongs to A or is an accumulation point of A . A subset A X is said to be nowhere dense if for every nonempty open subset U X , the intersection A U is not dense in U .
With Theorem 2 at our disposal, we are now able to prove the following result on the structure of the support of the optimal input distribution.
Theorem 3.
The set of points of increase E ( F X ) of an optimal input distribution F X is a nowhere dense subset of X that is of Lebesgue measure zero.
Proof. 
It is not difficult to show that h ( x ; F X ) is a real-analytic function on R n t ([18] Proposition 5). Now, in order to prove the result, we follow a series of steps similar to those outlined in Remark 2. Towards a contradiction, assume that the set of points of increase E ( F X ) of F X is not a nowhere dense subset of X . Then, according to Definition 2, there exists an open set U X such that E ( F X ) U is dense in U .
By using the KKT condition in Equation (3b), we have that h ( x ; F X ) is constant on the intersection E ( F X ) U . Thus, as E ( F X ) U is dense in U , it follows by the properties of continuous functions (real-analytic functions are continuous) that h ( x ; F X ) is also constant on U . Moreover, as U is an open set, Theorem 2 implies that h ( x ; F X ) must also be constant on R n t . This, however, leads to a contradiction as h ( x ; F X ) cannot be constant on all of R n t , which can be shown by taking the Fourier transform of h ( x ; F X ) and solving for the probability density f Y ( y ) of the channel output (the reader is referred to [3] for details). Therefore, we conclude that E ( F X ) is a nowhere dense subset of X .
Showing that E ( F X ) has Lebesgue measure zero follows along similar lines by assuming that E ( F X ) is a set of positive measure. Then, Property (ii) of Theorem 2 can be used to conclude that h ( x ; F X ) must be zero on all of U . This again leads to a contradiction, which implies that E ( F X ) must be of Lebesgue measure zero. □
Remark 4.
Note that if X = B 0 ( A ) for some A R + and h ( x ; F X ) is orthogonally equivariant (i.e., it only depends on x ), then E ( F X ) can be written as a union of concentric spheres. That is,
E ( F X ) = j C ( A j )
with C ( A j ) : = { x R n t : x = A j } for some A j R + . To see this, let
g ( x ) : = h ( x ; F X ) h ( H X + Z )
and observe that if x E ( F X ) , then g ( x ) = 0 . Combining this with the symmetry of the function x g ( x ) , we have that (We know that it is abuse of notation to use the same letter for the functions x g ( x ) and x g ( x ) even if they are different. It is an attempt to say in a compact way that g is orthogonally equivariant.)
x = y : x E ( F X ) y E ( F X ) .
Moreover, this implies that
E ( F X ) = j I C ( A j ) ,
where I is possibly of infinite cardinality. In fact, I has finite cardinality. To see this, note that, if g ( x ) is real-analytic, then so is g ( x ) . However, as x g ( x ) is a non-zero real-analytic function on R , it can have at most finitely many zeros on an interval.
As an example consider the special case n r = n t = n with H = I n . Then, the union in Equation (4) implies that the cardinality of E ( F X ) is uncountable and that discrete inputs are in general not optimal. Therefore, Theorem 3 can generally not be improved in the sense that for n > 1 , statements about the cardinality of E ( F X ) cannot be made. Note, however, that the magnitude of X is discrete. An example of the corresponding optimal input distribution for the case of n = 2 is given in Figure 1.
Even though Theorem 3 does not allow us to conclude that the optimal input distribution of an arbitrary MIMO channel is discrete and finite, for the special case of a SISO channel we have the following partial result.
Theorem 4.
(Optimal Input Distribution of a SISO Channel [3,6,28]) For some fixed h R and A R + , consider the SISO channel Y = h X + Z with input space X = [ A , A ] . Let F X be an input distribution that achieves the capacity, C ( X , h ) , of that channel. Then, F X satisfies the following properties:
  • F X is unique.
  • F X is symmetric.
  • F X is discrete with the number of mass points being of the order O ( A 2 ) .
  • F X contains probability mass points at { A , A } .
Moreover, binary communication with mass points at { A , A } is optimal if and only if A A ¯ , where A ¯ 1.665 .
Theorem 4 can now be used to also address the special cases of multiple-input single output (MISO) and single-input multiple output (SIMO) channels.
Theorem 5.
Let Y = h T X + Z be a MISO channel with channel matrix h T R n t and some optimal input X X R n t . Then, the distribution of h T X * is discrete with finitely many mass points. On the other hand, let Y = h X + Z be a SIMO channel with channel matrix h R n r . Then, the optimal input X X R has a discrete distribution with finitely many mass points.
Proof. 
For the MISO case, the capacity can expressed as
C ( X , h T ) = max F X : X X I ( X ; h T X + Z ) = max F X : X X I ( h T X ; h T X + Z ) = max F S : S h T X I ( S ; S + Z ) = max F S : | S | r max ( h T X ) I ( S ; S + Z ) .
Using Theorem 5 we have that the maximizing distribution in Equation (5) F S , where S : = h T X , is discrete with finitely many mass points.
For the SIMO case, the channel input distribution is discrete as a SIMO channel can be transformed into a SISO channel. Thus, let A R + be finite. Then, the capacity of the SIMO channel can be expressed as
C ( X , h ) = max F X : | X | A I ( X ; h X + Z ) = max F X : | X | A I ( X ; h X + Z ) .
Again, it follows from Theorem 5 that the mutual information in Equation (6) is maximized by a channel input distribution, F X , that is discrete with finitely many mass points. This concludes the proof. □
Remark 5.
Note that in the MISO case, we do not claim F X to be discrete with finitely many points. To illustrate the difficulty, let h T = [ 1 , 1 ] so that
h T X = h T X 1 X 2 = X 1 X 2 .
As X 1 and X 2 can be arbitrarily correlated, we cannot rule out cases in which X 1 = X D and X 2 = X 2 D , with D a discrete random variable and X of arbitrary distribution. Clearly the distribution of X is not discrete.
Note that in general it can be shown that the capacity-achieving input distribution is discrete if the optimization problem in Equation (2) can be reformulated as an optimization over one dimensional distributions. This, for example, has been done in [5] for parallel channels with a total amplitude constraint.

3.3. Properties of Capacity-Achieving Input Distributions in the Small (But Not Vanishing) Amplitude Regime

In this subsection, we study properties of capacity-achieving input distribution in the regime of small amplitudes. To that end, we will need the notion of a subharmonic function.
Definition 3.
(Subharmonic Function) Let f be a real-valued function that is twice continuously differentiable on an open set G R n . Then, f is subharmonic if 2 f 0 on G , where 2 denotes the Laplacian (if f is twice differentiable, its Laplacian is given by 2 f ( x 1 , , x n ) = i = 1 n 2 f x i 2 ).
We use the following Theorem, which states that a subharmonic function always attains its maximum on the boundary of its domain.
Theorem 6.
(Maximum Principle of Subharmonic Functions [29]) Let G R n be a connected open set. If f : G R is subharmonic and attains a global maximum in the interior of G , then f is constant on G .
In addition to Theorem 6, we need the following result that has been proven in [30].
Lemma 1.
Let the likelihood function of the output of a MIMO channel be defined as
: R n r R , ( y ) = f Y ( y ) 1 ( 2 π ) n r 2 e y 2 2
and let A y denote the Hessian matrix of log e ( y ) . Then, the Laplacian (or the trace of A y ) is given by
2 log e ( y ) = Tr ( A y ) = E H X 2 | Y = y E [ H X | Y = y ] 2 .
Theorem 7.
Suppose that r max 2 ( H X ) log 2 ( e ) . Then, x h ( x ; F X ) is a subharmonic function for every F X .
Proof. 
Let F X be arbitrary and observe that
h ( x ; F X ) = E log 2 f Y ( H x + Z ) = E ( H x + Z ) E log 2 1 ( 2 π ) n r 2 e H x + Z 2 2 = E ( H x + Z ) + E H x + Z 2 2 log 2 ( e ) + log 2 ( 2 π ) n r 2 = E ( H x + Z ) + H x 2 + n r 2 log 2 ( e ) + log 2 ( 2 π ) n r 2 .
With this expression in hand, the Laplacian of h ( x ; F X ) with respect to x can be bounded from below as follows:
2 h ( x ; F X ) = 2 E ( H x + Z ) + H x 2 + n r 2 log 2 ( e ) + log 2 ( 2 π ) n r 2 = E 2 ( H x + Z ) + 2 H x 2 2 log 2 ( e ) = ( a ) E Tr H H T A H x + Z + 2 H x 2 2 log 2 ( e ) = E Tr H H T A H x + Z + Tr H H T log 2 ( e ) ( b ) E Tr H H T Tr A H x + Z + Tr H H T log 2 ( e ) = Tr H H T E Tr A H x + Z + log ( e ) ( c ) Tr H H T r max 2 ( H X ) + log ( e ) ,
where ( a ) follows from Equation (7) and the chain rule for the Hessian; ( b ) from using the well-known inequality
Tr ( C D ) 2 Tr ( C ) 2 Tr ( D ) 2
that holds for C and D positive semi-definite; and ( c ) from using the inequality
Tr A y = E H X 2 | Y = y E [ H X | Y = y ] 2 E H X 2 | Y = y r max 2 ( H X ) .
Thus, according to the assumption that r max 2 ( H X ) log 2 ( e ) , the right-hand side of Equation (8) is nonnegative, which proves the result. □
Now, knowing that h ( x ; F X ) is a subharmonic function allows us to characterize the support of an optimal input distribution of a MIMO channel provided that the radius of the channel input space, X , is sufficiently small.
Theorem 8.
Let F X be a capacity-achieving input distribution and r max 2 ( H X ) log 2 ( e ) . Then, E ( F X ) X .
Proof. 
From the KKT conditions in Equation (3), we know that, if x E ( F X ) , then x is a maximizer of h ( x ; F X ) . According to Theorem 7, we also know that h ( x ; F X ) is subharmonic. Hence, from the Maximum Principle of Subharmonic Functions (i.e., Theorem 6), it follows E ( F X ) X . □
Combining Theorem 8 with the observations made in Remark 4 leads to the following corollary.
Corollary 1.
Let X = B 0 ( A ) and A 1 H log 2 ( e ) . Then,
E ( F X ) C ( A ) ,
where C ( A ) : = { x R n : x = A } denotes a sphere of radius A.
We conclude this section by noting that for the special case n t = n r = n with H = I n , the exact value of A such that E ( F X ) = C ( A ) has been characterized in terms of an integral equation in [31], which is approximately equal to 1.5 n .

4. Upper and Lower Bounds on the Capacity

The considerations in the previous section have shown that characterizing the structure of an optimal channel-input distribution is a challenging question in itself that we could only partially answer. A full characterization, however, is a necessary prerequisite to narrow down the search space in Equation (2) to one that is tractable. Except for some special cases (i.e., special choices of X ), optimizing over the most general space of input distributions that consists of all continuous n t -dimensional probability distributions F X with X X , is prohibitive (Note that Dytso et al. [32] summarized methods of how to optimize functionals over the space of probability distributions that are constrained in there support). Thus, up to the writing of this paper, there is little hope in being able to solve the problem in Equation (2) in full generality so that in this section we are proposing novel lower and upper bounds on the capacity C ( X , H ) . Nevertheless, these bounds will allow us to better understand how the capacity of such MIMO channels behaves.
Towards this end, in Section 4.1, we provide four upper bounds. The first is based on an upper bound on the differential entropy of a random vector that is constraint in its pth moment, the second and third bounds are based on duality arguments, and the fourth on the relationship between mutual information and the minimum mean square error (MMSE), I-MMSE relationship for short, known from [33]. The three lower bounds proposed in Section 4.2, on the other hand, are based on the celebrated entropy power inequality, a generalization of the Ozarow–Wyner capacity bound taken from [11], and on Jensen’s inequality.

4.1. Upper Bounds

To establish our first upper bound on Equation (2), we need the following result ([11] Th. 1).
Lemma 2.
(Maximum Entropy Under pth Moment Constraint) Let n N and p ( 0 , ) be arbitrary. Then, for any U R n such that h ( U ) < and U p < , we have
h ( U ) n log 2 k n , p n 1 p U p ,
where
k n , p : = π e 1 p p n 1 p Γ n p + 1 1 n Γ n 2 + 1 1 n .
Theorem 9.
(Moment Upper Bound) For any channel input space X and any fixed channel matrix H , we have
C ( X , H ) C ¯ M ( X , H ) : = inf p > 0 n r log 2 k n r , p ( 2 π e ) 1 2 n r 1 p x ˜ + Z p ,
where x ˜ H X is chosen such that x ˜ = r max ( H X ) .
Proof. 
Expressing Equation (2) in terms of differential entropies results in
C ( X , H ) = max F X : X X h ( H X + Z ) h ( Z ) ( a ) max F X : X X n r log 2 k n r , p ( 2 π e ) 1 2 n r 1 p H X + Z p = ( b ) n r log 2 k n r , p ( 2 π e ) 1 2 n r 1 p max F X : X X H X + Z p ,
where ( a ) follows from Lemma 2 with the fact that h ( Z ) = n r 2 log 2 ( 2 π e ) ; and ( b ) from the monotonicity of the logarithm.
Now, notice that H X + Z p is linear in F X and therefore it attains its maximum at an extreme point of the set F X : = { F X : X X } (i.e., the set of all cumulative distribution functions of X ). As a matter of fact [26], the extreme points of F X are given by the set of degenerate distributions on X ; that is, { F X ( y ) = δ x ( y ) , y X } x X . This allows us to conclude
max F X : X X H X + Z p = max x X H x + Z p .
Observe that the Euclidian norm is a convex function, which is therefore maximized at the boundary of the set H X . Combining this with Equation (9) and taking the infimum over p > 0 completes the proof. □
The following Theorem provides two alternative upper bounds that are based on duality arguments.
Theorem 10.
(Duality Upper Bounds) For any channel input space X and any fixed channel matrix H
C ( X , H ) C ¯ Dual , 1 ( X , H ) : = log 2 c n r ( d ) + Vol B 0 ( d ) ( 2 π e ) n r 2 ,
where
d : = r max ( H X ) , c n r ( d ) : = i = 1 n r 1 n r 1 i Γ n r 1 2 2 n r 2 Γ n r 2 d i ,
and
C ( X , H ) C ¯ Dual , 2 ( X , H ) : = i = 1 n r log 2 1 + 2 A i 2 π e ,
where a = ( A 1 , , A n r ) such that Box ( a ) = Box ( H X ) .
Proof. 
Using duality bounds, it has been shown in [8] that for any centered n-dimensional ball of radius r R +
max F X : X B 0 ( r ) I ( X ; X + Z ) log 2 c n ( r ) + Vol B 0 ( r ) ( 2 π e ) n 2 ,
where c n ( r ) : = i = 1 n 1 n 1 i Γ n 1 2 2 n 2 Γ n 2 r i .
Now, observe that
C ( X , H ) = max F X : X X h ( H X + Z ) h ( H X + Z | H X ) = max F X : X X I ( H X ; H X + Z ) = max F X ˜ : X ˜ H X I ( X ˜ ; X ˜ + Z ) ( a ) max F X ˜ : X ˜ B 0 ( d ) , d : = r max ( H X ) I ( X ˜ ; X ˜ + Z ) ( b ) log 2 c n r ( d ) + Vol B 0 ( d ) ( 2 π e ) n r 2 .
where ( a ) follows from enlarging the optimization domain; and ( b ) from using the upper bound in Equation (12). This proves Equation (10).
To show the upper bound in Equation (11), we proceed with an alternative upper bound to Equation (13):
C ( X , H ) = max F X ˜ : X ˜ H X I ( X ˜ ; X ˜ + Z ) ( a ) max F X ˜ : X ˜ Box ( H X ) I ( X ˜ ; X ˜ + Z ) ( b ) max F X ˜ : X ˜ Box ( H X ) i = 1 n r I ( X ˜ i ; X ˜ i + Z i ) = ( c ) i = 1 n r max F X ˜ i : | X ˜ i | A i I ( X ˜ i ; X ˜ i + Z i ) ( d ) i = 1 n r log 2 1 + 2 A i 2 π e ,
where the (in)equalities follow from: ( a ) enlarging the optimization domain; ( b ) single-letterizing the mutual information; ( c ) choosing individual amplitude constraints ( A 1 , , A n r ) a R + n r such that Box ( a ) = Box ( H X ) ; and ( d ) using the upper bound in Equation (12) for n = 1 . This concludes the proof. □
As mentioned at the beginning of the section, another simple technique for deriving upper bounds on the capacity is to use the I-MMSE relationship [33]
I ( X ; X + Z ) = log 2 ( e ) 2 0 1 E X E [ X | γ X + Z ] 2 d γ .
For any γ 0 , the quantity E X E [ X | γ X + Z ] 2 is known as the MMSE of estimating X from the noisy observation γ X + Z . An important fact that will be useful is that the conditional expected value E [ X | γ X + Z ] is the best estimator in the sense that it minimizes the mean square error over all measurable functions f : R n r R n t ; that is, for any Y R n r and X R n t
E X E [ X | Y ] 2 = inf f is measurable E X f ( Y ) 2 .
Theorem 11.
(I-MMSE Upper Bound) For any channel input space X and any fixed channel matrix H
C ( X , H ) C ¯ I MMSE ( X , H ) = log 2 ( e ) n r 2 + n r 2 log 2 r max 2 ( H X ) n r , r max 2 ( H X n r r max 2 ( H X ) 2 , r max 2 ( H X ) n r .
Proof. 
Fix some ϵ [ 0 , 1 ] and observe that
2 log 2 ( e ) I ( X ; H X + Z ) = ( a ) 2 log 2 ( e ) I ( H X ; H X + Z ) = ( b ) 0 1 E H X E [ H X | γ H X + Z ] 2 d γ = 0 ϵ E H X E [ H X | γ H X + Z ] 2 d γ + ϵ 1 E H X E [ H X | γ H X + Z ] 2 d γ ( c ) 0 ϵ E H X 0 2 d γ + ϵ 1 E H X 1 γ ( γ H X + Z ) 2 d γ = ϵ E H X 2 + ϵ 1 1 γ E Z 2 d γ = ϵ E H X 2 + E Z 2 log e 1 ϵ = ϵ E H X 2 + n r log e 1 ϵ ,
where the (in)equalities follow from: ( a ) using that I ( H X ; H X + Z ) = I ( X ; H X + Z ) for any fixed H ; ( b ) using the I-MMSE relationship in Equation (14); and ( c ) using the property that conditional expectation minimizes mean square error (i.e., (15)).
Now, notice that
max F X : X X I ( X ; H X + Z ) max F X : X X log 2 ( e ) 2 ϵ E H X 2 + n r log e 1 ϵ = ( a ) 1 2 ϵ max x ˜ X H x ˜ 2 + n r log e 1 ϵ = ( b ) 1 2 ϵ r max 2 ( H X ) + n r log e 1 ϵ ,
where ( a ) follows from max F X : X X E H X 2 = max x ˜ X H x ˜ 2 (the same argument was used in the proof of Theorem 9); and ( b ) from the definition of r max 2 ( H X ) .
Since ϵ is arbitrary, we can choose it to minimize the upper bound in Equation (16). Towards this end, we need the following optimization result
min ϵ [ 0 , 1 ] ϵ a + b log 1 ϵ = b + b log a b , a b a , a b ,
which is easy to show. Combining Equation (16) with Equation (17), we obtain the following upper bound on the capacity
max F X : X X I ( X ; H X + Z ) n r 2 + n r 2 log r max 2 ( H X ) n r , r max 2 ( H X ) n r r max 2 ( H X ) , r max 2 ( H X ) n r .
This concludes the proof. □
Corollary 2.
For any channel input space X and any fixed channel matrix H
C ¯ I MMSE ( X , H ) log 2 ( e ) n r 2 + n r 2 log e H 2 r max 2 ( X ) n r , H 2 r max 2 ( X ) n r H 2 r max 2 ( X ) 2 , H 2 r max 2 ( X ) n r .
Proof. 
The corollary follows by upper bounding Equation (16) using the fact that r max 2 ( H X ) H r max 2 ( X ) . □
Remark 6.
In the proof of Theorem 11, instead of using sub-optimal estimators f ( Y ) = 0 and f ( Y ) = 1 γ Y , we could have used an optimal linear estimator of the form f ( Y ) = K X Y K Y 1 Y , where K X Y denotes the cross-covariance matrix between X and Y and K Y the covariance matrix of Y . This choice would result in the capacity upper bound
C ( X , H ) max K X : X X 1 2 log 2 det I n r + H K X H T
with K X the covariance matrix of the channel input. While Equation (18) is a valid upper bound, as of the writing of this paper, it is not clear how to perform an optimization over covariance matrices of random variables with bounded support. One possibility to avoid this is to use the inequality between arithmetic and geometric mean and bound the determinant by the trace:
det I n r + H K X H T Tr I n r + H K X H T n r n r = Z + H X 2 2 n r n r .
However, combining Equation (19) with Equation (18) is merely a special case of the moment upper bound of Theorem 9 for p = 2 . Therefore, the estimators in Theorem 11 are chosen to obtain a non-trivial upper bound avoiding the optimization over covariance matrices.
In Section 5, we present a comparison of the upper bounds of Theorems 9–11 by means of a simple example.

4.2. Lower Bounds

A classical approach to bound a mutual information from below is to use the entropy power inequality (EPI).
Theorem 12.
(EPI Lower Bounds) For any fixed channel matrix H and any channel input space X with X absolutely continuous, we have
C ( X , H ) C ̲ EPI ( X , H ) : = max F X : X X n r 2 log 2 1 + 2 2 n r h ( H X ) 2 π e .
Moreover, if n t = n r = n , H R n × n invertible, and X uniformly distributed over X , then
C ( X , H ) C ̲ EPI ( X , H ) : = n 2 log 2 1 + | det ( H ) | 2 n Vol ( X ) 2 n 2 π e .
Proof. 
By means of the EPI
2 2 n r h ( H X + Z ) 2 2 n r h ( H X ) + 2 2 n r h ( Z ) ,
we conclude
2 2 n r C ( X , H ) 1 + ( 2 π e ) 1 2 2 n r max F X : X X h ( H X ) ,
which finalizes the proof of the lower bound in Equation (20).
To show the lower bound in Equation (21), all we need is to recall that
h ( H X ) = h ( X ) + log 2 | det ( H ) | ,
which is maximized for X uniformly distributed over X . However, if X is uniformly drawn from X , we have
2 2 n h ( H X ) = Vol ( H X ) 2 n = | det ( H ) | 2 n Vol ( X ) 2 n ,
which completes the proof. □
The considerations in Section 3 suggest that a channel input distribution that maximizes Equation (2) might be discrete. Therefore, there is a need for lower bounds that unlike the bounds in Theorem 12 rely on discrete inputs.
Theorem 13.
(Ozarow–Wyner Type Lower Bound) Let X D supp ( X D ) R n t be a discrete random vector of finite entropy, g : R n r R n t a measurable function, and p > 0 . Furthermore, let K p be a set of continuous random vectors, independent of X D , such that for every U K p we have h ( U ) < , U p < , and
supp ( U + x i ) supp ( U + x j ) =
for all x i , x j supp ( X D ) , i j . Then,
C ( X , H ) C ̲ OW ( X , H ) : = [ H ( X D ) gap ] + ,
where
gap : = inf U K p g m e a s u r a b l e p > 0 G 1 , p ( U , X D , g ) + G 2 , p ( U )
with
G 1 , p ( U , X D , g ) : = n t log 2 U + X D g ( Y ) p U p ,
G 2 , p ( U ) : = n t log 2 k n t , p n t 1 p U p 2 1 n t h ( U ) ,
and k n t , p as defined in Lemma 2, respectively.
Proof. 
The proof is identical to ([11] Theorem 2). To make the manuscript more self-contained, we repeat it here.
Let U and X D be statistically independent. Then, the mutual information I ( X D ; Y ) can be lower bounded as
I ( X D ; Y ) ( a ) I ( X D + U ; Y ) = h ( X D + U ) h ( X D + U | Y ) = ( b ) H ( X D ) + h ( U ) h ( X D + U | Y ) .
Here, ( a ) follows from the data processing inequality as X D + U X D Y forms a Markov chain in that order; and ( b ) from the assumption in Equation (22). By using Lemma 2, we have that the last term in Equation (25) can be bounded from above as
h ( X D + U | Y ) n t log 2 k n t , p n t 1 p X D + U g ( Y ) p .
Combining this expression with Equation (25) results in
I ( X D ; Y ) H ( X D ) G 1 , p ( U , X D , g ) + G 2 , p ( U ) ,
with G 1 , p and G 2 , p as defined in Equations (23) and (24), respectively. Maximizing the right-hand side over all U K p , measurable functions g : R n r R n t , and p > 0 provides the bound. □
Interestingly, the bound of Theorem 13 holds for arbitrary channels and is therefore not restricted to MIMO channels. The interested reader is referred to [11] for details.
We conclude the section by providing a lower bound that is based on Jensen’s inequality and holds for arbitrary inputs.
Theorem 14.
(Jensen’s Inequality Lower Bound) For any channel input space X and any fixed channel matrix H , we have
C ( X , H ) C ̲ Jensen ( X , H ) : = max F X : X X log 2 + 2 e n r 2 ψ 1 ( X , H )
with
ψ ( X , H ) : = E [ exp H ( X X ) 2 4 ] = E ϕ X H T Z 2 2 ,
where X is an independent copy of X and ϕ X denotes the characteristic function of X .
Proof. 
To show the lower bound, we follow an approach of Dytso et al. [34]. Note that by Jensen’s inequality
h ( Y ) = E [ log 2 f Y ( Y ) ] log 2 E [ f Y ( Y ) ] = log 2 R n r f Y ( y ) f Y ( y ) d y .
Now, evaluating the integral in Equation (27) results in
R n r f Y ( y ) f Y ( y ) d y = 1 ( 2 π ) n r R n r E e y H X 2 2 E e y H X 2 2 d y = ( a ) 1 ( 2 π ) n r E R n r e y H X 2 + y H X 2 2 d y = ( b ) 1 ( 2 π ) n r E e H X H X 2 4 R n r e y H ( X X ) 2 2 d y = ( c ) 1 2 n r π n r 2 E e H ( X X ) 2 4 ,
where ( a ) follows from the independence of X and X and Tonelli’s Theorem ([35] Chapter 5.9); ( b ) from completing a square; and ( c ) from the fact that R n r e y H ( X X ) 2 2 d y = R n r e y 2 d y = π n r 2 . Combining Equation (27) with Equation (28) and subtracting h ( Z ) = n r 2 log 2 ( 2 π e ) completes the proof of the first version of the bound.
To show the second version, observe that
E e H ( X X ) 2 4 = ( d ) E ϕ H ( X X ) 2 ( Z ) = ( e ) E ϕ H X 2 ( Z ) ϕ H X 2 ( Z ) = ( f ) E ϕ X H T Z 2 ϕ X H T Z 2 = ( g ) E ϕ X H T Z 2 ϕ X H T Z 2 = ( h ) E ϕ X H T Z 2 ϕ X * H T Z 2 = E ϕ X H T Z 2 2 ,
where ( d ) follows from Parseval’s identity ([35] Chapter 9.5) by noting that exp ( · 2 / 2 ) is a characteristic function of Z and ϕ H ( X X ) 2 ( · ) is a characteristic function of H ( X X ) 2 ; ( e ) from using the property that the characteristic function of a sum of random vectors is equal to the product of its characteristic functions; ( f ) from using the fact that a characteristic function is a linear transformation; ( g ) from using that X and X have the same characteristic function; and ( h ) from the fact that the characteristic function is Hermitian. This completes the proof. □
Remark 7.
As is evident from our examples in the following sections, in many cases, the Jensen’s inequality lower bound of Theorem 14 performs remarkably well. The bound, however, is also useful for MIMO channels that are subject to an average power constraint. For example, evaluating Equation (26) with X N ( 0 , K X ) results in
I ( X ; H X + Z ) 1 2 log 2 + 2 e min ( n r , n t ) det I n r + H K X H T .
Note that this bound is within min ( n r , n t ) 2 log 2 2 e bits of the capacity of the power-constrained channel.
In Section 3, we discuss that the distributions that maximize mutual information in n t -dimensions are typically singular, which means that they are concentrated on a set of Lebesgue measure zero. Singular distributions generally do not have a probability density, whereas the characteristic function always exists. This is why the version of Jensen’s inequality lower bound in Theorem 14 that is based on the characteristic function of the channel input is especially useful for amplitude-constrained MIMO channels.

5. Invertible Channel Matrices

Consider the symmetric case of n t = n r = n antennas with H R n × n being invertible. In this section, we evaluate some of the lower and upper bounds proposed in the previous section for the special case of H being also diagonal and then characterize the gap to the capacity for arbitrary invertible channel matrices.

5.1. Diagonal Channel Matrices

Suppose the channel inputs are subject to per-antenna or an n-dimensional amplitude constraint. Then, the duality upper bound C ¯ Dual , 2 ( X , H ) of Theorem 10 takes on the following form.
Theorem 15.
(Upper Bounds) Let H = diag ( h 11 , , h n n ) R n × n be fixed. If X = Box ( a ) for some a = ( A 1 , , A n ) R + n , then
C ¯ Dual , 2 ( Box ( a ) , H ) = i = 1 n log 2 1 + 2 | h i i | A i 2 π e .
Moreover, if X = B 0 ( A ) for some A R + , then
C ¯ Dual , 2 ( B 0 ( A ) , H ) = i = 1 n log 2 1 + 2 | h i i | A n 2 π e .
Proof. 
The bound in Equation (29) immediately follows from Theorem 10 by observing that Box ( H Box ( a ) ) = Box ( H a ) . The bound in Equation (30) follows from Theorem 10 by the fact that
Box H B 0 ( A ) Box H Box B 0 ( A ) = Box ( h ) ,
where h : = A n ( | h 11 | , , | h n n | ) . This concludes the proof. □
For an arbitrary channel input space X , the EPI lower bound of Theorem 12 and Jensen’s inequality lower bound of Theorem 14 take on the following form.
Theorem 16.
(Lower Bounds) Let H = diag ( h 11 , , h n n ) R n × n be fixed and X arbitrary. Then,
C ̲ Jensen ( X , H ) = log 2 + 2 e n 2 1 ψ ( H , b )
with
ψ ( H , b ) : = min b X i = 1 n φ ( | h i i | B i ) ,
where b : = ( B 1 , , B n ) and φ : R + R + ,
φ ( x ) : = 1 x 2 e x 2 1 + π x 1 2 Q ( 2 x ) .
Moreover,
C ̲ EPI ( X , H ) = n 2 log 2 1 + Vol ( X ) 2 n i = 1 n h i i 2 n 2 π e .
Proof. 
For some given values B i R + , i = 1 , , n , let the ith component of X = ( X 1 , , X n ) be independent and uniformly distributed over the interval [ B i , B i ] . Thus, the expected value appearing in the bound of Theorem 14 can be written as
E e H ( X X ) 2 4 = E e i = 1 n h i i 2 ( X i X i ) 2 4 = E i = 1 n e h i i 2 ( X i X i ) 2 4 = i = 1 n E e h i i 2 ( X i X i ) 2 4 .
Now, if X is an independent copy of X , it can be shown that the expected value at the right-hand side of Equation (34) is of the explicit form
E e h i i 2 ( x i x i ) 2 4 = φ ( | h i i | B i )
with φ as defined in Equation (32). Finally, optimizing over all b = ( B 1 , , B n ) X results in the bound (31). The bound in Equation (33) follows by inserting | det ( H ) | = i = 1 n h i i into Equation (21), which concludes the proof. □
In Figure 2, the upper bounds of Theorems 9 and 15 and the lower bounds of Theorem 16 are depicted for a diagonal 2 × 2 MIMO channel with per-antenna amplitude constraints. It turns out that the moment upper bound and the EPI lower bound perform well in the small amplitude regime while the duality upper bound and Jensen’s inequality lower bound perform well in the high amplitude regime. Interestingly, for this specific example, the duality upper bound and Jensen’s lower bound are asymptotically tight.

5.2. Gap to the Capacity

Our first result provides and upper bound to the gap between the capacity in Equation (2) and the lower bound in Equation (21).
Theorem 17.
Let H R n × n be of full rank and
ρ ( X , H ) : = Vol B 0 r max ( H X ) Vol ( H X ) .
Then,
C ( X , H ) C ̲ EPI ( X , H ) n 2 log 2 ( π n ) 1 n ρ ( X , H ) 2 n .
Proof. 
For notational convenience, let the volume of an n-dimensional ball of radius r > 0 be denoted as
V n ( r ) : = Vol B 0 ( r ) = V n ( 1 ) r n = π n 2 r n Γ n 2 + 1 .
Now, observe that, by choosing p = 2 , the upper bound of Theorem 9 can further be upper bounded as
C ¯ M ( X , H ) n log 2 k n , 2 ( 2 π e ) 1 2 n 1 2 x ˜ + Z 2 = ( a ) n 2 log 2 1 n E x ˜ + Z 2 = ( b ) n 2 log 2 1 + 1 n x ˜ 2 ,
where ( a ) follows since k n , 2 = 2 π e n ; and ( b ) since E [ Z 2 ] = n . Therefore, the gap between Equation (21) and the moment upper bound of Theorem 9 can be upper bounded as follows:
C ¯ M ( X , H ) C ̲ EPI ( X , H ) = n 2 log 2 1 + 1 n x ˜ 2 1 + Vol ( H X ) 2 n 2 π e = a ) n 2 log 2 1 + 1 n V n ( x ˜ ) V n ( 1 ) 2 n 1 + Vol ( H X ) 2 n 2 π e = n 2 log 2 1 + 1 n ρ ( X , H ) Vol ( H X ) V n ( 1 ) 2 n 1 + Vol ( H X ) 2 n 2 π e b ) n 2 log 2 1 n 2 π e ρ ( X , H ) V n ( 1 ) 2 n c ) n 2 log 2 ( π n ) 1 n ρ ( X , H ) 2 n .
where ( a ) is due to the fact that x ˜ is the radius of an n-dimensional ball; ( b ) follows from the inequality 1 + c x 1 + x c for c 1 and x R + ; and ( c ) follows from using Stirling’s approximation to obtain 1 V n ( 1 ) 2 n 1 2 e π 1 1 n n 1 + 1 n . □
The term ρ ( X , H ) is referred to as the packing efficiency of the set H X . In the following proposition, we present the packing efficiencies for important special cases.
Proposition 1.
(Packing Efficiencies) Let H R n × n be of full rank, A R + , and a : = ( A 1 , , A n ) R + n . Then,
ρ B 0 ( A ) , I n = 1 ,
ρ B 0 ( A ) , H = H n | det ( H ) | ,
ρ Box ( a ) , I n = π n 2 Γ n 2 + 1 a n i = 1 n A i ,
ρ Box ( a ) , H π n 2 Γ n 2 + 1 H n a n | det ( H ) | i = 1 n A i .
Proof. 
The packing efficiency in Equation (35) follows immediately. Note that
r max H B 0 ( A ) = max x B 0 ( A ) H x = H A .
Thus, as H is assumed to be invertible, we have Vol ( H B 0 ( A ) ) = | det ( H ) | Vol ( B 0 ( A ) ) , which results in Equation (36). To show Equation (37), observe that
Vol B 0 r max I n Box ( a ) = Vol B 0 ( a ) = π n 2 Γ n 2 + 1 a n .
The proof of Equation (37) is concluded by observing that Vol ( I n Box ( a ) ) = i = 1 n A i . Finally, observe that Box ( a ) B 0 ( a ) implies r max ( H Box ( a ) ) r max ( H B 0 ( a ) ) so that
ρ H , Box ( a ) Vol B 0 ( H a ) Vol H Box ( a ) = π n 2 Γ n 2 + 1 H n a n | det ( H ) | i = 1 n A i ,
which is the bound in Equation (38). □
We conclude this section by characterizing the gap to the capacity when H is diagonal and the channel input space is the Cartesian product of n PAM constellations. In this context, PAM ( N , A ) refers to the set of N N equidistant PAM-constellation points with amplitude constraint A R + (see Figure 3 for an illustration), whereas X PAM ( N , A ) means that X is uniformly distributed over PAM ( N , A ) [11].
Theorem 18.
Let H = diag ( h 11 , , h n n ) R n × n be fixed and X = ( X 1 , , X n ) . Then, if X i PAM ( N i , A i ) , i = 1 , , n , for some given a = ( A 1 , , A n ) R + n , it holds that
C ¯ Dual , 2 ( Box ( a ) , H ) C ̲ OW ( Box ( a ) , H ) c · n b i t s ,
where N i : = 1 + 2 A i | h i i | 2 π e and
c : = 1 + 1 2 log 2 π e 6 + 1 2 log 2 1 + 6 π e 1.64 .
Moreover, if X i PAM ( N i , A ) , i = 1 , , n , for some given A R + , it holds that
C ¯ Dual , 2 ( B 0 ( A ) , H ) C ̲ OW ( B 0 ( A ) , H ) c · n b i t s ,
where N i : = 1 + 2 A | h i i | n 2 π e .
Proof. 
Since the channel matrix is diagonal, letting the channel input X be such that its elements X i , i = 1 , , n , are independent, we have that
I ( X ; H X + Z ) = i = 1 n I ( X i ; h i i X i + Z i ) .
Let X i PAM ( N i , A i ) with N i : = 1 + 2 A i | h i i | 2 π e and observe that half the Euclidean distance between any pair of adjacent points in PAM ( N i , A i ) is equal to Δ i : = A i / ( N i 1 ) (see Figure 3), i = 1 , , n . To lower bound the mutual information I ( X i ; h i i X i + Z i ) , we use the bound of Theorem 13 for p = 2 and n t = 1 . Thus, for some continuous random variable U that is uniformly distributed over the interval [ Δ i , Δ i ) and independent of X i , we have that
I ( X i ; h i i X i + Z i ) H ( X i ) 1 2 log 2 π e 6 1 2 log 2 E ( U + X i g ( Y i ) ) 2 E [ U 2 ] .
Now, note that the entropy term in Equation (41) can be lower bounded as
H ( X i ) = log 2 1 + 2 A i | h i i | 2 π e log 2 1 + 2 A i | h i i | 2 π e + log 2 ( 2 ) ,
where we have used that x x 2 for every x 1 . On the other hand, the last term in Equation (41) can be upper bounded by upper bounding its argument as follows:
E ( U + X i g ( Y i ) ) 2 E [ U 2 ] = a ) 1 + 3 E ( X i g ( Y i ) ) 2 Δ i 2 b ) 1 + 3 E [ Z i 2 ] ( N i 1 ) 2 A i 2 | h i i | 2 = 1 + 3 ( N i 1 ) 2 A i 2 | h i i | 2 c ) 1 + 3 2 A i | h i i | 2 π e 2 A i 2 | h i i | 2 = 1 + 6 π e .
where ( a ) follows from using that X i and U are independent and E [ U 2 ] = Δ i 2 3 ; ( b ) from using the estimator g ( Y i ) = 1 h i i Y i ; and ( c ) from N i = 1 + 2 A i | h i i | 2 π e 1 + 2 A i | h i i | 2 π e . Combining Equations (41), (42), and (43) results in the gap in (39).
The proof of the capacity gap in Equation (40) follows along similar lines, which concludes the proof. □
We are also able to determine the gap to the capacity for a general invertible channel matrix.
Theorem 19.
For any X and any invertible H
C ( X , H ) C ̲ OW ( X , H ) log 2 ( π n ) + n 2 log 2 1 + 4 n ( 4 + 4 n ) H 1 2 r max 2 ( H X ) r min 2 ( X ) + n H 1 2 r min 2 ( X ) .
Proof. 
Let X be uniformly distributed over a set constructed from an n-dimensional cubic lattice with the number of points equal to N = x ˜ + Z 2 n , where x ˜ H X is chosen such that x ˜ = r max ( H X ) , and scaled such that it is contained in the input space X . Note that the minimum distance between point in X are given by
d min supp ( X ) : = r min ( X ) N 1 n .
Now, we compute the difference between the moment upper bound of Theorem 9 and the Ozarow–Wyner lower bound of Theorem 13:
C ¯ M ( X , H ) C ̲ OW ( X , H ) ( a ) log 2 x ˜ + Z 2 H ( X ) + gap = n log 2 x ˜ + Z 2 log 2 x ˜ + Z 2 n + gap ( b ) log 2 ( 2 ) + gap ,
where ( a ) follows from Theorem 9 by choosing p = 2 ; and ( b ) by using the bound x x 2 for x > 1 . The next step in the proof consists in bounding the gap term, which requires to upper bound the terms in Equations (23) and (24) individually. Towards this end, choose p = 2 and let U be a random vector that is uniformly distributed over a ball of radius d min ( X ) . Thus, for (23) it follows
G 1 , 2 ( U , X , g ) = n log 2 U + X g ( Y ) 2 U 2 = ( a ) n log 2 U H 1 Z 2 U 2 = n 2 log 2 1 + H 1 Z 2 2 U 2 2 = ( b ) n 2 log 2 1 + 4 ( 4 + 4 n ) H 1 Z 2 2 d min 2 supp ( X ) ( c ) n 2 log 2 1 + ( 4 + 4 n ) H 1 Z 2 2 x ˜ + Z 2 2 r min 2 ( X ) = ( d ) n 2 log 2 1 + ( 4 + 4 n ) H 1 Z 2 2 r max 2 ( H X ) + n r min 2 ( X ) ( e ) n 2 log 2 1 + 4 ( 4 + 4 n ) H 1 2 Z 2 2 r max 2 ( H X ) + n r min 2 ( X ) = n 2 log 2 1 + 4 ( 4 + 4 n ) H 1 2 n r max 2 ( H X ) + n r min 2 ( X ) .
where ( a ) follows by choosing g ( Y ) = H 1 Y : ( b ) by using U 2 2 = r 2 4 + 2 n , where r = d min supp ( X ) 2 is the radius of an n-dimensional ball; ( c ) from dropping the floor function in the expression for the minimum distance, i.e.,
d min 1 supp ( X ) = N 1 n r min ( X ) = x ˜ + Z 2 n 1 n r min ( X ) x ˜ + Z 2 r min ( X ) ;
( d ) follows by expanding x ˜ + Z 2 2 using that x ˜ = r max ( H X ) ; and ( e ) from using the bound H 1 Z 2 H 1 Z 2 .
On the other hand, the term G 2 , p ( U ) can be bounded from above as follows ([36] Appendix L):
G 2 , p ( U ) = n log 2 k n , p n 1 p U p 2 1 n h ( U ) n log 2 ( π n ) 1 n .
Combining these two bounds with the one in (44) provides the result. □

6. Arbitrary Channel Matrices

For an arbitrary MIMO channel with an average power constraint, it is well known that the capacity is achieved by a singular value decomposition (SVD) of the channel matrix (i.e., H = U Λ V T ) along with considering the equivalent channel model
Y ˜ = Λ X ˜ + Z ˜ ,
where Y ˜ : = U T Y , X ˜ : = V T X , and Z ˜ : = U T Z , respectively.
To provide lower bounds for channels with amplitude constraints and SVD precoding, we need the following lemma.
Lemma 3.
For any given orthogonal matrix V R n t × n t and constraint vector a = ( A 1 , , A n t ) R + n t , there exists a distribution F X of X such that X ˜ = V T X is uniformly distributed over Box ( a ) . Moreover, the components X ˜ 1 , , X ˜ n t of X ˜ are mutually independent with X ˜ i uniformly distributed over [ A i , A i ] , i = 1 , , n t .
Proof. 
Suppose that X ˜ is uniformly distributed over Box ( a ) ; that is, the density of X ˜ is of the form
f X ˜ ( x ˜ ) = 1 Vol Box ( a ) , x ˜ Box ( a ) .
Since V is orthogonal, we have V X ˜ = X and by the change of variable Theorem for x V Box ( a )
f X ( x ) = 1 | det ( V ) | f X ˜ ( V T x ) = 1 | det ( V ) | Vol Box ( a ) = 1 Vol Box ( a ) .
Therefore, such a distribution of X exists. □
Theorem 20.
(Lower Bounds with SVD Precoding) Let H R n r × n t be fixed, n min : = min ( n r , n t ) , and X = Box ( a ) for some a = ( A 1 , , A n t ) R + n t . Furthermore, let σ i , i = 1 , , n min , be the ith singular value of H . Then,
C ̲ Jensen ( Box ( a ) , H ) = log 2 + 2 e n min 2 1 ψ ( H , b )
and
C ̲ EPI ( Box ( a ) , H ) = n min 2 log 2 1 + i = 1 n min A i σ i 2 n min 2 π e ,
where
ψ ( H , b ) : = min b Box ( a ) i = 1 n min φ ( σ i B i )
with b : = ( B 1 , , B n t ) and φ as defined in Equation (32).
Proof. 
Performing the SVD, the expected value in Theorem 14 can be written as
E e H ( X X ) 2 4 = E e U Λ V T ( X X ) 2 4 = E e Λ V T ( X X ) 2 4 = E e Λ ( X ˜ X ˜ ) 2 4 .
By Lemma 3, there exists a distribution F X such that the components of X ˜ are independent and uniformly distributed. Since Λ is a diagonal matrix, we can use Theorem 16 to arrive at Equation (45).
Note that by Lemma 3 there exists a distribution on X such that X ˜ is uniform over Box ( a ) R n t and Λ X ˜ is uniform over Λ Box ( a ) R n min , respectively. Therefore, by the EPI lower bound given in Equation (20), we obtain
C ̲ EPI ( Box ( a ) , H ) = n min 2 log 2 1 + 2 2 n min h ( Λ X ˜ ) 2 π e = n min 2 log 2 1 + Vol Λ Box ( a ) 2 n min 2 π e = n min 2 log 2 1 + i = 1 n min A i 2 n min i = 1 n min σ i 2 n min 2 π e ,
which is exactly the expression in Equation (46). This concludes the proof. □
Remark 8.
Notice that choosing the optimal b for the lower bound in Equation (45) is an amplitude allocation problem, which is reminiscent of waterfilling in the average power constraint case. It would be interesting to study whether the bound in Equation (45) is connected to what is called mercury waterfilling in [37,38].
In Figure 4, the lower bounds of Theorem 20 are compared to the moment upper bound of Theorem 2 for the special case of a 3 × 1 MIMO channel. Similar to the example presented in Figure 2, the EPI lower bound performs well in the low amplitude regime, while Jensen’s inequality lower bound performs well in the high amplitude regime.
We conclude this section by showing that for an arbitrary channel input space X , in the large amplitude regime the capacity pre-log is given by min ( n r , n t ) .
Theorem 21.
Let X be arbitrary and H R n r × n t fixed. Then,
lim r min ( X ) C ( X , H ) log 2 1 + 2 r min ( X ) 2 π e = min ( n r , n t ) .
Proof. 
Notice that there always exists a R + n t and c R + such that Box ( a ) X c Box ( a ) . Thus, without loss of generality, we can consider X = Box ( a ) , a = ( A , , A ) , for sufficiently large A R + . To prove the result, we therefore start with enlarging the constraint set of the bound in Equation (11):
Box H Box ( a ) B 0 r max H Box ( a ) B 0 r max H B 0 ( n t A ) = B 0 r max U Λ V T B 0 ( n t A ) = B 0 r max U Λ B 0 ( n t A ) = B 0 r max Λ B 0 ( n t A ) B 0 ( r ) Box ( a ) ,
where r : = n t A i = 1 n min σ i 2 and a : = r n min , , r n min R + n min . Therefore, by using the upper bound in Equation (11), it follows that
C ( Box ( a ) , H ) i = 1 n r log 2 1 + 2 A i 2 π e n min log 2 1 + 2 2 π e n t A i = 1 n min σ i 2 n min .
Moreover,
lim A C ( Box ( a ) , H ) log 2 1 + 2 A 2 π e n min lim A log 2 1 + 2 2 π e n t A i = 1 n min σ i 2 n min log 2 1 + 2 A 2 π e = n min .
Next, using the EPI lower bound in Equation (46), we have that
lim A C ̲ EPI ( Box ( a ) , Λ ) log 2 1 + 2 A 2 π e = n min lim A 1 2 log 2 ( 1 + A i = 1 n min σ i 2 n min 2 π e ) log 2 1 + 2 A 2 π e = n min .
This concludes the proof. □

7. The SISO Case

In this section, we apply the upper and lower bounds presented in the previous sections to the special case of a SISO channel that is subject to an amplitude constraint (i.e., X = [ A , A ] for some A R + ) and compare them with the state-of-the art. More precisely, we are interested in upper and lower bounds to the capacity
C ( [ A , A ] , h ) : = max F X : X [ A , A ] I ( X ; h X + Y ) .
Without loss of generality, we assume h = 1 in all that follows.

7.1. Upper and Lower Bounds

As a starting point for our comparisons, the following Theorem summarizes bounds on the capacity (47) that are known from the literature. The bounds are all based on the duality approach that we generalize in Section 4 to the MIMO case.
Theorem 22.
(Known Duality Upper Bounds) Let A > 0 be arbitrary. Then, the following are valid upper bounds to the capacity of the amplitude-constrained SISO channel defined in Equation (47).
  • McKellips upper bound [7]:
    C ( [ A , A ] , 1 ) C ¯ McK ( [ A , A ] , 1 ) : = log 2 1 + 2 A 2 π e .
  • Thangaraj–Kramer–Böcherer upper bound ([8] Theorem 1):
    C ( [ A , A ] , 1 ) C ¯ TKB ( [ A , A ] , 1 ) : = β ( A ) log e 2 π e A + H b β ( A ) , A 2 6.304 dB C ¯ McK ( [ A , A ] , 1 ) , else ,
    where β ( A ) : = 1 2 Q ( 2 A ) and H b denotes the binary entropy function.
  • Rassouli–Clerckx upper bound [9]:
    C ( [ A , A ] , 1 ) C ¯ RC ( [ A , A ] , 1 ) : = C ¯ TKB ( [ A , A ] , 1 ) + W ( A ) ,
    where
    W ( A ) : = 1 2 log e σ 2 ( A ) + 1 σ 2 ( A ) 1 1 2 + Q ( 2 A ) + g ( 2 A ) 2 σ 2 ( A ) ,
    σ 2 ( A ) : = 1 + 2 g ( 2 A ) 1 + 2 Q ( 2 A ) ,
    and
    g ( x ) : = x 2 Q ( x ) x 2 π e x 2 2 .
Now, we apply the moment upper bound of Theorem 9 to the SISO case.
Theorem 23.
(Moment Upper Bound) Let A > 0 be arbitrary. Then,
C ( [ A , A ] , 1 ) C ¯ M ( [ A , A ] , 1 ) = inf p > 0 log 2 k 1 , p ( 2 π e ) 1 2 E | A + Z | p 1 p ,
where the expected value is of the explicit form
E | A + Z | p = 2 p 2 Γ p + 1 2 π 1 F 1 p 2 ; 1 2 ; A 2 2
with 1 F 1 ( a ; b ; z ) being the confluent hypergeometric function of the first kind ([39] Chapter 13).
Proof. 
First, note that r max ( [ A , A ] ) = A . Then, by using the expression for the raw absolute moment of a Gaussian distribution given in [40], we have that
max a [ 0 , A ] E | a + Z | p = max a [ 0 , A ] 2 p 2 Γ p + 1 2 π 1 F 1 p 2 ; 1 2 ; a 2 2 .
The proof is concluded by observing that f ( a ) : = 1 F 1 p 2 ; 1 2 ; a 2 2 is an increasing function in a. □
The following theorem establishes the EPI and the Jensen lower bound of Section 4.2 assuming the channel input symbols are uniformly distributed.
Theorem 24.
(Lower Bounds with Uniform Inputs) Let A > 0 be arbitrary and the channel input X be uniformly distributed over [ A , A ] . Then,
C ( [ A , A ] , 1 ) C ̲ EPI ( [ A , A ] , 1 ) = 1 2 log 2 1 + 2 A 2 π e
and
C ( [ A , A ] , 1 ) C ̲ Jensen ( [ A , A ] , 1 ) = log 2 2 A 2 e 1 2 e A 2 1 + π A 1 2 Q 2 A .
Proof. 
The lower bound in Equation (52) follows from Theorem 12 by observing that Vol ( X ) = 2 A . To show the lower bound in Equation (53), consider Theorem 14 and let X and X be independent and uniformly distributed over [ A , A ] . Then, we have
E e | X X | 2 4 = 1 4 A 2 A A A A e ( x x ) 2 4 d x d x = 1 A 2 e A 2 1 + π A 1 2 Q 2 A ,
which concludes the proof. □
Restricting the channel inputs to be discrete allows for another set of lower bounds on Equation (47).
Theorem 25.
(Lower Bounds with Discrete Inputs) Let A > 1 be arbitrary, X B { A , A } equally likely, and X D PAM ( N ) with N = 1 + A 2 π e . Then,
C ( [ A , A ] , 1 ) C ̲ Binary ( [ A , A ] , 1 ) : = I ( X B ; X B + Z )
= 1 log e ( 2 ) A 2 e y 2 2 2 π log e cosh ( A 2 A y ) d y ,
C ( [ A , A ] , 1 ) C ̲ Jensen ( [ A , A ] , 1 ) = log 2 e 2 1 N 2 ( x D i , x D j ) PAM ( N ) 2 e ( x D i x D j ) 2 4 ,
and
C ( [ A , A ] , 1 ) C ̲ OW ( [ A , A ] , 1 ) = C ¯ McK ( [ A , A ] , 1 ) 2 .
Proof. 
The expression of the mutual information in Equation (54) for a uniform binary input X B { A , A } is found in [41] by using the I-MMSE relationship. The bound in Equation (56) follows from using Theorem 14 and the bound in Equation (57) from Theorem 13, respectively. This concludes the proof. □
Figure 5 compares the upper and lower bounds presented in this section in dependency of the amplitude constraint A. Observe that for values of A smaller than ≈1.665 (i.e., to the left of the gray vertical line), the lower bound (55) is in fact equal to the capacity. Up to constraints of A 1 , the moment upper bound in Equation (51) is the best after which the bound in Equation (50) becomes the tightest. The best lower bound for constraint values smaller than A 10 is the bound in Equation (56) after which the lower bound in Equation (53) becomes the tightest. Note that all lower and upper bounds are asymptotically tight (i.e., for A ).

7.2. High and Low Amplitude Asymptotics

In this subsection, we study how the capacity in Equation (47) behaves in the high and low amplitude regimes. To this end, we need the following expression
C ¯ AWGN ( [ A , A ] , 1 ) : = 1 2 log 2 1 + A 2 ,
which is either the capacity of a SISO channel with an average power constraint A 2 or the moment bound in Equation (51) evaluated for p = 2 .
Theorem 26.
(SISO High and Low Amplitude Asymptotics) It holds
lim A 0 C ( [ A , A ] , 1 ) C ¯ AWGN ( [ A , A ] , 1 ) = 1 ,
lim A C ( [ A , A ] , 1 ) C ¯ McK ( [ A , A ] , 1 ) = 1 ,
and
lim A C ( [ A , A ] , 1 ) C ¯ AWGN ( [ A , A ] , 1 ) = 1 2 log 2 π e 2 1.044 .
Proof. 
The capacity of an amplitude-constrained SISO channel in the regime of low amplitudes (i.e., for amplitudes smaller than A 1.655 ) was given by Guo et al. [41]
C ( [ A , A ] , 1 ) = 1 log e ( 2 ) A 2 e y 2 2 2 π log e cosh ( A 2 A y ) d y .
Now, observe that
lim A 0 1 1 2 log 2 ( 1 + A 2 ) e y 2 2 2 π log e cosh ( A 2 A y ) d y = 2 log 2 ( e ) e y 2 2 2 π log e cosh ( A 2 A y ) log e ( 1 + A 2 ) d y = 2 log 2 ( e ) e y 2 2 2 π lim A 0 log e cosh ( A 2 A y ) log e ( 1 + A 2 ) d y = 2 log 2 ( e ) e y 2 2 2 π y 2 2 d y = 1 log 2 ( e ) .
Therefore, the limit in Equation (58) is given by
lim A 0 C ( [ A , A ] , 1 ) C ¯ AWGN ( [ A , A ] , 1 ) = lim A 0 A 2 log e ( 2 ) 1 2 log 2 ( 1 + A 2 ) 1 log e ( 2 ) log 2 ( e ) = 2 log e ( 2 ) log e ( 2 ) 1 log e ( 2 ) log 2 ( e ) = 2 1 = 1 .
The limit in Equation (59) follows from comparing the EPI lower bound C ̲ EPI ( [ A , A ] , 1 ) = 1 2 log 2 1 + 2 A 2 π e in (52) with the McKellips upper bound C ¯ McK ( [ A , A ] , 1 ) = log 2 1 + 2 A 2 π e given in Equation (48).
Finally, to show Equation (60), observe that
lim A C ( [ A , A ] , 1 ) C ¯ AWGN ( [ A , A ] , 1 ) = lim A C ¯ McK ( [ A , A ] , 1 ) C ¯ AWGN ( [ A , A ] , 1 ) = lim A log 2 1 + 2 A 2 π e 1 2 log 2 ( 1 + A 2 ) = 1 2 log 2 π e 2 .
This concludes the proof. □

8. Conclusions

In this work, we studied the capacity of MIMO channels with bounded input spaces. Several new properties of input distributions that achieve the capacity of such channels have been provided. In particular, it is shown that the support of a capacity-achieving channel input distribution is a set that is small in a topological and measure theoretical sense. In addition to that, it is shown that, if the radius of the underlying channel input space, X , is small enough, then the support of a corresponding capacity-achieving input distribution must necessarily be a subset of the boundary of X . As the considerations on the input distribution have demonstrated that determining the capacity is a very challenging problem, we proposed several new upper and lower bounds that are shown to be tight in the high amplitude regime. An interesting future direction would be to study generalizations of our techniques to wireless optical MIMO channels [42] and other channels such as the wiretap channel [43].

Author Contributions

All authors contributed equally to this work.

Funding

This work was supported in part by the U. S. National Science Foundation under Grants CCF-093970 and CCF-1513915, by the German Research Foundation (DFG) under Grant GO 2669/1-1, and by the European Union’s Horizon 2020 Research And Innovation Programme, grant agreement No. 694630.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Telatar, I.E. Capacity of multi-antenna Gaussian channels. Eur. Trans. Telecommun. 1999, 10, 585–595. [Google Scholar] [CrossRef]
  2. Shannon, C. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27. [Google Scholar] [CrossRef]
  3. Smith, J.G. The information capacity of amplitude-and variance-constrained scalar Gaussian channels. Inf. Control 1971, 18, 203–219. [Google Scholar] [CrossRef]
  4. Shamai (Shitz), S.; Bar-David, I. The capacity of average and peak-power-limited quadrature Gaussian channels. IEEE Trans. Inf. Theory 1995, 41, 1060–1071. [Google Scholar] [CrossRef]
  5. Rassouli, B.; Clerckx, B. On the capacity of vector Gaussian channels with bounded inputs. IEEE Trans. Inf. Theory 2016, 62, 6884–6903. [Google Scholar] [CrossRef]
  6. Sharma, N.; Shamai (Shitz), S. Transition points in the capacity-achieving distribution for the peak-power limited AWGN and free-space optical intensity channels. Probl. Inf. Transm. 2010, 46, 283–299. [Google Scholar] [CrossRef]
  7. McKellips, A.L. Simple tight bounds on capacity for the peak-limited discrete-time channel. In Proceedings of the 2004 IEEE International Symposium on Information Theory (ISIT), Chicago, IL, USA, 27 June–2 July 2004; p. 348. [Google Scholar]
  8. Thangaraj, A.; Kramer, G.; Böcherer, G. Capacity Bounds for Discrete-Time, Amplitude-Constrained, Additive White Gaussian Noise Channels. Available online: https://arxiv.org/abs/1511.08742 (accessed on 15 February 2019).
  9. Rassouli, B.; Clerckx, B. An Upper Bound for the Capacity of Amplitude-Constrained Scalar AWGN Channel. IEEE Commun. Lett. 2016, 20, 1924–1926. [Google Scholar] [CrossRef]
  10. ElMoslimany, A.; Duman, T.M. On the Capacity of Multiple-Antenna Systems and Parallel Gaussian Channels With Amplitude-Limited Inputs. IEEE Trans. Commun. 2016, 64, 2888–2899. [Google Scholar] [CrossRef]
  11. Dytso, A.; Goldenbaum, M.; Poor, H.V.; Shamai (Shitz), S. A Generalized Ozarow-Wyner Capacity Bound with Applications. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 1058–1062. [Google Scholar]
  12. Dytso, A.; Goldenbaum, M.; Shamai (Shitz), S.; Poor, H.V. Upper and Lower Bounds on the Capacity of Amplitude-Constrained MIMO Channels. In Proceedings of the 2017 IEEE Global Communications Conference (GLOBECOM), Singapore, 4–8 December 2017; pp. 1–6. [Google Scholar]
  13. Cao, P.L.; Oechtering, T.J. Optimal transmit strategy for MIMO channels with joint sum and per-antenna power constraints. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017. [Google Scholar]
  14. Loyka, S. The Capacity of Gaussian MIMO Channels Under Total and Per-Antenna Power Constraints. IEEE Trans. Commun. 2017, 65, 1035–1043. [Google Scholar] [CrossRef]
  15. Loyka, S. On the Capacity of Gaussian MIMO Channels under the Joint Power Constraints. Available online: https://arxiv.org/abs/1809.00056 (accessed on 15 February 2019).
  16. Tuninetti, D. On the capacity of the AWGN MIMO channel under per-antenna power constraints. In Proceedings of the 2014 IEEE International Conference on Communications (ICC), Sydney, NSW, Australia, 10–14 June 2014; pp. 2153–2157. [Google Scholar]
  17. Vu, M. MISO capacity with per-antenna power constraint. IEEE Trans. Commun. 2011, 59, 1268–1274. [Google Scholar] [CrossRef]
  18. Chan, T.H.; Hranilovic, S.; Kschischang, F.R. Capacity-achieving probability measure for conditionally Gaussian channels with bounded inputs. IEEE Trans. Inf. Theory 2005, 51, 2073–2088. [Google Scholar] [CrossRef]
  19. Csiszar, I.; Körner, J. Information Theory: Coding Theorems for Discrete Memoryless Systems; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
  20. Abou-Faycal, I.C.; Trott, M.D.; Shamai, S. The capacity of discrete-time memoryless Rayleigh-fading channels. IEEE Trans. Inf. Theory 2001, 47, 1290–1301. [Google Scholar] [CrossRef]
  21. Fahs, J.; Abou-Faycal, I. On properties of the support of capacity-achieving distributions for additive noise channel models with input cost constraints. IEEE Trans. Inf. Theory 2018, 64, 1178–1198. [Google Scholar] [CrossRef]
  22. Gursoy, M.C.; Poor, H.V.; Verdú, S. The noncoherent Rician fading channel-part I: Structure of the capacity-achieving input. IEEE Trans. Wireless Commun. 2015, 4, 2193–2206. [Google Scholar] [CrossRef]
  23. Katz, M.; Shamai, S. On the capacity-achieving distribution of the discrete-time noncoherent and partially coherent AWGN channels. IEEE Trans. Inf. Theory 2004, 50, 2257–2270. [Google Scholar] [CrossRef]
  24. Ozel, O.; Ekrem, E.; Ulukus, S. Gaussian wiretap channel with amplitude and variance constraints. IEEE Trans. Inf. Theory 2015, 61, 5553–5563. [Google Scholar] [CrossRef]
  25. Bak, J.; Newman, D.J. Complex Analysis; Springer Science & Business Media: Berlin, Germany, 2010. [Google Scholar]
  26. Witsenhausen, H.S. Some Aspects of Convexity Useful in Information Theory. IEEE Trans. Inf. Theory 1980, 26, 265–271. [Google Scholar] [CrossRef]
  27. Krantz, S.G.; Parks, H.R. A Primer of Real Analytic Functions; Springer Science & Business Media: New York, NY, USA, 2002. [Google Scholar]
  28. Dytso, A.; Yagli, S.; Poor, H.V.; Shamai (Shitz), S. Capacity Achieving Distribution for the Amplitude Constrained Additive Gaussian Channel: An Upper Bound on the Number of Mass Points. Available online: https://arxiv.org/abs/1901.03264 (accessed on 15 February 2019).
  29. Ransford, T. Potential Theory in the Complex Plane; London Mathematical Society Student Texts; Cambridge University Press: Cambridge, UK, 1995; Volume 28. [Google Scholar]
  30. Hatsell, C.; Nolte, L. Some geometric properties of the likelihood ratio (Corresp.). IEEE Trans. Inf. Theory 1971, 17, 616–618. [Google Scholar] [CrossRef]
  31. Dytso, A.; Poor, H.V.; Shamai (Shitz), S. On the Capacity of the Peak Power Constrained Vector Gaussian Channel: An Estimation Theoretic Perspective. Available online: https://arxiv.org/abs/1804.08524 (accessed on 15 February 2019).
  32. Dytso, A.; Goldenbaum, M.; Poor, H.V.; Shamai (Shitz), S. When are Discrete Channel Inputs Optimal?—Optimization Techniques and Some New Results. In Proceedings of the 2018 Annual Conference on Information Sciences and Systems (CISS), Princeton, NJ, USA, 21–23 March 2018; pp. 1–6. [Google Scholar]
  33. Guo, D.; Shamai, S.; Verdú, S. Mutual information and minimum mean-square error in Gaussian channels. IEEE Trans. Inf. Theory 2005, 51, 1261–1282. [Google Scholar] [CrossRef]
  34. Dytso, A.; Tuninetti, D.; Devroye, N. Interference as Noise: Friend or Foe? IEEE Trans. Inf. Theory 2016, 62, 3561–3596. [Google Scholar] [CrossRef]
  35. Resnick, S.I. A Probability Path; Springer Science & Business Media: New York, NY, USA, 2013. [Google Scholar]
  36. Dytso, A.; Bustin, R.; Tuninetti, D.; Devroye, N.; Shamai (Shitz), S.; Poor, H.V. On the Minimum Mean p-th Error in Gaussian Noise Channels and its Applications. IEEE Trans. Inf. Theory 2018, 64, 2012–2037. [Google Scholar] [CrossRef]
  37. Lozano, A.; Tulino, A.M.; Verdú, S. Optimum power allocation for parallel Gaussian channels with arbitrary input distributions. IEEE Trans. Inf. Theory 2006, 52, 3033–3051. [Google Scholar] [CrossRef]
  38. Pérez-Cruz, F.; Rodrigues, M.R.; Verdú, S. MIMO Gaussian channels with arbitrary inputs: Optimal precoding and power allocation. IEEE Trans. Inf. Theory 2010, 56, 1070–1084. [Google Scholar] [CrossRef]
  39. Abramowitz, M.; Stegun, I.A. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, 9th dover printing ed.; Dover Publications: New York, NY, USA, 1964. [Google Scholar]
  40. Winkelbauer, A. Moments and Absolute Moments of the Normal Distribution. Available online: https://arxiv.org/abs/1209.4340 (accessed on 15 February 2019).
  41. Guo, D.; Wu, Y.; Shamai, S.; Verdú, S. Estimation in Gaussian Noise: Properties of the Minimum Mean-Square Error. IEEE Trans. Inf. Theory 2011, 57, 2371–2385. [Google Scholar]
  42. Moser, S.M.; Mylonakis, M.; Wang, L.; Wigger, M. Asymptotic Capacity Results for MIMO Wireless Optical Communication. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 536–540. [Google Scholar]
  43. Dytso, A.; Egan, M.; Perlaza, S.; Poor, H.V.; Shamai (Shitz), S. Optimal Inputs for Some Classes of Degraded Wiretap Channels. In Proceedings of the 2018 IEEE Information Theory Workshop (ITW), Guangzhou, China, 25–29 November 2018; pp. 1–5. [Google Scholar]
Figure 1. An example of a support of an optimal input distribution for the special case n t = n r = n = 2 .
Figure 1. An example of a support of an optimal input distribution for the special case n t = n r = n = 2 .
Entropy 21 00200 g001
Figure 2. Comparison of the upper and lower bounds of Theorems 9, 11, 15, and 16 evaluated for a 2 × 2 MIMO system with per-antenna amplitude constraints A 1 = A 2 = A (i.e., a = ( A , A ) ) and channel matrix H = 0.3 0 0 0.1 . The nested figure represents a zoom into the region 0 dB A 5 dB to visualize the differences between the bounds at small amplitude constraints.
Figure 2. Comparison of the upper and lower bounds of Theorems 9, 11, 15, and 16 evaluated for a 2 × 2 MIMO system with per-antenna amplitude constraints A 1 = A 2 = A (i.e., a = ( A , A ) ) and channel matrix H = 0.3 0 0 0.1 . The nested figure represents a zoom into the region 0 dB A 5 dB to visualize the differences between the bounds at small amplitude constraints.
Entropy 21 00200 g002
Figure 3. Example of a pulse-amplitude modulation constellation with N = 4 points and amplitude constraint A (i.e., PAM ( 4 , A ) ), where Δ : = A / ( N 1 ) denotes half the Euclidean distance between two adjacent constellation points. In the case N is odd, 0 is a constellation point.
Figure 3. Example of a pulse-amplitude modulation constellation with N = 4 points and amplitude constraint A (i.e., PAM ( 4 , A ) ), where Δ : = A / ( N 1 ) denotes half the Euclidean distance between two adjacent constellation points. In the case N is odd, 0 is a constellation point.
Entropy 21 00200 g003
Figure 4. Comparison of the upper bound in Theorem 2 with the lower bounds of Theorem 20 for a 3 × 1 MIMO system with amplitude constraints A 1 = A 2 = A 3 = A (i.e., a = ( A , A , A ) ) and channel matrix h = ( 0.6557 , 0.0357 , 0.8491 ) .
Figure 4. Comparison of the upper bound in Theorem 2 with the lower bounds of Theorem 20 for a 3 × 1 MIMO system with amplitude constraints A 1 = A 2 = A 3 = A (i.e., a = ( A , A , A ) ) and channel matrix h = ( 0.6557 , 0.0357 , 0.8491 ) .
Entropy 21 00200 g004
Figure 5. Comparison of upper and lower bounds on the capacity of a SISO channel with amplitude constraint A. The capacity of this channel is known for amplitudes smaller than A 10 log 10 ( 1.665 ) = 2.214 dB only (i.e., to the left of the gray vertical line) and unknown elsewhere. The nested figure represents a zoom into the region 1.9 dB A 1.88 dB to highlight the differences between the Moment upper bound (51), the Rassouli–Clerckx upper bound in Equation (50), and the lower bound with binary inputs in Equation (56).
Figure 5. Comparison of upper and lower bounds on the capacity of a SISO channel with amplitude constraint A. The capacity of this channel is known for amplitudes smaller than A 10 log 10 ( 1.665 ) = 2.214 dB only (i.e., to the left of the gray vertical line) and unknown elsewhere. The nested figure represents a zoom into the region 1.9 dB A 1.88 dB to highlight the differences between the Moment upper bound (51), the Rassouli–Clerckx upper bound in Equation (50), and the lower bound with binary inputs in Equation (56).
Entropy 21 00200 g005

Share and Cite

MDPI and ACS Style

Dytso, A.; Goldenbaum, M.; Poor, H.V.; Shamai, S. Amplitude Constrained MIMO Channels: Properties of Optimal Input Distributions and Bounds on the Capacity. Entropy 2019, 21, 200. https://0-doi-org.brum.beds.ac.uk/10.3390/e21020200

AMA Style

Dytso A, Goldenbaum M, Poor HV, Shamai S. Amplitude Constrained MIMO Channels: Properties of Optimal Input Distributions and Bounds on the Capacity. Entropy. 2019; 21(2):200. https://0-doi-org.brum.beds.ac.uk/10.3390/e21020200

Chicago/Turabian Style

Dytso, Alex, Mario Goldenbaum, H. Vincent Poor, and Shlomo Shamai (Shitz). 2019. "Amplitude Constrained MIMO Channels: Properties of Optimal Input Distributions and Bounds on the Capacity" Entropy 21, no. 2: 200. https://0-doi-org.brum.beds.ac.uk/10.3390/e21020200

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop