Next Article in Journal
Robust Power Optimization for Downlink Cloud Radio Access Networks with Physical Layer Security
Next Article in Special Issue
Postponing Distillability Sudden Death in a Correlated Dephasing Channel
Previous Article in Journal
On a Generalization of the Jensen–Shannon Divergence and the Jensen–Shannon Centroid
Previous Article in Special Issue
Exploring Multipartite Steering Effect Using Bell Operators
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Computing Classical-Quantum Channel Capacity Using Blahut–Arimoto Type Algorithm: A Theoretical and Numerical Analysis †

1
School of Information Science and Technology, ShanghaiTech University, Shanghai 201210, China
2
Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 201210, China
3
University of Chinese Academy of Sciences, Beijing 100049, China
*
Authors to whom correspondence should be addressed.
This paper is partially presented at 2019 International Symposium on Information Theory.
Submission received: 15 January 2020 / Revised: 5 February 2020 / Accepted: 13 February 2020 / Published: 16 February 2020
(This article belongs to the Collection Quantum Information)

Abstract

:
Based on Arimoto’s work in 1972, we propose an iterative algorithm for computing the capacity of a discrete memoryless classical-quantum channel with a finite input alphabet and a finite dimensional output, which we call the Blahut–Arimoto algorithm for classical-quantum channel, and an input cost constraint is considered. We show that, to reach ε accuracy, the iteration complexity of the algorithm is upper bounded by log n log ε ε where n is the size of the input alphabet. In particular, when the output state { ρ x } x X is linearly independent in complex matrix space, the algorithm has a geometric convergence. We also show that the algorithm reaches an ε accurate solution with a complexity of O ( m 3 log n log ε ε ) , and O ( m 3 log ε log ( 1 δ ) ε D ( p * | | p N 0 ) ) in the special case, where m is the output dimension, D ( p * | | p N 0 ) is the relative entropy of two distributions, and δ is a positive number. Numerical experiments were performed and an approximate solution for the binary two-dimensional case was analysed.

1. Introduction

The computation of channel capacity has always been a core problem in information theory. The very well-known Blahut-Arimoto algorithm [1,2] was proposed in 1972 to compute the discrete memoryless classical channel. Inspired by this algorithm, we propose an algorithm of Blahut-Arimoto type to compute the capacity of discrete memoryless classical-quantum channel. The classical-quantum channel [3] can be considered as a mapping x ρ x of an input alphabet X = { 1 , 2 , , | X | } to a set of quantum states in a finite dimensional Hilbert space H . The state of a quantum system is given by a density operator ρ , which is a positive semi-definite operator with trace equal to one. Let D m denote the set of all density operators acting on a Hilbert space H of dimension m. If the source emits a letter x with probability p x , the output would be ρ x , thus the output would form an ensemble: { p x : ρ x } x X .
In 1998, Holevo showed [4] that the classical capacity of the classical-quantum channel is the maximization of a quantity called the Holevo information over all input distributions. The Holevo information χ of an ensemble { p x : ρ x } x X is defined as
χ ( { p x : ρ x } x X ) = H ( x p x ρ x ) x p x H ( ρ x ) ,
where H ( · ) is the von Neumann entropy, which is defined on positive semidefinite matrices:
H ( ρ ) = Tr ( ρ log ρ ) .
Due to the concavity of von Neumann entropy [5], the Holevo information is always non-negative. The Holevo quantity is concave in the input distribution [5], and thus the maximization of Equation (1) over p is a convex optimization problem. However, it is not a straightforward convex optimization problem. In 2014, Sutter et al. [6] promoted an algorithm based on duality of convex programming and smoothing techniques [7] with a complexity of O ( ( n m ) m 3 ( log n ) 1 / 2 ε ) , where n m = max { n , m } .
For discrete memoryless classical channels, the capacity can be computed efficiently by using an algorithm called Blahut–Arimoto (BA) algorithm [1,2,8]. In 1998, H. Nagaoka [9] proposed a quantum version of BA algorithm. In his work, he considered the quantum-quantum channel and this problem was proved to be NP-complete in 2008 [10]. Despite the NP-completeness, in [11], an example is given of a qubit quantum channel which requires four inputs to maximize the Holevo capacity. Further research of Nagaoka’s algorithm was presented in [12], where the algorithm was implemented to check the additivity of quantum channels. In [9], Nagaoka mentioned an algorithm concerning classical-quantum channel; however, its speed of convergence was not studied there and the details of the proof were not presented either. In this paper, we show that, with proper manipulations, the BA algorithm can be applied to computing the capacity of classical-quantum channel with an input constraint efficiently. The remainder of this article is structured as follows. In Section 2, we propose the algorithm and show how the algorithm works. In Section 3, we provide the convergence analysis of the algorithm. In Section 4, we show the numerical experiments of BA algorithm to see how well this algorithm performs. In Section 5, we propose an approximate solution for a special case, which is the binary input, two-dimensional output case.
Notations: The logarithm with basis 2 is denoted by log ( · ) . The space of all Hermitian operators of dimension m is denoted by H m . The set of all density matrices of dimension m is denoted by D m { ρ H m : ρ 0 , Tr ρ = 1 } . Each letter x X is mapped to a density matrix ρ x , thus the classical-quantum channel can be represented as a set of density matrices { ρ x } x X . The set of all probability distributions of length n is denoted by Δ n { p R n : p x 0 , x = 1 n p x = 1 } . The von Neumann entropy of a density matrix ρ is denoted by H ( ρ ) = Tr [ ρ log ρ ] . The relative entropy between p , q Δ n , if sup p ( p ) supp ( q ) , is denoted by D ( p | | q ) = x p x ( log p x log q x ) and + otherwise. The relative entropy between ρ , σ D m , if supp ( ρ ) supp ( σ ) , is denoted by D ( ρ | | σ ) = Tr [ ρ ( log ρ log σ ) ] and + otherwise.

2. Blahut–Arimoto Algorithm for Classical-Quantum Channel

First, we write down the primal optimization problem:
Primal : max p H ( x p x ρ x ) x p x H ( ρ x ) , subject to s T p S ; p Δ n ,
where ρ x D m , s R n is a positive real vector, S > 0 . We denote the maximal value of Equation (3) as C ( S ) . In this optimization problem, we are to maximize the Holevo quantity with respect to the input distribution { p x } x X . Practically, the preparation of different signal state x has different cost, which is represented by s = ( s 1 , s 2 , , s n ) . We would like to bound the expected cost of the resource within some quantity, which is represented by the inequality constraint in Equation (3).
Lemma 1.
[6] Let a set G be defined as G : = arg max p Δ n χ ( { p x : ρ x } x X ) and S m a x : = min p G s T p . Then, if  S S m a x , the inequality constraint in the primal problem is inactive; and, if S < S m a x , the inequality constraint in the primal problem is equivalent to s T p = S .
Now, we assume that min { s x } x X S S m a x . The Lagrange dual problem of Equation (3) is
Dual : min λ 0 max p H ( x p x ρ x ) x p x H ( ρ x ) λ ( s T p S ) subject to p Δ n .
Lemma 2.
Strong duality holds between Equations (3) and (4).
Proof. 
The lemma follows from standard strong duality result of convex optimization theory ([13], Chapter 5.2.3). □
Define functions. Let
f λ ( p , p ) = x Tr { p x ρ x [ log ( p x ρ x ) log ( p x ρ ) ] } λ s T p ,
F ( λ ) = max p max p f ( p , p ) .
where ρ = x p x ρ x .
Lemma 3.
For fixed p, arg max p f λ ( p , p ) = p .
Proof. 
Actually, we can prove a stronger lemma (the following lemma was proposed in [9], but no proof was given, perhaps due to the space limitation). We now restate the lemma in [9] and give the proof. □
Lemma 4.
For fixed { p x : ρ x } x X , we have
max { q x : σ x } x X D ( p | | q ) + x p x Tr { ρ x [ log σ x log σ ] } = x p x Tr { ρ x [ log ρ x log ρ ] } , i . e . , arg max { q x : σ x } x X D ( p | | q ) + x p x Tr { ρ x [ log σ x log σ ] } = { p x : ρ x } x X ,
where p , q Δ n , σ x D m and ρ = x p x ρ x , σ = x q x σ x .
Proof. 
Considering Equation (7), we have
R H S L H S = D ( p | | q ) + x p x D ( ρ x | | σ x ) D ( ρ | | σ ) = D ( ρ X B | | σ X B ) D ( ρ | | σ ) ,
where ρ X B = x p x | x x | X ρ x and σ X B = x q x | x x | X σ x are classical-quantum state [5]. Let the quantum channel N be the partial trace channel on X system; then, by the monotonicity of quantum relative entropy ([5], Theorem 11.8.1), we have
D ( ρ X B | | σ X B ) D ( N ( ρ X B ) | | N ( σ X B ) ) = D ( ρ | | σ ) .
Notice that, if we let σ x = ρ x in Equation (7), with some calculation, Equation (7) becomes Lemma 3. Thus, Lemma 3 is a straightforward corollary of Lemma 4
Theorem 1.
The dual problem in Equation (4) is equivalent to
min λ 0 F ( λ ) + λ S .
Proof. 
It follows from Equation (5) and Lemma 3 that
max p f λ ( p , p ) = f λ ( p , p ) = H ( ρ ) x p x H ( ρ x ) λ s T p .
Hence,
min λ 0 max p H ( ρ ) x p x H ( ρ x ) λ ( s T p S ) = min λ 0 max p max p f λ ( p , p ) + λ S = min λ 0 F ( λ ) + λ S .
The BA algorithm is an alternating optimization algorithm, i.e., to optimize f λ ( p , p ) , each iteration step would fix one variable and optimize the function over the other variable. Now, we use BA algorithm to find F ( λ ) . The iteration procedure is
p x 0 > 0 , p x t = p x t , p t + 1 = arg max p x Tr { p x ρ x [ log ( p x t ρ x ) log ( p x ρ t ) ] } λ s T p ,
where ρ t = x p x t ρ x .
To get p t + 1 , we can use the Lagrange function:
L = x Tr { p x ρ x [ log ( p x t ρ x ) log ( p x ρ t ) ] } λ s T p ν ( x p x 1 ) ,
setting the gradient with respect to p x to zero. By combining the normalization condition, we can have (taking the natural logarithm for convenience):
p x t + 1 = r x t x r x t ,
where r x t = exp ( Tr { ρ x [ log ( p x t ρ x ) log ρ t ] } s x λ ) , ρ t = x p x t ρ x .
Thus, we can summarize the Algorithm 1 below.
Algorithm 1 Blahut–Arimoto algorithm for discrete memoryless classical-quantum channel.
  • set p x 0 = 1 | X | , x X ;
  • repeat
  •      p x t = p x t ;
  •      p x t + 1 = r x t x r x t , where r x t = exp ( Tr { ρ x [ log ( p x t ρ x ) log ρ t ] } s x λ ) ;   
  • until convergence.
Lemma 5.
Let p * ( λ ) = arg max p f ( p , p ) for a given λ; then, s T p * ( λ ) is a decreasing function of λ.
Proof. 
For convenience, we denote χ ( { p x : ρ x } x X ) as χ ( p ) . Notice that f λ ( p , p ) = χ ( p ) λ s T p by definition of f ( p , p ) .
For λ 1 < λ 2 , if p * ( λ 1 ) s < s T p * ( λ 2 ) , then, by the definition of p * ( λ ) , we have:
χ ( p * ( λ 1 ) ) λ 1 s T p * ( λ 1 ) χ ( p * ( λ 2 ) ) λ 1 s T p * ( λ 2 ) χ ( p * ( λ 2 ) ) χ ( p * ( λ 1 ) ) λ 1 s T ( p * ( λ 2 ) p * ( λ 1 ) ) < λ 2 s T ( p * ( λ 2 ) p * ( λ 1 ) ) χ ( p * ( λ 1 ) ) λ 2 s T p * ( λ 1 ) > χ ( p * ( λ 2 ) ) λ 2 s T p * ( λ 2 ) ,
which is a contradiction to the fact that p * ( λ 2 ) is an optimizer of χ ( p ) λ 2 s T p . Thus, we must have s T p * ( λ 1 ) s T p * ( λ 2 ) if λ 1 < λ 2 . □
We do not need to solve the optimization problem in Equation (8), because from Lemma 1 we can see that the statement “ p * is an optimal solution” is equivalent to “ s T p * = S and p * maximizes f λ ( p , p ) + λ S = χ ( { p x , ρ x } x X ) λ ( s T p S ) ”, which is also equivalent to “ s T p * = S and p * maximizes f λ ( p , p ) ", thus, if for some λ 0 , a p maximizes f λ ( p , p ) and s T p = S , then the capacity C ( S ) = F ( λ ) + λ S , and such λ is easy to find since s T p is a decreasing function of λ , and, to reach an ε accuracy, we need
O ( log ε )
steps using bisection method.

3. Convergence Analysis

Next, we show that the algorithm indeed converges to F ( λ ) and then provide an analysis of the speed of the convergence.

3.1. The Convergence Is Guaranteed

Corollary 1.
f λ ( p t + 1 , p t ) = log ( x r x t ) .
Proof. 
f λ ( p t + 1 , p t ) = x Tr { p x t + 1 ρ x log p x t + 1 } + x Tr { p x t + 1 ρ x [ log ( p x t ρ x ) log ( ρ t ) ] } λ s T p t + 1 = x p x t + 1 log p x t + 1 + x p x t + 1 log ( r x t ) = x p x t + 1 log ( r x t p x t + 1 ) = log ( x r x t ) .
The first equality comes from a manipulation of Equation (5). The second equality follows from Equation (10). The last equality follows from Equation (9). □
Corollary 2.
For arbitrary distribution { p x } x X , we have
χ ( { p x , ρ x } x X ) λ s T p f ( p t + 1 , p t ) x p x log ( p x t + 1 p x t ( x ) ) .
Proof. 
Define ρ = x p x ρ x . Then, we have
x p x log ( p x t + 1 p x t ) = x p x log ( 1 p x t r x t x r x t ) = f λ ( p t + 1 , p t ) + x p x log r x t p x t = f λ ( p t + 1 , p t ) + x p x Tr { ρ x [ log ( p x t ρ x ) log ρ t ] s x λ log p x t } = f λ ( p t + 1 , p t ) + x p x Tr { ρ x [ log ρ x log ρ t ] } λ s T p = f λ ( p t + 1 , p t ) + x p x Tr { ρ x [ log ρ x log ρ + log ρ log ρ t ] } λ s T p = f λ ( p t + 1 , p t ) + χ ( { p x , ρ x } x X ) λ s T p + D ( ρ | | ρ t ) .
The first equality follows from Equation (9). The second equality follows from Corollary 1. The third equality follows from Equation (10). The last equality follows from Equation (1). Since the relative entropy D ( ρ X | | ρ t ) is always non-negative [5], we have
χ ( { p x , ρ x } x X ) λ s T p f λ ( p t + 1 , p t ) x p x log ( p x t + 1 p x t ( x ) ) .
Theorem 2.
f λ ( p t + 1 , p t ) converges to F ( λ ) as t .
Proof. 
Let p * be an optimal solution that achieves F ( λ ) ; then, we have the following inequality
t = 0 N [ F ( λ ) f λ ( p t + 1 , p t ) ] = t = 0 N [ χ ( { p x * , ρ x } x X ) λ s T p * f λ ( p t + 1 , p t ) ] t = 0 N x p x * log ( p x t + 1 p x t ) = x p x * t = 0 N log ( p x t + 1 p x t ) = x p x * log ( p x N + 1 p x 0 ) = x p x * log ( p x * p x 0 ) + x p x * log ( p x N + 1 p * ( x ) ) = D ( p * | | p 0 ) D ( p * | | p N + 1 )
D ( p * | | p 0 ) .
The first equality follows from Equations (5), (6), and (1). The first inequality follows from Corollary 2. The last inequality follows from the non-negativity of relative entropy.
Thus, let N and with F ( λ ) f λ ( p t + 1 , p t ) 0 , we have
0 t = 0 [ F ( λ ) f λ ( p t + 1 , p t ) ] D ( p * | | p 0 ) ,
Notice we take the initial p 0 to be uniform distribution, so the right hand side of Equation (15) is finite. Combine with the fact that f λ ( p t + 1 , p t ) is a non-decreasing sequence, this means f λ ( p t + 1 , p t ) converges to F ( λ ) . □
Theorem 3.
The probability distribution { p t } t = 0 also converges.
Proof. 
Remove the summation over t in Equations (13) and (14); then, we have
0 F ( λ ) f λ ( p t + 1 , p t ) x p x * log ( p x t + 1 p x t ) = D ( p * | | p t ) D ( p * | | p t + 1 ) .
Now that the sequence { p t } t = 0 is a bounded sequence, there exists a subsequence { p t k } k = 0 that converges. Let us say it converges to p ¯ . Then, clearly, we have f ( p ¯ , p ¯ ) = F ( λ ) (or f ( p t + 1 , p t ) would not converge). Substituting p * = p ¯ in Equation (16), we have
0 D ( p ¯ | | p t ) D ( p ¯ | | p t + 1 ) .
Thus, the sequence { D ( p ¯ | | p t ) } t = 0 is a decreasing sequence and there exists a subsequence { D ( p ¯ | | p t k ) } k = 0 that converges to zero. Therefore, we can conclude that { D ( p ¯ | | p t ) } t = 0 converges to zero, which means { p t } t = 0 converges to p ¯ . □

3.2. The Speed of Convergence

Theorem 4.
To reach ε accuracy to F ( λ ) , the algorithm needs an iteration complexity less than log n ε .
Proof. 
From the proof of Theorem 2, we know
t = 0 N [ F ( λ ) f λ ( p t + 1 , p t ) ] D ( p * | | p 0 ) = x p x * log ( p x * p x 0 ) = log n H ( p * ) < log n ,
and [ F ( λ ) f λ ( p t + 1 , p t ) ] is non-increasing in t, thus
F ( λ ) f λ ( p t + 1 , p t ) < log n t .
Next, we show that in some special cases the algorithm has a better convergence performance, which is a geometric speed of convergence.
Assumption 1.
The channel matrices { ρ x } x X are linearly independent, i.e., there does not exist a vector c R n such that
x c x ρ x = 0 .
Remark 1.
Assumption 1 is equivalent to: The output state ρ = x p x ρ x is uniquely determined by the input distribution p.
Theorem 5.
Under Assumption 1, the optimal solution p * is unique.
Proof. 
Notice that the von Neumann entropy in Equation (2) is strictly concave [14], thus, for distributions p p , ρ = x p x ρ x x p x ρ x = ρ , which is followed from Assumption refas1. Thus, this means H ( ρ ) is strictly concave in p. Thus, Holevo quantity in Equation (1) is strictly concave in p, which means the optimal solution p * is unique. □
We also need the following theorem:
Theorem 6.
[15] The relative entropy satisfies
D ( ρ | | σ ) 1 2 Tr ( ρ σ ) 2 .
Now, we state the theorem of convergence:
Theorem 7.
Suppose start from some initial point p 0 , then, under Assumption 1, the algorithm converges to the optimal point p * , and p 0 converges to p * at a geometric speed, i.e., there exist N 0 and δ > 0 , where N and δ are independent, such that, for any t > N 0 , we have
D ( p * | | p t ) ( 1 δ ) t N 0 D ( p * | | p N 0 ) .
Proof. 
Define d x = p x * p x t and the real vector | d = ( d 1 , d 2 , , d n ) T . Using Taylor expansion, we have
D ( p * | | p t ) = x p x * log ( p x * p x t ) = x p x * log ( 1 d x p x * ) = 1 2 d | P | d + x O ( d x 3 ) ,
where P = d i a g ( p 1 * , p 2 * , , p n * ) . Now, p t converges to p * , i.e., | d converges to zero, thus there exists a N 0 such that, for any t > N 0 , we have
D ( p * | | p t ) 2 3 d | P | d .
From Theorem 6, we have
D ( ρ * | | ρ t ) 1 2 Tr { [ x d x ρ x ] 2 } = 1 2 d | M | d ,
where M R n × n :
M i j = Tr ( ρ i ρ j ) .
From Equation (18), we know that, under Assumption 1, M is positive definite. Thus, there exists a δ > 0 such that
1 2 M > δ 2 3 P 1 2 d | M | d > δ 2 3 d | P | d .
Thus, for any t > N 0 , it follows from Equations (17) and (18) that
D ( ρ * | | ρ t ) δ D ( p * | | p t ) .
From Equation (12), we know
x p x * log ( p x t + 1 p x t ) D ( ρ * | | ρ t ) D ( p * | | p t + 1 ) D ( p * | | p t ) D ( ρ * | | ρ t ) ,
combined with Equation (19), we have
D ( p * | | p t + 1 ) D ( p * | | p t ) δ D ( p * | | p t ) = ( 1 δ ) D ( p * | | p t ) D ( p * | | p t ) ( 1 δ ) t N 0 D ( p * | | p N 0 )
for any t > N 0 . □
Remark 2.
(Complexity). Denote n , m as the size of input alphabet and output state dimension, respectively. A closer look at Algorithm 1 reveals that, for each iteration, a matrix logarithm log ρ t needs to be calculated, and the rest are just multiplication of matrices and multiplication of numbers. The matrix logarithm can be done with complexity O ( m 3 ) [16], thus, by Theorem 4 and Equation (11), the complexity to reach ε-close to the true capacity using Algorithm 1 is O ( m 3 log n log ε ε ) . With extra condition of the channel { ρ x } x X , which is Assumption 1, the complexity to reach an ε-close solution (i.e., D ( p * | | p t ) < ε ) using Algorithm 1 is O ( m 3 log ε log ( 1 δ ) ε D ( p * | | p N 0 ) ) . Usually, we do not need ε to be too small (no smaller than 10 6 ), thus, in either case, the complexity is better than O ( ( n m ) m 3 ( log n ) 1 / 2 ε ) in [6] when n m is big, where n m = max { n , m } .

4. Numerical Experiments on BA Algorithm

We only performed experiments on BA algorithm with no input constraint (BA algorithm with input constraint is some combination of BA algorithm with no input constraint.) We studied the relations between iteration complexity and n , m (i.e., the input size and output dimension) when the algorithm reaches certain accuracy. Since we do not know the true capacity of a certain channel, we used the following theorem to bound the error of the algorithm.
Theorem 8.
With the iteration procedure in the BA Algorithm 1, max x { D ( ρ x | | ρ t ) λ s x } converges to F ( λ ) from above.
Proof. 
Following from Algorithm 1, Corollary 1, and Theorem 3, we have
lim t p x t + 1 p x t = exp [ D ( ρ x | | ρ * ) λ s x F ( λ ) ] ,
where ρ * = x p x * ρ x , p * is an optimal distribution. The limit above is 1 if p x * > 0 and does not exceed 1 if p x * = 0 . Thus,
D ( ρ x | | ρ * ) λ s x F ( λ )
for every x X , with equality if p x * > 0 . This proves
max x { D ( ρ x | | ρ t ) λ s x } F ( λ ) .
For any p t and any optimal distribution p * , we have
max x [ D ( ρ x | | ρ t ) λ s x ] x p x * [ D ( ρ x | | ρ t ) λ s x ] = x p x * D ( ρ x | | ρ * ) + D ( ρ * | | ρ t ) λ s T p * = F ( λ ) + D ( ρ * | | ρ t ) F ( λ ) .
The first equality requires some calculation and the second equality follows since p * is an optimal distribution. This means max x { D ( ρ x | | ρ t ) λ s x } converges to F ( λ ) from above. □
Thus, our accuracy criterion was: for a given classical-quantum channel, we ran the BA algorithm (with no input constraint), until [ max x { D ( ρ x | | ρ t ) } f ( p t , p t ) ] was less than 10 k , and recorded the number of iteration. At this time, the accuracy was of order 10 ( k + 1 ) at most since max x { D ( ρ x | | ρ t ) } and f ( p t , p t ) ] converged to the true capacity from above and below, respectively.
We performed the following numerical experiments: for given values of input size n, output dimension m and accuracy, we generated 200 classical-quantum channels randomly, recorded the numbers of iterations, and then calculated the average number of iterations of these 200 experiments. The results are shown in Figure 1. Note that the accuracy 10 k in Figure 1 means we ran the BA algorithm until [ max x { D ( ρ x | | ρ t ) } f ( p t , p t ) ] was less than 10 k , and the error between the true capacity and the computed value was of order 10 ( k + 1 ) at most.
We can see in Figure 1 that the iteration complexity scales better as accuracy and input dimension increase. We can also see for given input size n and accuracy, the output dimension has vary little influence on iteration complexity, which means the iteration complexity also scales better as the output dimension m increases. Compared with our theoretical analysis of iteration complexity in Theorem 4: to reach ϵ accuracy, we needed log n ϵ iterations; the numerical experiments showed that the number of iterations was far smaller than l o g n ϵ to reach ϵ accuracy, whether the output quantum states were independent or not (cases in ( n , m ) = ( 6 , 2 ) , ( 10 , 2 ) ). The reason for this is that the inequalities in the proof of Theorem 4 are quite loose. Thus, Theorem 4 only provides a very loose upper bound on iteration complexity. We can also guess that maybe the relation in Equation (20) holds generally and we just cannot prove it yet.
Next, we needed to see the running time of the BA algorithm. There were three methods to compute the classical-quantum channel capacity: BA algorithm, the duality and smoothing technique [6], and the method created by Hamza Fawzi et al. [17]. In [17], a Matlab code package called CvxQuad is provided which accurately approximates the relative entropy function via semidefinite programming so that many quantum capacity quantities can be computed using a convex optimization tool called CVX. Here, we compared the running time of the above three methods. For different input size n and output dimension m, we generated a classical-quantum channel randomly and computed the channel capacity using the above three methods and then recorded the running time of each method. The results are shown in Figure 2.
In Figure 2, we can see the BA algorithm was the fastest method. The duality and smoothing method was rather slow and we did not record the running time of the duality and smoothing method when n = 30 because it took too long. We can also notice that the running time of the CvxQuad method was extremely sensitive to the output dimension, which is not a surprise because CVX is a second-order method. Thus, our BA algorithm was significantly faster than the other two methods when n and m became big.

5. An Approximate Solution of p in Binary Two Dimensional Case

In this section, we provide an approximate optimal input distribution for the case of the input size and output dimension are both 2:
{ p 1 : ρ 1 ; p 2 : ρ 2 } , p 1 + p 2 = 1 , ρ 1 , ρ 2 D 2 .

5.1. Use Bloch Sphere to Get an Approximate Solution

Any two-dimensional density matrix can be represented as a point in the Bloch sphere [5], as shown in the following:
Any density matrix can be represented as a vector in the Bloch sphere starting from the origin. Suppose ρ 1 , ρ 2 can be represented as r 1 , r 2 respectively, as shown in Figure 3; then, the two eigenvalues would be 0.5 ± r 1 / 2 and 0.5 ± r 2 / 2 , respectively. Extending r 1 , we get two intersections on the surface of the Bloch sphere; then, these two intersections represent the two eigenvectors of ρ 1 (the points on the surface of the sphere represent pure state and the interior points represent mixed states). A probabilistic combination of ρ 1 , ρ 2 can be represented as p 1 ρ 1 + p 2 ρ 2 = p 1 r 1 + p 2 r 2 ([5] Exercise 4.4.13). Any point on the surface of Bloch sphere can be represented as
cos α 2 | 0 + sin α 2 e i ϕ | 1 ,
where α is the angle to the Z-axis and ϕ is the angle of the X-axis to the projection of the point on the XY plane.
By symmetry, it is obvious that the Holevo quantity is only related to r 1 , r 2 , θ , p 1 , where θ is the angle between r 1 , r 2 . One interesting result is that the angle θ has very little influence on p * , where p * is the optimal distribution that maximizes Holevo quantity. If we know λ 1 , λ 2 (the bigger eigenvalues of ρ 1 , ρ 2 , respectively), θ and p 1 , then the Holevo quantity can be written as
χ ( λ 1 , λ 2 , θ , p 1 ) = S ( 1 2 + 1 2 | | p 1 r 1 + ( 1 p 1 ) r 2 | | 2 ) [ p 1 S ( 1 2 + 1 2 r 1 ) + ( 1 p 1 ) S ( 1 2 + 1 2 r 2 ) ] ,
where S ( · ) is the binary entropy ( S ( x ) = ( x log x + ( 1 x ) log ( 1 x ) ) and r i = 2 λ i 1 .
Using Cosine Theorem to calculate | | p 1 r 1 + ( 1 p 1 ) r 2 | | 2 ) , the gradient of χ with respect to p 1 can be calculated directly, denoted as
p 1 χ ( λ 1 , λ 2 , θ , p 1 ) .
If we can find a p 1 such that p 1 χ ( λ 1 , λ 2 , θ , p 1 ) = 0 , then this p 1 is the optimal solution (because χ ( λ 1 , λ 2 , θ , p 1 ) is concave in p 1 ). However, we cannot solve the equation p 1 χ ( λ 1 , λ 2 , θ , p 1 ) = 0 with respect to p 1 when θ 0 . Now that θ has little influence on p * , let θ = 0 (this is actually the classical case), and let
p 1 χ ( λ 1 , λ 2 , θ = 0 , p 1 ) = 0 ,
the above equation is easy to solve and we get a solution p ^ 1 :
p ^ 1 = 1 c 1 + c r 2 r 1 r 2 , where c = 2 S ( λ 1 ) S ( λ 2 ) ( r 1 r 2 ) / 2 ,
where we assume r 1 r 2 . (It can be easily seen from the Bloch sphere that if r 1 = r 2 , the optimal distribution would be { 1 2 , 1 2 } .)
This p ^ 1 can be used as an approximate optimal solution for all θ [ 0 , π ] . Next, we need numerical experiments to see how accurate p ^ 1 is.

5.2. Numerical Experiments on the Approximated Solution p ^ 1

It is obvious that the maximum of Holevo quantity only depends on r 1 , r 2 and θ , thus, without loss of generality, we let ρ 1 be on the Z-axis and ρ 2 be on the XZ plane:
ρ 1 = λ 1 | 0 0 | + ( 1 λ 1 ) | 1 1 | ; ρ 2 = λ 2 | ψ 0 ψ 0 | + ( 1 λ 2 ) | ψ 1 ψ 1 | ,
where
| ψ 0 = cos θ 2 | 0 + sin θ 2 | 1 ; | ψ 1 = sin θ 2 | 0 + cos θ 2 | 1 ,
which means the angle between ρ 1 and ρ 2 (i.e., r 1 and r 2 ) is θ [5].
In the numerical experiments, we let λ 1 , λ 2 range from 0.5 to 1, and θ range from 0 to π . For each value of ( λ 1 , λ 2 , θ ) , we substituted ( λ 1 , λ 2 ) into Equation (22) to compute p ^ 1 . Then, we substituted ( λ 1 , λ 2 , θ , p ^ 1 ) into Equation (21) to get the approximate maximum of Holevo quantity over p 1 : χ ( λ 1 , λ 2 , θ , p ^ 1 ) . To see how accurate this approximate maximum is, we need a BA algorithm to provide an accurate maximum. The termination criterion for the iteration process of BA algorithm is stopping when [ max x { D ( ρ x | | ρ t ) } f ( p t , p t ) ] is less than 10 6 ; then, the BA algorithm outputs a value of Holevo quantity χ B A ( λ 1 , λ 2 , θ ) . We can compute the error of χ ( λ 1 , λ 2 , θ , p ^ 1 ) then take the maximum over θ [ 0 , π ]
Error ( λ 1 , λ 2 ) = max θ [ 0 , π ] | χ ( λ 1 , λ 2 , θ , p ^ 1 ) χ B A ( λ 1 , λ 2 , θ ) | .
Figure 4 is the numerical result, which is a plot of ( λ 1 , λ 2 , Error ( λ 1 , λ 2 ) ) .
In Figure 4, we can see that, if λ 1 , λ 2 are not “too big", the error can be upper bounded by 10 3 . To see this more directly, we take the maximum of Error ( λ 1 , λ 2 ) for different ranges of λ 1 , λ 2 :
max λ 1 , λ 2 [ 0.5 , R ] Error ( λ 1 , λ 2 ) .
Figure 5 is a plot of ( R , max λ 1 , λ 2 [ 0.5 , R ] Error ( λ 1 , λ 2 ) ) .
In Figure 5, we can see that, if λ 1 , λ 2 < 0.95 , the error of approximate maximum of Holevo quantity can be upper bounded by 3 × 10 4 . Thus, we can conclude that, when the bigger eigenvalues of ρ 1 , ρ 2 are not too big (no bigger than 0.95 ), Equation (22) can make the error of the maximum of Holevo quantity smaller than 3 × 10 4 .
The approximate solution is an interesting phenomenon. The reason the angle θ has such little influence on the maximum of Holevo quantity is unclear.

6. Discussion

In this paper, we provide an algorithm which computes the capacity of classical-quantum channel. We analyzed the speed of convergence theoretically and numerical experiments showed that our algorithm outperforms the existing methods [6,17]. We also provide an approximated method to compute the capacity of binary two-dimensional classical-quantum channel, which shows high accuracy. As mentioned in the Introduction, for a general quantum-quantum channel, maximizing Holevo quantity with respect to both the input distribution and output quantum state is NP-complete and this is not a convex optimization problem because Holevo quantity is concave with respect to input distribution and convex with respect to output quantum states. Thus, it remains open whether there exists an efficient algorithm to solve this problem. However, for classical-quantum channel, Holevo quantity has an upper bound, thus one future work would be to maximize Holevo quantity with respect to output states with given input distribution. It remains open if alternating optimization algorithms, in particular of Blahut–Arimoto type, can also be given for other optimization problems in terms of quantum entropy.
The authors declare no conflict of interest.

Author Contributions

Formal analysis, H.L., N.C.; Investigation, H.L.; Project administration, H.L.; Supervision, N.C.; Writing—original draft, H.L.; Writing—review & editing, N.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Arimoto, S. An algorithm for computing the capacity of arbitrary discrete memoryless channels. IEEE Trans. Inf. Theory 1972, 18, 14–20. [Google Scholar] [CrossRef] [Green Version]
  2. Blahut, R. Computation of channel capacity and rate-distortion functions. IEEE Trans. Inf. Theory 1972, 18, 460–473. [Google Scholar] [CrossRef] [Green Version]
  3. Holevo, A. Problems in the mathematical theory of quantum communication channels. Rep. Math. Phys. 1977, 12, 273–278. [Google Scholar] [CrossRef]
  4. Holevo, A.S. The capacity of the quantum channel with general signal states. IEEE Trans. Inf. Theory 1998, 44, 269–273. [Google Scholar] [CrossRef] [Green Version]
  5. Wilde, M. From Classical to Quantum Shannon Theory; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
  6. Sutter, D.; Sutter, T.; Esfahani, P.M.; Renner, R. Efficient Approximation of Quantum Channel Capacities. IEEE Trans. Inf. Theory 2016, 62, 578–598. [Google Scholar] [CrossRef] [Green Version]
  7. Nesterov, Y. Smooth minimization of non-smooth functions. Math. Program. 2005, 103, 127–152. [Google Scholar] [CrossRef]
  8. Yeung, R.W. Information Theory and Network Coding; Springer: Berlin/Heidelberger, Germany, 2008. [Google Scholar]
  9. Nagaoka, H. Algorithms of Arimoto-Blahut type for computing quantum channel capacity. In Proceedings of the IEEE International Symposium on Information Theory (ISIT), Cambridge, MA, USA, 16–21 August 1998; p. 354. [Google Scholar]
  10. Beigi, S.; Shor, P.W. On the Complexity of Computing Zero-Error and Holevo Capacity of Quantum Channels. arXiv 2007, arXiv:0709.2090. [Google Scholar]
  11. Osawa, S.; Nagaoka, H. Numerical Experiments on the Capacity of Quantum Channel with Entangled Input States. IEICE TRANSACTIONS Fundam. Electron. Commun. Comput. 2001, E84-A, 2583–2590. [Google Scholar]
  12. Hayashi, M.; Imai, H.; Matsumoto, K.; Ruskai, M.B.; Shimono, T. Qubit Channels Which Require Four Inputs to Achieve Capacity: Implications for Additivity Conjectures. Quantum Inf. Comput. 2005, 5, 13–31. [Google Scholar]
  13. Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  14. Bengtsson, I.; Zyczkowski, K. GEOMETRY OF QUANTUM STATES: An Introduction to Quantum Entanglement; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar]
  15. Hiai, F.; Petz, D. Introduction to Matrix Analysis and Applications; Springer: Berlin/Heidelberger, Germany, 2014. [Google Scholar]
  16. Moler, C.; Van Loan, C. Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later. SIAM Rev. 2003, 45, 3–49. [Google Scholar] [CrossRef]
  17. Hamza Fawzi, O.F. Efficient optimization of the quantum relative entropy. J. Phys. A Math. Theor. 2018, 51, 409601. [Google Scholar] [CrossRef] [Green Version]
Figure 1. The number of iterations needed to reach certain accuracy.
Figure 1. The number of iterations needed to reach certain accuracy.
Entropy 22 00222 g001
Figure 2. The caparison of the running time of three methods.
Figure 2. The caparison of the running time of three methods.
Entropy 22 00222 g002
Figure 3. Bloch sphere.
Figure 3. Bloch sphere.
Entropy 22 00222 g003
Figure 4. Error of the approximated method.
Figure 4. Error of the approximated method.
Entropy 22 00222 g004
Figure 5. Error of the approximate method.
Figure 5. Error of the approximate method.
Entropy 22 00222 g005

Share and Cite

MDPI and ACS Style

Li, H.; Cai, N. Computing Classical-Quantum Channel Capacity Using Blahut–Arimoto Type Algorithm: A Theoretical and Numerical Analysis. Entropy 2020, 22, 222. https://0-doi-org.brum.beds.ac.uk/10.3390/e22020222

AMA Style

Li H, Cai N. Computing Classical-Quantum Channel Capacity Using Blahut–Arimoto Type Algorithm: A Theoretical and Numerical Analysis. Entropy. 2020; 22(2):222. https://0-doi-org.brum.beds.ac.uk/10.3390/e22020222

Chicago/Turabian Style

Li, Haobo, and Ning Cai. 2020. "Computing Classical-Quantum Channel Capacity Using Blahut–Arimoto Type Algorithm: A Theoretical and Numerical Analysis" Entropy 22, no. 2: 222. https://0-doi-org.brum.beds.ac.uk/10.3390/e22020222

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop