Next Article in Journal
Applied Cryptography Using Chaos Function for Fast Digital Logic-Based Systems in Ubiquitous Computing
Previous Article in Journal
Hidden State Conditional Random Field for Abnormal Activity Recognition in Smart Homes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Optimal Fix-Free Code for Anti-Uniform Sources

1
Department of Mathematics and Cryptography, Malek Ashtar University of Technology, Isfahan 83145/115, Iran
2
Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan 84156-83111, Iran
3
Department of Electrical and Computer Engineering, University of Victoria, PO Box 1700, STN CSC, Victoria, BC V8W 2Y2, Canada
*
Author to whom correspondence should be addressed.
Entropy 2015, 17(3), 1379-1386; https://0-doi-org.brum.beds.ac.uk/10.3390/e17031379
Submission received: 9 February 2015 / Accepted: 13 March 2015 / Published: 19 March 2015
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
An n symbol source which has a Huffman code with codelength vector Ln = (1, 2, 3, …, n − 2, n − 1, n − 1) is called an anti-uniform source. In this paper, it is shown that for this class of sources, the optimal fix-free code and symmetric fix-free code is C n * = ( 0 , 11 , 101 , 1001 , , 1 0 0 n 2 1 ) .

1. Introduction

One of the basic problems in the context of source coding is to assign a code Cn = (c1, c2, ⋯, cn) with codelength vector Ln = (1, 2, ⋯, n) to a memoryless source with probability vector Pn = (p1, p2, ⋯, pn). Decoding requirements often constrain us to choose a code Cn from a specific class of codes, such as prefix-free codes, fix-free codes or symmetric fix-free codes. With a prefix-free code, no codeword is the prefix of another codeword. This property ensures that decoding in the forward direction can be done without any delay (instantaneously). Alternatively, with a fix-free code no codeword is the prefix or suffix of any other codeword [1,2]. Therefore, decoding of a fix-free code in both the forward and backward directions can be done without any delay. The ability to decode a fix-free code in both directions makes them more robust to transmission errors and faster decoding can be achieved compared to prefix-free codes. As a result, fix-free codes are used in video standards such as H.263+ and MPEG-4 [3]. A symmetric fix-free code is a fix-free code whose codewords are symmetric. In general, decoder implementation for a fix-free code requires more memory compared to that for a prefix-free code. Although the decoder for a symmetric fix-free code is the same as for a fix-free code [4], symmetric codes have greater redundancy in general.
Let S ( L n ) = i = 1 n 2 i denote the Kraft sum of the codelength vector Ln. A well-known necessary and sufficient condition for the existence of a prefix-free code with codelength vector Ln is the Kraft inequality, i.e., S(Ln) 1 [5]. However, this inequality is only a necessary condition on the existence of a fix-free code. Some sufficient conditions on the existence of a fix-free code were introduced in [611].
The optimal code for a specific class of codes is defined as the code with the minimum average codelength, i.e., i = 1 n p i i among all codes in that class. The optimal prefix-free code can easily be obtained using the Huffman algorithm [12]. Recently, two methods for finding the optimal fix-free code have been developed. One is based on the A* algorithm [13], while the other is based on the concept of dominant sequences [14]. Compared to the Huffman algorithm, these methods are very complex.
A source with n symbols having Huffman code with codelength vector Ln = (1, 2, 3, ⋯, n − 2, n − 1, n − 1) is called an anti-uniform source [15,16]. Such sources have been shown to correspond to particular probability distributions. For example, it was shown in [17] and [18], respectively, that the normalized tail of the Poisson distribution and the geometric distribution with success probability greater than some critical value are anti-uniform sources. It was demonstrated in [15,16] that a source with probability vector Pn = (p1, p2, ⋯, pn) where p1 ≥ p2 ≥ pn, is anti-uniform if and only if
j = i + 2 n p j p i for 1 i n 3.
As mentioned above, finding an optimal fix-free or symmetric fix-free code is complex. Thus, in this paper optimal fix-free and symmetric fix-free codes are determined for anti-uniform sources. In particular, it is proven that
C n * = ( 0 , 11 , 101 , 1001 , , 1 0 0 n 2 1 ) ,
is an optimal fix-free code for this class of sources. Since C n * is symmetric, this code is also an optimal symmetric fix-free code. Although for an anti-uniform source, the difference between the average codelength of the optimal prefix-free code and C n * is small (it is exactly equal to pn), it is not straightforward to prove that C n * is the optimal fix-free code. In [19], the optimality of C n * among symmetric fix-free codes for a family of exponential probability distributions, which is an anti-uniform source, was discussed.
In [20], another class of fix-free codes called weakly symmetric fix-free codes was examined. A fix-free code is weakly-symmetric if the reverse of each codeword is also a codeword. In fact, every symmetric fix-free code is a weakly symmetric fix-free code, and every weakly symmetric fix-free code is a fix-free code. Thus, since the optimal code among fix-free codes and symmetric fix-free codes for anti-uniform sources is C n *, this code is also optimal for weakly symmetric fix-free codes.
The remainder of this paper is organized as follows. In Section 2, a sketch of the proofs of the main theorems, i.e., Theorems 1 and 3, is provided, followed by the main results of the paper. Then detailed proofs of these results are given in Section 3.

2. A Sketch of the Proofs

Since a fix-free code is also a prefix-free code, the Kraft sum of an optimal fix-free code is not greater than 1. Therefore, the Kraft sum of this code is either equal to 1 or smaller than 1.
Proposition 1. If
( 1 * , , n 1 * , n * ) = arg min L n : S ( L n ) 1 i = 1 n p i i ,
then we have
( 1 * , , n 1 * , n * + 1 ) = arg min L n : S ( L n ) < 1 i = 1 n p i i .
It can be inferred from Proposition 1 that if the Kraft sum of an optimal fix-free code is smaller than 1, then the average codelength of this code is not better than the codelength vector ( 1 * , , n 1 * , n * + 1 ). The optimal prefix-free code for an anti-uniform source has codelength vector (1, 2, ⋯, n − 1, n − 1). Therefore, the optimal codelength vector with Kraft sum smaller than 1 for an anti-uniform source is the codelength vector (1, 2, ⋯, n−1, n). Further, the codelength vector of C n * is (1, 2, ⋯, n−1, n). Thus, if the Kraft sum of the optimal fix-free code for an anti-uniform source is smaller than 1, then the code C n * is optimal.
Proposition 2. There is no symmetric fix-free code with Kraft sum 1 for n > 2.
According to Proposition 2, the Kraft sum for an optimal symmetric fix-free code is smaller than 1. Thus, Propositions 1 and 2 prove the following theorem.
Theorem 1. The optimal symmetric fix-free code for an anti-uniform source Pn is the code C n *.
There exist fix-free codes with Kraft sum 1, for example (00, 01, 10, 11) and (01, 000, 100, 110, 111, 0010, 0011, 1010, 1011) [21]. Therefore, proving that the code C n * is the optimal fix-free code for anti-uniform sources requires that the average codelength for this code be better than every possible codelength for a fix-free code. To achieve this, we use the following theorem which was proven in [21].
Theorem 2. [21] Let Ln = (1, ⋯, n), Mi(Ln) = |{j|ℓj = i}| for 1 ≤ i ≤ max1≤jnj and H i = 2 i 1 i j = 1 i 2 ( i , j ) where (i, j) denotes the greatest common divisor of i and j. If S(Ln) = 1, Mi(Ln) > Hi for some i and|{1, ⋯, n}| > 1, then no fix-free code exists with codelength vector Ln.
According to the definition of Hi, we have that H1 = 0 and H2 = 1. Therefore from Theorem 2, for Ln with Kraft sum 1 and
M 1 ( L n ) > 0 and L n ( 1 , 1 ) ,
or
M 2 ( L n ) > 1 and L n ( 2 , 2 , 2 , 2 ) ,
there is no fix-free code.
Definition 1. For a given n, let
L n = { L n | S ( L n ) = 1 , M 1 ( L n ) = 0 and M 2 ( L n ) 1 } .
From Theorem 2, if the Kraft sum of the optimal fix-free code is equal to 1, then the average codelength for this code is not smaller than that of the optimal codelength vector among those in L n for n > 4. It can easily be verified that | L n | = 0 for n < 7. For anti-uniform sources, the following proposition characterizes the optimal codelength vector in L n for n ≥ 7.
Proposition 3. Let Pn = (p1, ⋯, pn−1, pn) be the probability vector of an anti-uniform source with p1 ≥ pn−1 ≥ pn. Then we have
arg min L n L n i = 1 n p i i = { ( 2 , 3 , 3 , 3 , 3 , 3 , 3 , ) i f n = 7 ( 2 , 3 , 3 , 3 , 3 , 3 , 4 , 5 , 6 , , n 6 , n 5 , n 4 , n 4 ) , i f n > 7
The last step requires that the average codelength of C n * is better than that of the given codelength vector in Proposition 3. This is given in the proof of the following theorem.
Theorem 3. The optimal fix-free code for an anti-uniform source Pn is C n * for n > 4.
Note that Theorem 3 is not true for n = 4. For example, for P 4 = ( 1 3 , 1 3 , 1 6 , 1 6 ) which is the probability vector of an anti-uniform source, the average codelength of the fix-free code (00, 01, 10, 11) is better than that of C n *.

3. Proofs of the Results in Section 2

Proof of Proposition 1: Let Ln = (1, ⋯, n−1, n) with 1 ≤ ⋯ ≤ n−1n and 1 > S(Ln). Thus, we have 2 n > 2 n S ( L n ) = i = 1 n 2 n i, and consequently
2 n 1 + i = 1 n 2 n i .
Therefore, we can write
1 i = 1 n 2 i + 2 n = i = 1 n 1 2 i + 2 ( n 1 ) .
Let L n = ( 1 , , n 1 , n ) such that i = i for 1 ≤ i ≤ n − 1, and n = n 1. From (2), we have that S ( L n ) 1. According to the definition of ( 1 * , , n 1 * , n * ), we can write
i = 1 n p i i * i = 1 n p i i ,
and consequently
i = 1 n 1 p i i * + ( n * + 1 ) p n = i = 1 n p i i * + p n i = 1 n p i i + p n = i = 1 n p i i .
This shows that the average codelength of ( 1 * , , n 1 * , n * + 1 ) is better than any other codelength vector, say Ln, with Kraft sum smaller than 1.
Proof of Proposition 2: Suppose that the Kraft sum of Ln, which is the codelength vector of the code Cn, is equal to 1. Let codeword c = x1x2·x ℓ−10 (resp. c = x1x2·xℓ−11) with length ( > 1), be the longest codeword of Cn. Since the Kraft sum of Ln is equal to 1, the codeword c′ = x1x2·xℓ−11 (resp. c′ = x1x2·xℓ−10) belongs to Cn. However, both c and c′ cannot be symmetric because x1 = 0 and x1 = 1 cannot both be true. Thus, Cn is not a symmetric fix-free code.
The following lemma will be used in the proof of Proposition 3.
Lemma 1. For n ≥ 7, let Pn = (p1, ⋯, pn−1, pn) with p1 ≥ pn−1 ≥ pn and
P n 1 = ( p 1 , , p n 2 , p n 1 ) = ( p 1 , , p n 2 , p n 1 + p n ) .
Further, suppose that
L n * = ( 1 * , , n 1 * , n * ) = arg min L n L n i = 1 n p i i ,
and for n > 7
L n 1 = ( 1 , , n 2 , n 1 ) = arg min L n 1 L n 1 i = 1 n 1 p i i .
Then we have
( 1 * , , n 2 * , n 1 * , n * ) = { ( 2 , 3 , 3 , 3 , 3 , 3 , 3 ) , i f n = 7 ( 1 , , n 2 , n 1 + 1 , n 1 + 1 ) , i f n > 7
Proof. It can easily be verified that L 7 consists of all permutations of 2, 3, 3, 3, 3, 3, 3. Thus, p1 ≥ p6 ≥ p7 completes the proof for n = 7. To prove the lemma for n > 7, we consider two cases: (1) n * = 3, and (2) n * > 3. First, note that n * = max 1 i n i, because p1 ≥ pn−1 ≥ pn.
  • n * = 3: It can easily be shown that (3, 3, 3, 3, 3, 3, 3, 3) is the only codelength vector in n > 7 L n with maximum codelength which is a not greater than 3. Therefore, to prove the lemma in this case it is enough to show that ( 1 , , 7 ) = ( 3 , 3 , 3 , 3 , 3 , 3 , 2 ). According to the first argument in this proof, the codelength vector ( 1 , , 7 ) is a permutation of 2, 3, 3, 3, 3, 3, 3. Thus, proving that p7 + p8 is maximum over all probabilities in P 7, i.e., p7 + p8 ≥ p1, completes the proof for this case. If p7 + p8 < p1, then the average codelength of ( 1 , , 8 ) = ( 2 , 3 , 3 , 3 , 3 , 3 , 4 , 4 ), is better than that of ( 1 * , , 8 * ) = ( 3 , 3 , 3 , 3 , 3 , 3 , 3 , 3 ), because
    i = 1 8 p i i * = 3 p 1 + 3 ( 1 p 1 ) > ( a ) 2 p 1 + 3 ( 1 p 1 ) + p 7 + p 8 = i = 1 8 p i i ,
    where (a) follows from p7 + p8 < p1. Thus, the codelength vector (3, 3, 3, 3, 3, 3, 3) is not optimal, which is a contradiction. Therefore, p7 + p8 ≥ p1 and the proof for this case is complete.
  • n * > 3: The proof for this case is similar to the proof of the Huffman algorithm. Let L n = ( 1 , , n ) L n, i.e., S(Ln) = 1, M1(Ln) = 0 and M2(Ln) 1, with 1 ≤ ℓn and 3 < ℓn, and let L n 1 = ( 1 , , n 1 ) = ( 1 , , n 2 , n 1 1 ). It can easily be shown that S(Ln) = 1 implies that n = n−1. Since n = n−1, we have S ( L n 1 ) = S ( L n ), and consequently S ( L n 1 ) = 1. Further, since n = max 1 i n i and n > 3, M1(Ln) = 0 and M2(Ln) 1 imply that M 1 ( L n 1 ) = 0 and M 2 ( L n 1 ) 1, which gives L n 1 L n 1, and we can write
    i = 1 n p i i * = min L n L n i = 1 n p i i ( a ¯ ¯ ) [ min L n L n i = 1 n 2 p i i + ( p n 1 + p n ) ( n 1 1 ) ] + p n 1 + p n ( b ¯ ¯ ) [ min L n 1 L n 1 i = 1 n 1 p i i ] + p n 1 + p n ( c ¯ ¯ ) i = 1 n 1 p i i + p n 1 + p n ,
    where (a) follows from n * = n 1 *, (b) follows from the definition of L n 1 and (3), and (c) follows from (4). Therefore, for n > 7, since the average codelength of the given codelength vector in (5) is equal to i = 1 n 1 p i i + p n 1 + p n, this codelength vector is optimal and the proof is complete.
Proof of Proposition 3. The proposition is proved by induction on n. According to Lemma 1, the base of induction, i.e., n = 7, is true. Assume that the proposition is true for all anti-uniform sources with n−1 symbols. Let Pn = (p1, ⋯, pn) be the probability vector of an anti-uniform source. Also, suppose that P n 1 = ( p 1 , , p n 2 , p n 1 ) = ( p 1 , , p n 2 , p n 1 + p n ). From (1), it is obvious that P n 1 is the probability vector of an anti-uniform source and p 1 p 2 p n 3 p n 2 , p n 1. Since we have n 1 = n, where ( 1 , , n 2 , n 1 ) = arg min L n 1 L n 1 i = 1 n 1 p i i, from the induction assumption we can write
( 1 , , n 2 , n 1 ) = { ( 2 , 3 , 3 , 3 , 3 , 3 , 3 ) , if n 1 = 7 ( 2 , 3 , 3 , 3 , 3 , 3 , 4 , 5 , , n 5 , n 5 ) , if n 1 > 7 .
Therefore, we have n 1 = n 5, and Lemma 1 completes the proof.
Proof of Theorem 3. We have that | L 5 | = | L 6 | = 0. Therefore, for n = 5, 6 the proof is complete. The proof for n = 7 is the same as the proof for n > 7, and so is omitted. Now suppose that n > 7. In the following, it is proven that the average codelength vector of ( 1 , n ) = ( 2 , 3 , 3 , 3 , 3 , 3 , 4 , 5 , 6 , , n 4 , n 4 ) is greater than or equal to that of C n *, i.e., i = 1 n i p i, which completes the proof.
i = 1 n p i i = 2 p 1 + 3 ( p 2 + p 3 + p 4 + p 5 + p 6 ) + i = 7 n 1 ( i 3 ) p i + ( n 4 ) p n = i = 1 n i p i + p 1 + p 2 p 4 2 p 5 3 p 6 3 i = 7 n p i p n = i = 1 n i p i + ( p 1 p 3 p 4 p n ) + ( p 2 i = 4 n p i ) + ( p 3 i = 5 n p i ) + ( p 4 i = 6 n p i ) ( a ) i = 1 n i p i ,
where (a) follows from the fact that Pn is the probability vector of an anti-uniform source, i.e., p i j = i + 2 n p j for n − 3 ≥ i ≥ 1.□

Author Contributions

All authors were involved in developing the problem, interpreting the results, and writing the paper. Ali Zaghian and Adel Aghajan conducted the analysis. All authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Schützenberg, M.P. On an application of semigroup methods to some problems in coding. IRE Trans. Inf. Theory. 1956, 2, 47–60. [Google Scholar]
  2. Gilbert, E.N.; Moore, E.F. Variable-length binary encodings. Bell Syst. Tech. J 1959, 38, 933–968. [Google Scholar]
  3. Webb, J.L.H. Efficient table access for reversible variable-length decoding. IEEE Trans. Circuits Syst. Video Technol. 2001, 11, 981–985. [Google Scholar]
  4. Takishima, Y.; Wada, M.; Murakami, H. Reversible variable length codes. IEEE Trans. Commun. 1995, 43, 158–162. [Google Scholar]
  5. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed; Wiley: New York, NY, USA, 2006. [Google Scholar]
  6. Yekhanin, S. Improved upper bound for the redundancy of fix-free codes. IEEE Trans. Inf. Theory. 2004, 50, 2815–2818. [Google Scholar]
  7. Ahlswede, R.; Balkenhol, B.; Khachatrian, L. Some properties of fix-free codes. Proceedings of International Seminar on Coding Theory and Combinatorics, Thahkadzor, Armenia, 6–11 October 1996, 20–33.
  8. Harada, K.; Kobayashi, K. A note on the fix-free code property. IEICE Trans. Fund. 1999, E82-A, 2121–2128. [Google Scholar]
  9. Sergey, Y. Sufficient conditions of existence of fix-free codes. Proceedings of IEEE International Symposium on Information Theory, Washington, DC, USA, 24–29 June 2001, 284.
  10. Ye, C.; Yeung, R.W. Some basic properties of fix-free codes. IEEE Trans. Inf. Theory. 2001, 47, 72–87. [Google Scholar]
  11. Kukorelly, Z.; Zeger, K. Sufficient conditions for existence of binary fix-free codes. IEEE Trans. Inf. Theory. 2005, 51, 3433–3444. [Google Scholar]
  12. Huffman, D.A. A method for the construction of minimum-redundancy codes. Proceedings of I.R.E. September 1952; 40, 1098–1101. [Google Scholar]
  13. Huang, Y.M.; Wu, T.Y.; Han, Y.S. An A*-based algorithm for constructing reversible variable length codes with minimum average codeword length. IEEE Trans. Commun. 2010, 58, 3175–3185. [Google Scholar]
  14. Savari, S.A.; Hossein Tabatabaei Yazdi, S.M.; Abedini, N.; Khatri, S.P. On optimal and achievable fix-free codes. IEEE Trans. Inf. Theory. 2012, 58, 5112–5129. [Google Scholar]
  15. Esmaeili, M.; Kakhbod, A. On antiuniform and partially antiuniform sources. Proceedings of IEEE International Conference on Communications, Istanbul, Turkey, 11–15 June 2006, 1611–1615.
  16. Esmaeili, M.; Kakhbod, A. On information theory parameters of infinite anti-uniform sources. IET Commun. 2007, 1, 1039–1041. [Google Scholar]
  17. Humblet, P. Optimal source coding for a class of integer alphabets. IEEE Trans. Inf. Theory. 1978, 24, 110–112. [Google Scholar]
  18. Kato, A.; Han, T.; Nagaoka, H. Huffman coding with an infinite alphabet. IEEE Trans. Inf. Theory. 1996, 42, 977–984. [Google Scholar]
  19. Tsai, C.-W.; Wu, J.-L. Modified symmetrical reversible variable-length code and its theoretical bounds. IEEE Trans. Inf. Theory. 2001, 47, 2543–2548. [Google Scholar]
  20. Aghajan, A.; Khosravifard, M. Weakly symmetric fix-free codes. IEEE Trans. Inf. Theory. 2014, 60, 5500–5515. [Google Scholar]
  21. Fraenkel, A.S.; Klein, S.T. Bidirectional Huffman coding. Comput. J 1990, 33, 296–307. [Google Scholar]

Share and Cite

MDPI and ACS Style

Zaghian, A.; Aghajan, A.; Gulliver, T.A. The Optimal Fix-Free Code for Anti-Uniform Sources. Entropy 2015, 17, 1379-1386. https://0-doi-org.brum.beds.ac.uk/10.3390/e17031379

AMA Style

Zaghian A, Aghajan A, Gulliver TA. The Optimal Fix-Free Code for Anti-Uniform Sources. Entropy. 2015; 17(3):1379-1386. https://0-doi-org.brum.beds.ac.uk/10.3390/e17031379

Chicago/Turabian Style

Zaghian, Ali, Adel Aghajan, and T. Aaron Gulliver. 2015. "The Optimal Fix-Free Code for Anti-Uniform Sources" Entropy 17, no. 3: 1379-1386. https://0-doi-org.brum.beds.ac.uk/10.3390/e17031379

Article Metrics

Back to TopTop