On Two-Stage Guessing

Graczyk, Robert; Sason, Igal

doi:10.3390/info12040159

Open AccessArticle

On Two-Stage Guessing

by

Robert Graczyk

¹ and

Igal Sason

^2,*

¹

Signal and Information Processing Laboratory, ETH Zurich, 8092 Zurich, Switzerland

²

Andrew and Erna Viterbi Faculty of Electrical and Computer Engineering, Technion–Israel Institute of Technology, Haifa 3200003, Israel

^*

Author to whom correspondence should be addressed.

Information 2021, 12(4), 159; https://0-doi-org.brum.beds.ac.uk/10.3390/info12040159

Submission received: 26 February 2021 / Revised: 2 April 2021 / Accepted: 6 April 2021 / Published: 9 April 2021

(This article belongs to the Special Issue Statistical Communication and Information Theory)

Download

Browse Figure

Versions Notes

Abstract

:

Stationary memoryless sources produce two correlated random sequences

X^{n}

and

Y^{n}

. A guesser seeks to recover

X^{n}

in two stages, by first guessing

Y^{n}

and then

X^{n}

. The contributions of this work are twofold: (1) We characterize the least achievable exponential growth rate (in n) of any positive

ρ

-th moment of the total number of guesses when

Y^{n}

is obtained by applying a deterministic function f component-wise to

X^{n}

. We prove that, depending on f, the least exponential growth rate in the two-stage setup is lower than when guessing

X^{n}

directly. We further propose a simple Huffman code-based construction of a function f that is a viable candidate for the minimization of the least exponential growth rate in the two-stage guessing setup. (2) We characterize the least achievable exponential growth rate of the

ρ

-th moment of the total number of guesses required to recover

X^{n}

when Stage 1 need not end with a correct guess of

Y^{n}

and without assumptions on the stationary memoryless sources producing

X^{n}

and

Y^{n}

.

Keywords:

guessing; majorization; method of types; Schur concavity; ranking function; Shannon entropy; Rényi entropy; Arimoto–Rényi conditional entropy

1. Introduction

Pioneered by Massey [1], McEliece and Yu [2], and Arikan [3], the guessing problem is concerned with recovering the realization of a finite-valued random variable X using a sequence of yes-no questions of the form “Is

X = x_{1}

?”, “Is

X = x_{2}

?”, etc., until correct. A commonly used performance metric for this problem is the

ρ

-th moment of the number of guesses until X is revealed (where

ρ

is a positive parameter).

When guessing a length-n i.i.d. sequence

X^{n}

(a tuple of n components that are drawn independently according to the law of X), the

ρ

-th moment of the number of guesses required to recover the realization of

X^{n}

grows exponentially with n, and the exponential growth rate is referred to as the guessing exponent. The least achievable guessing exponent was derived by Arikan [3], and it equals the order-

\frac{1}{1 + ρ}

Rényi entropy of X. Arikan’s result is based on the optimal deterministic guessing strategy, which proceeds in descending order of the probability mass function (PMF) of

X^{n}

.

In this paper, we propose and analyze a two-stage guessing strategy to recover the realization of an i.i.d. sequence

X^{n}

. In Stage 1, the guesser is allowed to produce guesses of an ancillary sequence

Y^{n}

that is jointly i.i.d. with

X^{n}

. In Stage 2, the guesser must recover

X^{n}

. We show the following:

1.: When $Y^{n}$ is generated by component-wise application of a mapping $f : X \to Y$ to $X^{n}$ and the guesser is required to recover $Y^{n}$ in Stage 1 before proceeding to Stage 2, then the least achievable guessing exponent (i.e., the exponential growth rate of the $ρ$ -th moment of the total number of guesses in the two stages) equals

$ρ max \{H_{\frac{1}{1 + ρ}} (f (X)), H_{\frac{1}{1 + ρ}} (X | f (X))\},$

(1)

where the maximum is between the order- $\frac{1}{1 + ρ}$ Rényi entropy of $f (X)$ and the conditional Arimoto–Rényi entropy of X given $f (X)$ . We derive (1) in Section 3 and summarize our analysis in Theorem 1. We also propose a Huffman code-based construction of a function f that is a viable candidate for the minimization of (1) among all maps from $X$ to $Y$ (see Algorithm 2 and Theorem 2).
2.: When $X^{n}$ and $Y^{n}$ are jointly i.i.d. according to the PMF $P_{X Y}$ and Stage 1 need not end with a correct guess of $Y^{n}$ (i.e., the guesser may proceed to Stage 2 even if $Y^{n}$ remains unknown), then the least achievable guessing exponent equals

$sup_{Q_{X Y}} (ρ min \{H (Q_{X}), max \{H (Q_{Y}), H (Q_{X ∣ Y})\}\} - D (Q_{X Y} | | P_{X Y})),$

(2)

where the supremum is over all PMFs $Q_{X Y}$ defined on the same set as $P_{X Y}$ ; and $H (\cdot)$ and $D (\cdot ∥ \cdot)$ denote, respectively, the (conditional) Shannon entropy and the Kullback–Leibler divergence. We derive (2) in Section 4 and summarize our analysis in Theorem 3. Parts of Section 4 were presented in the conference paper [4].

Our interest in the two-stage guessing problem is due to its relation to information measures: Analogous to how the Rényi entropy can be defined operationally via guesswork, as opposed to its axiomatic definition, we view (1) and (2) as quantities that capture at what cost and to what extent knowledge of Y helps in recovering X. For example, minimizing (1) over descriptions

f (X)

of X can be seen as isolating the most beneficial information of X in the sense that describing it in any more detail is too costly (the first term of the maximization in (1) exceeds the second), whereas a coarser description leaves too much uncertainty (the second term exceeds the first). Similarly, but with the joint law of

(X, Y)

fixed, (2) quantifies the least (partial) information of Y that benefits recovering X (because an optimal guessing strategy will proceed to Stage 2 when guessing Y no longer benefits guessing X). Note that while (1) and (2) are derived in this paper, studying their information-like properties is a subject of future research (see Section 5).

Besides its theoretic implications, the guessing problem is also applied practically in communications and cryptography. This includes sequential decoding [5,6], and measuring password strength [7], confidentiality of communication channels [8], and resilience against brute-force attacks [9]. It is also strongly related to task encoding and (lossless and lossy) compression (see, e.g., [10,11,12,13,14,15]).

Variations of the guessing problem include guessing under source uncertainty [16], distributed guessing [17,18], and guessing on the Gray–Wyner and the Slepian–Wolf network [19].

2. Preliminaries

We begin with some notation and preliminary material that are essential for the presentation in Section 3 ahead. The analysis in Section 4 relies on the method of types (see, e.g., Chapter 11 in [20]).

Throughout the paper, we use the following notation:

For $m, n \in N$ with $m < n$ , let $[m : n] : = {m, \dots, n}$ ;
Let P be a PMF that is defined on a finite set $X$ . For $k \in [1 : | X |]$ , let $G_{P} (k)$ denote the sum of its k largest point masses, and let $p_{m a x} : = G_{P} (1)$ . For $n \in N$ , denote by $P_{n}$ the set of all PMFs defined on $[1 : n]$ .

The next definitions and properties are related to majorization and Rényi measures.

Definition 1

(Majorization). Consider PMFs P and Q, defined on the same (finite or countably infinite) set

X

. We say that Q majorizes P, denoted

P ≺ Q

, if

G_{P} (k) \leq G_{Q} (k)

for all

k \in [1 : | X |]

. If P and Q are defined on finite sets of different cardinalities, then the PMF defined on the smaller set is zero-padded to match the cardinality of the larger set.

By Definition 1, a unit mass majorizes any other distribution; on the other hand, the uniform distribution (on a finite set) is majorized by any other distribution of equal support.

Definition 2

(Schur-convexity/concavity). A function

f : P_{n} \to R

is Schur-convex if, for every

P, Q \in P_{n}

with

P ≺ Q

, we have

f (P) \leq f (Q)

. Likewise, f is Schur-concave if

- f

is Schur-convex, i.e., if

P ≺ Q

implies that

f (P) \geq f (Q)

.

Definition 3

(Rényi entropy [21]). Let X be a random variable taking values on a finite or countably infinite set

X

according to the PMF

P_{X}

. The order-α Rényi entropy

H_{α} (X)

of X is given by

H_{α} (X) : = \frac{1}{1 - α} log (\sum_{x \in X} P_{X}^{α} (x)), α \in (0, 1) \cup (1, \infty),

(3)

where unless explicitly given, the base of

log (\cdot)

can be chosen arbitrarily, with

exp (\cdot)

denoting its inverse function. Via continuous extension,

\begin{matrix} H_{0} (X) : = log |{x \in X : P_{X} (x) > 0}|, \end{matrix}

(4)

\begin{matrix} H_{1} (X) : = H (X) = - \sum_{x \in X} P_{X} (x) log P_{X} (x), \end{matrix}

(5)

\begin{matrix} H_{\infty} (X) : = log \frac{1}{p_{m} a x}, \end{matrix}

(6)

where

H (X)

is the (Shannon) entropy of X.

Proposition 1

(Schur-concavity of the Rényi entropy, Appendix F.3.a of [22]). The Rényi entropy of any order

α > 0

is Schur-concave (in particular, the Shannon entropy is Schur-concave).

Definition 4

(Arimoto-Rényi conditional entropy [23]). Let

(X, Y)

be a pair of random variables taking values on a product set

X \times Y

according to the PMF

P_{X Y}

. When

X

is finite or countably infinite, the order-α Arimoto–Rényi conditional entropy

H_{α} (X | Y)

of X given Y is defined as follows:

If $α \in (0, 1) \cup (1, \infty)$ ,

$\begin{matrix} H_{α} (X | Y) & : = \frac{α}{1 - α} log E [{(\sum_{x \in X} P_{X | Y}^{α} (x | Y))}^{\frac{1}{α}}] . \end{matrix}$

(7)

When $Y$ is finite, (7) can be simplified as follows:

$\begin{matrix} H_{α} (X | Y) & = \frac{α}{1 - α} log \sum_{y \in Y} {(\sum_{x \in X} P_{X Y}^{α} (x, y))}^{\frac{1}{α}} \end{matrix}$

(8)

$\begin{matrix} = \frac{α}{1 - α} log \sum_{y \in Y} P_{Y} (y) exp (\frac{1 - α}{α} H_{α} (X | Y = y)) . \end{matrix}$

(9)
If $α \in {0, 1, \infty}$ and $Y$ is finite, then, via continuous extension,

$\begin{matrix} H_{0} (X | Y) & = log max_{y \in Y} |supp P_{X | Y} (\cdot | y)| = max_{y \in Y} H_{0} (X | Y = y), \end{matrix}$

(10)

$\begin{matrix} H_{1} (X | Y) & = H (X | Y), \end{matrix}$

(11)

$\begin{matrix} H_{\infty} (X | Y) & = log \frac{1}{\sum_{y \in Y} \{max_{x \in X} P_{X | Y} (x | Y) \cdot P_{Y} (y)\}} . \end{matrix}$

(12)

The properties of the Arimoto–Rényi conditional entropy were studied in [24,25].

Finally, the set

F_{n, m}

with

n, m \in N

denotes the set of all deterministic functions

f : [1 : n] \to [1 : m]

. If

m < n

, then a function

f \in F_{n, m}

is not one-to-one (i.e., it is a non-injective function).

3. Two-Stage Guessing: $Y_{i} = f (X_{i})$

Let

X^{n} : = (X_{1}, \dots, X_{n})

be a sequence of i.i.d. random variables taking values on a finite set

X

. Assume without loss of generality that

X = [1 : | X |]

. Let

m \in [2 : | X | - 1]

,

Y : = [1 : m]

, and let

f : X \to Y

be a fixed deterministic function (i.e.,

f \in F_{| X |, m}

). Consider guessing

X^{n} \in X^{n}

in two stages as follows (Algorithm 1).

Algorithm 1 (two-stage guessing algorithm):

(a)

Stage 1:

Y^{n} : = (f (X_{1}), \dots, f (X_{n})) \in Y^{n}

is guessed by asking questions of the form:

“Is $Y^{n} = {\hat{y}}_{1}$ ?”, “Is $Y^{n} = {\hat{y}}_{2}$ ?”, …

until correct. Note that as

| Y | = m < | X |

, this stage cannot reveal

X^{n}

.

(b)

Stage 2: Based on

Y^{n}

, the sequence

X^{n} \in X^{n}

is guessed by asking questions of the form:

“Is $X^{n} = {\hat{x}}_{1}$ ?”, “Is $X^{n} = {\hat{x}}_{2}$ ?”, …

until correct. If

Y^{n} = y^{n}

, the guesses

{\hat{x}}_{k} : = ({\hat{x}}_{k, 1}, \dots, {\hat{x}}_{k, n})

are restricted to

X

-sequences, which satisfy

f ({\hat{x}}_{k, i}) = y_{i}

for all

i \in [1 : n]

.

The guesses

{\hat{y}}_{1}

,

{\hat{y}}_{2}

, …, in Stage 1 are in descending order of probability as measured by

P_{Y^{n}}

(i.e.,

{\hat{y}}_{1}

is the most probable sequence under

P_{Y^{n}}

;

{\hat{y}}_{2}

is the second most probable; and so on; ties are resolved arbitrarily). We denote the index of

y^{n} \in Y^{n}

in this guessing order by

g_{Y^{n}} (y^{n})

. Note that because every sequence

y^{n}

is guessed exactly once,

g_{Y^{n}} (\cdot)

is a bijection from

Y^{n}

to

[1 : m^{n}]

; we refer to such bijections as ranking functions. The guesses

{\hat{x}}_{1}

,

{\hat{x}}_{2}

, …, in Stage 2 depend on

Y^{n}

and are in descending order of the posterior

P_{X^{n} | Y^{n}} (\cdot | Y^{n})

. Following our notation from Stage 1, the index of

x^{n} \in X^{n}

in the guessing order induced by

Y^{n} = y^{n}

is denoted

g_{X^{n} | Y^{n}} (x^{n} | y^{n})

. Note that for every

y^{n} \in Y^{n}

, the function

g_{X^{n} | Y^{n}} (\cdot | y^{n})

is a ranking function on

X^{n}

. Using

g_{Y^{n}} (\cdot)

and

g_{X^{n} | Y^{n}} (\cdot | \cdot)

, the total number of guesses

G_{2} (X^{n})

in Algorithm 1 can be expressed as

\begin{matrix} G_{2} (X^{n}) = g_{Y^{n}} (Y^{n}) + g_{X^{n} | Y^{n}} (X^{n} | Y^{n}), \end{matrix}

(13)

where

g_{Y^{n}} (Y^{n})

and

g_{X^{n} | Y^{n}} (X^{n} | Y^{n})

are the number of guesses in Stages 1 and 2, respectively. Observe that guessing in descending order of probability minimizes the

ρ

-th moment of the number of guesses in both stages of Algorithm 1. By [3], for every

ρ > 0

, the guessing moments

E [g_{Y^{n}}^{ρ} (Y^{n})]

and

E [g_{X^{n} | Y^{n}}^{ρ} (X^{n} | Y^{n})]

can be (upper and lower) bounded in terms of

H_{\frac{1}{1 + ρ}} (Y)

and

H_{\frac{1}{1 + ρ}} (X | Y)

as follows:

\begin{matrix} {(1 + n ln m)}^{- ρ} exp (n ρ H_{\frac{1}{1 + ρ}} (Y)) & \leq E [g_{Y^{n}}^{ρ} (Y^{n})] \end{matrix}

\begin{matrix} \leq exp (n ρ H_{\frac{1}{1 + ρ}} (Y)), \\ {(1 + n ln | X |)}^{- ρ} exp (n ρ H_{\frac{1}{1 + ρ}} (X | f (X))) & \leq E [g_{X^{n} | Y^{n}}^{ρ} (X^{n} | Y^{n})] \end{matrix}

(14a)

\begin{matrix} \leq exp (n ρ H_{\frac{1}{1 + ρ}} (X | f (X))) . \end{matrix}

(14b)

Combining (13) and (14), we next establish bounds on

E [G_{2} {(X^{n})}^{ρ}]

. In light of (13), we begin with bounds on the

ρ

-th power of a sum.

Lemma 1.

Let

k \in N

, and let

{a_{i}}_{i = 1}^{k}

be a non-negative sequence. For every

ρ > 0

,

\begin{matrix} s_{1} (k, ρ) \sum_{i = 1}^{k} a_{i}^{ρ} \leq {(\sum_{i = 1}^{k} a_{i})}^{ρ} \leq s_{2} (k, ρ) \sum_{i = 1}^{k} a_{i}^{ρ}, \end{matrix}

(15)

where

\begin{matrix} s_{1} (k, ρ) : = \{\begin{matrix} 1, & i f ρ \geq 1 \\ k^{ρ - 1}, & i f ρ \in (0, 1), \end{matrix} \end{matrix}

(16)

and

\begin{matrix} s_{2} (k, ρ) : = \{\begin{matrix} k^{ρ - 1}, & i f ρ \geq 1 \\ 1, & i f ρ \in (0, 1) . \end{matrix} \end{matrix}

(17)

If $ρ \geq 1$ , then the left and right inequalities in (15) hold with equality if, respectively, $k - 1$ of the $a_{i}$ ’s are equal to zero or $a_{1} = \dots = a_{k}$ ;
if $ρ \in (0, 1)$ , then the left and right inequalities in (15) hold with equality if, respectively, $a_{1} = \dots = a_{k}$ or $k - 1$ of the $a_{i}$ ’s are equal to zero.

Proof.

See Appendix A. □

Using the shorthand notation

k_{1} (ρ) : = s_{1} (2, ρ)

,

k_{2} (ρ) : = s_{2} (2, ρ)

, we apply Lemma 1 in conjunction with (13), (14) (and the fact that

| X | \geq m

) to bound

E [G_{2}^{ρ} (X^{n})]

as follows:

\begin{matrix} k_{1} (ρ) {(1 + n ln | X |)}^{- ρ} [exp (n ρ H_{\frac{1}{1 + ρ}} (f (X))) + exp (n ρ H_{\frac{1}{1 + ρ}} (X | f (X)))] \end{matrix}

\begin{matrix} \leq E [G_{2}^{ρ} (X^{n})] \end{matrix}

(18a)

\begin{matrix} \leq k_{2} (ρ) [exp (n ρ H_{\frac{1}{1 + ρ}} (f (X))) + exp (n ρ H_{\frac{1}{1 + ρ}} (X | f (X)))] . \end{matrix}

(18b)

The bounds in (18) are asymptotically tight as n tends to infinity. To see this, note that

\begin{matrix} lim_{n \to \infty} \frac{1}{n} ln (k_{1} (ρ) {(1 + n ln | X |)}^{- ρ}) = 0, lim_{n \to \infty} \frac{1}{n} ln k_{2} (ρ) = 0, \end{matrix}

(19)

and therefore, for all

ρ > 0

,

\begin{matrix} lim_{n \to \infty} \frac{1}{n} ln E [G_{2}^{ρ} (X^{n})] \\ = lim_{n \to \infty} \frac{1}{n} log (exp (n ρ H_{\frac{1}{1 + ρ}} (f (X))) + exp (n ρ H_{\frac{1}{1 + ρ}} (X | f (X)))) . \end{matrix}

(20)

Since the sum of the two exponents on the right-hand side (RHS) of (20) is dominated by the larger exponential growth rate, it follows that

\begin{matrix} lim_{n \to \infty} \frac{1}{n} log (exp (n ρ H_{\frac{1}{1 + ρ}} (f (X))) + exp (n ρ H_{\frac{1}{1 + ρ}} (X | f (X)))) \\ = ρ max \{H_{\frac{1}{1 + ρ}} (f (X)), H_{\frac{1}{1 + ρ}} (X | f (X))\}, \end{matrix}

(21)

and thus, by (20) and (21),

\begin{matrix} E_{2} (X; ρ, m, f) & : = lim_{n \to \infty} \frac{1}{n} log E [G_{2}^{ρ} (X^{n})] \end{matrix}

(22)

\begin{matrix} = ρ max \{H_{\frac{1}{1 + ρ}} (f (X)), H_{\frac{1}{1 + ρ}} (X | f (X))\} . \end{matrix}

(23)

As a sanity check, note that if

m = | X |

and f is the identity function id (i.e.,

id (x) = x

for all

x \in X

), then

X^{n}

is revealed in Stage 1, and (with Stage 2 obsolete) the

ρ

-th moment of the total number of guesses grows exponentially with rate

ρ H_{\frac{1}{1 + ρ}} (id (X)) = ρ H_{\frac{1}{1 + ρ}} (X) .

This is in agreement with the RHS of (22), as

\begin{matrix} ρ max \{H_{\frac{1}{1 + ρ}} (id (X)), H_{\frac{1}{1 + ρ}} (X | id (X))\} \end{matrix}

\begin{matrix} = ρ max \{H_{\frac{1}{1 + ρ}} (X), H_{\frac{1}{1 + ρ}} (X | X)\} \end{matrix}

(24)

\begin{matrix} = ρ max \{H_{\frac{1}{1 + ρ}} (X), 0\} \end{matrix}

(25)

\begin{matrix} = ρ H_{\frac{1}{1 + ρ}} (X) . \end{matrix}

(26)

We summarize our results so far in Theorem 1 below.

Theorem 1.

Let

X^{n} = (X_{1}, \dots, X_{n})

be a sequence of i.i.d. random variables, each drawn according to the PMF

P_{X}

of support

X : = [1 : | X |]

. Let

m \in [2 : | X | - 1]

,

f \in F_{| X |, m}

, and define

Y^{n} : = (f (X_{1}), \dots, f (X_{n}))

. When guessing

X^{n}

according to Algorithm 1 (i.e., after first guessing

Y^{n}

in descending order of probability as measured by

P_{Y^{n}} (\cdot)

and proceeding in descending order of probability as measured by

P_{X^{n} | Y^{n}} (\cdot | Y^{n})

for guessing

X^{n}

), the ρ-th moment of the total number of guesses

G_{2} (X^{n})

satisfies

(a): the lower and upper bounds in (18) for all $n \in N$ and $ρ > 0$ ;
(b): the asymptotic characterization (22) for $ρ > 0$ and $n \to \infty$ .

A Suboptimal and Simple Construction of f in Algorithm 1 and Bounds on $E [G_{2}^{ρ} (X^{n})]$

Having established in Theorem 1 that

E [G_{2}^{ρ} (X^{n})] \approx exp (n E_{2} (X; ρ, m, f)), ρ > 0,

(27)

we now seek to minimize the exponent

E_{2} (X; ρ, m, f)

in the RHS of (23) (for given PMF

P_{X}

,

ρ > 0

, and

m \in [2 : | X | - 1]

) over all

f \in F_{| X |, m}

.

We proceed by considering a sub-optimal and simple construction of f, which enables obtaining explicit bounds as a function of the PMF

P_{X}

and the value of m, while this construction also does not depend on

ρ

.

For a fixed

m \in [2 : | X | - 1]

, a non-injective deterministic function

f_{m}^{*} : X \to [1 : m]

is constructed by relying on the Huffman algorithm for lossless compression of

X : = X_{1}

. This construction also (almost) achieves the maximal mutual information

I (X; f (X))

among all deterministic functions

f : X \to [1 : m]

(this issue is elaborated in the sequel). Heuristically, apart from its simplicity, the motivation of this sub-optimal construction can be justified since it is expected to reduce the guesswork in Stage 2 of Algorithm 1, where one wishes to guess

X^{n}

on the basis of the knowledge of

Y^{n}

with

Y_{i} = f (X_{i})

for all

i \in [1 : n]

. In this setting, it is shown that the upper and lower bounds on

E [G_{2}^{ρ} (X^{n})]

are (almost) asymptotically tight in terms of their exponential growth rate in n. Furthermore, these exponential bounds demonstrate a reduction in the required number of guesses for

X^{n}

, as compared to the optimal one-stage guessing of

X^{n}

.

In the sequel, the following construction of a deterministic function

f_{m}^{*} : X \to [1 : m]

is analyzed; this construction was suggested in the proofs of Lemma 5 in [26] and Theorem 2 in [14].

Algorithm 2 (construction of

f_{m}^{*}

):

(a): Let $i : = 1, P_{1} : = P_{X}$ .
(b): If $| supp (P_{i}) | = m$ , let $R : = P_{i}$ , and go to Step c. If not, let $P_{i + 1} : = P_{i}$ with its two least likely symbols merged as in the Huffman code construction. Let $i \leftarrow i + 1$ , and go to Step b.
(c): Construct $f_{m}^{*} \in F_{| X |, m}$ by setting $f_{m}^{*} (k) = j$ if $P_{1} (k)$ has been merged into $R (j)$ .

We now define

Y^{* n} : = (Y_{1}^{*}, \dots, Y_{n}^{*})

with

\begin{matrix} Y_{i}^{*} : = f_{m}^{*} (X_{i}), \forall i \in [1 : n] . \end{matrix}

(28)

Observe that due to [26] (Corollary 3 and Lemma 5) and because

f_{m}^{*} (\cdot)

operates component-wise on the i.i.d. vector

X^{n}

, the following lower bound on

\frac{1}{n} I (X^{n}; Y^{* n})

applies:

\begin{matrix} \frac{1}{n} I (X^{n}; Y^{* n}) & = I (X; f_{m}^{*} (X)) \\ = H (f_{m}^{*} (X)) \\ \geq max_{f \in F_{| X |, m}} H (f (X)) - β^{*} \\ = max_{f \in F_{| X |, m}} I (X; f (X)) - β^{*} \\ = max_{f \in F_{| X |, m}} \frac{1}{n} I (X^{n}; Y^{n}) - β^{*}, \end{matrix}

(29)

where

Y^{n} : = (f (X_{1}), \dots, f (X_{n}))

and

\begin{matrix} β^{*} : = log (\frac{2}{e ln 2}) \approx 0.08607 bits . \end{matrix}

(30)

From the proof of [26] (Theorem 3), we further have the following multiplicative bound:

\begin{matrix} \frac{1}{n} I (X^{n}; Y^{* n}) \geq \frac{10}{11} max_{f \in F_{| X |, m}} \frac{1}{n} I (X^{n}; Y^{n}) . \end{matrix}

(31)

Note that by [26] (Lemma 1), the maximization problem in the RHS of (29) is strongly NP-hard [27]. This means that, unless

P = NP

, there is no polynomial-time algorithm that, given an arbitrarily small

ε > 0

, produces a deterministic function

f^{(ε)} \in F_{| X |, m}

satisfying

\begin{matrix} I (X; f^{(ε)} (X)) \geq (1 - ε) max_{f \in F_{| X |, m}} I (X; f (X)) . \end{matrix}

(32)

We next examine the performance of our candidate function

f_{m}^{*}

when applied in Algorithm 1. To that end, we first bound

E [g_{Y^{n}}^{ρ} (Y^{* n})]

in terms of the Rényi entropy of a suitably defined random variable

{\tilde{X}}_{m} \in [1 : m]

constructed in Algorithm 3 below. In the construction, we assume without loss of generality that

P_{X} (1) \geq \dots \geq P_{X} (| X |)

and denote the PMF of

{\tilde{X}}_{m}

by

Q : = R_{m} (P_{X})

.

Algorithm 3 (construction of the PMF

Q : = R_{m} (P_{X})

of the random variable

{\tilde{X}}_{m}

):

If $m = 1$ , then $Q : = R_{1} (P_{X})$ is defined to be a point mass at one;
If $m = | X |$ , then $Q : = R_{| X |} (P_{X})$ is defined to be equal to $P_{X}$ .

Furthermore, for

m \in [2 : | X | - 1]

,

(a): If $P_{X} (1) < \frac{1}{m}$ , then Q is defined to be the equiprobable distribution on $[1 : m]$ ;
(b): Otherwise, the PMF Q is defined as

$\begin{matrix} Q (i) : = \{\begin{matrix} P_{X} (i), & i f i \in [1 : m^{*}], \\ \frac{1}{m - m^{*}} \sum_{j = m^{*} + 1}^{| X |} P_{X} (j), & i f i \in {m^{*} + 1, \dots, m}, \end{matrix} \end{matrix}$

(33)

where $m^{*}$ is the maximal integer $i \in [1 : m - 1]$ , which satisfies

$\begin{matrix} P_{X} (i) \geq \frac{1}{m - i} \sum_{j = i + 1}^{| X |} P_{X} (j) . \end{matrix}$

(34)

Algorithm 3 was introduced in [26,28]. The link between the Rényi entropy of

{\tilde{X}}_{m}

and that of

Y^{*}

was given in Equation (34) of [14]:

\begin{matrix} H_{α} (Y^{*}) \in [H_{α} ({\tilde{X}}_{m}) - v (α), H_{α} ({\tilde{X}}_{m})], \forall α > 0, \end{matrix}

(35)

where

\begin{matrix} v (α) : = \{\begin{matrix} log (\frac{α - 1}{2^{α} - 2}) - \frac{α}{α - 1} log (\frac{α}{2^{α} - 1}), & α \neq 1 \\ log (\frac{2}{e ln 2}) \approx 0.08607 bits, & α = 1 . \end{matrix} \end{matrix}

(36)

The function

v : (0, \infty) \to (0, log 2)

is depicted in Figure 1 on the following page. It is monotonically increasing, continuous, and it satisfies

\begin{matrix} lim_{α ↓ 0} v (α) = 0, lim_{α \to \infty} v (α) = log 2 (= 1 bit) . \end{matrix}

(37)

Combining (14a) and (35) yields, for all

ρ > 0

,

\begin{matrix} {(1 + n ln m)}^{- ρ} exp (n [ρ H_{\frac{1}{1 + ρ}} ({\tilde{X}}_{m}) - ρ v (\frac{1}{1 + ρ})]) \end{matrix}

\begin{matrix} \leq E [g_{Y^{n}}^{ρ} (Y^{* n})] \end{matrix}

(38a)

\begin{matrix} \leq exp (n ρ H_{\frac{1}{1 + ρ}} ({\tilde{X}}_{m})), \end{matrix}

(38b)

where, due to (30), the difference between the exponential growth rates (in n) of the lower and upper bounds in (38) is equal to

ρ v (\frac{1}{1 + ρ})

, and it can be verified to satisfy (see (30) and (36))

\begin{matrix} 0 < ρ v (\frac{1}{1 + ρ}) < β^{*} \approx 0.08607 log 2, ρ > 0, \end{matrix}

(39)

where the leftmost and rightmost inequalities of (39) are asymptotically tight as we let

ρ \to 0^{+}

and

ρ \to \infty

, respectively.

By inserting (38) into (18) and applying Inequality (39), it follows that for all

ρ > 0

,

\begin{matrix} \frac{k_{1} (ρ)}{{(1 + n ln | X |)}^{ρ}} [exp (n [ρ H_{\frac{1}{1 + ρ}} ({\tilde{X}}_{m}) - β^{*}]) + exp (n ρ H_{\frac{1}{1 + ρ}} (X | f_{m}^{*} (X)))] \end{matrix}

\begin{matrix} \leq E [G_{2}^{ρ} (X^{n})] \end{matrix}

(40a)

\begin{matrix} \leq k_{2} (ρ) [exp (n ρ H_{\frac{1}{1 + ρ}} ({\tilde{X}}_{m})) + exp (n ρ H_{\frac{1}{1 + ρ}} (X | f_{m}^{*} (X)))] . \end{matrix}

(40b)

Consequently, by letting n tend to infinity and relying on (22),

\begin{matrix} max \{ρ H_{\frac{1}{1 + ρ}} ({\tilde{X}}_{m}) - β^{*}, ρ H_{\frac{1}{1 + ρ}} (X | f_{m}^{*} (X))\} \end{matrix}

\begin{matrix} \leq E_{2} (X; ρ, m, f_{m}^{*}) \end{matrix}

(41a)

\begin{matrix} \leq max \{ρ H_{\frac{1}{1 + ρ}} ({\tilde{X}}_{m}), ρ H_{\frac{1}{1 + ρ}} (X | f_{m}^{*} (X))\} . \end{matrix}

(41b)

We next simplify the above bounds by evaluating the maxima in (41) as a function m and

ρ

. To that end, we use the following lemma.

Lemma 2.

For

α > 0

, let the two sequences

{a_{m} (α)}

and

{b_{m} (α)}

be given by

\begin{matrix} a_{m} (α) : = H_{α} ({\tilde{X}}_{m}), \end{matrix}

(42)

\begin{matrix} b_{m} (α) : = H_{α} (X | f_{m}^{*} (X)), \end{matrix}

(43)

with

m \in [1 : | X |]

. Then,

(a): The sequence ${a_{m} (α)}$ is monotonically increasing (in m), and its first and last terms are zero and $H_{α} (X)$ , respectively.
(b): The sequence ${b_{m} (α)}$ is monotonically decreasing (in m), and its first and last terms are $H_{α} (X)$ and zero, respectively.
(c): If $supp P_{X} = X$ , then ${a_{m} (α)}$ is strictly monotonically increasing, and ${b_{m} (α)}$ is strictly monotonically decreasing. In particular, for all $m \in [2 : | X | - 1]$ , $a_{m} (α)$ and $b_{m} (α)$ are positive and strictly smaller than $H_{α} (X)$ .

Proof.

See Appendix B. □

Since symbols of probability zero (i.e.,

x \in X

for which

P_{X} (x) = 0

) do not contribute to the expected number of guesses, assume without loss of generality that

supp P_{X} = X

. In view of Lemma 2, we can therefore define

\begin{matrix} m_{ρ}^{*} & = m_{ρ}^{*} (P_{X}) : = min \{m \in [2 : | X |] : a_{m} (\frac{1}{1 + ρ}) \geq b_{m} (\frac{1}{1 + ρ})\} . \end{matrix}

(44)

Using (44), we simplify (41) as follows:

(a): If $m < m_{ρ}^{*}$ , then

$\begin{matrix} E_{2} (ρ, m) = ρ H_{\frac{1}{1 + ρ}} (X | f_{m}^{*} (X)) . \end{matrix}$

(45)
(b): Otherwise, if $m \geq m_{ρ}^{*}$ , then

$\begin{matrix} ρ H_{\frac{1}{1 + ρ}} ({\tilde{X}}_{m}) - β^{*} \leq E_{2} (ρ, m) \leq ρ H_{\frac{1}{1 + ρ}} ({\tilde{X}}_{m}) . \end{matrix}$

(46)

Note that when guessing

X^{n}

directly, the

ρ

-th moment of the number of guesses grows exponentially with rate

ρ H_{\frac{1}{1 + ρ}} (X)

(cf. (14a)), due to Item c) in Lemma 2, since

H_{\frac{1}{1 + ρ}} ({\tilde{X}}_{m}) = a_{m} (\frac{1}{1 + ρ}) < a_{| X |} (\frac{1}{1 + ρ}) = H_{\frac{1}{1 + ρ}} (X),

(47)

and also because conditioning (on a dependent chance variable) strictly reduces the Rényi entropy [24]. Equations (45) and (46) imply that for any

m \in [2 : | X | - 1]

, guessing in two stages according to Algorithm 1 with

f = f_{m}^{*}

reveals

X^{n}

sooner (in expectation) than guessing

X^{n}

directly.

We summarize our findings in this section in the second theorem below.

Theorem 2.

For a given PMF

P_{X}

of support

X = [1 : | X |]

and

m \in [2 : | X | - 1]

, let the function

f_{m}^{*} \in F_{| X |, m}

be constructed according to Algorithm 2, and let the random variable

{\tilde{X}}_{m}

be constructed according to the PMF in Algorithm 3. Let

X^{n}

be i.i.d. according to

P_{X}

, and let

Y^{n} : = (f_{m}^{*} (X_{1}), \dots, f_{m}^{*} (X_{n}))

. Finally, for

ρ > 0

, let

\begin{matrix} E_{1} (ρ) : = ρ H_{\frac{1}{1 + ρ}} (X) \end{matrix}

(48)

be the optimal exponential growth rate of the ρ-th moment single-stage guessing of

X^{n}

, and let

\begin{matrix} E_{2} (ρ, m) : = E_{2} (X; ρ, m, f_{m}^{*}) \end{matrix}

(49)

be given by (23) with

f : = f_{m}^{*}

. Then, the following holds:

(a): Cicalese et al., [26]: The maximization of the (normalized) mutual information $\frac{1}{n} I (X^{n}; Y^{n})$ over all the deterministic functions $f : X \to [1 : m]$ is a strongly NP-hard problem (for all n). However, the deterministic function $f : = f_{m}^{*}$ almost achieves this maximization up to a small additive term, which is equal to $β^{*} : = log (\frac{2}{e ln 2}) \approx 0.08607 log 2$ , and also up to a multiplicative term, which is equal to $\frac{10}{11}$ (see (29)–(31)).
(b): The ρ-th moment of the number of guesses for $X^{n}$ , which is required by the two-stage guessing in Algorithm 1, satisfies the non-asymptotic bounds in (40a) and (40b).
(c): The asymptotic exponent $E_{2} (ρ, m)$ satisfies (45) and (46).
(d): For all $ρ > 0$ ,

$\begin{matrix} E_{1} (ρ) - E_{2} (ρ, m) \geq ρ [H_{\frac{1}{1 + ρ}} (X) - max \{H_{\frac{1}{1 + ρ}} ({\tilde{X}}_{m}), H_{\frac{1}{1 + ρ}} (X | f_{m}^{*} (X))\}] > 0, \end{matrix}$

(50)

so there is a reduction in the exponential growth rate (as a function of n) of the required number of guesses for $X^{n}$ by Algorithm 1 (in comparison to the optimal one-stage guessing).

4. Two-Stage Guessing: Arbitrary $(X, Y)$

We next assume that

X^{n}

and

Y^{n}

are drawn jointly i.i.d. according to a given PMF

P_{X Y}

, and we drop the requirement that Stage 1 need reveal

Y^{n}

prior to guessing

X^{n}

. Given

ρ > 0

, our goal in this section is to find the least exponential growth rate (in n) of the

ρ

-th moment of the total number of guesses required to recover

X^{n}

. Since Stage 1 may not reveal

Y^{n}

, we can no longer express the number of guesses using the ranking functions

g_{Y^{n}} (\cdot)

and

g_{X^{n} | Y^{n}} (\cdot | \cdot)

as in Section 3, and need new notation to capture the event that

Y^{n}

was not guessed in Stage 1. To that end, let

G_{n}

be a subset of

Y^{n}

, and let the ranking function

{\tilde{g}}_{Y^{n}} : G_{n} \to [1 : | G_{n} |]

(51)

denote the guessing order in Stage 1 with the understanding that if

Y^{n} \notin G_{n}

, then

{\tilde{g}}_{Y^{n}} (Y^{n}) = | G_{n} |

(52)

and the guesser moves on to Stage 2 knowing only that

Y^{n} \notin G_{n}

. We denote the guessing order in Stage 2 by

{\tilde{g}}_{X^{n} | Y^{n}} : X^{n} \times Y^{n} \to {[1 : | X |}^{n}],

(53)

where, for every

y^{n} \in Y^{n}

,

{\tilde{g}}_{X^{n} | Y^{n}} (\cdot | y^{n})

is a ranking function on

[1 : | X |^{n}]

that satisfies

{\tilde{g}}_{X^{n} | Y^{n}} (\cdot | y^{n}) = {\tilde{g}}_{X^{n} | Y^{n}} (\cdot | η^{n}), \forall y^{n}, η^{n} \notin G_{n} .

(54)

Note that while

{\tilde{g}}_{Y^{n}}

and

{\tilde{g}}_{X^{n} | Y^{n}}

depend on

G_{n}

, we do not make this dependence explicit. In the remainder of this section, we prove the following variational characterization of the least exponential growth rate of the

ρ

-th moment of the total number of guesses in both stages

{\tilde{g}}_{Y^{n}} (Y^{n}) + {\tilde{g}}_{X^{n} | Y^{n}} (X^{n} | Y^{n})

.

Theorem 3.

If

(X^{n}, Y^{n})

are i.i.d. according to

P_{X Y}

, then for all

ρ > 0

\begin{matrix} lim_{n \to \infty} min_{{\tilde{g}}_{Y^{n}}, {\tilde{g}}_{X^{n} | Y^{n}}} \frac{1}{n} log E [{({\tilde{g}}_{Y^{n}} (Y^{n}) + {\tilde{g}}_{X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \\ = sup_{Q_{X Y}} (ρ min \{H (Q_{X}), max \{H (Q_{Y}), H (Q_{X ∣ Y})\}\} - D (Q_{X Y} | | P_{X Y})), \end{matrix}

(55)

where the supremum on the RHS of (55) is over all PMFs

Q_{X Y}

on

X \times Y

(and the limit exists).

Note that if

P_{X Y}

is such that

Y = f (X)

, then the RHS of (55) is less than or equal to the RHS of (21). In other words, the guessing exponent of Theorem 3 is less than or equal to the guessing exponent of Theorem 1. This is due to the fact that guessing

Y^{n}

in Stage 1 before proceeding to Stage 2 (the strategy examined in Section 3) is just one of the admissible guessing strategies of Section 4 and not necessarily the optimal one.

We prove Theorem 3 in two parts: First, we show that the guesser can be assumed cognizant of the empirical joint type of

(X^{n}, Y^{n})

; by invoking the law of total expectation, averaging over denominator-n types

Q_{X Y}

on

X \times Y

, we reduce the problem to evaluating the LHS of (55) under the assumption that

(X^{n}, Y^{n})

is drawn uniformly at random from a type class

T^{(n)} (Q_{X Y})

(instead of being i.i.d.

P_{X Y}

). We conclude the proof by solving this reduced problem, showing in particular that when

(X^{n}, Y^{n})

is drawn uniformly at random from a type class, the LHS of (55) can be achieved either by guessing

Y^{n}

in Stage 1 or skipping Stage 1 entirely.

We begin with the first part of the proof and show that the guesser can be assumed cognizant of the empirical joint type of

(X^{n}, Y^{n})

; we formalize and prove this claim in Corollary 1, which we derive from Lemma 3 below.

Lemma 3.

Let

{\tilde{g}}_{Y^{n}}^{*}

and

{\tilde{g}}_{X^{n} | Y^{n}}^{*}

be ranking functions that minimize the expectation in the LHS of (55), and likewise, let

{\tilde{g}}_{T; Y^{n}}^{*}

and

{\tilde{g}}_{T; X^{n} | Y^{n}}^{*}

be ranking functions cognizant of the empirical joint type

Π_{X^{n} Y^{n}}

of

(X^{n}, Y^{n})

that minimize the expectation in the LHS of (55) over all ranking functions depending on

Π_{X^{n} Y^{n}}

. Then, there exist positive constants a and k, which are independent of n, such that

E [{({\tilde{g}}_{Y^{n}}^{*} (Y^{n}) + {\tilde{g}}_{X^{n} | Y^{n}}^{*} (X^{n} | Y^{n}))}^{ρ}] \leq E [{({\tilde{g}}_{T; Y^{n}}^{*} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}}^{*} (X^{n} | Y^{n}))}^{ρ}] k n^{a} .

(56)

Proof.

See Appendix C. □

Corollary 1.

If the limit

lim_{n \to \infty} \frac{1}{n} log E [{({\tilde{g}}_{T; Y^{n}}^{*} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}}^{*} (X^{n} | Y^{n}))}^{ρ}], ρ > 0,

(57)

exists, so does the limit in the LHS of (55), and the two are equal.

Proof.

By (56),

\begin{matrix} \underset{n \to \infty}{lim sup} min_{{\tilde{g}}_{Y^{n}}, {\tilde{g}}_{X^{n} | Y^{n}}} \frac{1}{n} log E [{({\tilde{g}}_{Y^{n}} (Y^{n}) + {\tilde{g}}_{X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \end{matrix}

\begin{matrix} = \underset{n \to \infty}{lim sup} \frac{1}{n} log E [{({\tilde{g}}_{Y^{n}}^{*} (Y^{n}) + {\tilde{g}}_{X^{n} | Y^{n}}^{*} (X^{n} | Y^{n}))}^{ρ}] \end{matrix}

(58)

\begin{matrix} \leq \underset{n \to \infty}{lim sup} \frac{1}{n} log (E [{({\tilde{g}}_{T; Y^{n}}^{*} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}}^{*} (X^{n} | Y^{n}))}^{ρ}] k n^{a}) \end{matrix}

(59)

\begin{matrix} = \underset{n \to \infty}{lim sup} \frac{1}{n} (log E [{({\tilde{g}}_{T; Y^{n}}^{*} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}}^{*} (X^{n} | Y^{n}))}^{ρ}] + log (k) + a log (n)) \end{matrix}

(60)

\begin{matrix} = \underset{n \to \infty}{lim sup} \frac{1}{n} log E [{({\tilde{g}}_{T; Y^{n}}^{*} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}}^{*} (X^{n} | Y^{n}))}^{ρ}] \end{matrix}

(61)

\begin{matrix} = lim_{n \to \infty} \frac{1}{n} log E [{({\tilde{g}}_{T; Y^{n}}^{*} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}}^{*} (X^{n} | Y^{n}))}^{ρ}] . \end{matrix}

(62)

The inverse inequality

\begin{matrix} \underset{n \to \infty}{lim inf} min_{{\tilde{g}}_{Y^{n}}, {\tilde{g}}_{X^{n} | Y^{n}}} \frac{1}{n} log E [{({\tilde{g}}_{Y^{n}} (Y^{n}) + {\tilde{g}}_{X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \\ \geq lim_{n \to \infty} \frac{1}{n} log E [{({\tilde{g}}_{T; Y^{n}}^{*} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}}^{*} (X^{n} | Y^{n}))}^{ρ}] \end{matrix}

(63)

follows from the fact that an optimal guessing strategy depending on the empirical joint type

Π_{X^{n} Y^{n}}

of

(X^{n}, Y^{n})

cannot be outperformed by a guessing strategy ignorant of

Π_{X^{n} Y^{n}}

. □

Corollary 1 states that the minimization in the LHS of (55) can be taken over guessing strategies cognizant of the empirical joint type of

(X^{n}, Y^{n})

. As we show in the next lemma, this implies that evaluating the LHS of (55) can be further simplified by taking the expectation with

(X^{n}, Y^{n})

drawn uniformly at random from a type class (instead of being i.i.d.

P_{X Y}

).

Lemma 4.

Let

E_{Q_{X Y}}

denote expectation with

(X^{n}, Y^{n})

drawn uniformly at random from the type class

T^{(n)} (Q_{X Y})

. Then, the following limits exist and

\begin{matrix} lim_{n \to \infty} min_{{\tilde{g}}_{T; Y^{n}}, {\tilde{g}}_{T; X^{n} | Y^{n}}} \frac{1}{n} log E [{({\tilde{g}}_{T; Y^{n}} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \\ = lim_{n \to \infty} max_{Q_{X Y}} (min_{{\tilde{g}}_{T; Y^{n}}, {\tilde{g}}_{T; X^{n} | Y^{n}}} \frac{1}{n} log E_{Q_{X Y}} [{({\tilde{g}}_{T; Y^{n}} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \\ - D (Q_{X Y} ∥ P_{X Y})), \end{matrix}

(64)

where the maximum in the RHS of (64) is taken over all denominator-n types on

X \times Y

.

Proof.

Recall that

Π_{X^{n} Y^{n}}

is the empirical joint type of

(X^{n}, Y^{n})

, and let

T_{n} (X \times Y)

denote the set of all denominator-n types on

X \times Y

. We prove Lemma 4 by applying the law of total expectation to the LHS of (64) (averaging over the events

{Π_{X^{n} Y^{n}} = Q_{X Y}}, Q_{X Y} \in T_{n} (X \times Y)

) and by approximating the probability of observing a given type using standard tools from large deviations theory. We first show that the LHS of (64) is upper bounded by its RHS:

\begin{matrix} lim_{n \to \infty} min_{{\tilde{g}}_{T; Y^{n}}, {\tilde{g}}_{T; X^{n} | Y^{n}}} \frac{1}{n} log E [{({\tilde{g}}_{T; Y^{n}} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \\ = lim_{n \to \infty} \frac{1}{n} log (\sum_{Q_{X Y}} min_{{\tilde{g}}_{T; Y^{n}}, {\tilde{g}}_{T; X^{n} | Y^{n}}} E_{Q_{X Y}} [{({\tilde{g}}_{T; Y^{n}} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \end{matrix}

\begin{matrix} P [Π_{X^{n} Y^{n}} = Q_{X Y}]) \\ \leq lim_{n \to \infty} \frac{1}{n} log (max_{Q_{X Y}} min_{{\tilde{g}}_{T; Y^{n}}, {\tilde{g}}_{T; X^{n} | Y^{n}}} E_{Q_{X Y}} [{({\tilde{g}}_{T; Y^{n}} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \end{matrix}

(65)

\begin{matrix} P [Π_{X^{n} Y^{n}} = Q_{X Y}] n^{α}) \\ = lim_{n \to \infty} \frac{1}{n} log (max_{Q_{X Y}} min_{{\tilde{g}}_{T; Y^{n}}, {\tilde{g}}_{T; X^{n} | Y^{n}}} E_{Q_{X Y}} [{({\tilde{g}}_{T; Y^{n}} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \end{matrix}

(66)

\begin{matrix} P [Π_{X^{n} Y^{n}} = Q_{X Y}]) \\ \leq lim_{n \to \infty} \frac{1}{n} log (max_{Q_{X Y}} min_{{\tilde{g}}_{T; Y^{n}}, {\tilde{g}}_{T; X^{n} | Y^{n}}} E_{Q_{X Y}} [{({\tilde{g}}_{T; Y^{n}} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \end{matrix}

(67)

\begin{matrix} 2^{- n D (Q_{X Y} ∥ P_{X Y})}), \end{matrix}

(68)

where (66) holds for sufficiently large

α

because the number of types grows polynomially in n (see Appendix C); and (68) follows from [20] (Theorem 11.1.4). We next show that the LHS of (64) is also lower bounded by its RHS:

\begin{matrix} lim_{n \to \infty} min_{{\tilde{g}}_{T; Y^{n}}, {\tilde{g}}_{T; X^{n} | Y^{n}}} \frac{1}{n} log E [{({\tilde{g}}_{T; Y^{n}} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \\ = lim_{n \to \infty} \frac{1}{n} log (\sum_{Q_{X Y}} min_{{\tilde{g}}_{T; Y^{n}}, {\tilde{g}}_{T; X^{n} | Y^{n}}} E_{Q_{X Y}} [{({\tilde{g}}_{T; Y^{n}} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \end{matrix}

\begin{matrix} P [Π_{X^{n} Y^{n}} = Q_{X Y}]) \\ \geq lim_{n \to \infty} \frac{1}{n} log (max_{Q_{X Y}} min_{{\tilde{g}}_{T; Y^{n}}, {\tilde{g}}_{T; X^{n} | Y^{n}}} E_{Q_{X Y}} [{({\tilde{g}}_{T; Y^{n}} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \end{matrix}

(69)

\begin{matrix} P [Π_{X^{n} Y^{n}} = Q_{X Y}]) \\ \geq lim_{n \to \infty} \frac{1}{n} log (max_{Q_{X Y}} min_{{\tilde{g}}_{T; Y^{n}}, {\tilde{g}}_{T; X^{n} | Y^{n}}} E_{Q_{X Y}} [{({\tilde{g}}_{T; Y^{n}} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \end{matrix}

(70)

\begin{matrix} 2^{- n [D (Q_{X Y} ∥ P_{X Y}) + δ_{n}]}) \\ \geq lim_{n \to \infty} \frac{1}{n} log (max_{Q_{X Y}} min_{{\tilde{g}}_{T; Y^{n}}, {\tilde{g}}_{T; X^{n} | Y^{n}}} E_{Q_{X Y}} [{({\tilde{g}}_{T; Y^{n}} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \end{matrix}

(71)

\begin{matrix} 2^{- n D (Q_{X Y} ∥ P_{X Y})}), \end{matrix}

(72)

where in (71),

\begin{matrix} δ_{n} : = \frac{(| X | | Y | - 1) log (n + 1)}{n} \end{matrix}

(73)

tends to zero as we let n tend to infinity, and the inequality in (71) follows again from [20] (Theorem 11.1.4). Together, (68) and (71) imply the equality in (64). □

We now have established the first part of the proof of Theorem 3: In Corollary 1, we showed that the ranking functions

{\tilde{g}}_{Y^{n}}

and

{\tilde{g}}_{X^{n} | Y^{n}}

can be assumed cognizant of the empirical joint type of

(X^{n}, Y^{n})

, and in Lemma 4, we showed that under this assumption, the minimization of the

ρ

-th moment of the total number of guesses can be carried out with

(X^{n}, Y^{n})

drawn uniformly at random from a type class.

Below, we give the second part of the proof: We show that if the pair

(X^{n}, Y^{n})

is drawn uniformly at random from

T^{(n)} (Q_{X Y})

, then

\begin{matrix} lim_{n \to \infty} min_{{\tilde{g}}_{Y^{n}}, {\tilde{g}}_{X^{n} | Y^{n}}} \frac{1}{n} log E [{({\tilde{g}}_{Y^{n}} (Y^{n}) + {\tilde{g}}_{X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \\ = ρ min \{H (Q_{X}), max {H (Q_{Y}), H (Q_{X | Y})\}}, ρ > 0 . \end{matrix}

(74)

Note that Corollary 1 and Lemma 4 in conjunction with (74) conclude the proof of Theorem 3:

\begin{matrix} lim_{n \to \infty} min_{{\tilde{g}}_{Y^{n}}, {\tilde{g}}_{X^{n} | Y^{n}}} \frac{1}{n} log E [{({\tilde{g}}_{Y^{n}} (Y^{n}) + {\tilde{g}}_{X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \end{matrix}

\begin{matrix} = lim_{n \to \infty} min_{{\tilde{g}}_{T; Y^{n}}, {\tilde{g}}_{T; X^{n} | Y^{n}}} \frac{1}{n} log E [{({\tilde{g}}_{T; Y^{n}} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \\ = lim_{n \to \infty} max_{Q_{X Y}} (min_{{\tilde{g}}_{T; Y^{n}}, {\tilde{g}}_{T; X^{n} | Y^{n}}} \frac{1}{n} log E_{Q_{X Y}} [{({\tilde{g}}_{T; Y^{n}} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \end{matrix}

(75)

\begin{matrix} - D (Q_{X Y} ∥ P_{X Y})) \end{matrix}

(76)

\begin{matrix} = sup_{Q_{X Y}} (ρ min \{H (Q_{X}), max \{H (Q_{Y}), H (Q_{X ∣ Y})\}\} - D (Q_{X Y} | | P_{X Y})), \end{matrix}

(77)

where the supremum in the RHS of (77) is taken over all PMFs

Q_{X Y}

on

X \times Y

, and the step follows from (76) because the set of types is dense in the set of all PMFs (in the same sense that

Q

is dense in

R

).

It thus remains to prove (74). We begin with the direct part: Note that when

(X^{n}, Y^{n})

is drawn uniformly at random from

T^{(n)} (Q_{X Y})

, the

ρ

-th moment of the total number of guesses grows exponentially with rate

ρ H (Q_{X})

if we skip Stage 1 and guess

X^{n}

directly and with rate

ρ max \{H (Q_{Y}), H (Q_{X | Y})\}

if we guess

Y^{n}

in Stage 1 before moving on to guessing

X^{n}

. To prove the second claim, we argue by case distinction on

ρ

. Assuming first that

ρ \leq 1

,

\begin{matrix} lim_{n \to \infty} \frac{1}{n} log E [{({\tilde{g}}_{T; Y^{n}} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \end{matrix}

\begin{matrix} \leq lim_{n \to \infty} \frac{1}{n} log E [{\tilde{g}}_{T; Y^{n}} {(Y^{n})}^{ρ} + {\tilde{g}}_{T; X^{n} | Y^{n}} {(X^{n} | Y^{n})}^{ρ}] \end{matrix}

(78)

\begin{matrix} = lim_{n \to \infty} \frac{1}{n} log (E [{\tilde{g}}_{T; Y^{n}} {(Y^{n})}^{ρ}] + E [{\tilde{g}}_{T; X^{n} | Y^{n}} {(X^{n} | Y^{n})}^{ρ}]) \end{matrix}

(79)

\begin{matrix} \leq lim_{n \to \infty} \frac{1}{n} log (2^{n ρ H (Q_{Y})} + 2^{n ρ H (Q_{X | Y})}) \end{matrix}

(80)

\begin{matrix} = ρ max \{H (Q_{Y}), H (Q_{X | Y})\}, \end{matrix}

(81)

where (78) holds because

ρ \leq 1

(see Lemma 1); (80) holds since, by the latter assumption,

Y^{n}

is revealed at the end of Stage 1, and thus, the guesser (cognizant of

Q_{X Y}

) will only guess elements from the conditional type class

T^{(n)} (Q_{X | Y} | Y^{n})

; and (81) follows from the fact that the exponential growth rate of a sum of two exponents is dominated by the larger one. We wrap up the argument by showing that the LHS of (78) is also lower bounded by the RHS of (81):

\begin{matrix} lim_{n \to \infty} \frac{1}{n} log E [{({\tilde{g}}_{T; Y^{n}} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \end{matrix}

\begin{matrix} = lim_{n \to \infty} \frac{1}{n} log E [2^{ρ} {(\frac{1}{2} {\tilde{g}}_{T; Y^{n}} (Y^{n}) + \frac{1}{2} {\tilde{g}}_{T; X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \end{matrix}

(82)

\begin{matrix} \geq lim_{n \to \infty} \frac{1}{n} log E [2^{ρ - 1} ({\tilde{g}}_{T; Y^{n}} {(Y^{n})}^{ρ} + {\tilde{g}}_{T; X^{n} | Y^{n}} {(X^{n} | Y^{n})}^{ρ})] \end{matrix}

(83)

\begin{matrix} = lim_{n \to \infty} \frac{1}{n} log (E [{\tilde{g}}_{T; Y^{n}} {(Y^{n})}^{ρ}] + E [{\tilde{g}}_{T; X^{n} | Y^{n}} {(X^{n} | Y^{n})}^{ρ}]) \end{matrix}

(84)

\begin{matrix} \geq lim_{n \to \infty} \frac{1}{n} log (\frac{1}{1 + ρ} (2^{n ρ H (Q_{Y})} + 2^{n ρ H (Q_{X | Y})}) 2^{- n ρ δ_{n}}) \end{matrix}

(85)

\begin{matrix} = ρ max \{H (Q_{Y}), H (Q_{X | Y})\}, \end{matrix}

(86)

where (83) follows from Jensen’s inequality; and (85) follows from [29] (Proposition 6.6) and the lower bound on the size of a (conditional) type class [20] (Theorem 11.1.3). The case

ρ > 1

can be proven analogously and is hence omitted (with the term

2^{ρ - 1}

in the RHS of (83) replaced by one). Note that by applying the better of the two proposed guessing strategies (i.e., depending on

Q_{X Y}

, either guess

Y^{n}

in Stage 1 or skip it) the

ρ

-th moment of the total number of guesses grows exponentially with rate

ρ min \{H (Q_{X}), max \{H (Q_{Y}), H (Q_{X | Y})\}}

. This concludes the direct part of the proof of (74). We remind the reader that while we have constructed a guessing strategy under the assumption that the empirical joint type

Π_{X^{n} Y^{n}}

of

(X^{n}, Y^{n})

is known, Lemma 3 implies the existence of a guessing strategy of equal asymptotic performance that does not depend on

Π_{X^{n} Y^{n}}

. Moreover, Lemma 3 is constructive in the sense that the type-independent guessing strategy can be explicitly derived from the type-cognizant one (cf. the proof of Proposition 6.6 in [29]).

We next establish the converse of (74) by showing that when

(X^{n}, Y^{n})

is drawn uniformly at random from

T^{(n)} (Q_{X Y})

,

\begin{matrix} \underset{n \to \infty}{lim inf} \frac{1}{n} log E [{({\tilde{g}}_{T; Y^{n}} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \\ \geq ρ min \{H (Q_{X}), max \{H (Q_{Y}), H (Q_{X | Y})\}\} \end{matrix}

(87)

for all two-stage guessing strategies. To see why (87) holds, consider an arbitrary guessing strategy, and let the sequence

n_{1}, n_{2}, \dots

be such that

\begin{matrix} lim_{k \to \infty} \frac{1}{n_{k}} log E [{({\tilde{g}}_{T; Y^{n_{k}}} (Y^{n_{k}}) + {\tilde{g}}_{T; X^{n_{k}} | Y^{n_{k}}} (X^{n_{k}} | Y^{n_{k}}))}^{ρ}] \\ = \underset{n \to \infty}{lim inf} \frac{1}{n} log E [{({\tilde{g}}_{T; Y^{n}} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \end{matrix}

(88)

and such that the limit

α : = lim_{k \to \infty} \frac{| G_{n_{k}} |}{| T^{(n_{k})} (Q_{Y}) |}

(89)

exists. Using Lemma 5 below, we show that the LHS of (88) (and thus, also the LHS of (87)) is lower bounded by

ρ H (Q_{X})

if

α = 0

and by

ρ max \{H (Q_{Y}), H (Q_{X | Y})\}

if

α > 0

. This establishes the converse, because the lower of the two bounds must apply in any case.

Lemma 5.

If the pair

(X^{n}, Y^{n})

is drawn uniformly at random from a type class

T^{(n)} (Q_{X Y})

and

\underset{n \to \infty}{lim sup} \frac{| G_{n} |}{| T^{(n)} (Q_{Y}) |} = 0,

(90)

then

lim_{n \to \infty} \frac{1}{n} log E [{\tilde{g}}_{X^{n} | Y^{n}} {(X^{n} | Y^{n})}^{ρ}] = ρ H (Q_{X}),

(91)

where

Q_{X}

and

Q_{Y}

denote the X- and Y-marginal of

Q_{X Y}

.

Proof.

Note that since

lim_{n \to \infty} \frac{1}{n} log | T^{(n)} (Q_{X}) | = H (Q_{X}),

(92)

the RHS of (91) is trivially upper bounded by

ρ H (Q_{X})

. It thus suffices to show that (90) yields the lower bound

\underset{n \to \infty}{lim inf} \frac{1}{n} log E [{\tilde{g}}_{X^{n} | Y^{n}} {(X^{n} | Y^{n})}^{ρ}] \geq ρ H (Q_{X}) .

(93)

To show that (90) yields (93), we define the indicator variable

E_{n} : = \{\begin{matrix} 0, & if Y^{n} \in G_{n} \\ 1, & else, \end{matrix}

(94)

and observe that due to (90) and the fact that

Y^{n}

is drawn uniformly at random from

T^{(n)} (Q_{Y})

,

lim_{n \to \infty} P [E_{n} = 1] = 1 .

(95)

Consequently,

H (E_{n})

tends to zero as n tends to infinity, and because

H (X^{n}) - H (X^{n} ∣ E_{n}) = I (X^{n}; E_{n}) \leq H (E_{n}),

(96)

we get

lim_{n \to \infty} (H (X^{n}) - H (X^{n} ∣ E_{n})) = 0 .

(97)

This and (95) imply that

lim_{n \to \infty} \frac{1}{n} H (X^{n}) = lim_{n \to \infty} \frac{1}{n} H (X^{n} ∣ E_{n} = 1) .

(98)

To conclude the proof of Lemma 5, we proceed as follows:

\begin{matrix} \underset{n \to \infty}{lim inf} \frac{1}{n} log E [{\tilde{g}}_{X^{n} | Y^{n}} {(X^{n} | Y^{n})}^{ρ}] \end{matrix}

\begin{matrix} \geq \underset{n \to \infty}{lim inf} \frac{1}{n} log E [{\tilde{g}}_{X^{n} | Y^{n}} {(X^{n} | Y^{n})}^{ρ} | E_{n} = 1] \end{matrix}

(99)

\begin{matrix} \geq \underset{n \to \infty}{lim inf} \frac{1}{n} ρ H_{1 / (1 + ρ)} (X^{n} ∣ E_{n} = 1) \end{matrix}

(100)

\begin{matrix} \geq \underset{n \to \infty}{lim inf} \frac{1}{n} ρ H (X^{n} ∣ E_{n} = 1) \end{matrix}

(101)

\begin{matrix} = \underset{n \to \infty}{lim inf} \frac{1}{n} ρ H (X^{n}) \end{matrix}

(102)

\begin{matrix} = ρ H (Q_{X}), \end{matrix}

(103)

where (99) holds due to (95) and the law of total expectation; (100) follows from [3] (Theorem 1); (101) holds because the Rényi entropy is monotonically decreasing in its order and because

ρ > 0

; and (102) is due to (98). □

We now conclude the proof of the converse part of (74). Assume first that

α

(as defined in (89)) equals zero. By (88) and Lemma 5,

\begin{matrix} \underset{n \to \infty}{lim inf} \frac{1}{n} log E [{({\tilde{g}}_{T; Y^{n}} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \end{matrix}

\begin{matrix} = lim_{k \to \infty} \frac{1}{n_{k}} log E [{({\tilde{g}}_{T; Y^{n_{k}}} (Y^{n_{k}}) + {\tilde{g}}_{T; X^{n_{k}} | Y^{n_{k}}} (X^{n_{k}} | Y^{n_{k}}))}^{ρ}] \end{matrix}

(104)

\begin{matrix} \geq \underset{k \to \infty}{lim inf} \frac{1}{n_{k}} log E [{\tilde{g}}_{T; Y^{n_{k}}} {(X^{n_{k}} | Y^{n_{k}})}^{ρ}] \end{matrix}

(105)

\begin{matrix} = ρ H (Q_{X}), \end{matrix}

(106)

establishing the first contribution to the RHS of (87). Next, let

α > 0

. Applying [29] (Proposition 6.6) in conjunction with (89) and the fact that

Y^{n_{k}}

is drawn uniformly at random from

T^{(n_{k})} (Q_{Y})

,

E [{\tilde{g}}_{T; Y^{n_{k}}} {(Y^{n_{k}})}^{ρ}] \geq \frac{α}{2} \cdot \frac{2^{n_{k} ρ H (Q_{Y})} 2^{- n_{k} δ_{n_{k}}}}{1 + ρ}

(107)

for all sufficiently large k. Using (107) and proceeding analogously as in (82) to (85), we now establish the second contribution to the RHS of (87):

\begin{matrix} \underset{n \to \infty}{lim inf} \frac{1}{n} log E [{({\tilde{g}}_{T; Y^{n}} (Y^{n}) + {\tilde{g}}_{T; X^{n} | Y^{n}} (X^{n} | Y^{n}))}^{ρ}] \end{matrix}

\begin{matrix} = lim_{k \to \infty} \frac{1}{n_{k}} log E [{({\tilde{g}}_{T; Y^{n_{k}}} (Y^{n_{k}}) + {\tilde{g}}_{T; X^{n_{k}} | Y^{n_{k}}} (X^{n_{k}} | Y^{n_{k}}))}^{ρ}] \end{matrix}

(108)

\begin{matrix} \geq \underset{k \to \infty}{lim inf} \frac{1}{n_{k}} log (E [{\tilde{g}}_{T; Y^{n_{k}}} {(Y^{n_{k}})}^{ρ}] + E [{\tilde{g}}_{T; X^{n_{k}} | Y^{n_{k}}} {(X^{n_{k}} | Y^{n_{k}})}^{ρ}]) \end{matrix}

(109)

\begin{matrix} \geq \underset{k \to \infty}{lim inf} \frac{1}{n_{k}} log (\frac{α}{2} \cdot \frac{2^{n_{k} ρ H (Q_{Y})} 2^{- n_{k} δ_{n_{k}}}}{1 + ρ} + E [{\tilde{g}}_{T; X^{n_{k}} | Y^{n_{k}}} {(X^{n_{k}} | Y^{n_{k}})}^{ρ}]) \end{matrix}

(110)

\begin{matrix} \geq \underset{k \to \infty}{lim inf} \frac{1}{n_{k}} log (\frac{α}{2} \cdot \frac{2^{n_{k} ρ H (Q_{Y})} 2^{- n_{k} δ_{n_{k}}}}{1 + ρ} + \frac{2^{n_{k} ρ H (Q_{X | Y})} 2^{- n_{k} δ_{n_{k}}}}{1 + ρ}) \end{matrix}

(111)

\begin{matrix} = ρ max \{H (Q_{Y}), H (Q_{X | Y})\}, \end{matrix}

(112)

where (110) is due to (107); and in (111), we granted the guesser access to

Y^{n}

at the beginning of Stage 2.

5. Summary and Outlook

We proposed a new variation on the Massey–Arikan guessing problem where, instead of guessing

X^{n}

directly, the guesser is allowed to first produce guesses of a correlated ancillary sequence

Y^{n}

. We characterized the least achievable exponential growth rate (in n) of the

ρ

-th moment of the total number of guesses in the two stages when

X^{n}

is i.i.d. according to

P_{X}

,

Y_{i} = f (X_{i})

for all

i \in [1 : n]

, and the guesser must recover

Y^{n}

in Stage 1 before proceeding to Stage 2 (Section 3, Theorems 1 and 2); and when the pair

(X^{n}, Y^{n})

is jointly i.i.d. according to

P_{X Y}

and Stage 1 need not reveal

Y^{n}

(Section 4, Theorem 3). Future directions of this work include:

1.: The generalization of our results to a larger class of sources (e.g., Markov sources);
2.: A study of the information-like properties of the guessing exponents (1) and (2);
3.: Finding the optimal block-wise description $Y^{n} = f (X^{n})$ and its associated two-stage guessing exponent;
4.: The generalization of the cryptographic problems [8,9] to a setting where the adversary may also produce guesses of leaked side information.

Author Contributions

Both authors contributed equally to this work (in its various stages which include the conceptualization, analysis, writing and proofreading). Both authors read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing not applicable.

Acknowledgments

The authors are indebted to Amos Lapidoth for his contribution to Section 4 (see [4]). The constructive comments in the review process, which helped to improve the presentation, are gratefully acknowledged.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Lemma 1

If

ρ \geq 1

, then

\begin{matrix} {(\sum_{i = 1}^{k} a_{i})}^{ρ} & = k^{ρ} {(\frac{1}{k} \sum_{i = 1}^{k} a_{i})}^{ρ} \\ \leq k^{ρ} \cdot \frac{1}{k} \sum_{i = 1}^{k} a_{i}^{ρ} \\ = k^{ρ - 1} \sum_{i = 1}^{k} a_{i}^{ρ}, \end{matrix}

(A1)

where (A1) holds by Jensen’s inequality and since the mapping

x \mapsto x^{ρ}

for

x \geq 0

is convex. If at least one of the non-negative

a_{i}

’s is positive (if all

a_{i}

’s are zero, it is trivial), then

\begin{matrix} {(\sum_{i = 1}^{k} a_{i})}^{ρ} & = {(\sum_{j = 1}^{k} a_{j})}^{ρ} \sum_{i = 1}^{k} (\frac{a_{i}}{\sum_{j = 1}^{k} a_{j}}) \end{matrix}

\begin{matrix} \geq {(\sum_{j = 1}^{k} a_{j})}^{ρ} \sum_{i = 1}^{k} {(\frac{a_{i}}{\sum_{j = 1}^{k} a_{j}})}^{ρ} \end{matrix}

(A2)

\begin{matrix} = \sum_{i = 1}^{k} a_{i}^{ρ}, \end{matrix}

(A3)

where (A2) holds since

0 \leq \frac{a_{i}}{\sum_{j = 1}^{k} a_{j}} \leq 1

for all

i \in [1 : n]

, and

ρ \geq 1

. If

ρ \in (0, 1)

, then inequalities (A1) and (A2) are reversed. The conditions for equalities in (15) are easily verified.

Appendix B. Proof of Lemma 2

From (33),

R_{1} (P_{X})

is a unit probability mass at one and

R_{| X |} (P_{X}) \equiv P_{X}

, so (42) gives that

\begin{matrix} a_{1} (α) = 0, a_{| X |} (α) = H_{α} (X) . \end{matrix}

(B1)

In view of Lemma 5 in [14], it follows that for all

m \in [1 : | X | - 1]

,

\begin{matrix} a_{m + 1} (α) & = max_{Q \in P_{m + 1} : P_{X} ≺ Q} H_{α} (Q) \end{matrix}

(B2)

\begin{matrix} \geq max_{Q \in P_{m} : P_{X} ≺ Q} H_{α} (Q) \end{matrix}

(B3)

\begin{matrix} = a_{m} (α), \end{matrix}

(B4)

where (B2) and (B4) are due to [14] (38) and (42); (B3) holds since

P_{m} \subset P_{m + 1}

.

We next prove Item b. Consider the sequence of functions

{f_{m}^{*}}_{m = 1}^{| X |}

, defined over the set

X

. By construction (see Algorithm 2),

f_{| X |}^{*}

is the identity function since all the respective

| X |

nodes in the Huffman algorithm stay un-changed in this case. We also have

f_{1}^{*} (x) = 1

for all

x \in X

(in the latter case, by Algorithm 2, all nodes are merged by the Huffman algorithm into a single node). Hence, from (43),

\begin{matrix} b_{1} (α) = H_{α} (X), b_{| X |} (α) = 0 . \end{matrix}

(B5)

Consider the construction of the function

f_{m}^{*}

by Algorithm 2. Since the transition from

m + 1

to m nodes is obtained by merging two nodes without affecting the other

m - 1

nodes, it follows by the data processing theorem for the Arimoto–Rényi conditional entropy (see [24] (Theorem 2 and Corollary 1)) that, for all

m \in [1 : | X | - 1]

,

\begin{matrix} b_{m + 1} (α) = H_{α} (X | f_{m + 1}^{*} (X)) \leq H_{α} (X | f_{m}^{*} (X)) = b_{m} (α) . \end{matrix}

(B6)

We finally prove Item c. Suppose that

P_{X}

is supported on the set

X

. Under this assumption, it follows from the strict Schur concavity of the Rényi entropy that the inequality in (B3) is strict, and therefore, (B2)–(B4) imply that

a_{m} (α) < a_{m + 1} (α)

for all

m \in [1 : | X | - 1]

. In particular, Item a implies that

0 < a_{m} (α) < H_{α} (X)

holds for every

m \in [2 : | X | - 1]

. Furthermore, the conditioning on

f_{m + 1}^{*} (X)

enables distinguishing between the two labels of

X

, which correspond to the pair of nodes that are being merged (by the Huffman algorithm) in the transition from

f_{m + 1}^{*} (X)

to

f_{m}^{*} (X)

. Hence, the inequality in (B6) turns out to be strict under the assumption that

P_{X}

is supported on the set

X

. In particular, under that assumption, it follows from Item b that

0 < b_{m} (α) < H_{α} (X)

holds for every

m \in [2 : | X | - 1]

.

Appendix C. Proof of Lemma 3

We prove Lemma 3 as a consequence of Corollary C1 below and the fact that the number of denominator-n types on a finite set grows polynomially in n ([20], Theorem 11.1.1).

Corollary C1.

(Moser [29], (6.47) and Corollary 6.10) Let the random triple

(U, V, W)

take values in the finite set

U \times V \times W

, and let

({\tilde{g}}_{U}^{*} (\cdot), {\tilde{g}}_{U | V}^{*} (\cdot | \cdot))

and

({\tilde{g}}_{U | W}^{*} (\cdot | \cdot), {\tilde{g}}_{U | V, W}^{*} (\cdot | \cdot, \cdot))

be ranking functions that, for a given

ρ > 0

, minimize

E [{({\tilde{g}}_{U} (U) + {\tilde{g}}_{U | V} (U | V))}^{ρ}]

(C1)

over all two-stage guessing strategies (with no access to W) and

E [{({\tilde{g}}_{U | W} (U | W) + {\tilde{g}}_{U | V, W} (U | V, W))}^{ρ}]

(C2)

over all two-stage guessing strategies cognizant of W. Then,

E [{({\tilde{g}}_{U}^{*} (U) + {\tilde{g}}_{U | V}^{*} (U | V))}^{ρ}] \leq E [{({\tilde{g}}_{U | W}^{*} (U | W) + {\tilde{g}}_{U | V, W}^{*} (U | V, W))}^{ρ}] {| W |}^{ρ} .

(C3)

Lemma 3 follows from Corollary C1 with

U \leftarrow X^{n}, U \leftarrow X^{n}

;

V \leftarrow Y^{n}, V \leftarrow Y^{n}

;

W \leftarrow Π_{X^{n} Y^{n}}

;

W \leftarrow T_{n} (X \times Y)

, and by noticing that for all

n \in N

,

| T_{n} {(X \times Y) |}^{ρ} \leq {(n + 1)}^{ρ (| X \times Y | - 1)} \leq k n^{a},

(C4)

where

a : = ρ (| X \times Y | - 1)

and

k : = 2^{a}

are positive constants independent of n.

References

Massey, J.L. Guessing and entropy. In Proceedings of the 1994 IEEE International Symposium on Information Theory, Trondheim, Norway, 27 June–1 July 1994; p. 204. [Google Scholar]
McEliece, R.J.; Yu, Z. An inequality on entropy. In Proceedings of the 1995 IEEE International Symposium on Information Theory, Whistler, BC, Canada, 17–22 September 1995; p. 329. [Google Scholar]
Arikan, E. An inequality on guessing and its application to sequential decoding. IEEE Trans. Inf. Theory 1996, 42, 99–105. [Google Scholar] [CrossRef] [Green Version]
Graczyk, R.; Lapidoth, A. Two-stage guessing. In Proceedings of the 2019 IEEE International Symposium on Information Theory, Paris, France, 7–12 July 2019; pp. 475–479. [Google Scholar]
Arikan, E.; Merhav, N. Joint source-channel coding and guessing with application to sequential decoding. IEEE Trans. Information Theory 1998, 44, 1756–1769. [Google Scholar] [CrossRef] [Green Version]
Boztaş, S. Comments on “An inequality on guessing and its application to sequential decoding”. IEEE Trans. Inf. Theory 1997, 43, 2062–2063. [Google Scholar] [CrossRef]
Cachin, C. Entropy Measures and Unconditional Security in Cryptography. Ph.D. Thesis, ETH Zurich, Zurich, Switzerland, 1997. [Google Scholar]
Merhav, N.; Arikan, E. The Shannon cipher system with a guessing wiretapper. IEEE Trans. Inf. Theory 1999, 45, 1860–1866. [Google Scholar] [CrossRef] [Green Version]
Bracher, A.; Hof, E.; Lapidoth, A. Guessing attacks on distributed-storage systems. IEEE Trans. Inf. Theory 2019, 65, 6975–6998. [Google Scholar] [CrossRef] [Green Version]
Arikan, E.; Merhav, N. Guessing subject to distortion. IEEE Trans. Inf. Theory 1998, 44, 1041–1056. [Google Scholar] [CrossRef]
Bracher, A.; Lapidoth, A.; Pfister, C. Distributed task encoding. In Proceedings of the 2017 IEEE International Symposium on Information Theory, Aachen, Germany, 25–30 June 2017; pp. 1993–1997. [Google Scholar]
Bracher, A.; Lapidoth, A.; Pfister, C. Guessing with distributed encoders. Entropy 2019, 21, 298. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Christiansen, M.M.; Duffy, K.R. Guesswork, large deviations, and Shannon entropy. IEEE Trans. Inf. Theory 2013, 59, 796–802. [Google Scholar] [CrossRef] [Green Version]
Sason, I. Tight bounds on the Rényi entropy via majorization with applications to guessing and compression. Entropy 2018, 20, 896. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Sason, I.; Verdú, S. Improved bounds on lossless source coding and guessing moments via Rényi measures. IEEE Trans. Inf. Theory 2018, 64, 4323–4346. [Google Scholar] [CrossRef] [Green Version]
Sundaresan, R. Guessing under source uncertainty. IEEE Trans. Inf. Theory 2007, 53, 269–287. [Google Scholar] [CrossRef] [Green Version]
Merhav, N.; Cohen, A. Universal randomized guessing with application to asynchronous decentralized brute-force attacks. IEEE Trans. Inf. Theory 2020, 66, 114–129. [Google Scholar] [CrossRef]
Salamatian, S.; Beirami, A.; Cohen, A.; Médard, M. Centralized versus decentralized multi-agent guesswork. In Proceedings of the 2017 IEEE International Symposium on Information Theory, Aachen, Germany, 25–30 June 2017; pp. 2263–2267. [Google Scholar]
Graczyk, R.; Lapidoth, A. Gray-Wyner and Slepian-Wolf guessing. In Proceedings of the 2020 IEEE International Symposium on Information Theory, Los Angeles, CA, USA, 21–26 June 2020; pp. 2207–2211. [Google Scholar]
Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2006. [Google Scholar]
Rényi, A. On measures of entropy and information. In Proceedings of the 4th Berkeley Symposium on Probability Theory and Mathematical Statistics, Berkeley, CA, USA, 8–9 August 1961; pp. 547–561. [Google Scholar]
Marshall, A.W.; Olkin, I.; Arnold, B.C. Inequalities: Theory of Majorization and Its Applications, 2nd ed.; Springer: New York, NY, USA, 2011. [Google Scholar]
Arimoto, S. Information measures and capacity of order α for discrete memoryless channels. In Proceedings of the 2nd Colloquium on Information Theory, Keszthely, Hungary, 25–30 August 1975; Csiszár, I., Elias, P., Eds.; Colloquia Mathematica Societatis Janós Bolyai: Amsterdam, The Netherlands, 1977; Volume 16, pp. 41–52. [Google Scholar]
Fehr, S.; Berens, S. On the conditional Rényi entropy. IEEE Trans. Inf. Theory 2014, 60, 6801–6810. [Google Scholar] [CrossRef]
Sason, I.; Verdú, S. Arimoto–Rényi conditional entropy and Bayesian M-ary hypothesis testing. IEEE Trans. Inf. Theory 2018, 64, 4–25. [Google Scholar] [CrossRef]
Cicalese, F.; Gargano, L.; Vaccaro, U. Bounds on the entropy of a function of a random variable and their applications. IEEE Trans. Inf. Theory 2018, 64, 2220–2230. [Google Scholar] [CrossRef]
Garey, M.R.; Johnson, D.S. Computers and Intractability: A Guide to the Theory of NP-Completness; W. H. Freedman and Company: New York, NY, USA, 1979. [Google Scholar]
Cicalese, F.; Gargano, L.; Vaccaro, U. An information theoretic approach to probability mass function truncation. In Proceedings of the 2019 IEEE International Symposium on Information Theory, Paris, France, 7–12 July 2019; pp. 702–706. [Google Scholar]
Moser, S.M. Advanced Topics in Information Theory (Lecture Notes), 4th ed.; Signal and Information Processing Laboratory: ETH Zürich, Switzerland; Institute of Communications Engeering, National Chiao Tung University: Hsinchu, Taiwan, 2020. [Google Scholar]

Figure 1. A plot of

v : (0, \infty) \to (0, log 2)

, which is monotonically increasing, continuous, and satisfying (37).

Figure 1. A plot of

v : (0, \infty) \to (0, log 2)

, which is monotonically increasing, continuous, and satisfying (37).

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Graczyk, R.; Sason, I. On Two-Stage Guessing. Information 2021, 12, 159. https://0-doi-org.brum.beds.ac.uk/10.3390/info12040159

AMA Style

Graczyk R, Sason I. On Two-Stage Guessing. Information. 2021; 12(4):159. https://0-doi-org.brum.beds.ac.uk/10.3390/info12040159

Chicago/Turabian Style

Graczyk, Robert, and Igal Sason. 2021. "On Two-Stage Guessing" Information 12, no. 4: 159. https://0-doi-org.brum.beds.ac.uk/10.3390/info12040159

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

On Two-Stage Guessing

Abstract

1. Introduction

2. Preliminaries

3. Two-Stage Guessing: $Y_{i} = f (X_{i})$

A Suboptimal and Simple Construction of f in Algorithm 1 and Bounds on $E [G_{2}^{ρ} (X^{n})]$

4. Two-Stage Guessing: Arbitrary $(X, Y)$

5. Summary and Outlook

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Proof of Lemma 1

Appendix B. Proof of Lemma 2

Appendix C. Proof of Lemma 3

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

On Two-Stage Guessing

Abstract

1. Introduction

2. Preliminaries

3. Two-Stage Guessing: Y i = f ( X i )

A Suboptimal and Simple Construction of f in Algorithm 1 and Bounds on E G 2 ρ ( X n )

4. Two-Stage Guessing: Arbitrary ( X , Y )

5. Summary and Outlook

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Proof of Lemma 1

Appendix B. Proof of Lemma 2

Appendix C. Proof of Lemma 3

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

3. Two-Stage Guessing: $Y_{i} = f (X_{i})$

A Suboptimal and Simple Construction of f in Algorithm 1 and Bounds on $E [G_{2}^{ρ} (X^{n})]$

4. Two-Stage Guessing: Arbitrary $(X, Y)$