Optimizing Age Penalty in Time-Varying Networks with Markovian and Error-Prone Channel State

Chen, Yuchao; Tang, Haoyue; Wang, Jintao; Song, Jian

doi:10.3390/e23010091

Open AccessArticle

Optimizing Age Penalty in Time-Varying Networks with Markovian and Error-Prone Channel State

¹

Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

²

Beijing National Research Center for Information Science and Technology (BNRist), Beijing 100084, China

³

Research Institute of Tsinghua University in Shenzhen, Shenzhen 518057, China

^*

Author to whom correspondence should be addressed.

Entropy 2021, 23(1), 91; https://0-doi-org.brum.beds.ac.uk/10.3390/e23010091

Submission received: 20 November 2020 / Revised: 31 December 2020 / Accepted: 8 January 2021 / Published: 10 January 2021

(This article belongs to the Section Information Theory, Probability and Statistics)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, we consider a scenario where the base station (BS) collects time-sensitive data from multiple sensors through time-varying and error-prone channels. We characterize the data freshness at the terminal end through a class of monotone increasing functions related to Age of information (AoI). Our goal is to design an optimal policy to minimize the average age penalty of all sensors in infinite horizon under bandwidth and power constraint. By formulating the scheduling problem into a constrained Markov decision process (CMDP), we reveal the threshold structure for the optimal policy and approximate the optimal decision by solving a truncated linear programming (LP). Finally, a bandwidth-truncated policy is proposed to satisfy both power and bandwidth constraint. Through theoretical analysis and numerical simulations, we prove the proposed policy is asymptotic optimal in the large sensor regime.

Keywords:

age of information; cross-layer design; constrained Markov decision process

1. Introduction

The requirements for data freshness in numerous emerging applications are becoming stricter [1,2]. However, the limited resources and bandwidth, together with the fading and error-prone channel characteristics, prevent the control terminal from obtaining the newest information. Moreover, the traditional optimization goals like low delay and high throughput cannot fully characterize the requirement of data freshness. Therefore, it is necessary to introduce new metrics to capture data freshness in such systems and design strategies to optimize the system performance in the presence of resource and environment restrictions.

Recently, a popular metric, Age of information (AoI), has been proposed in [3] to measure the data freshness. Since then, the optimization of age performance under different systems has been a research hotspot. The simple point-to-point system model has been studied in [3,4,5,6,7,8,9,10,11]. When update packets are generated by external sources and are queued in a buffer before transmission, queuing theory can be used to analyze the performance of such system, see, e.g., in, [3,4,5,6,7,8]. In [3], it is shown that the optimum packet generation rate of a first-come-first-served (FCFS) system should achieve a trade-off between throughput and delay. In [8], dynamic pricing is used as an incentive to balance the AoI evolution and the monetary payments to the users. Other studies [9,10,11] consider the generate-at-will system without a queue. Energy constraints are studied in [10,11] to find the trade-off between the age performance and energy consumption. In [11], both offline and online heuristic policies are proposed to optimize the average AoI, which outperform the greedy approach.

Apart from the point-to-point systems, scheduling strategies in the multi-user networks are studied in [12,13,14,15,16,17,18,19]. Different scheduling policies are studied in [12] to minimize the average AoI performance through unreliable channels, and Maatouk et al. verify the asymptotic optimality of Whittle’s index policy by setting an upper bound on the maximum AoI [13]. When each user has a minimum throughput requirement, the authors of [14] propose the Drift-Plus-Penalty policy using Lyapunov analysis to minimize the average AoI performance. In [17], a slotted ALOHA algorithm is proposed to minimize the average AoI, whose performance is verified in the large system regime.

Notice that in many general scenarios, like remote estimation [20,21], the proper evaluation of data freshness is a function of AoI instead of AoI itself. Therefore, the metric of Cost of Update Delay (CoUD) in [7] and age penalty function in [9] have been proposed to measure data freshness in a general setting. However, those works focus on minimizing data freshness in a single-user network, and the multi-user model is rarely considered. To fill this gap, we study a scenario where the base station (BS) collects time-sensitive data from multiple sensors through time-varying channels. We generalize our previous work [22] by considering a more realistic time-varying channel with packet loss and a more general age penalty measurement to model different application scenarios. The main contributions of the paper are listed as follows.

We study the scheduling strategy for age penalty minimization in multi-sensor bandwidth constrained networks through time-varying and error-prone channel links with power limited sensors. To study a practical network, we model the channel to be a finite-state ergodic Markov chain. The packet loss probability and power consumption depend on the current channel state. Unlike previous work, we model the effect of data staleness in different scenarios via a class of monotone increasing function related to AoI.
Through relaxing the hard bandwidth constraint and Lagrange multipliers, we decouple the multi-sensor optimization problem into several single-sensor constrained Markov decision process (CMDP) problems. To deal with the potential infinite age penalty, we deduce the threshold structure of the optimal policy and then obtain the approximate optimal single-sensor scheduling decision by solving a truncated linear programming (LP). We prove the solution to the LP is asymptotic optimal when the truncated threshold is sufficiently large.
The sub-gradient ascend method is applied to find the optimal Lagrange multiplier to satisfy the relaxed bandwidth constraint. Finally, we propose the truncated stationary policy to meet the hard bandwidth constraint. The average performance of the strategy is verified through theoretical analysis and numerical simulations.

The remainder of this paper is organized as follows. The network model, the AoI metric, and the age penalty function are introduced in Section 2. In Section 3, we formulate the primal scheduling problem, and then decouple it into independent single-sensor problems through bandwidth relaxation and Lagrange multipliers. The approximate optimal policy for single-sensor problem is obtained in Section 4 by solving an LP. In Section 5, the asymptotic optimal truncated policy is proposed. Finally, Section 6 provides simulation results to verify the performance of the proposed truncated policy, and Section 7 draws the conclusion.

Notations: All the vectors and matrices are denoted in boldface. The probability of

A

given

B

is denoted as

\Pr (A | B)

. Let

E_{π} [X]

be the expectation of variable X given

π

. The cardinality of a set

Ω

is written as

| Ω |

.

2. System Model

2.1. Network Model

In this work, we consider a BS collecting update packets from N time-sensitive sensors through time-varying channels. The time is slotted, and

t \in {1, 2, . . ., T}

is used to denote the current slot index. Define

u_{n} (t)

to be the scheduling decision for sensor n in slot t, where

u_{n} (t) = 1

means the sensor n chooses to send the newest packet, while

u_{n} (t) = 0

means idling the channel link. Assume all the scheduling behaviors take place at the beginning of each slot and the packet transmission delay through all the channel links is one slot. Due to the limited bandwidth capacity of the BS, the total number of sensors to be scheduled in one slot cannot be larger than M. Here, we assume

M < N

so that the problem is nontrivial:

\sum_{n = 1}^{N} u_{n} (t) \leq M, \forall t .

(1)

To model the time varying effect, we assume that the channel link connecting the BS and each sensor n is an ergodic Q-state Markov chain. Denote

q_{n} (t) \in {1, 2, \dots, Q}

to be the channel state of link n in slot t. Without loss of generality, we assume that the channel state becomes more noisy as the state index becomes larger. Denote

P_{n} = {p_{i j}^{n}}

to be the Markov state transition matrix of link n, and the entry

p_{i j}^{n}

means the probability of changing into state j in the next slot given the current state i, i.e.,

p_{i j}^{n} ≜ \Pr {q_{n} (t + 1) = j | q_{n} (t) = i} .

(2)

Due to different channel states, the sensors should use different energy for both saving energy and successful decoding of the packet at the receiver. We denote

w (q)

to be the energy consumption for scheduling when the channel state is q. Notice that the energy consumption tends to be larger as the channel state becomes more noisy to combat the channel fading, i.e.,

w (1) < w (2) < . . . < w (Q)

. Besides, due to the power limit of each sensor n, the total average power consumption cannot exceed the upper bound, denoted by

E_{n}

, i.e.,

lim_{T \to \infty} \frac{1}{T} E_{π} [\sum_{t = 1}^{T} u_{n} (t) w (q_{n} (t))] \leq E_{n}, \forall n,

(3)

where

π

is a scheduling policy.

Given channel state q, we assume that there exists the probability of packet loss

ε_{n, q}

through link n due to decoding error or inaccurate estimation.

2.2. Age of Information and Age Penalty

In the network described above, the BS wishes to obtain the freshest information for further process or accurate prediction. We model the data staleness at the terminal end as a monotone increasing age penalty function

f (\cdot)

of Age of information (AoI). By definition, AoI is the difference between the current time slot and the time slot that the freshest data received by the BS is generated by the sensor [3].

Let

x_{n} (t)

be the AoI of sensor n in slot t. According to the definition, if the sensor is scheduled in slot t and there is no packet loss, then

x_{n} (t + 1) = 1

; otherwise,

x_{n} (t + 1) = x_{n} (t) + 1

. In conclusion, the AoI evolves as follows,

\begin{matrix} \Pr (x_{n} (t + 1) = x^{'} | x_{n} (t) = x, u_{n} (t) = u) = \{\begin{matrix} 1 - ε_{n, q_{n} (t)}, & x^{'} = 1, u = 1; \\ ε_{n, q_{n} (t)}, & x^{'} = x + 1, u = 1; \\ 1, & x^{'} = x + 1, u = 0; \\ 0, & otherwise . \end{matrix} \end{matrix}

(4)

3. Problem Formulation and Decomposition

3.1. Problem Formulation

For given network, we measure the data freshness at the terminal side by computing the average age penalty under scheduling policy

π

, denoted by

J (π)

, i.e.,

J (π) = lim_{T \to \infty} \frac{1}{N T} E_{π} [\sum_{t = 1}^{T} \sum_{n = 1}^{N} f (x_{n} (t)) | x (0)],

(5)

where

x (0) = [x_{1} (0), x_{2} (0), . . ., x_{N} (0)]

states the initial AoI of the system. In this work, we assume that the system is synchronized with all the sensors at the beginning, i.e.,

x_{n} (0) = 1, \forall n

, and thus omit

x (0)

in the further analysis.

We denote

Π_{CP}

to be the set of all possible causal policies whose decisions are only based on current and historic information while satisfying both bandwidth and power constraints. Then, our goal is to optimize Equation (5) by choosing a scheduling policy

π \in Π_{CP}

. Therefore, the primal optimization problem can be written as

Problem 1

(Primal Scheduling Problem).

min_{π \in Π_{CP}} lim_{T \to \infty} \frac{1}{N T} E_{π} [\sum_{n = 1}^{N} \sum_{t = 1}^{T} f (x_{n} (t))],

(6a)

\begin{matrix} s . t . & \sum_{n = 1}^{N} u_{n} (t) \leq M, \forall t, \end{matrix}

(6b)

\begin{matrix} lim_{T \to \infty} \frac{1}{T} E_{π} [\sum_{t = 1}^{T} u_{n} (t) w (q_{n} (t))] \leq E_{n}, \forall n . \end{matrix}

(6c)

3.2. Problem Decomposition

Notice that Equation (6b) is an integer programming where the exponential growth rate of state and action space set obstacles in solving Problem 1. Therefore, we formulate a relaxed version of Problem 1, where the primal bandwidth constraint in every slot is replaced by a time-average bandwidth constraint. We will then show that the relaxed problem can be solved by sensor level decoupling, which greatly reduces the cardinality of the state and action space.

Problem 2

(Relaxed Primal Scheduling Problem).

A g e_{R}^{*} = min_{π \in Π_{CP}} lim_{T \to \infty} \frac{1}{N T} E_{π} [\sum_{n = 1}^{N} \sum_{t = 1}^{T} f (x_{n} (t))],

(7a)

\begin{matrix} s . t . & lim_{T \to \infty} \frac{1}{T} E_{π} [\sum_{t = 1}^{T} \sum_{n = 1}^{N} u_{n} (t)] \leq M, \end{matrix}

(7b)

\begin{matrix} lim_{T \to \infty} \frac{1}{T} E_{π} [\sum_{t = 1}^{T} u_{n} (t) w (q_{n} (t))] \leq E_{n}, \forall n . \end{matrix}

(7c)

Denote

π_{R}^{*}

to be the optimal policy of Problem 2. The following theorem ensures that the optimal policy of the relaxed problem is composed of several local optimal policies

π_{n}^{*}

, each of which depends on its own channel state and AoI evolution regardless of others.

Theorem 1.

The optimal policy of Problem 2 can be decoupled into local optimal policies, i.e.,

π_{R}^{*} = π_{1}^{*} \otimes π_{2}^{*} \otimes \dots \otimes π_{N}^{*}

. Each of the local policy

π_{n}^{*}

has the following properties.

\begin{matrix} A g e_{R}^{*} = lim_{T \to \infty} \frac{1}{N T} \sum_{t = 1}^{T} \sum_{n = 1}^{N} E_{π_{n}^{*}} [f (x_{n} (t))], \\ lim_{T \to \infty} \frac{1}{T} \sum_{t = 1}^{T} \sum_{n = 1}^{N} E_{π_{n}^{*}} [u_{n} (t)] \leq M, \\ lim_{T \to \infty} \frac{1}{T} E_{π_{n}^{*}} [\sum_{t = 1}^{T} u_{n} (t) w (q_{n} (t))] \leq E_{n}, \forall n . \end{matrix}

The proof of Theorem 1 is provided in Appendix A.

In order to find the local optimal polices, we introduce a Lagrange multiplier

λ \geq 0

to eliminate the relaxed bandwidth constraint, and the Lagrange function is as follows,

\begin{matrix} L (π, λ) = lim_{T \to \infty} \frac{1}{N T} \sum_{n = 1}^{N} \sum_{t = 1}^{T} E_{π_{n}} [f (x_{n} (t)) + λ u_{n} (t) - λ \frac{M}{N}], \end{matrix}

where the Lagrange multiplier

λ

can be seen as a scheduling penalty which will increase the function value if there are more than M sensors to be scheduled per slot in average.

For fixed

λ

, we can now further decouple the relaxed scheduling problem into N single-sensor cross-layer designs, each of which has the corresponding power constraint in Equation (7c):

Problem 3

(Single-Sensor Decoupled Problem).

min_{π_{n} \in Π_{CP}} lim_{T \to \infty} \frac{1}{T} E_{π_{n}} [\sum_{t = 1}^{T} (f (x_{n} (t)) + λ u_{n} (t))],

(8a)

\begin{matrix} s . t . & lim_{T \to \infty} \frac{1}{T} E_{π_{n}} [\sum_{t = 1}^{T} u_{n} (t) w (q_{n} (t))] \leq E_{n} . \end{matrix}

(8b)

As the resolution of each decoupled problem is independent of n, we omit the subscript n in further analysis.

4. Single-Sensor Problem Resolution

4.1. Constrained Markov Decision Process Formulation

First, we notice that the decoupled problem is a constrained Markov decision process of which the elements

(S, A, \Pr (\cdot | \cdot), c (\cdot))

and constraint are explained as follows.

State Space: The state of each sensor consists of two parts: the current AoI $x (t)$ and channel state $q (t)$ . Thus, $S = {x \times q}$ is infinite but countable.
Action Space: There are two possible actions in the action space $A$ for the scheduling policy, denoted by $u (t)$ . Action $u (t) = 1$ means the sensor chooses to schedule while $u (t) = 0$ means idling. Notice that here $u (t)$ does not need to satisfy the bandwidth constraint.
Probability Transition Function: According to Equations (2) and (4), the probability transition function can be written out as follows.

$\begin{matrix} \Pr ((x + 1, q^{'}) | (x, q), u = 0) = p_{q q^{'}}, \\ \Pr ((x + 1, q^{'}) | (x, q), u = 1) = p_{q q^{'}} ε_{n, q}, \\ \Pr ((1, q^{'}) | (x, q), u = 1) = p_{q q^{'}} (1 - ε_{n, q}) . \end{matrix}$
One-Step Cost: The one-step cost consists of two parts: the age penalty growth and scheduling penalty, which can be computed by

$c_{x} (x (t), q (t), u (t)) = f (x (t)) + λ u (t) .$

(9)

And the one-step power consumption is

$c_{E} (x (t), q (t), u (t)) = u (t) w (q (t)) .$

(10)

Now our goal is to optimize the following average one-step cost,

min_{π \in Π_{CP}} lim_{T \to \infty} \frac{1}{T} E_{π} [\sum_{t = 1}^{T} c_{x} (x (t), q (t), u (t))],

(11)

under the following average power constraint,

lim_{T \to \infty} \frac{1}{T} E_{π} [\sum_{t = 1}^{T} c_{E} (x (t), q (t), u (t))] \leq E .

(12)

4.2. Characterization of the Optimal Policy

To search for the stationary optimal policy, we can further introduce another Lagrange multiplier

ν \geq 0

to eliminate the power constraint, i.e.,

\frac{1}{T} E_{π} [\sum_{t = 1}^{T} c_{x} (x (t), q (t), u (t)) + ν c_{E} (x (t), q (t), u (t))] - ν E .

The multiplier

ν

can be viewed as a power penalty, which will increase the Lagrange function once the average power consumption exceeds the constraint. Then minimizing the above Lagrange function for fixed

ν

becomes an MDP problem without any constraint.

The following lemma ensures that the optimal stationary policy for the MDP problem has a threshold structure.

Lemma 1.

The optimal stationary policy of the unconstrained MDP problem has the threshold structure, i.e., given state

(x, q)

, there exists a threshold

τ_{q}

such that if

x \geq τ_{q}

, then it is optimal to schedule the sensor; otherwise, idling is the optimal action.

Proof sketch: The complete proof is provided in Appendix B and Appendix C, which is similar to Theorem 2 in [23]. Despite the complex proof, the intuition is simple. As it is optimal to schedule the sensor in state

(x, q)

, then it is also the optimal action in state

(x^{'}, q), \forall x^{'} > x

because the AoI is much bigger.

4.3. Linear Programming Approximation

Now, we focus on finding the optimal stationary policy. Denote

ξ_{x, q}

to be the scheduling probability given state

(x, q)

and our goal is to find

{ξ_{x, q}^{*}}

that minimizes the objective function. In this part, we will approximate

{ξ_{x, q}^{*}}

by solving a truncated LP.

According to Lemma 1, for the optimal stationary policy, we can set a threshold

X > {max}_{q} τ_{q}

and then we have

ξ_{x, q}^{*} = 1, \forall x \geq X

. Next, we focus on searching for policies that possesses this threshold property as other policies are far from optimality.

Let

μ_{x, q}

be the steady distribution of state

(x, q)

. Then, define a new variable

y_{x, q} ≜ μ_{x, q} ξ_{x, q}

. The following theorem converts the CMDP problem into an infinite LP problem.

Theorem 2.

The single-sensor decoupled problem is equal to the following infinite LP problem.

{μ_{x, q}^{*}, y_{x, q}^{*}} = arg min_{μ_{x, q}, y_{x, q}} \sum_{q = 1}^{Q} \sum_{x = 1}^{\infty} (f (x) μ_{x, q} + λ y_{x, q}),

(13a)

\begin{matrix} s . t . & \sum_{q = 1}^{Q} \sum_{x = 1}^{\infty} μ_{x, q} = 1, \end{matrix}

(13b)

\begin{matrix} μ_{1, q^{'}} = \sum_{q = 1}^{Q} \sum_{x = 1}^{\infty} (1 - ε_{q}) p_{q q^{'}} y_{x, q}, \end{matrix}

(13c)

\begin{matrix} μ_{x, q^{'}} = \sum_{q = 1}^{Q} p_{q q^{'}} [μ_{x - 1, q} - (1 - ε_{q}) y_{x - 1, q}], \forall x \geq 2, \end{matrix}

(13d)

\begin{matrix} \sum_{q = 1}^{Q} \sum_{x = 1}^{\infty} y_{x, q} w (q) \leq E, \end{matrix}

(13e)

\begin{matrix} 0 \leq μ_{x, q} \leq 1, 0 \leq y_{x, q} \leq μ_{x, q} . \end{matrix}

(13f)

Proof.

Let us consider the average cost of Equation (8a) by using

μ_{x, q}

and

y_{x, q}

. Invoking Equation (9), the one step cost of state

(x, q)

is either

f (x) + λ

when scheduling or

f (x)

when idling. Therefore, the time average cost can be computed as follows,

\begin{matrix} \sum_{q = 1}^{Q} \sum_{x = 1}^{\infty} μ_{x, q} [ξ_{x, q} (f (x) + λ) + (1 - ξ_{x, q}) f (x)] = \sum_{q = 1}^{Q} \sum_{x = 1}^{\infty} (f (x) μ_{x, q} + λ y_{x, q}), \end{matrix}

which is equivalent to Equation (13a).

Similarly, according to Equation (10), the time average power consumption is

\begin{matrix} \sum_{q = 1}^{Q} \sum_{x = 1}^{\infty} μ_{x, q} [ξ_{x, q} w (q)] = \sum_{q = 1}^{Q} \sum_{x = 1}^{\infty} y_{x, q} w (q), \end{matrix}

which is exactly the LHS of Equation (13e).

Considering the property of steady probability distribution, Equation (13b,f) are verified.

Finally, notice that the evolution of state

(x, q)

forms a Markov chain as depicted in Figure 1 (top) for

Q = 2

as an example. We use

α_{q, q^{'}}^{x} = \Pr ((x + 1, q^{'}) | (x, q))

and

β_{q, q^{'}}^{x} = \Pr ((1, q^{'}) | (x, q))

to denote the transition probability between the states, which can be computed as follows,

\begin{matrix} α_{q, q^{'}}^{x} & = \{\begin{matrix} (1 - ξ_{x, q}) p_{q q^{'}} + ξ_{x, q} ε_{q} p_{q q^{'}}, & x < X; \\ ε_{q} p_{q q^{'}}, & x \geq X . \end{matrix} \end{matrix}

(14)

\begin{matrix} β_{q, q^{'}}^{x} & = \{\begin{matrix} ξ_{x, q} (1 - ε_{q}) p_{q q^{'}}, & x < X; \\ (1 - ε_{q}) p_{q q^{'}}, & x \geq X . \end{matrix} \end{matrix}

(15)

According to the property of steady distribution,

μ_{x, q}

equals to the sum of the steady distribution which can be transferred to

μ_{x, q}

in the next slot times their transition probability. As depicted in Figure 1,

μ_{2, 2} = μ_{1, 1} α_{1, 2}^{1} + μ_{1, 2} α_{2, 2}^{1}

(see the dashed lines). Therefore, we can compute

μ_{x, q}

as follows,

\begin{matrix} μ_{x, q^{'}} = \{\begin{matrix} \sum_{q = 1}^{Q} \sum_{x^{'} = 1}^{\infty} β_{q, q^{'}}^{x^{'}} μ_{x^{'}, q}, & x = 1; \\ \sum_{q = 1}^{Q} α_{q, q^{'}}^{x - 1} μ_{x - 1, q}, & x \geq 2, \end{matrix} \end{matrix}

which is equivalent to Equation (13c,d). □

As the steady distribution is infinite, it is difficult to solve the problem exactly. Therefore, we approximate the optimization problem in Theorem 2 into a finite LP problem through truncation:

Problem 4

(Linear Programming Approximation).

{{\tilde{μ}}_{x, q}^{*}, {\tilde{y}}_{x, q}^{*}} = arg min_{μ_{x, q}, y_{x, q}} \sum_{q = 1}^{Q} \sum_{x = 1}^{X} (f (x) μ_{x, q} + λ y_{x, q}),

(16a)

\begin{matrix} s . t . & \sum_{q = 1}^{Q} \sum_{x = 1}^{X} μ_{x, q} = 1, \end{matrix}

(16b)

\begin{matrix} μ_{1, q^{'}} = \sum_{q = 1}^{Q} \sum_{x = 1}^{X} (1 - ε_{q}) p_{q q^{'}} y_{x, q}, \end{matrix}

(16c)

\begin{matrix} μ_{x, q^{'}} = \sum_{q = 1}^{Q} p_{q q^{'}} [μ_{x - 1, q} - (1 - ε_{q}) y_{x - 1, q}], \forall 2 \leq x \leq X - 1, \end{matrix}

(16d)

\begin{matrix} μ_{X, q^{'}} = \sum_{q = 1}^{Q} p_{q q^{'}} [μ_{X - 1, q} + μ_{X, q} - (1 - ε_{q}) (y_{X - 1, q} + y_{X, q})], \end{matrix}

(16e)

\begin{matrix} \sum_{q = 1}^{Q} \sum_{x = 1}^{X} y_{x, q} w (q) \leq E, \end{matrix}

(16f)

\begin{matrix} 0 \leq μ_{x, q} \leq 1, 0 \leq y_{x, q} \leq μ_{x, q}, \end{matrix}

(16g)

\begin{matrix} y_{X, q} = μ_{X, q} . \end{matrix}

(16h)

After truncation, the optimal value of Equation (16a) is the lower bound of the objective function of the decoupled problem Equation (8a). The detailed proof is provided in Appendix D. The key concept is to set a threshold X and convert the Markov chain into a finite-state one (see Figure 1).

Moreover, the following theorem guarantees the lower bound obtained by the above LP problem is tight when X is sufficiently large. Thus, the approximate optimal solution

{{\tilde{μ}}_{x, q}^{*}, {\tilde{y}}_{x, q}^{*}}

performs close to the exact optimal solution

{μ_{x, q}^{*}, y_{x, q}^{*}}

. Before displaying the theorem, first denote

π_{X}^{*}

and

π^{*}

to be the scheduling policy according to the approximate optimal solution

{{\tilde{μ}}_{x, q}^{*}, {\tilde{y}}_{x, q}^{*}}

by setting threshold X and optimal one

{μ_{x, q}^{*}, y_{x, q}^{*}}

, respectively. Define

J_{\infty} (π)

to be the age and scheduling penalty of the primal problem under policy

π

and

J_{X} (π)

is the approximate penalty when we set the age penalty

f (x) = f (X), \forall x \geq X

. Then, according to Equations (13a) and (16a), we have

\begin{matrix} J_{\infty} (π^{*}) = \sum_{q = 1}^{Q} \sum_{x = 1}^{\infty} (f (x) μ_{x, q}^{*} + λ y_{x, q}^{*}), J_{X} (π_{X}^{*}) = \sum_{q = 1}^{Q} \sum_{x = 1}^{X} (f (x) {\tilde{μ}}_{x, q}^{*} + λ {\tilde{y}}_{x, q}^{*}) . \end{matrix}

Theorem 3.

Assume

f (x)

satisfies the following property:

\exists ϵ \in [\sqrt{ε_{\max}}, 1)

and constant

k > 0

such that

f (X) = k ϵ^{- X}

and

f (x) \leq k ϵ^{- x}, \forall x \geq X

, where

ε_{\max} = {max}_{q} ε_{q}

. Then, we have the following property,

J_{\infty} (π^{*}) - J_{X} (π_{X}^{*}) \leq \frac{k ε_{\max}^{\frac{X + 1}{2} - τ_{\max}}}{1 - ε_{\max}},

(17)

where

τ_{\max} = {max}_{q} τ_{q}

. As we see the above inequality, the difference between optimal solution of Theorem 2 and Problem 4 converges to 0 as the threshold X becomes sufficiently large.

The entire proof is provided in Appendix E.

After solving the above LP problem, we can obtain the approximate optimal scheduling probability

{{\tilde{ξ}}_{x, q}^{*}}

by setting a sufficiently large X and computing

{{\tilde{μ}}_{x, q}^{*}}

and

{{\tilde{y}}_{x, q}^{*}}

. Moreover, analogical to the threshold structure described in Lemma 1,

{{\tilde{ξ}}_{x, q}^{*}}

also has the following property.

Lemma 2.

For any state

(x, q)

of each sensor, the optimal scheduling probability

{{\tilde{ξ}}_{x, q}^{*}}

is non-decreasing with x, i.e.,

{\tilde{ξ}}_{x_{1}, q}^{*} \leq {\tilde{ξ}}_{x_{2}, q}^{*}, \forall x_{1} \leq x_{2} .

The proof technique is similar to Lemma 1, so it is omitted here.

5. Multi-Sensor Problem Resolution

By now, through relaxing, decoupling and truncation, we have obtained the approximate solution to the single-sensor decoupled problem for fixed scheduling penalty

λ

. In this section, we will go back to solve the multi-sensor problem, and propose a truncated policy to meet the hard bandwidth constraint in Equation (6b).

5.1. The Relaxed Problem Resolution

First, we should choose the optimal

λ

such that the relaxed bandwidth constraint can be fully leveraged. Denote

g (λ) = {min}_{π} L (π, λ)

to be the Lagrange dual function, where we choose the approximate optimal policy

π^{*} (λ)

by solving LP. Then, the dual function can be computed as follows,

g (λ) = \frac{1}{N} (\sum_{n = 1}^{N} g_{n} (λ) - λ M),

where

g_{n} (λ) = {min}_{π} L_{n} (π, λ)

is the decoupled dual function for sensor n. According to the LP approximation,

g_{n} (λ)

can be further written out as follows,

\begin{matrix} g_{n} (λ) = X_{n} (λ) + λ U_{n} (λ), \end{matrix}

where

X_{n} (λ)

is the average age penalty bounded by

\sum_{x = 1}^{X} \sum_{q = 1}^{Q} f (x) {\tilde{μ}}_{x, q}^{n, *}

, and

U_{n} (λ)

is the average scheduling probability, which equals to

\sum_{x = 1}^{X} {\tilde{y}}_{x, q}^{n, *}

.

According to the work in [24], the optimal Lagrange multiplier

λ^{*}

satisfies

\begin{matrix} λ^{*} = sup {λ | \sum_{n = 1}^{N} U_{n} (λ) \leq M} . \end{matrix}

If

U_{n} (λ^{*}) = M

, then the optimal policy is just

π (λ^{*})

. Otherwise, the optimal policy is a mixture of two policies, denoted by

π_{l}

and

π_{u}

, which can be computed by

\begin{matrix} π_{l} = lim_{λ \to λ^{* -}} π (λ), π_{u} = lim_{λ \to λ^{* +}} π (λ) . \end{matrix}

Then, we apply the sub-gradient ascend method to find the optimal solution, where the sub-gradient can be computed as follows,

d_{λ} g (λ) = \sum_{n = 1}^{N} U_{n} (λ) - M = U (λ) - M,

(18)

where

U (λ) = \sum_{n = 1}^{N} U_{n} (λ)

is the total scheduling probability.

Choose

λ_{0} = 0

as the starting point, and compute the average scheduling probability

U (λ_{0})

. If

U (λ_{0}) < M

, then it does not need to consider the bandwidth constraint, and thus the optimal solution has already been solved. Moreover, this optimal solution can also be viewed as the lower bound of the primal optimization problem, i.e.,

LB = \frac{1}{N} \sum_{n = 1}^{N} \sum_{x = 1}^{X} \sum_{q = 1}^{Q} f (x) {\tilde{μ}}_{x, q}^{n, *} (λ_{0}) .

Otherwise, we need to increase the scheduling penalty through iterations. The update operation in iteration k can be written out as follows,

λ_{k + 1} = λ_{k} + t_{k + 1} d_{λ} g (λ_{k}),

where

t_{k + 1}

is the step size in iteration k.

Moreover, the step size is determined as follows,

\begin{matrix} t_{k + 1} = \{\begin{matrix} γ t_{k}, & d_{λ} g (λ_{k}) d_{λ} g (λ_{k - 1}) < 0; \\ t_{k}, & otherwise, \end{matrix} \end{matrix}

where

γ \in (0, 1)

is a constant.

The determination of the step size above guarantees the algorithm converges from both sides. Therefore, after running the whole algorithm, we can obtain two different scheduling probabilities

M_{l}

and

M_{u}

:

\begin{matrix} M_{l} & = max_{k} U (λ_{k}), s . t . d_{λ} g (λ_{k}) \leq 0; \end{matrix}

(19)

\begin{matrix} M_{u} & = min_{k} U (λ_{k}), s . t . d_{λ} g (λ_{k}) \geq 0 . \end{matrix}

(20)

Their corresponding optimal polices are denoted as

{{\tilde{μ}}_{x, q}^{n, l}, {\tilde{y}}_{x, q}^{n, l}}

and

{{\tilde{μ}}_{x, q}^{n, u}, {\tilde{y}}_{x, q}^{n, u}}

. Then, the optimal stationary policy can be obtained by mixing these two policies:

{{\tilde{μ}}_{x, q}^{n, *}, {\tilde{y}}_{x, q}^{n, *}} = θ {{\tilde{μ}}_{x, q}^{n, u}, {\tilde{y}}_{x, q}^{n, u}} + (1 - θ) {{\tilde{μ}}_{x, q}^{n, l}, {\tilde{y}}_{x, q}^{n, l}},

where the mixed coefficient can be computed as follows,

θ = \frac{M - M_{l}}{M_{u} - M_{l}} .

Now, we have obtained the optimal stationary policy of the relaxed scheduling problem. The algorithm flow chart is listed in Algorithm 1. Once we obtain

{{\tilde{μ}}_{x, q}^{n, *}, {\tilde{y}}_{x, q}^{n, *}}

, the optimal scheduling probability

{{\tilde{ξ}}_{x, q}^{n, *}}

can be computed as follows,

\begin{matrix} {\tilde{ξ}}_{x, q}^{n, *} = \{\begin{matrix} 1, & if x > X or {\tilde{μ}}_{x, q}^{n, *} = 0 or {\tilde{ξ}}_{x - 1, q}^{n, *} = 1; \\ \frac{{\tilde{y}}_{x, q}^{n, *}}{{\tilde{μ}}_{x, q}^{n, *}}, & otherwise . \end{matrix} \end{matrix}

Algorithm 1 Construction of the optimal stationary policy

Initialization:

λ_{- 1} = λ_{0} = 0

,

ϵ

,

t_{0}

,

γ

,

M_{u}

and

M_{l}

for each

n \in [1, N]

do

compute

{{\tilde{μ}}_{x, q}^{n, *} (λ_{0}), {\tilde{y}}_{x, q}^{n, *} (λ_{0})}

and

U_{n} (λ_{0})

end for

if

d_{λ} g (λ_{0}) \leq 0

then

{{\tilde{μ}}_{x, q}^{n, *}, {\tilde{y}}_{x, q}^{n, *}} = {{\tilde{μ}}_{x, q}^{n, *} (λ_{0}), {\tilde{y}}_{x, q}^{n, *} (λ_{0})}

else

k = 0

while

| λ_{k} - λ_{k - 1} | > ϵ

or

d_{λ} g (λ_{k}) > 0

do

if

d_{λ} g (λ_{k}) d_{λ} g (λ_{k - 1}) < 0

then

t_{k + 1} = γ t_{k}

else

t_{k + 1} = t_{k}

end if

λ_{k + 1} = λ_{k} + t_{k + 1} d_{λ} g (λ_{k})

for each

n \in [1, N]

do

compute

{{\tilde{μ}}_{x, q}^{n, *} (λ_{k + 1}), {\tilde{y}}_{x, q}^{n, *} (λ_{k + 1})}

and

U_{n} (λ_{k + 1})

end for

if

d_{λ} g (λ_{k + 1}) \geq 0

and

U (λ_{k + 1}) \leq M_{u}

then

{{\tilde{μ}}_{x, q}^{n, u}, {\tilde{y}}_{x, q}^{n, u}} = {{\tilde{μ}}_{x, q}^{n, *} (λ_{k + 1}), {\tilde{y}}_{x, q}^{n, *} (λ_{k + 1})}

M_{u} = U (λ_{k + 1})

end if

if

d_{λ} g (λ_{k + 1}) \leq 0

and

U (λ_{k + 1}) \geq M_{l}

then

{{\tilde{μ}}_{x, q}^{n, l}, {\tilde{y}}_{x, q}^{n, l}} = {{\tilde{μ}}_{x, q}^{n, *} (λ_{k + 1}), {\tilde{y}}_{x, q}^{n, *} (λ_{k + 1})}

M_{l} = U (λ_{k + 1})

end if

k = k + 1

end while

θ = \frac{M - M_{l}}{M_{u} - M_{l}}

{{\tilde{μ}}_{x, q}^{n, *}, {\tilde{y}}_{x, q}^{n, *}} = θ {{\tilde{μ}}_{x, q}^{n, u}, {\tilde{y}}_{x, q}^{n, u}} + (1 - θ) {{\tilde{μ}}_{x, q}^{n, l}, {\tilde{y}}_{x, q}^{n, l}}

end if

5.2. Truncation for the Hard Bandwidth Constraint

Finally, a bandwidth-truncated policy

{\hat{π}}_{X}

is derived from the optimal stationary policy

π_{X}^{*}

to satisfy the hard bandwidth constraint in Equation (6b). Before introducing the truncated policy, first denote

S (t)

to be the set of sensors to be scheduled in slot t, and

| S (t) |

is the number of sensors to be scheduled in slot t. Then, the construction of

{\hat{π}}_{X}

is carried out as follows.

In slot t, compute the scheduling set $S (t)$ according to the optimal stationary policy $π_{X}^{*}$ .
If $| S (t) | \leq M$ , then ${\hat{π}}_{X}$ schedules all these sensors as $π_{X}^{*}$ does.
If $| S (t) | > M$ , the hard bandwidth constraint is never satisfied. Therefore, ${\hat{π}}_{X}$ randomly chooses M out of $| S (t) |$ sensors to be scheduled in the current slot.

The following theorem guarantees the asymptotic performance of the truncated policy

{\hat{π}}_{X}

compared with

π_{X}^{*}

on certain conditions.

Theorem 4.

Suppose the age penalty function is concave, and let

κ = \frac{M}{N}

be a constant. If all the sensors and their channels are identical, i.e., the power constraint and the channel transition matrix are the same, then the truncated policy

{\hat{π}}_{X}

and the optimal randomized policy

π_{X}^{*}

have the following property,

lim_{N \to \infty} J ({\hat{π}}_{X}) - J (π_{X}^{*}) = 0 .

The whole proof is provided in Appendix F.

6. Simulation Results

In this section, we provide simulation results to verify the performance of the proposed policy. First, we study the average age penalty performance with different types of sensors with different bandwidth constraint and AoI truncation threshold X. Next, we study the detailed scheduling decision of each sensor. The average performance is obtained by simulating

10^{5}

consecutive slots.

6.1. Average Age Penalty Performance

In this part, we demonstrate the average performance of our proposed policy. We consider 4-state channel system, i.e.,

Q = 4

. The age penalty function is chosen as

f (x) = ln (x)

unless otherwise specified. The transition matrix

P_{n}

for each sensor is the same:

P_{n} = (\begin{matrix} 0.4 & 0.3 & 0.2 & 0.1 \\ 0.25 & 0.3 & 0.25 & 0.2 \\ 0.2 & 0.25 & 0.3 & 0.25 \\ 0.1 & 0.2 & 0.3 & 0.4 \end{matrix}) .

(21)

Denote

{η_{q}}

to be the steady distribution of the channel state. We consider that for each channel state q, the energy consumption

w (q) = q

. According to [12], the optimal policy to minimize the average AoI performance when all the sensors are identical is a greedy policy, which schedules the M sensors with the largest AoI and consumes the average power for each sensor

E_{G} = \frac{M}{N} \sum_{q = 1}^{Q} η_{q} w (q)

. Therefore, define

ρ_{n} = \frac{E_{n}}{E_{G}}

to be the power constraint factor which describes the effects of power consumption constraint

E_{n}

.

Figure 2 demonstrates the average age penalty performance of the proposed policy

{\hat{π}}_{X}

as a number of sensors N, with bandwidth constraint

M = {5, 15}

, compared with the lower bound, the relaxed optimal policy

π_{X}^{*}

and the greedy policy. Set the threshold

X = ⌈ \frac{20 N}{M} ⌉

, where

⌈ \cdot ⌉

is ceiling function. We assume that the probability of packet loss for each sensor is the same, denoted by

ε

:

ε = [0.1, 0.3, 0.2, 0.4] .

The power constraint factor of sensor n is

ρ_{n} = 0.2 + \frac{1.4 (n - 1)}{N - 1}

.

As seen in Figure 2, the proposed truncated policy performs closely to the relaxed optimal policy and the lower bound, and outperforms the greedy policy especially when N is large. According to Figure 2, the age penalty decreases by

18 %

and

23 %

from the greedy policy with

N = 60

sensors when

M = 5

and

M = 15

, respectively, under proposed policy. Moreover, as the threshold X becomes large, the difference between the average performance following policy

π_{X}^{*}

and the lower bound becomes indistinguishable. Therefore, the asymptotic performance described in Theorem 3 can be verified.

Figure 3 compares the average performance of the proposed policy

{\hat{π}}_{X}

with the AoI-minimum policy to verify the improvement of considering different penalty function. The AoI-minimum policy is similar to the one proposed in our previous work [22] with the consideration of packet loss. The bandwidth constraint

M = 2

. The packet loss probability

ε = [0.25, 0.35, 0.65, 0.8]

, and the energy consumption

w (q) = 2^{q}

. Here the age penalty function is chosen as

f (x) = x^{2}

. From Figure 3, we can see that the AoI-minimum policy cannot guarantee a good age penalty performance. Thus, it is necessary to consider different demand for data freshness to achieve better performance.

Figure 4 and Figure 5 verify the asymptotic performance of the proposed policy with different age penalty function and different packet loss probability respectively when

M / N

is a constant in symmetric networks. From both figures, it can be seen that the difference of average age penalty under proposed policy and the lower bound becomes small as N increases. Thus, the asymptotic performance described in Theorem 4 can be verified.

6.2. Sensor Level Analysis and Threshold Structure

Next, we analyze the scheduling decision of each sensor and their corresponding age penalty to provide some insights of optimal scheduling policies. We consider a system with

N = 8

and

M = 2

. The transition matrix of channel state is the same as Equation (21), and power consumption

w (q) = q

. We set the threshold

X = 80

to compute the proposed policy.

First we consider the system with

Q = 4

and age penalty function

f (x) = ln x

. Figure 6 analyzes how the power constraint influences age penalty of each sensor. The power constraint of sensor n is

ρ_{n} = 0.2 n

. From Figure 6, we can see that the proposed policy outperforms the greedy policy when the required power consumption is scarce, and performs similarly or a little worse when the factor

ρ_{n} > 1

. This implies that our proposed policy chooses a more proper power allocation based on current channel state and AoI than the greedy policy by stimulating sources with scarce power budgets to be scheduled in better channel states.

As the packet loss influences age penalty as well, Figure 7 considers sensors with different probability of packet loss, which can be written out as the following matrix

{ε_{n, q}}

:

ε = (\begin{matrix} 0.05 & 0.1 & 0.15 & 0.2 \\ 0.1 & 0.15 & 0.2 & 0.25 \\ 0.15 & 0.2 & 0.25 & 0.3 \\ 0.2 & 0.25 & 0.3 & 0.35 \\ 0.25 & 0.3 & 0.35 & 0.4 \\ 0.3 & 0.35 & 0.4 & 0.45 \\ 0.35 & 0.4 & 0.45 & 0.5 \\ 0.4 & 0.45 & 0.5 & 0.55 \end{matrix}) .

(22)

We fix the power constraint factor

ρ_{n} = 0.6

for all sensors. Figure 7 shows that the average age penalty increases with the probability of packet loss. Moreover, the proposed policy combats with the packet loss better than the greedy policy as the proposed policy considers

ε_{n, q}

when solving the LP problem, but the greedy policy does not.

Next, we verify the threshold structure of the optimal scheduling policy. Figure 8 demonstrates the effect of bandwidth and packet loss on the scheduling threshold. The power constraint factor

ρ_{n} = 0.6, \forall n

, and the packet loss probability is the same as Equation (22). Subfigures (a–c) demonstrate three of these sensors whose packet loss probability is as the title. For each of the three sensors, subfigures (d–f) consider the single-sensor system without bandwidth constraint and display their scheduling probability, respectively. Moreover, Figure 8 lists some of the thresholds given channel state q, e.g., in subfigure (a), the threshold of channel state

q = 3

is

x = 7

, and the corresponding optimal scheduling probability is

ξ_{7, 3} = 0.9963

. From Figure 8, first we can see that all the six figures verify the non-decreasing property of the scheduling probability with AoI

x (t)

described in Lemma 2. Second, subfigures (a–c) demonstrate that the sensor with higher packet loss probability also has higher scheduling threshold. This implies that the sensors with more reliable channel should be given higher priority to scheduling than unreliable ones to minimize the average age penalty, since scheduling the more reliable channel under the same AoI is more likely to reduce the current age penalty. Third, by comparing subfigures (a) and (d), (b) and (e), and (c) and (f), the scheduling threshold varies more significantly for different channel states if there exists no bandwidth constraint. The sensors tend to update more often when the channel state is good, and idle when the channel state is bad. This is because the sensors can choose to update packets more frequently in good channel state to both save energy and increase the success probability of transmission without bandwidth constraint.

Finally, we study the effects of age penalty function on threshold structure. Here, we consider a system with

Q = 2

,

ρ_{n} = 0.2 n

and three different penalty function, i.e.,

f (x) = ln x

,

f (x) = x

, and

f (x) = x^{2}

in Figure 9. We plot the scheduling decision of the sixth sensor. As is depicted in Figure 9, as the system has a higher restriction on data freshness such as exponential or quadratic function, the difference between thresholds of different states becomes small. In such situations, channel states play a weaker role because waiting for another slot to schedule tends to have unbearable age penalty.

7. Conclusions

In this paper, we consider the multi-sensor scheduling problem through an error-prone Markovian channel state. Through relaxing and decoupling, we propose a truncated policy to satisfy both the bandwidth and power constraints to minimize the average age penalty of all sensors in infinite horizon. We prove the asymptotic performance of the truncated policy in symmetric networks when the age penalty function is concave by choosing a sufficiently large threshold X. Through theoretical analysis and numerical simulations, we find that the age penalty function, packet loss probability, bandwidth constraint, and power constraint work altogether to influence the optimal scheduling decisions. Those who have more reliable channel state and enough power consumption tend to have higher scheduling priority.

Author Contributions

Conceptualization, Y.C. and H.T.; methodology, Y.C.; validation, Y.C. and H.T.; formal analysis, Y.C.; writing—original draft preparation, Y.C.; writing—review and editing, H.T., J.W., and J.S.; supervision, J.W. and J.S.; funding acquisition, J.W. and J.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Key R&D Program of China under Grant 2017YFE0112300, Beijing National Research Center for Information Science and Technology under Grant BNR2019RC01014 and BNR2019TD01001.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Theorem 1

First, we notice that Problem 2 is equivalent to the following optimization problem, where we further introduce variables

ν_{n}

to denote the local bandwidth constraint of sensor n.

Problem A1

(Equivalent Relaxed Primal Scheduling Problem).

A g e_{R}^{*} = min_{π \in Π_{CP}, {ν_{1}, \dots, ν_{N}}} lim_{T \to \infty} \frac{1}{N T} E_{π} [\sum_{n = 1}^{N} \sum_{t = 1}^{T} f (x_{n} (t))],

(A1a)

\begin{matrix} s . t . & lim_{T \to \infty} \frac{1}{T} E_{π} [\sum_{t = 1}^{T} u_{n} (t)] \leq ν_{n}, \forall n, \end{matrix}

(A1b)

\begin{matrix} \sum_{n = 1}^{N} ν_{n} \leq M, \end{matrix}

(A1c)

\begin{matrix} lim_{T \to \infty} \frac{1}{T} E_{π} [\sum_{t = 1}^{T} u_{n} (t) w (q_{n} (t))] \leq E_{n}, \forall n, \end{matrix}

(A1d)

\begin{matrix} 0 \leq ν_{n} \leq 1, \forall n . \end{matrix}

(A1e)

For each feasible fixed local bandwidth constraint vector

{ν_{n}}

, we can transfer the above problem into the following one by removing constraint Equations (A1c) and (A1e).

Problem A2

(Relaxed Problem for Fixed

{ν_{n}}

).

A g e_{R}^{*} (ν_{1}, \dots, ν_{n}) = min_{π \in Π_{CP}} lim_{T \to \infty} \frac{1}{N T} E_{π} [\sum_{n = 1}^{N} \sum_{t = 1}^{T} f (x_{n} (t))],

(A2a)

\begin{matrix} s . t . & lim_{T \to \infty} \frac{1}{T} E_{π} [\sum_{t = 1}^{T} u_{n} (t)] \leq ν_{n}, \forall n, \end{matrix}

(A2b)

\begin{matrix} lim_{T \to \infty} \frac{1}{T} E_{π} [\sum_{t = 1}^{T} u_{n} (t) w (q_{n} (t))] \leq E_{n}, \forall n . \end{matrix}

(A2c)

Then, according to the work in [25], the optimal policy of the Problem 6 given

{ν_{n}}

can be decoupled into several local policies. This is somewhat intuitive as the constraints and objective function of Problem 6 are decoupled for each sensor n. As for each feasible

{ν_{n}}

, the optimal scheduling policy can be decoupled, recall that

A g e_{R}^{*} = {min}_{ν_{1}, \dots, ν_{N}} A g e_{R}^{*} (ν_{1}, \dots, ν_{N})

, when

{ν_{n}}

takes the optimum value

{ν_{n}^{*}}

, the property also holds.

Appendix B. Proof of Lemma 1

Before we proceed to the proof, we make two definitions. For any two states

s = (x, q)

and

s^{'} = (x^{'}, q) \in S

, define a partial order

\leq_{s}

. We state that

s \leq_{s} s^{'}

if and only if

x \leq x^{'}

. Moreover, we also define a partial order

\leq_{a}

on the action space

A

. We state that

u \leq_{a} u^{'}

if and only if

u \leq u^{'}

.

The monotonicity of the optimal action on the state space is true if the four following conditions hold:

If $s \leq_{s} s^{'}$ , $c (s, u) \leq c (s^{'}, u)$ for any $u \in A$ ;
If $s \leq_{s} s^{'}$ , for any $u \in A$ , $\sum_{s_{+}} \Pr (s_{+} | s, u) V (s_{+}) \leq \sum_{s_{+}} \Pr (s_{+} | s^{'}, u) V (s_{+})$ ,
where $V (s)$ is any monotone increasing function;
If $s \leq_{s} s^{'}$ and $u \leq_{a} u^{'}$ , then $c (s, u) + c (s^{'}, u^{'}) \leq c (s^{'}, u) + c (s, u^{'})$ ;
If $s \leq_{s} s^{'}$ and $u \leq_{a} u^{'}$ , then:

\begin{matrix} \sum_{s_{+}} \Pr (s_{+} | s, u) V (s_{+}) + \sum_{s_{+}} \Pr (s_{+} | s^{'}, u^{'}) V (s_{+}) \\ \leq & \sum_{s_{+}} \Pr (s_{+} | s^{'}, u) V (s_{+}) + \sum_{s_{+}} \Pr (s_{+} | s, u^{'}) V (s_{+}); \end{matrix}

where

s, s^{'} \in S

,

u, u^{'} \in A

.

s^{+} \in S

is the next state, and

c (s, u)

denotes the one-step cost given the state s and action u.

Next, we consider a discounted cost MDP over a finite horizon:

min_{π \in Π_{CP}} E_{π} [\sum_{k = 1}^{T} β^{k} c (s (k), u (k))],

(A3)

where

β

is discounted factor.

And the corresponding Bellman equation is

\begin{matrix} V_{T, β} (s) = min_{u \in A} c (s, u); \end{matrix}

(A4)

\begin{matrix} V_{k, β} (s) = min_{u \in A} [c (s, u) + β \sum_{s_{+}} \Pr (s_{+} | s, u) V_{k + 1, β} (s_{+})], k = 1, 2, . . ., T - 1 . \end{matrix}

(A5)

If the above four conditions are satisfied and the corresponding Bellman function is monotone increasing, then the one-step cost

c (s, u) = c_{x} (x, q, u) + ν c_{E} (x, q, u)

is monotone and sub-modular in s and u, which shows there exists an optimal monotone policy for any finite-time horizon MDP. Using the vanishing discount approach in Theorem 5.5.4 in [26], the property of monotonicity is propagated to the time-average MDP.

Before verifying the above four conditions, first we introduce the following lemma to ensure

V_{k, β} (x, q)

is monotone increasing with x, whose proof is provided in Appendix C.

Lemma A1.

For fixed channel state q and discounted factor β,

V_{k, β} (x, q)

is monotone increasing with x.

Therefore, we only need to show the decoupled unconstrained problem satisfies the above four conditions.

Notice that the one-step cost can be computed by

c (s, u) = f (x) + λ u + ν w (q) u .

(A6)

According to the definition of partial order

\leq_{s}

and

\leq_{a}

, and Equation (A6), we can easily verify condition 1 and 3.

Also, we have:

\begin{matrix} \sum_{s_{+}} \Pr (s_{+} | s, u) V (s_{+}) = \{\begin{matrix} \sum_{q^{'}} p_{q q^{'}} V (x + 1, q^{'}), & u = 0; \\ \sum_{q^{'}} [p_{q q^{'}} (1 - ε_{q}) V (1, q^{'}) + p_{q q^{'}} ε_{q} V (x + 1, q^{'})], & u = 1 . \end{matrix} \end{matrix}

(A7)

According to Equation (A7) and the fact that

V (x, q) < V (x^{'}, q), \forall x < x^{'}

, it is also feasible to verify that both condition 2 and 4 hold.

Appendix C. Proof of Lemma A1

In this part, we will prove that

V_{k, β} (x, q)

in the finite-time horizon MDP is an increasing function with x. The key method of the proof is through induction.

Notice that

V_{T, β} (x, q) = {min}_{u \in A} c (s, u) = {min}_{u \in A} f (x) + λ u + ν w (q) u

is increasing with x. Suppose that

V_{t, β} (x, q)

is increasing with

x, \forall t \geq k

. For

\forall x_{1} < x_{2}

, we have

c_{x} (x_{1}, q, u) < c_{x} (x_{2}, q, u), c_{E} (x_{1}, q, u) = c_{E} (x_{2}, q, u) .

Denote

J_{k - 1, β} (x, q, u)

to be the expected discounted cost if take action u at state

(x, q)

. Then,

\begin{matrix} J_{k - 1, β} (x_{1}, q, 0) & = c_{x} (x_{1}, q, 0) + β \sum_{q^{'}} p_{q q^{'}} V_{k, β} (x_{1} + 1, q) \\ < c_{x} (x_{2}, q, 0) + β \sum_{q^{'}} p_{q q^{'}} V_{k, β} (x_{2} + 1, q) \\ = J_{k - 1, β} (x_{2}, q, 0) . \end{matrix}

Similarly, we can derive that

J_{k - 1, β} (x_{1}, q, 1) < J_{k - 1, β} (x_{2}, q, 1)

. According to the Bellman equation Equations (A4) and (A5), the value function

V_{k - 1, β} (x, q)

can be computed as follows,

V_{k - 1, β} (x, q) = min_{u \in A} J_{k - 1, β} (x, q, u) .

From the above,

J_{k - 1, β} (x_{1}, q, u) < J_{k - 1, β} (x_{2}, q, u), \forall u \in A

. Thus,

V_{k - 1, β} (x_{1}, q) < V_{k - 1, β} (x_{2}, q)

. Therefore, through induction, we verify that

V_{k, β} (x, q)

is increasing with x.

Appendix D. Derivation of Problem 4

First, make the following variable substitution,

\begin{matrix} μ_{x, q}^{'} = \{\begin{matrix} μ_{x, q}, & x < X; \\ \sum_{x^{'} = X}^{\infty} μ_{x^{'}, q}, & x = X . \end{matrix} \end{matrix}

(A8)

Similarly, let

\begin{matrix} y_{x, q}^{'} = \{\begin{matrix} y_{x, q}, & x < X; \\ \sum_{x^{'} = X}^{\infty} y_{x^{'}, q}, & x = X . \end{matrix} \end{matrix}

(A9)

Instead of solving

μ_{x, q}

and

y_{x, q}

in Theorem 2, we solve the new finite variables

μ_{x, q}^{'}

and

y_{x, q}^{'}

. Therefore, the constraints Equation (13b), Equation (13c), Equation (13e), and Equation (13f) become Equation (16b), Equation (16c), Equation (16f), and Equation (16g), respectively. Equation (16d) corresponds to Equation (13d) when

x \leq X - 1

. Otherwise, sum all the Equation (13d) from X to infinity, i.e.,

\begin{matrix} \sum_{x = X}^{\infty} μ_{x, q^{'}} & = \sum_{x = X}^{\infty} \sum_{q = 1}^{Q} p_{q q^{'}} [μ_{x - 1, q} - (1 - ε_{q}) y_{x - 1, q}] \\ \overset{(a)}{=} \sum_{q = 1}^{Q} p_{q q^{'}} [μ_{X - 1, q} - (1 - ε_{q}) y_{X - 1, q}] \\ + \sum_{x = X}^{\infty} \sum_{q = 1}^{Q} p_{q q^{'}} [μ_{x, q} - (1 - ε_{q}) y_{x, q}] \\ = \sum_{q = 1}^{Q} p_{q q^{'}} [μ_{X - 1, q} - (1 - ε_{q}) y_{X - 1, q}] \\ + \sum_{q = 1}^{Q} p_{q q^{'}} [\sum_{x = X}^{\infty} μ_{x, q} - (1 - ε_{q}) \sum_{x = X}^{\infty} y_{x, q}], \end{matrix}

where (a) holds because we separate

x = X

from the sum.

Invoking Equations (A8) and (A9), the above equality is equivalent to

\begin{matrix} μ_{X, q^{'}}^{'} & = \sum_{q = 1}^{Q} p_{q q^{'}} [μ_{X - 1, q}^{'} - (1 - ε_{q}) y_{X - 1, q}^{'}] + \sum_{q = 1}^{Q} p_{q q^{'}} [μ_{X, q}^{'} - (1 - ε_{q}) y_{X, q}^{'}] \\ = \sum_{q = 1}^{Q} p_{q q^{'}} [μ_{X - 1, q}^{'} + μ_{X, q}^{'} - (1 - ε_{q}) (y_{X - 1, q}^{'} + y_{X, q}^{'})], \end{matrix}

which is exactly equal to Equation (16e).

Notice that the optimal action is to schedule when

x \geq X

, i.e.,

ξ_{x, q}^{*} = 1

. Thus,

y_{x, q}^{*} = μ_{x, q}^{*}, \forall x \geq X

. Therefore, we have

μ_{X, q}^{'} = y_{X, q}^{'},

(A10)

which is equal to Equation (16h).

By now, through variable substitution, we have verified the constraints of the finite LP problem are equivalent to the ones of the original optimization problem. Therefore, the optimal solution to the finite LP problem is also feasible to the original problem. Now for the objective function, we have

\begin{matrix} \sum_{q = 1}^{Q} \sum_{x = 1}^{\infty} (f (x) μ_{x, q} + λ y_{x, q}) \\ = & \sum_{q = 1}^{Q} \sum_{x = 1}^{X - 1} (f (x) μ_{x, q} + λ y_{x, q}) + \sum_{q = 1}^{Q} \sum_{x = X}^{\infty} (f (x) μ_{x, q} + λ y_{x, q}) \\ \geq & \sum_{q = 1}^{Q} \sum_{x = 1}^{X - 1} (f (x) μ_{x, q} + λ y_{x, q}) + \sum_{q = 1}^{Q} \sum_{x = X}^{\infty} (f (X) μ_{x, q} + λ y_{x, q}) \\ = & \sum_{q = 1}^{Q} \sum_{x = 1}^{X - 1} (f (x) μ_{x, q}^{'} + λ y_{x, q}^{'}) + \sum_{q = 1}^{Q} (f (X) μ_{X, q}^{'} + λ y_{X, q}^{'}) \\ = & \sum_{q = 1}^{Q} \sum_{x = 1}^{X} (f (x) μ_{x, q}^{'} + λ y_{x, q}^{'}) . \end{matrix}

Therefore, we have verified that the decoupled single-sensor problem is approximate to the LP problem, which has the same constraint but can be the lower bound of the original problem.

Appendix E. Proof of Theorem 3

Our aim is to bound

J_{\infty} (π^{*}) - J_{X} (π_{X}^{*})

. Recall that

π^{*}

is the optimal solution to the primal problem. Thus, the difference can be bounded by

J_{\infty} (π_{X}^{*}) - J_{X} (π_{X}^{*})

.

To further bound the above term, first denote

ρ_{x, q}^{X}

and

ξ_{x, q}^{X}

to be the distribution and scheduling probability of state

(x, q)

under policy

π_{X}^{*}

, i.e.,

\begin{matrix} ρ_{x, q}^{X} = E_{π_{X}^{*}} [lim_{T \to \infty} \frac{1}{T} \sum_{t = 1}^{T} I ((x (t), q (t)) = (x, q))], \end{matrix}

(A11)

\begin{matrix} ξ_{x, q}^{X} = E_{π_{X}^{*}} [lim_{T \to \infty} \frac{1}{T} \sum_{t = 1}^{T} I (u (t) = 1 | (x (t), q (t)) = (x, q))], \end{matrix}

(A12)

where

I (\cdot)

is indicating function.

Let

γ_{x, q}^{X} = ρ_{x, q}^{X} ξ_{x, q}^{X}

. Recalling the approximation solution to Problem 4, we have

\begin{matrix} ρ_{x, q}^{X} = {\tilde{μ}}_{x, q}^{*}, \forall x < X, \sum_{x = X}^{\infty} ρ_{x, q}^{X} = {\tilde{μ}}_{X, q}^{*}; \end{matrix}

(A13)

\begin{matrix} γ_{x, q}^{X} = {\tilde{y}}_{x, q}^{*}, \forall x < X, \sum_{x = X}^{\infty} γ_{x, q}^{X} = {\tilde{y}}_{X, q}^{*} . \end{matrix}

(A14)

Then, we can bound

J_{\infty} (π_{X}^{*}) - J_{X} (π_{X}^{*})

as

\begin{matrix} J_{\infty} (π_{X}^{*}) - J_{X} (π_{X}^{*}) \\ = & E_{π_{X}^{*}} [lim_{T \to \infty} \frac{1}{T} \sum_{t = 1}^{T} (f (x (t)) + λ u (t))] - J_{X} (π_{X}^{*}) \\ = & E_{π_{X}^{*}} [lim_{T \to \infty} \frac{1}{T} \sum_{t = 1}^{T} \sum_{q = 1}^{Q} \sum_{x = 1}^{\infty} f (x) I ((x (t), q (t)) = (x, q)) \\ + λ I ((x (t), q (t)) = (x, q), u (t) = 1)] - J_{X} (π_{X}^{*}) \\ \overset{(a)}{=} & \sum_{q = 1}^{Q} \sum_{x = 1}^{\infty} (f (x) ρ_{x, q}^{X} + λ γ_{x, q}^{X}) - J_{X} (π_{X}^{*}) \\ = & \sum_{q = 1}^{Q} \sum_{x = 1}^{X - 1} (f (x) ρ_{x, q}^{X} + λ γ_{x, q}^{X}) + \sum_{q = 1}^{Q} \sum_{x = X}^{\infty} (f (x) ρ_{x, q}^{X} + λ γ_{x, q}^{X}) \\ - \sum_{q = 1}^{Q} \sum_{x = 1}^{X} (f (x) {\tilde{μ}}_{x, q}^{*} + λ {\tilde{y}}_{x, q}^{*}) \\ \overset{(b)}{=} & \sum_{x = X}^{\infty} (f (x) - f (X)) \sum_{q = 1}^{Q} ρ_{x, q}^{X}, \end{matrix}

(A15)

where (a) holds because of Equations (A11) and (A12). (b) holds because of Equations (A13) and (A14).

Next, we upper bound term

\sum_{q = 1}^{Q} ρ_{x, q}^{X}

. For simplicity, denote

ρ_{x}^{X} = {[ρ_{x, 1}^{X}, ρ_{x, 2}^{X}, \dots, ρ_{x, Q}^{X}]}^{T}

. Thus, we have

ρ_{x + 1}^{X} = α ρ_{x}^{X}, \forall x \geq τ_{\max},

(A16)

where the matrix

α

can be computed according to Equation (14) and further bounded as follows,

α = (\begin{matrix} ε_{1} p_{1, 1} & ε_{2} p_{2, 1} & \dots & ε_{Q} p_{Q, 1} \\ ε_{1} p_{1, 2} & ε_{2} p_{2, 2} & \dots & ε_{Q} p_{Q, 2} \\ \dots & \dots & \dots & \dots \\ ε_{1} p_{1, Q} & ε_{2} p_{2, Q} & \dots & ε_{Q} p_{Q, Q} \end{matrix}) \leq ε_{\max} P^{T} .

(A17)

Therefore,

\sum_{q = 1}^{Q} ρ_{x, q}^{X}

can be bounded as follows,

\begin{matrix} \sum_{q = 1}^{Q} ρ_{x, q}^{X} & = 1^{T} ρ_{x}^{X} \\ \overset{(a)}{=} 1^{T} α^{x - τ_{\max}} ρ_{τ_{\max}}^{X} \\ \overset{(b)}{\leq} 1^{T} ε_{\max}^{x - τ_{\max}} {(P^{T})}^{x - τ_{\max}} ρ_{τ_{\max}}^{X} \\ \overset{(c)}{=} ε_{\max}^{x - τ_{\max}}, \end{matrix}

(A18)

where

1

is all-one vector. (a) holds because of Equation (A16). (b) holds due to Equation (A17). (c) holds because

1

is the eigenvector of

P

.

Plugging Equation (A18) into Equation (A15), we have

\begin{matrix} J_{\infty} (π^{*}) - J_{X} (π_{X}^{*}) & \leq \sum_{x = X}^{\infty} k (ϵ^{- x} - ϵ^{- X}) ε_{\max}^{x - τ_{\max}} \\ = \sum_{x = X}^{\infty} k (ε_{\max}^{\frac{x}{2} - τ_{\max}} - ε_{\max}^{x - \frac{X}{2} - τ_{\max}}) \\ = \frac{k ε_{\max}^{\frac{X + 1}{2} - τ_{\max}}}{1 - ε_{\max}} . \end{matrix}

Appendix F. Proof of Theorem 4

According to Lemma 2, the optimal policy for every decoupled single-sensor problem has the threshold structure. Let

τ_{q}^{n}

be the threshold of sensor n given channel state q. Denote

Γ_{n} = {max}_{q} τ_{q}^{n} - {min}_{q} τ_{q}^{n}

to be the largest difference between different thresholds of sensor n, and

Γ = {max}_{n} Γ_{n}

. As all the sensors are identical,

Γ

does not change with N. Moreover, let

\bar{S} (t) = E_{π_{X}^{*}} [S (t)]

.

Suppose that the sensor n is not scheduled under

{\hat{π}}_{X}

when

u_{n} (t) = 1

. Now consider the probability that it is still not scheduled in the next slot, which results from two reasons, i.e., it jumps into a state which has a higher scheduling threshold or there are still more than M sensors to be scheduled. Let p be the probability that the channel state jumps into a state having a higher scheduling threshold. Then, the probability of idling in the next slot, denoted by

p_{idle}

can be computed by

\begin{matrix} p_{idle} = p + (1 - p) \frac{N - M}{N} = p \frac{M}{N} + \frac{N - M}{N} . \end{matrix}

Notice that p can be upper bounded as

\begin{matrix} p \leq max_{n, q} \sum_{q^{'} = 1, q^{'} \neq q}^{Q} p_{q q^{'}}^{n} = max_{n, q} (1 - p_{q q}^{n}) . \end{matrix}

Therefore,

p_{idle}

can be upper bounded by z, which can be computed as

p_{idle} \leq z = \frac{M}{N} max_{n, q} (1 - p_{q q}^{n}) + \frac{N - M}{N} .

Therefore, it can be generalized that the probability that it is still not scheduled in the consecutive k slots is upper bounded by

z^{{(k - Γ_{n})}^{+}}

, where

{(\cdot)}^{+} = max (\cdot, 0)

.

Now, we bound the different performance of

{\hat{π}}_{X}

and

π_{X}^{*}

by introducing another policy

{\tilde{π}}_{X}

. Under

{\tilde{π}}_{X}

, when

| S (t) | > M

, all these sensors are scheduled like

π_{X}^{*}

, but add an extra penalty

a_{x}^{n} (t)

for those sensors, which can be computed as

\begin{matrix} a_{x}^{n} (t) & = \sum_{k = 0}^{\infty} z^{{(k - Γ_{n})}^{+}} (f (x_{n} (t + k)) - f (k)) \\ \overset{(a)}{\leq} \sum_{k = 0}^{\infty} z^{{(k - Γ_{n})}^{+}} (f (x_{n} (t)) - f (0)) \\ \leq (Γ_{n} + \frac{1}{1 - z}) f (x_{n} (t)) \\ \leq (Γ + \frac{1}{1 - z}) f (x_{n} (t)), \end{matrix}

where (a) holds because

f (x)

is concave.

For simplicity, denote

A = Γ + \frac{1}{1 - z}

, which is a constant once the channel transition matrix and power constraint are fixed.

As the age penalty function

f (x)

is concave, the average age penalty cost under

{\tilde{π}}_{X}

does not decrease compared with

{\hat{π}}_{X}

. Then, the difference between

J ({\hat{π}}_{X})

and

J (π_{X}^{*})

can be bounded as

\begin{matrix} J ({\hat{π}}_{X}) - J (π_{X}^{*}) \leq & J ({\tilde{π}}_{X}) - J (π_{X}^{*}) \\ = & lim_{T \to \infty} E_{π_{X}^{*}} [\frac{1}{N T} \sum_{n = 1}^{N} \sum_{t = 1}^{T} a_{x}^{n} (t) \frac{{(| S (t) | - M)}^{+}}{M}] \\ \leq & lim_{T \to \infty} E_{π_{X}^{*}} [\frac{1}{M N T} \sum_{n = 1}^{N} \sum_{t = 1}^{T} A f (x_{n} (t)) | | S (t) | - M |], \end{matrix}

Notice that when

x > X

, the optimal action is to schedule. Hence, the probability of

x_{n} (t) > X

is upper bounded by

{({max}_{n, q} ε_{n, q})}^{x - X}

. For simplicity, let

ρ = {max}_{n, q} ε_{n, q}

. Then,

\forall ϵ > 0

, there exists

k = ⌈ \frac{ln ϵ}{ln ρ} ⌉

such that the steady distribution

μ_{x, q}^{n}, \forall x > X + k

and n can be bounded as

\begin{matrix} \sum_{q = 1}^{Q} μ_{x, q}^{n} \leq ρ^{x - X} = ρ^{k} ρ^{x - (X + k)} \leq ϵ ρ^{x - (X + k)} . \end{matrix}

Then, we have

\begin{matrix} J ({\hat{π}}_{X}) - J (π_{X}^{*}) \\ \leq & lim_{T \to \infty} \frac{1}{M N T} E_{π_{X}^{*}} [\sum_{n = 1}^{N} \sum_{t = 1}^{T} I_{x_{n} (t) \leq X + k} A f (x_{n} (t)) | | S (t) | - M |] \\ + \frac{1}{M N T} E_{π_{X}^{*}} [\sum_{n = 1}^{N} \sum_{t = 1}^{T} I_{x_{n} (t) > X + k} A f (x_{n} (t)) | | S (t) | - M |] \\ \leq & lim_{T \to \infty} \frac{A f (X + k)}{M} E_{π_{X}^{*}} [\frac{1}{T} \sum_{t = 1}^{T} (| | S (t) | - \bar{S} (t) | + | \bar{S} (t) - M |)] \\ + \frac{1}{M N T} E_{π_{X}^{*}} [\sum_{n = 1}^{N} \sum_{t = 1}^{T} I_{x_{n} (t) > X + k} A N f (x_{n} (t))] \\ = & lim_{T \to \infty} \frac{A f (X + k)}{κ N} E_{π_{X}^{*}} [\frac{1}{T} \sum_{t = 1}^{T} (| | S (t) | - \bar{S} (t) | + | \bar{S} (t) - M |)] \\ + \frac{A}{κ} E_{π_{X}^{*}} [\frac{1}{N T} \sum_{n = 1}^{N} \sum_{t = 1}^{T} I_{x_{n} (t) > X + k} f (x_{n} (t))] . \end{matrix}

(A19)

As

f (x)

is concave, it can be upper bounded by a linear function, i.e.,

f (x) \leq m x, \forall x > X + k

. Therefore, the second term in the above inequality can be further bounded as

\begin{matrix} lim_{T \to \infty} E_{π_{X}^{*}} [\frac{1}{N T} \sum_{n = 1}^{N} \sum_{t = 1}^{T} I_{x_{n} (t) > X + k} f (x_{n} (t))] = & \frac{1}{N} \sum_{n = 1}^{N} \sum_{x = X + k + 1}^{\infty} \sum_{q = 1}^{Q} μ_{x, q}^{n} f (x) \\ \leq & \sum_{x = X + k + 1}^{\infty} m ϵ ρ^{x - (X + k)} x \\ = & m ϵ [\frac{ρ}{1 - ρ} (X + k + 1) + \frac{ρ^{2}}{{(1 - ρ)}^{2}}] . \end{matrix}

By choosing

ϵ = N^{- 1}

and

k = O (ln N)

, the second term in Equation (A19) converges to 0 as N becomes large.

For the first term in Equation (A19), according to the work in [27], the expectation of

| | S (t) | - \bar{S} |

has the following property,

lim_{T \to \infty} E_{π_{X}^{*}} [\frac{1}{T} \sum_{t = 1}^{T} | | S (t) | - \bar{S} (t) |] = O (\sqrt{N}) .

In addition, as policy

π_{X}^{*}

satisfies the relaxed bandwidth constraint, we have

lim_{T \to \infty} E_{π_{X}^{*}} [\frac{1}{T} \sum_{t = 1}^{T} | \bar{S} (t) - M |] = 0 .

Therefore, when

T \to \infty

,

J ({\hat{π}}_{X}) - J (π_{X}^{*})

can be upper bounded by

\begin{matrix} O (\frac{X + ln N}{\sqrt{N}}) + O (\frac{X + 1 + ln N}{N}) . \end{matrix}

As the threshold X does not increase with N,

J ({\hat{π}}_{X}) - J (π_{X}^{*})

converges to 0 as N becomes infinite. Thus, the asymptotic performance of the truncated policy has been proven.

References

Kaul, S.; Gruteser, M.; Rai, V.; Kenney, J. Minimizing age of information in vehicular networks. In Proceedings of the 2011 8th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks, Salt Lake City, UT, USA, 27–30 June 2011; pp. 350–358. [Google Scholar]
Zhou, B.; Saad, W. Optimal Sampling and Updating for Minimizing Age of Information in the Internet of Things. In Proceedings of the 2018 IEEE Global Communications Conference (GLOBECOM), Abu Dhabi, UAE, 9–13 December 2018; pp. 1–6. [Google Scholar]
Kaul, S.; Yates, R.; Gruteser, M. Real-time status: How often should one update? In Proceedings of the 2012 Proceedings IEEE INFOCOM, Orlando, FL, USA, 25–30 March 2012; pp. 2731–2735. [Google Scholar]
Chen, K.; Huang, L. Age-of-information in the presence of error. In Proceedings of the 2016 IEEE International Symposium on Information Theory (ISIT), Barcelona, Spain, 10–15 July 2016; pp. 2579–2583. [Google Scholar]
Costa, M.; Codreanu, M.; Ephremides, A. On the Age of Information in Status Update Systems with Packet Management. IEEE Trans. Inf. Theory 2016, 62, 1897–1910. [Google Scholar] [CrossRef]
Najm, E.; Yates, R.; Soljanin, E. Status updates through M/G/1/1 queues with HARQ. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 131–135. [Google Scholar]
Kosta, A.; Pappas, N.; Ephremides, A.; Angelakis, V. Age and value of information: Non-linear age case. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 326–330. [Google Scholar]
Wang, X.; Duan, L. Dynamic Pricing for Controlling Age of Information. In Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, 7–12 July 2019; pp. 962–966. [Google Scholar]
Sun, Y.; Uysal-Biyikoglu, E.; Yates, R.; Koksal, C.E.; Shroff, N.B. Update or wait: How to keep your data fresh. IEEE Trans. Inf. Theory 2017, 63, 7492–7508. [Google Scholar] [CrossRef]
Ceran, E.T.; Gündüz, D.; György, A. Average Age of Information With Hybrid ARQ Under a Resource Constraint. IEEE Trans. Wirel. Commun. 2019, 18, 1900–1913. [Google Scholar] [CrossRef] [Green Version]
Bacinoglu, B.T.; Ceran, E.T.; Uysal-Biyikoglu, E. Age of information under energy replenishment constraints. In Proceedings of the 2015 Information Theory and Applications Workshop (ITA), San Diego, CA, USA, 1–6 February 2015; pp. 25–31. [Google Scholar]
Kadota, I.; Uysal-Biyikoglu, E.; Singh, R.; Modiano, E. Minimizing the Age of Information in broadcast wireless networks. In Proceedings of the 2016 54th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 27–30 September 2016; pp. 844–851. [Google Scholar]
Maatouk, A.; Kriouile, S.; Assaad, M.; Ephremides, A. On the Optimality of the Whittle’s Index Policy for Minimizing the Age of Information. arXiv 2020, arXiv:2001.03096. [Google Scholar] [CrossRef]
Kadota, I.; Sinha, A.; Modiano, E. Scheduling Algorithms for Optimizing Age of Information in Wireless Networks with Throughput Constraints. IEEE/ACM Trans. Netw. 2019, 27, 1359–1372. [Google Scholar] [CrossRef]
Kadota, I.; Sinha, A.; Uysal-Biyikoglu, E.; Singh, R.; Modiano, E. Scheduling Policies for Minimizing Age of Information in Broadcast Wireless Networks. IEEE/ACM Trans. Netw. 2018, 26, 2637–2650. [Google Scholar] [CrossRef] [Green Version]
Bedewy, A.M.; Sun, Y.; Shroff, N.B. Age-optimal information updates in multihop networks. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 576–580. [Google Scholar]
Chen, X.; Gatsis, K.; Hassani, H.; Bidokhti, S.S. Age of Information in Random Access Channels. In Proceedings of the 2020 IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 21–26 June 2020. [Google Scholar]
Hsu, Y.; Modiano, E.; Duan, L. Age of information: Design and analysis of optimal scheduling algorithms. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 561–565. [Google Scholar]
Talak, R.; Kadota, I.; Karaman, S.; Modiano, E. Scheduling Policies for Age Minimization in Wireless Networks with Unknown Channel State. In Proceedings of the 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA, 17–22 June 2018; pp. 2564–2568. [Google Scholar]
Sun, Y.; Polyanskiy, Y.; Uysal, E. Sampling of the Wiener Process for Remote Estimation over a Channel with Random Delay. IEEE Trans. Inf. Theory 2019, 66, 1118–1135. [Google Scholar] [CrossRef]
Wang, J.; Ren, X.; Mo, Y.; Shi, L. Whittle Index Policy for Dynamic Multichannel Allocation in Remote State Estimation. IEEE Trans. Autom. Control 2020, 65, 591–603. [Google Scholar] [CrossRef]
Tang, H.; Wang, J.; Song, L.; Song, J. Minimizing Age of Information With Power Constraints: Multi-User Opportunistic Scheduling in Multi-State Time-Varying Channels. IEEE J. Sel. Areas Commun. 2020, 38, 854–868. [Google Scholar] [CrossRef] [Green Version]
Wu, S.; Han, D.; Cheng, P.; Shi, L. Optimal Scheduling of Multiple Sensors over Lossy and Bandwidth Limited Channels. arXiv 2018, arXiv:1804.05618. [Google Scholar] [CrossRef] [Green Version]
Beutler, F.J.; Ross, K.W. Optimal policies for controlled Markov chains with a constraint. J. Math. Anal. Appl. 1985, 112, 236–252. [Google Scholar] [CrossRef] [Green Version]
Tajan, R.; Ciblat, P. Information-theoretic multi-user power adaptation in retransmission schemes. In Proceedings of the 2016 IEEE 17th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Edinburgh, UK, 3–6 July 2016; pp. 1–5. [Google Scholar]
Hernández-Lerma, O.; Lasserre, J.B. Discrete-Time Markov Control Processes: Basic Optimality Criteria; Springer Science & Business Media: Berlin, Germany, 2012; Volume 30. [Google Scholar]
Diaconis, P.; Zabell, S. Closed form summation for classical distributions: Variations on a theme of de Moivre. Stat. Sci. 1991, 6, 284–302. [Google Scholar] [CrossRef]

Figure 1. Illustration of the state transition graph for

Q = 2

channel states without (top) and with (bottom) AoI truncation with AoI threshold

X = 3

. The numbers in circles are channel state index q, and the number in rectangles are AoI index x.

Figure 1. Illustration of the state transition graph for

Q = 2

channel states without (top) and with (bottom) AoI truncation with AoI threshold

X = 3

. The numbers in circles are channel state index q, and the number in rectangles are AoI index x.

Figure 2. Average age penalty performance as a number of sensors N,

M = {5, 15}

.

Figure 2. Average age penalty performance as a number of sensors N,

M = {5, 15}

.

Figure 3. Average age penalty performance as a number of sensors N compared with AoI-minimum policy.

Figure 4. Asymptotic average age penalty performance with different age penalty function

f (x) = {ln x, \sqrt{x}}

.

κ = \frac{1}{8}

.

Figure 4. Asymptotic average age penalty performance with different age penalty function

f (x) = {ln x, \sqrt{x}}

.

κ = \frac{1}{8}

.

Figure 5. Asymptotic average age penalty performance with different packet loss probability.

ε = {[0.1, 0.3, 0.2, 0.4], [0.6, 0.55, 0.5, 0.45]}

.

κ = \frac{1}{8}

.

Figure 5. Asymptotic average age penalty performance with different packet loss probability.

ε = {[0.1, 0.3, 0.2, 0.4], [0.6, 0.55, 0.5, 0.45]}

.

κ = \frac{1}{8}

.

Figure 6. Each sensor’s average penalty with different power constraint factor

ρ_{n} = 0.2 n

.

Figure 6. Each sensor’s average penalty with different power constraint factor

ρ_{n} = 0.2 n

.

Figure 7. Each sensor’s average penalty with different packet loss probability

ε_{n, q} = 0.05 (n + q - 1)

.

Figure 7. Each sensor’s average penalty with different packet loss probability

ε_{n, q} = 0.05 (n + q - 1)

.

Figure 8. Subfigures (a–c) demonstrate the scheduling probability of sensors whose packet loss probability is as the title when

N = 8

and

M = 2

. Subfigures (d–f) demonstrate their scheduling probability in single-sensor system respectively. All the power constraint is

ρ_{n} = 0.6

.

Figure 8. Subfigures (a–c) demonstrate the scheduling probability of sensors whose packet loss probability is as the title when

N = 8

and

M = 2

. Subfigures (d–f) demonstrate their scheduling probability in single-sensor system respectively. All the power constraint is

ρ_{n} = 0.6

.

Figure 9. Threshold comparison with different age penalty function.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, Y.; Tang, H.; Wang, J.; Song, J. Optimizing Age Penalty in Time-Varying Networks with Markovian and Error-Prone Channel State. Entropy 2021, 23, 91. https://0-doi-org.brum.beds.ac.uk/10.3390/e23010091

AMA Style

Chen Y, Tang H, Wang J, Song J. Optimizing Age Penalty in Time-Varying Networks with Markovian and Error-Prone Channel State. Entropy. 2021; 23(1):91. https://0-doi-org.brum.beds.ac.uk/10.3390/e23010091

Chicago/Turabian Style

Chen, Yuchao, Haoyue Tang, Jintao Wang, and Jian Song. 2021. "Optimizing Age Penalty in Time-Varying Networks with Markovian and Error-Prone Channel State" Entropy 23, no. 1: 91. https://0-doi-org.brum.beds.ac.uk/10.3390/e23010091

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimizing Age Penalty in Time-Varying Networks with Markovian and Error-Prone Channel State

Abstract

1. Introduction

2. System Model

2.1. Network Model

2.2. Age of Information and Age Penalty

3. Problem Formulation and Decomposition

3.1. Problem Formulation

3.2. Problem Decomposition

4. Single-Sensor Problem Resolution

4.1. Constrained Markov Decision Process Formulation

4.2. Characterization of the Optimal Policy

4.3. Linear Programming Approximation

5. Multi-Sensor Problem Resolution

5.1. The Relaxed Problem Resolution

5.2. Truncation for the Hard Bandwidth Constraint

6. Simulation Results

6.1. Average Age Penalty Performance

6.2. Sensor Level Analysis and Threshold Structure

7. Conclusions

Author Contributions

Funding

Conflicts of Interest

Appendix A. Proof of Theorem 1

Appendix B. Proof of Lemma 1

Appendix C. Proof of Lemma A1

Appendix D. Derivation of Problem 4

Appendix E. Proof of Theorem 3

Appendix F. Proof of Theorem 4

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI