Distributed Offloading for Multi-UAV Swarms in MEC-Assisted 5G Heterogeneous Networks

Ma, Mingfang; Wang, Zhengming

doi:10.3390/drones7040226

Open AccessArticle

Distributed Offloading for Multi-UAV Swarms in MEC-Assisted 5G Heterogeneous Networks

by

Mingfang Ma

and

Zhengming Wang

^*

College of Science, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Drones 2023, 7(4), 226; https://0-doi-org.brum.beds.ac.uk/10.3390/drones7040226

Submission received: 22 February 2023 / Revised: 18 March 2023 / Accepted: 20 March 2023 / Published: 24 March 2023

(This article belongs to the Special Issue Multi-UAV Networks)

Download

Browse Figures

Versions Notes

Abstract

:

Mobile edge computing (MEC) is a novel paradigm that offers numerous possibilities for Internet of Things (IoT) applications. In typical use cases, unmanned aerial vehicles (UAVs) that can be applied to monitoring and logistics have received wide attention. However, subject to their own flexible maneuverability, limited computational capability, and battery energy, UAVs need to offload computation-intensive tasks to ensure the quality of service. In this paper, we solve this problem for UAV systems in a 5G heterogeneous network environment by proposing an innovative distributed framework that jointly considers transmission assessment and task offloading. Specifically, we devised a fuzzy logic-based offloading assessment mechanism at the UAV side, which can adaptively avoid risky wireless links based on the motion state of an UAV and performance transmission metrics. We introduce a multi-agent advantage actor–critic deep reinforcement learning (DRL) framework to enable the UAVs to optimize the system utility by learning the best policies from the environment. This requires decisions on computing modes as well as the choices of radio access technologies (RATs) and MEC servers in the case of offloading. The results validate the convergence and applicability of our scheme. Compared with the benchmarks, the proposed scheme is superior in many aspects, such as reducing task completion delay and energy consumption.

Keywords:

unmanned aerial vehicle; heterogeneous networks; computation offloading; fuzzy logic; deep reinforcement learning

1. Introduction

Unmanned aerial vehicles (UAVs) have become popular in recent years, thanks to their mobility, flexibility, and limited costs [1,2]. For instance, when UAV swarms are equipped with sensing devices, they can be candidates for rapid computing and communication in scenarios, such as reconnaissance, property surveillance, transportation, agriculture 4.0, etc. [3,4]. In the future, UAV swarms will play a more prominent role in enhancing existing services and enabling new ones [5]. However, novel services tend to be more computation-intensive, which is a significant challenge for UAVs with limited on-board computing power and battery capacity.

In order to alleviate the strain on device resources, multi-access edge computing (MEC) has received significant attention. By deploying powerful computing units at the edge of the network to serve devices, MEC can provide computation resources to UAV swarms in close proximity; as a result, the transmission delay between them as well as the energy consumed locally are greatly shortened [6]. In other words, by utilizing MEC, UAV swarms are able to offload task data to edge servers via wireless transmission to assist computing.

The transmission link and network node selections need to be seriously considered in the task-offloading process [7]. Previous studies [8,9] have mostly been offloaded via cellular networks, which can cause base station (BS) overload and network congestion in the face of large-scale UAV swarm connections. Fortunately, thanks to the recent popular 5G heterogeneous network architecture, which integrates different radio access technologies (RATs), UAVs can also offload tasks through access points (APs) of deployed Wi-Fi networks, thus reducing the pressure on a single cellular network and enhancing the exploits on available network resources [10]. As a result, different UAV tasks will face many choices when selecting the target network nodes to request services, and facilities in close proximity are not always the best choice. Notably, although the UAV link selection in a network-sparse environment is small and relatively fixed [11], a critical concern of this paper is how flying UAVs adaptively evaluate and select transmission links in a distributed manner to achieve flexible and stable offloading in hotspots with overlapped coverage of heterogeneous networks.

For task offloading, previous works have mainly focused on developing strategies under system certainty or used centralized approaches when faced with environmental dynamics. In most cases, they fall short of settling the multi-UAV offloading problem in unknown environments, particularly when multiple heterogeneous network nodes are deployed and UAVs fly arbitrarily. In addition, the heuristics or dynamic programming methods commonly used to achieve optimal task-offloading solutions may be time-consuming due to the large number of iterations required. As a result, these approaches may not be suitable for real-time offloading decision-making in dynamic environments. Accordingly, reinforcement learning (RL) has the potential to alleviate excessive computational demands, so as to enable learning for the agents. Previous online schemes based on RL have coped with system uncertainty to a certain extent, while offloading strategies are made centrally by the system or independently by each agent.

In this paper, we propose a distributed task-offloading scheme for multi-UAVs in MEC-assisted heterogeneous networks with the objective of maximizing the utilities of all UAVs for processing tasks through multi-UAV collaboration. To prevent UAVs from offloading via easily disconnected communication links and poorly performing service nodes, we propose an offloading assessment mechanism for UAV swarms based on fuzzy logic. In the framework, UAV velocity and transmission quality are jointly considered, and UAVs can make assessments locally and efficiently based on the perceived information. Subsequently, we designed an offloading algorithm by applying deep reinforcement learning (DRL), which adopts multi-agent advantage actor–critic (A2C) policy optimization to automatically and effectively work out the optimal solution, so as to reduce the task completion time and energy consumption of an UAV swarm in a MEC environment.

The contributions of this paper are summarized as follows:

We introduce a multi-agent task-offloading model in a heterogeneous network environment (which is different from the existing works that consider single-network scenarios or independent devices). Moreover, the optimization problem is formulated as a Markov decision process (MDP), which is beneficial for solving the sequential offloading decision-making for UAV swarm in dynamic environments.
To facilitate stable offloading of UAVs in any motion state, we devised a fuzzy logic-based offloading assessment mechanism. The mechanism is executed in a decentralized manner on the UAV with low complexity and can adaptively identify available offloading nodes that are prone to disconnection or have undesirable transmission quality.
Based on the multi-agent DRL framework, we propose a distributed offloading scheme named DOMUS. DOMUS effectively enables each UAV to learn the joint optimal policy, such as determining the computing mode and selecting the RATs and MEC servers in the offloading case.
We performed different numerical simulations to verify the rationality and efficiency of the DOMUS scheme. The evaluation results show that the DOMUS proposed is capable of rapidly converging to a stable reward, achieving the optimal offloading performance in energy consumption and delay by comparing with four other benchmarks.

The rest of the paper is structured as follows. Section 2 presents the related works on task offloading. Section 3 illustrates the system model, presents the mathematical presentation of the task computing model, formulates a utility model for the performance metrics of task computing, and defines the optimization problem. An offloading assessment mechanism based on fuzzy logic is devised in Section 4. In Section 5, we propose a distributed task-offloading algorithm by applying the multi-agent DRL framework. Finally, Section 6 demonstrates and compares the efficiency of the proposed scheme and Section 7 summarizes this paper. For ease of reference, the definitions of the key symbols are listed in Table 1.

2. Related Work

Effective offloading of computer-intensive application tasks for smart devices, especially UAVs, is becoming more critical. Accordingly, many studies related to task offloading are being proposed. In this section, we briefly outline the related work.

Some studies consider centralized controllers to realize offloading decisions. Li et al. [12] considered that the tasks performed in Maritime environments have strict delay requirements; they designed a genetic-based offloading algorithm for energy-starved UAVs, which optimizes energy consumption under the task delay constraint. Guo et al. [13] studied task offloading in a MEC system and attempted to minimize the system overhead by expressing the offloading as a mixed-integer non-linear programming problem, proposing a heuristic algorithm based on a greedy policy. Zhang et al. [14] integrated latency and energy consumption to obtain the offloading utility, which was combined with simulated annealing to make offloading strategies in MEC, so as to enhance the utility. These efforts [12,13,14] have global coordination but require UAVs to upload private information related to the tasks executed and real-time status to enable centralized offloading decision-making. This will significantly increase the burden on the centralized controller when the scale of the UAV swarm increases.

As network environments become larger and more complex, distributed frameworks are becoming more popular in some computation-offloading efforts [15,16]. Dai et al. [1] developed a vehicle-assisted offloading architecture for UAVs in smart cities, where vehicles and UAVs are matched according to preferences and the offloading process of data are modeled as part of a bargaining game to enhance the offloading efficiency and optimize the system utility. Zhou et al. [17] modeled the interaction in offloading as part of a Stackelberg game and maximized the utility of the system. To select a suitable service provider for the offloaded task of UAV, Gu et al. [18] devised an evolutionary game-based offloading approach to make a trade-off between latency, energy, and cost. However, these methods require multiple interactions and iterations of all participants to reach a satisfactory optimal solution, and they are not always suitable for making real-time decisions due to the fact that UAVs have good maneuverability, which can lead to rapid changes in environmental states.

Swarm intelligence is a popular approach used in multi-UAV systems and can enable global behavior to emerge from UAV clusters through operations such as interactions. As a result, swarm intelligence algorithms have received more attention in the implementation of UAV offloading. You et al. [19] introduced a computation-offloading scheme based on particle swarm optimization, which can offload tasks to low-latency MEC servers and balance the load on the servers. Li et al. [20] constructed an offloading model, which aims to minimize the delay of whole UAVs under the constraint of consumed energy; they applied the bat algorithm to solve the model. In [21], Asaamoning et al. researched computing offloading in a networked control system consisting of UAVs and discussed the application of swarm intelligence approaches, such as ant colony optimization and bee colony optimization. In addition, these swarm intelligence approaches can help determine the optimal positions of drone base stations, which can provide support for drones to act as base stations in the next generation of the Internet of Things [22,23].

Some studies on computation offloading in MEC tend to leverage reinforcement learning because of its strength in adapting to dynamic environments. Chen et al. [24] constructed task-offloading architecture based on deep deterministic policy gradients to optimize the offloading performance. Different from these schemes, refs. [24,25,26], our work devises the distributed decision-making mechanism by leveraging the multi-agent DRL framework, which can collaboratively deal with the optimization of offloading policies for multi-UAVs with heterogeneous tasks. Although there are distributed approaches for computation offloading decision-making that applies reinforcement learning, such as Q-TOMEC [27], TORA [28], and a distributed offloading technique based on deep Q-learning [29], these approaches utilize parallel deep neural networks instead of considering collaboration among agents.

This paper considers the popular 5G heterogeneous network architecture rather than a single network, as considered in many papers [8,9,30,31]. Correspondingly, it is important to evaluate and choose the appropriate offloading link among many heterogeneous networks for UAVs with good maneuverability. This is because an improper offloading selection may lead to frequent service interruptions, network hand-offs, and transmission link failures. However, existing offloading schemes are only concerned with task deadlines, energy consumed, or a balance between the two. In order to make the offloading scheme effective, we propose an offloading assessment mechanism that jointly considers the effects of transmission quality and UAV mobility to ensure efficient data transmission. Furthermore, we designed the mechanism to be fully decentralized on the UAV side, so that the mechanism has great scalability. To our knowledge, this is the first attempt to research the link evaluation during the task offloading of an UAV swarm.

3. System Model and Problem Definition

In this section, we first depict the system model (Section 3.1), and present the mathematical formulation of task computing models (Section 3.2). Then we formulate a utility function to evaluate the critical attributes that can affect the decision-making of UAVs (Section 3.3). Finally, according to the system model and task computing models, we define the optimization problem to be solved in this paper (Section 3.4).

3.1. System Model

The system in Figure 1 shows a flying UAV swarm that is defined by a group of agents

U = {1, \dots, U}

. In addition, the wireless network is supported by heterogeneous RATs, including Wi-Fi APs and cellular BSs; each is equipped with a MEC server that can provide computational capability for the energy-constrained UAV swarms to process their tasks. We let

M = {1, \dots, M}

indicate a set of geo-distributed MEC servers.

Furthermore, each UAV

u \in U

has a task to process at a certain time; we use a tuple

κ_{u} = {d_{u, κ}, c_{u, κ}}

to express the UAV task u; the data size of the task

κ_{u}

is indicated by

d_{u, κ}

, and the total computation resources required to complete

κ_{u}

are denoted by

c_{u, κ}

. Moreover, the application tasks place tight requirements on the quality of service (QoS) attributes, such as delay, BER, and PLR when executing tasks.

In view of the above, each UAV in the system can perform its task

κ_{u}

by computing locally, offloading to a MEC server through cellular BS or a Wi-Fi AP. Correspondingly, when each UAV u performs its task

κ_{u}

, two binary variables (

α_{u, κ}^{1}

and

α_{u, κ}^{2}

) are used to characterize the decisions made by the UAV u; we provide the following explanations for them.

α_{u, κ}^{1} = \{\begin{matrix} 0 & local computing \\ 1 & κ_{u} is offloaded \end{matrix}

(1)

α_{u, κ}^{2} = \{\begin{matrix} 0 & κ_{u} is offloaded to a BS server \\ 1 & κ_{u} is offloaded to a Wi - Fi server \end{matrix}

(2)

in which

α_{u, κ}^{1}

expresses the task

κ_{u}

computed locally or offloaded. In the second decision,

α_{u, κ}^{2} = 0

or 1 means task

κ_{u}

is offloaded to a MEC server equipped with a cellular BS or Wi-Fi AP, which occurs only when

α_{u, κ}^{1} = 1

. These divergent task computing modes will enable the UAVs to efficiently implement tasks and obtain great service performance.

3.2. Task Computing Models

Considering the limitations on the energy and computational capabilities of UAVs, it is important to study the hybrid task computing modes to support the UAVs in this paper.

(1): Local computing model

After the UAV u perceives the task data, it may process the tasks locally. The local execution time can be computed as

l_{u, κ}^{l o c} = \frac{c_{u, κ}}{λ_{u}}

(3)

where

λ_{u}

indicates the computational capability of the UAV u and

c_{u, κ}

denotes the needed CPU amount to complete the task

κ

.

Let

e_{u, κ}^{l o c}

denote the energy consumed on local computing, which is represented by

e_{u, κ}^{l o c} = ρ_{u}^{e} c_{u, κ}

(4)

where

ρ_{u}^{e}

means the local energy consumption coefficient per CPU cycle.

(2): MEC offloading model

In this model, we consider that there is more than one UAV that will offload tasks to the same MEC server in the same time period. In this case, if the UAV u performs the task

κ

by MEC, the achieved data transmission rates via the cellular and Wi-Fi networks are denoted by

ξ_{u, κ}^{c}

and

ξ_{u, κ}^{w}

, which are, respectively, presented in Equations (5) and (6) [32].

ξ_{u, κ}^{c} = B_{u}^{c} \cdot {log}_{2} (1 + \frac{P_{u, κ}^{c} G_{u, κ}^{c}}{{(σ_{u, κ}^{c})}^{2} + \sum_{u^{'} \neq u} P_{u^{'}, κ^{'}}^{c} G_{u^{'}, κ^{'}}^{c}})

(5)

ξ_{u, κ}^{w} = B_{u}^{w} \cdot {log}_{2} (1 + \frac{P_{u, κ}^{w} G_{u, κ}^{w}}{{(σ_{u, κ}^{w})}^{2} + \sum_{u^{'} \neq u} P_{u^{'}, κ^{'}}^{w} G_{u^{'}, κ^{'}}^{w}})

(6)

In Equation (5),

P_{u, κ}^{c}

means the transmission power of the UAV u for offloading the task to the MEC server m via cellular connectivity;

G_{u, κ}^{c} = d_{u, m}^{- ι}

is the channel gain because of the path loss effect and shadowing, where the path loss coefficient is denoted by

ι

, the distance between the UAV u and the server m is

d_{u, m}

,

d_{u, m} = \sqrt{d v_{u, m}^{2} + d h_{u, m}^{2}}

, where

d v_{u, m}

and

d h_{u, m}

, respectively, indicate the vertical and horizontal distances between the UAV u and the server m;

{(σ_{u, κ}^{c})}^{2}

denotes the noise power of the channel,

u^{'}

defines the other UAVs that access the server m to process its task

κ^{'}

, and

B_{u}^{c}

expresses the allocated bandwidth from the cellular network. Additionally, the variables in Equation (6) have the same meanings as those in Equation (5).

Then the transmission time

l_{u, κ, m}^{t r}

of the task data for the UAV u can be represented as

l_{u, κ, m}^{t r} = \{\begin{matrix} l_{u, κ}^{c} = \frac{d_{u, κ}}{ξ_{u, κ}^{c}} & α_{u, κ}^{1} = 1, α_{u, κ}^{2} = 0 \\ l_{u, κ}^{w} = \frac{d_{u, κ}}{ξ_{u, κ}^{w}} & α_{u, κ}^{1} = 1, α_{u, κ}^{2} = 1 \end{matrix}

(7)

Here,

l_{u, κ, m}^{t r}

is a general variable; it defines the transmission time

l_{u, κ}^{c}

and

l_{u, κ}^{w}

occurs in the data transmission through the cellular or Wi-Fi network, respectively.

Accordingly, if the transmission power of the UAV u is indicated by

P_{u}

, and

P_{u} \in {P_{u, κ}^{c}, P_{u, κ}^{w}}

, the energy

e_{u, κ}^{m e c}

consumed by the UAV u during data transmission can be written as

e_{u, κ}^{m e c} = P_{u} l_{u, κ, m}^{t r}

(8)

After the task data are transmitted to the server m, similar to the local computing model, the data processing time

l_{u, κ, m}^{e x e}

on the server m can be represented by

l_{u, κ, m}^{e x e} = \frac{c_{u, κ}}{λ_{m}}

(9)

in which

λ_{m}

denotes the computational capability of the server m. Thereby, the total time consumed by the UAV u during offloading is expressed as

l_{u, κ}^{m e c} = l_{u, κ, m}^{t r} + l_{u, κ, m}^{e x e}

(10)

3.3. Utility Model in Task Computing

According to the utility theory, we designed a utility function to effectively evaluate the consumed time and energy during the processing tasks. The function designed is formulated as

ϕ (z) = \{\begin{matrix} 1 & z = 0 \\ \frac{1}{1 + {(\frac{z}{z^{m i d}})}^{η_{z}}} & 0 < z \leq z^{m i d} \\ \frac{{(\frac{z^{m a x} - z}{z^{m i d}})}^{η_{z}}}{1 + {(\frac{z^{m a x} - z}{z^{m i d}})}^{η_{z}}} & z^{m i d} < z < z^{m a x} \\ 0 & z \geq z^{m a x} \end{matrix}

(11)

The value of

ϕ (z)

is mainly determined by the property of the designed function, such as twice differentiability, monotonicity, and convexity-concavity, as well as the tolerable upper bound

z^{m a x}

of the attribute z for an application task. Moreover,

z^{m i d} = \frac{z^{m a x}}{2}

,

η_{z} \geq 2

characterizes the sensitivity of the application task to a specific attribute and determines the steepness of the function.

To this end, when the UAV u adopts any task computing modes, the utility of the consumed time and energy can be measured by the above-designed utility function in Equation (11); we used

ϕ (l)

and

ϕ (e)

to express them, respectively. Then, on the basis of the multi-attribute utility principle, performing the tasks for the UAV u can be measured by the following utility

F_{u}

, which is the function of time and energy consumed when the UAV u chooses a certain task computing mode. Moreover,

F_{u}

is represented as follows:

F_{u} = w_{d, κ} ϕ (l) + w_{e, κ} ϕ (e)

(12)

in which

w_{d, κ}

and

w_{e, κ}

characterize the balance factors between the consumed time and energy; hence,

w_{d, κ} + w_{e, κ} = 1

.

3.4. Optimization Problem Formulation

Based on the system model constructed, our objective is to maximize the utility of UAVs by making optimal task computation decisions. Thus, the optimization problem under related constraints can be defined as follows:

\begin{matrix} P 1 : max_{A_{u}} \sum_{u = 1}^{U} F_{u} \\ s . t . \{\begin{matrix} C_{1} : α_{u, κ}^{1} \in {0, 1}, α_{u, κ}^{2} \in {0, 1}, \forall u \in U \\ C_{2} : C_{m}^{'} (t) \leq {\hat{C}}_{m}, \forall m \in M \\ C_{3} : e < {\hat{e}}_{u} \\ C_{4} : l < {\hat{l}}_{u, κ} \\ C_{5} : p < {\hat{p}}_{u, κ} \\ C_{6} : b < {\hat{b}}_{u, κ} \end{matrix} \end{matrix}

(13)

The defined optimization problem is constrained by the binary offloading decision, computation capacity of the MEC servers, battery energy of UAVs, and the QoS demands of the tasks performed. In Equation (13),

A_{u}

denotes the decision set of each UAV.

C_{1}

indicates the offloading decision constraint, and

C_{m}^{'} (t) = \sum c_{u, κ}

expresses the computation resources of the server m occupied by UAVs; therefore,

C_{2}

denotes whether server

m \in M

is selected to provide service. The computation resources used by UAVs cannot exceed the computation capacity of the server m at a certain time slot t.

C_{3}

indicates the battery energy constraint of an UAV

u \in U

;

C_{4}

denotes that the consumed time when executing task

κ_{u}

should be controlled within the allowable delay threshold

{\hat{l}}_{u, κ}

.

C_{5}

and

C_{6}

express whether the PLR and BER occurring in the data transmission should satisfy the tolerable upper bound values for a certain task

κ_{u}

if processed by the MEC.

The optimization problem is an integer-programming problem; the feasible decision number for task computation is

{(M + 1)}^{U}

, which is commonly non-convex and NP-hard. Conventional mathematical-based optimization approaches can work out the optimal solution for the proposed problem theoretically but are unable to realize it in a short time. The DRL approach is applicable for settling the decision-making problems with high-dimensional solution spaces effectively, especially for the increased number of offloaded tasks in future wireless networks. In view of the above, we will develop a multi-agent A2C-based DRL scheme, which can find feasible offloading actions in polynomial time.

4. Offloading Assessment Based on Fuzzy Logic

In this section, by applying the fuzzy logic theory, we propose an assessment mechanism to let the UAV swarm adaptively screen the available offloading nodes to reduce the complexity and speed up the training progress of the MODUS proposed.

Fuzzy logic is quite distinct from binary logic, which is capable of making a decision based on multi-valued logic. This characteristic enables the fuzzy logic system to handle the input variables that have uncertain and incomplete data. The fuzzy logic-based approach can effectively rapidly respond to the dynamicity of the changing environment and adaptively produce crisp values. The proposed offloading assessment algorithm based on fuzzy logic involves processing the inputs containing the PLR, BER, and velocity parameters, then the input variables are processed through fuzzification, fuzzy inference, and defuzzification; finally, the fuzzy logic system outputs a crisp value, i.e., offloading probability, which signifies the fitness of a specific MEC server for the UAV task.

In particular, the fuzzy logic-based offloading assessment algorithm is deployed on the UAV in a decentralized manner, and can regard all MEC servers perceived as the candidate-offloading targets to be assessed. Algorithm 1 depicts the procedures of the proposed offloading assessment scheme.

Algorithm 1 Fuzzy logic-based offloading assessment.

Input: Set of candidate-offloading nodes

M

.

Output: Available offloading node set

\tilde{M}

for the UAV u.

1: while Obtaining sensor data in a time period do

2: for

m = 1 : M

do

3: Velocity, PLR, BER, ⟵

M [m]

.velocity,.PLR,.BER according to Equations (14) and (15);

4: BER, PLR ⟵

N o r m a l

(BER),

N o r m a l

(PLR);

5:

χ_{m}

⟵ Fuzzy Logic (velocity, PLR, BER);

6: if

χ_{m} < {\hat{χ}}_{m}

then

7:

\tilde{M}

⟵m;

8: end if

9: end for

10: end while

In step 3 of Algorithm 1, each UAV senses nearby offloading nodes, observes the flying velocity, and perceives the data of the PLR and BER with respect to candidate-offloading targets by assuming that the UAV task u will be offloaded to the node.

The PLR occurring in data transmission can be commonly evaluated by

p_{u, κ}^{m} = ς \cdot d_{u, κ} \cdot exp (- ϑ \cdot \frac{P_{r s s}}{σ^{2} \cdot d_{u, m}^{2}})

(14)

in which

P_{r s s}

indicates the received signal strength and

σ^{2} \in {{(σ_{u, κ}^{c})}^{2}, {(σ_{u, κ}^{w})}^{2}}

denotes the noise power,

ς

and

ϑ

are two tunable parameters; both of them meet the condition that

0 < ς < 1

,

0 < ϑ < 1

.

The BER for a binary phase-shift keying modulation in an additive Gaussian white noise environment is expressed as follows

b_{u, κ}^{m} = \frac{1}{2} e r f c (\sqrt{\frac{P_{r s s}}{σ^{2}}})

(15)

where the

e r f c (.)

is a Gauss complementary error function, it can be written as

e r f c (x) = \frac{2}{\sqrt{π}} \int_{x}^{\infty} e^{- μ^{2}} d μ

(16)

In particular, in step 4 of the algorithm, the obtained PLR and BER are normalized by

\frac{p_{u, κ}^{m}}{{\hat{p}}_{u, κ}}

and

\frac{b_{u, κ}^{m}}{{\hat{b}}_{u, κ}}

so as to eliminate the unit difference before inputting it into the fuzzy logic system. In step 5, the designed fuzzy logic system for offloading the assessment maps the sensed data, including velocity, PLR, and BER into fuzzy sets according to the membership functions (MFs) for each one; this process is called fuzzification. Afterward, the fuzzy inference procedure infers the fuzzified inputs and produces fuzzy output based on multiple IF-AND-THEN rules, which are designed by following empirically fuzzy rule sets. Furthermore, based on the triggered fuzzy rule, the fuzzy logic system proceeds to the defuzzification stage, which calculates and outputs a scalar value

χ_{m}

for the node

m \in M

by applying the centroid defuzzifier method [33]. Moreover,

χ_{m} \in [0, 1]

can characterize the fitness of the offloading node for the task

κ_{u}

of the UAV u; the higher the

χ_{m}

, the better the fitness. Finally, in steps 6 and 7, the obtained

χ_{m}

is compared with its permitted upper bound

{\hat{χ}}_{m}

; if the condition is satisfied, the node

m \in M

will be selected by the UAV u as the available offloading target, and be included in

\tilde{M}

.

5. Multi-Agent A2C-Based Decentralized Task Offloading

The optimization problem formulated in Equation (13) is a sequential decision-making problem in the dynamic environment. In this section, the problem of multi-UAV offloading is a time-varying multi-agent MDP; we propose a decentralized task-offloading scheme (DOMUS). We consider that the environmental dynamics of wireless networks are always unknown; thus, the proposed algorithm applies a model-free DRL framework based on the multi-agent A2C to enable each UAV agent to learn the optimal computing policy task via training in polynomial time.

5.1. Multi-Agent MDP Model in the A2C Framework

In the A2C framework, the optimization problem in Equation (13) for the task implementation of multi-UAVs can be defined as a multi-agent MDP

〈 U, S, {A_{u}}_{u \in U}, P, {R_{u}}_{u \in U} 〉

, which will be interpreted in detail as follows.

(1) State space

S

. In a time slot t, each UAV agent observes the system state

s^{t} \in S

, which involves the location of the UAV and relevant information of tasks and situations of the MEC environment; thus, the state

s^{t}

is constituted by a group of parameter metrics. (1)

l o c (x, y, h)

: three-dimensional location of the UAV agent u; (2)

(d_{u, κ}, c_{u, κ})

: data size and required computation resources for the task

κ

; (3)

{\hat{l}}_{u, κ}

: maximum tolerable delay for the

κ_{u}

task; (4)

S R = {I_{u, 1}, \dots, I_{u, M}}

: the signal-to-noise ratio vector between the UAV u and its available offloading nodes in

\tilde{M}

; (5)

D i s t = {d_{u, 1}, \dots, d_{u, M}}

: the distance vector between the UAV u and its available offloading nodes in

\tilde{M}

.

(2) Action space

A_{u}

. The action taken in the time slot t for each UAV u is to decide whether the task should be performed locally or offloaded to a MEC server, and if offloaded, which server will be selected. Thus, according to the definitions of the task computing decisions in Equations (1) and (2) in the system model, the action set for each UAV can be represented as

A_{u} = {a_{u}^{1}, a_{u}^{2}, a_{u}^{3}}

, in which

a_{u}^{1}

indicates

α_{u, κ}^{1} = 0

,

a_{u}^{2}

denotes

α_{u, κ}^{1} = 1

and

α_{u, κ}^{2} = 0

, and

a_{u}^{3}

expresses

α_{u, κ}^{1} = 1

and

α_{u, κ}^{2} = 1

.

(3) Reward function

R_{u}

. At state

s^{t}

, each UAV chooses an action and receives an instant reward

r_{u}^{t}

from the environment. It is known that the purpose of each agent is to maximize its utility through improving the policy of task computing. For this reason, we define the reward

r_{u}^{t}

as the performance improvement between two utility values obtained by the UAV within two consecutive time slots; the

r_{u}^{t}

is written as

r_{u}^{t} = \{\begin{matrix} ε_{1} & F_{u}^{t} - F_{u}^{t - 1} > β \\ ε_{2} & F_{u}^{t} - F_{u}^{t - 1} < - β \\ 0 & otherwise, \end{matrix}

(17)

where

F_{u}^{t}

refers to the utility of the UAV u for processing tasks at time slot t,

ε_{1} > 0

, and

ε_{2} < 0

; both

ε_{1}

and

ε_{2}

denote the obtained instant rewards under two different situations, i.e.,

F_{u}^{t} - F_{u}^{t - 1} > β

and

F_{u}^{t} - F_{u}^{t - 1} < - β

, respectively. Moreover,

β > 0

means the sensitivity to utility changes of the UAV in MDP. Therefore, the reward

r_{u}^{t}

can effectively characterize the change directions of two utilities corresponding to two consecutive time slots. Furthermore, the reward function

R_{u}

can be presented as

R_{u} (s, a) = E [r_{u}^{t + 1} | s_{t}, a_{t}]

, which is the expected value of the instant reward.

Therefore, in the multi-agent MDP model, at the current time slot t, if the state is

s^{t} \in S

and the joint actions of agents in the system can be denoted as

a^{t} = {a_{1}, \dots, a_{U}} \in A

, each agent

u \in U

can obtain a reward

r_{u}^{t + 1}

. Then the state will be transformed into a new state

s^{t + 1} \in S

according to the transition probability

P (s^{t + 1} | s^{t}, a^{t})

. Additionally, the policy of agent u is denoted as the probability that the agent selects the action at a given state, which can be expressed as

π_{u} (s, a_{u})

. Then the joint policy of all agents can be formulated as

π (s, a) = Π_{u = 1}^{U} π_{u} (s, a_{u})

, and the

π (s, a)

is written as

π

for simplicity.

5.2. Multi-Agent A2C Framework

As deep neural networks (DNNs) can offer accurate regression, A2C applies DNNs to the actor and the critic networks to approximate the policy and value function. The actor is a policy function

π_{u} (a_{u} | s; θ_{u})

, which allows agent u to yield a policy and select an action

a_{u}

under state s, where

θ_{u}

is a parameter of the DNN. We pack

θ_{u}

in a set

θ = θ_{1}, \dots, θ_{u}, \dots, θ_{U}

. The critic is a state value function

V (s)

used to evaluate the state. Additionally, the advantage term expresses that there is a function

ζ (s, a) = Q (s, a) - V (s)

to indicate the advantage of the selected action under a given state, where

Q (s, a)

represents the action-value function.

The learning objective of the agent in A2C is to find a policy

π

that can maximize the expected long-term system reward

J (π)

over all possible trajectories. Accordingly, our optimization objective is to learn the optimal joint task computing policy

π^{θ} = π (a | s; θ) = Π_{u = 1}^{U} π_{u} (a_{u} | s; θ_{u})

so as to maximize the globally averaged expected reward

J (π)

for agents, which are represented as

π^{θ} = A r g max_{π} J (π)

(18)

in which

J (π)

is given as follows:

\begin{matrix} J (π) & = lim_{T} \frac{1}{T} E [\sum_{t = 1}^{T} \frac{1}{U} \sum_{u \in U} r_{u}^{t + 1}] \\ = \sum_{s \in S} η_{π} (s) \sum_{a \in A} π (s, a) \frac{1}{U} \sum R_{u} (s, a) \end{matrix}

(19)

where

η_{π} (s) = {lim}_{t \to \infty} \Pr (s_{t} = s | π)

denotes the stationary probability distribution in the Markov chain when the policy

π

is given.

Furthermore, our optimization problem aims to work out

max J (θ) = \sum_{s \in S} η_{θ} (s) \sum_{a \in A} π (a | s; θ) \frac{1}{U} \sum R_{u} (s, a)

(20)

where

θ

will be learned by the policy gradient method [34]. Moreover, based on the objective function, the gradients for

θ

can be calculated as

\nabla_{θ_{u}} J (θ) = E [\nabla_{θ_{u}} log π_{u}^{θ_{u}} ζ_{u}^{π^{θ}} (s, a)]

(21)

in which

ζ_{u}^{π^{θ}} (s, a)

is the advantage function, represented by

ζ_{u}^{π^{θ}} (s, a) = Q^{π^{θ}} (s, a) - V_{u}^{π^{θ}} (s, a_{- u})

(22)

where we use

a_{- u}

to present the actions adopted by other agents, except for agent u;

Q^{π^{θ}}

indicates the action value function under the policy

π^{θ}

for a given state–action pair

(s, a)

, while

V_{u}^{π^{θ}}

is the state value function. They are given as follows:

Q^{π^{θ}} (s, a) = \sum_{t} E [\frac{1}{U} \sum_{u \in U} r_{u}^{t + 1} - J (θ) | s^{0} = s, a^{0} = a, π^{θ})]

(23)

V_{u}^{π^{θ}} (s, a_{- u}) = \sum_{a_{u} \in A_{u}} π_{u} (a_{u} | s; θ_{u}) Q^{π^{θ}} (s, a_{u}, a_{- u})]

(24)

A2C takes the temporal difference (TD) error as an unbiased estimation to evaluate the advantage function, which reduces the complexity of the parameter update and improves the stability of the algorithm. In this case, the advantage is approximated as

ζ (s_{t}, a_{t}) \approx \frac{1}{U} r^{t + 1} + γ V (s^{t + 1} | s^{t}, a^{t}) - V (s^{t}) = δ (s^{t})

(25)

in which

γ

indicates the discounted factor.

The critic network estimates

Q (s, a)

with

Q^{π^{θ}} (s, a)

and generates a TD error to express whether the action taken by the agent is good or not, as well as updates the DNN parameter

θ^{c}

with the gradient descent method. Additionally, each UAV u can share estimations from the critic network with other UAVs nearby to effectively evaluate the actions. Then the output of the critic network is further used to update the parameter

θ^{a}

for the actor network of the UAV agent u, which aims at improving the probabilities of actions that perform relatively well. In particular, the update to

θ^{a}

and

θ^{c}

can be presented as

θ^{a} \leftarrow θ^{a} + \frac{\partial log π (a^{t} | s^{t}; θ^{a})}{\partial θ^{a}} δ^{t} (s^{t}; θ^{c})

(26)

θ^{c} \leftarrow θ^{c} + δ^{t} (s^{t}; θ^{c}) \frac{\partial V (s^{t}; θ^{c})}{\partial θ^{c}}

(27)

5.3. A2C-Based Decentralized Offloading Algorithm

In this section, based on the A2C model, we propose a distributed offloading algorithm for multi-UAVs; its implementation is summarized in Algorithm 2.

Algorithm 2 A2C-based decentralized offloading algorithm.

Input: UAV swarm

U

, MEC server

M

, the learning rates

l r^{a}

,

l r^{c}

of the actor and critic network, the maximum episodes

E p_{m a x}

, the step size of one episode

E p_{i}

, the update interval

Δ t

, and the discount factor

γ

;

Output:

π^{*} = {π_{u}^{*}, u \in U}

for all UAVs.

1: for UAV

u = 1 : U

do

2: Initialize the parameters

θ_{u}^{a}

and

θ_{u}^{c}

with respect to the actor and critic network;

3: end for

4: for Episode

i = 1 : E p_{m a x}

do

5: Reset the state:

l o c (x, y, h)

,

(d_{u, κ}, c_{u, κ})

,

{\hat{l}}_{u, κ}

,

S R = {I_{u, 1}, \dots, I_{u, M}}

and

D i s t = {d_{u, 1}, \dots, d_{u, M}}

;

6: for UAV

u = 1 : U

do

7: Execute Algorithm 1 to obtain

\tilde{M}

;

8: Obtain the state

s^{0}

;

9: end for

10: for Step

t = 1 : E p_{i}

do

11: for UAV

u = 1 : U

do

12: Takes action

a_{u}^{t}

by actor

π_{u} (a_{u}^{t} | s^{t}; θ_{u}^{a})

;

13: end for

14: Perform computation offloading according to the joint actions

a^{t} = {a_{1}, \dots, a_{U}}

;

15: Obtain the current reward

r^{t} = {r_{1}, \dots, r_{U}}

and calculate the new state

s^{t + 1}

;

16: if

m o d (Δ t, t) = = 0

then

17: Update

θ^{c}

for the critic networks based on Equation (26);

18: Compute

θ^{a}

for the actor networks using Equation (27);

19: end if

20: end for

21: end for

At the initial stage, we give the related parameters including the set of UAVs and MEC servers, i.e.,

U

,

M

, the learning rate

l r^{a}

,

l r^{c}

of the actor and critic network, maximum number of training episodes

E p_{m a x}

and the step size

E p_{i}

of one episode, update interval

Δ t

, as well as the discount factor

γ

. For each UAV

u \in U

, we initialize the actor parameter

θ_{u}^{a}

and critic parameter

θ_{u}^{c}

. Afterward, at the start of each episode in the training stage, the system state is randomly initialized, including the locations of UAVs and the relevant information of tasks and situations of the MEC environment; each UAV will execute Algorithm 1 to obtain the available offloading node set

\tilde{M}

, then the initial state

s^{0}

is obtained (from steps 5 to 8).

Without loss of generality, one training episode is divided into

E p_{i}

time slots. At time slot t, each UAV adopts an action according to the policy

π_{u} (a_{u}^{t} | s^{t}; θ_{u}^{a})

in the actor, then performs computation offloading according to the adopted action

a_{u} \in a_{t}

, and obtains the instant reward

r^{u} \in r^{t}

; next, the state is updated to

s^{t + 1}

(from steps 11 to 15). Finally, once every

Δ t

, the algorithm updates the parameters of the actor and critic network by only sampling the

(s^{t + 1}, a^{t}, s^{t})

(from steps 16 to 19). In order to enable the average reward to converge to a stable value and learn the optimal policy, the iterative training will last for

E p_{m a x}

episodes. After convergence, the algorithm only needs to save the actor network to make offloading decisions for UAVs.

The computational complexity of Algorithm 1 is to explore the available offloading nodes by each UAV, and the complexity of the designed fuzzy logic module is a constant; thus, the complexity for Algorithm 1 is

O (M)

in the worst case. In Algorithm 2, at the training stage, each UAV agent evaluates the Q-value with the critic network by inputting the joint actions of UAVs and the environment state; thus, the input and output sizes of the critic network in UAV are

U | S |

and 1, respectively. Moreover, each UAV makes an action by mapping the current state to the actor network; thereby, the input and output sizes of the actor network in UAV are

| S |

and 1, respectively. After training is finished, the action for each UAV can be obtained from its actor network only with the

| S |

input size and 1 output size. The computational complexity is proportional to the input and output sizes; thus, the overall complexity of our DOMUS proposed is

O (M + U | S |)

.

6. Performance Evaluation

In this section, we perform a series of numerical simulations and evaluate the proposed task-offloading scheme for the UAV swarm in MEC-assisted heterogeneous networks.

6.1. Parameter Settings

We consider a MEC-assisted heterogeneous network scenario where MEC servers are randomly deployed in the

1000 \times 1000

m area, UAVs are randomly distributed, and each UAV has a task to be processed. Concerning the communication parameters, the maximum communication ranges for cellular and Wi-Fi networks are 400 and 200 m, respectively [35,36]. The bandwidths of cellular and Wi-Fi networks are

B^{c} = 4

MHz,

B^{w} = 5

MHz, the transmission power

P_{u}

of UAV is set at 10 W [37], and the Gaussian noise power

{(σ_{u, κ}^{c})}^{2}

and

{(σ_{u, κ}^{w})}^{2}

are set as the same value, i.e.,

- 100

dBm. The path loss follows a distance-dependent model with a path loss coefficient of

ι \geq 1

[38,39]. Additionally, the computational capability of MEC servers and UAVs are characterized by uniformly distributed variables

λ_{m}

and

λ_{u}

, which are uniformly distributed in

[5, 8]

and

[0.7, 1]

Gcycles/s [40], respectively. The computational capacity

{\hat{C}}_{m}

of the server is uniformly distributed in

[9, 11]

Gcycles. The energy consumed per CPU cycle denoted by

ρ_{u}^{e}

is

5 \times 10^{- 10}

J/cycle. For the computation tasks to be completed by UAVs, the data size

d_{u, κ}

and the needed computation resources

c_{u, κ}

are uniformly distributed in

[4, 5]

MB and

[1.6, 2]

Gcycles. The weighting factors

w_{d, κ}

and

w_{e, κ}

for delay and energy are commonly set to be the same, i.e., 0.5. Furthermore, for the learning parameters, we set the learning rate to

l r^{a} = 0.001

and

l r^{c} = 0.004

, and the discount factor

γ

to be equal to

0.99

. Finally, we summarize the above key parameters in Table 2.

6.2. Fitness Demonstration of Offloading Targets

As shown in Figure 2, by executing Algorithm 1, we depict the relationship between the fitness of servers, the velocity of UAVs, and the PLR and BER that occur during task offloading. The fitness is characterized by the offloading probability in Algorithm 1. It can be observed that the offloading probability is negatively correlated with the above mentioned indicators, i.e., velocity, PLR, and BER. More specifically, Figure 2a,b shows that the offloading probability experiences a rapid decline as the three indicators increase, which validates the effectiveness and validity of the devised fuzzy logic-based offloading assessment mechanism in adaptively evaluating offloading targets at the UAV side.

6.3. Convergence Performance

In order to demonstrate the convergence performance of the proposed DOMUS scheme, we use the same parameter settings listed in Table 2 and plot the variation of the average rewards for four and six UAVs in Figure 3. As shown in Figure 3, the reward curves for different numbers of agents can rapidly converge and fluctuate within a small range. This phenomenon can be attributed to the multi-agent A2C-based distributed offloading mechanism, in which the critic networks assess and guide the actor networks to output better offloading policies for multi-UAVs at each learning episode.

6.4. Impact of Weighting Factors

Figure 4a and b, respectfully, demonstrate the impact of energy consumption and delay weighting factors on two performance metrics when UAVs perform tasks using the proposed DOMUS scheme. Moreover, the number of UAVs is fixed at 6, and the average data size of generated tasks is varied from 4 to 16 MB. In Figure 4a, the curves of energy consumption increase with the average data size of tasks. However, a higher energy consumption weighting factor results in lower energy consumption needed to complete the tasks under the same data size.

From Figure 4b, we can see that the delay incurred by performing tasks shows a linearly increasing trend as the delay weighting factor increases. However, as the delay weighting factor grows larger, less delay is required to complete the UAV tasks under the same average data size. Therefore, the comparisons depicted in Figure 4a,b are consistent with the theoretical data that both energy consumption and delay show noticeable differences under different weighting factors when processing UAV tasks.

6.5. Performance Comparison

In the following section, we compare the proposed DOMUS with four task-offloading schemes under different parameter settings. The comparative algorithms considered are: (1) Greedy-based sequential tuning computation-offloading scheme (STCO) [41]; (2) Weight improvement-based particle swarm optimization offloading algorithm (IWPSO) [42]; (3) Distance-dependent offloading scheme (DDO); (4) Smart ant colony optimization task-offloading algorithm (SACO) [31].

(1) Impact of the number of UAVs. First, we set the weighting factors as

w_{d, κ} = w_{e, κ} = 0.5

and evaluated the performance of the proposed DOMUS under different numbers of UAVs. Figure 5a shows a comparison of the overall energy consumption of all algorithms as the number of UAVs increases. As shown, the DOMUS achieves lower energy consumption compared to the STCO and SACO schemes. By relying on the multi-agent DRL model, the DOMUS can learn the distribution of computation tasks and enable multi-UAVs to almost always select proper offloading targets as the number of UAVs increases. The STCO ignores future offloading decisions, resulting in decisions that are suboptimal from a long-term perspective. The SACO algorithm may become trapped in local optimization due to the feedback of pheromones in suboptimal solutions obtained in early iterations. In Figure 5b, we plot the overall delay under different UAV numbers, which shows the same trend with Figure 5a; the delay curves of other offloading schemes increase significantly compared with DOMUS. Because the DOMUS can select actions without a non-minimum delay for the current task of the UAV, it optimizes long-term performance. To prove this, as shown in Figure 5c, we also recorded the average utility of UAVs as the number of UAVs increased. Combined with Figure 5a,b, our proposed DOMUS can achieve lower energy consumption and lower delay compared to the STCO offloading approaches, with minimum improvements of 8.29% and 7.75%, respectively. Accordingly, the average utility achieved by DOMUS is the highest among the different algorithms, with an improvement of up to 12.82%.

(2) Impact of data size. We set the number of UAVs to six and then investigated the energy consumption, delay, and average utility required to complete tasks for UAVs with different average task data sizes.

Figure 6 shows the impact of data transmission on energy consumption. Transmitting larger amounts of data requires more communication resources, leading to increased communication delays. In this case, as the data size increases, UAVs will consume more energy for data transmission. Moreover, SACO’s energy consumption performance deteriorates as the data size increases, primarily due to the gradually increasing tabu lists in the SACO algorithm that restrict the UAV selection. Our proposed DOMUS can explore policy learning with great effectiveness, resulting in reduced energy consumption. In general, DOMUS reduces energy consumption by up to 13.13% compared to other schemes.

Figure 7 compares the impact of average data size on task completion delay for UAVs. As the data size increases, the delay to complete tasks also increases; when the data size of the task is big, computing the task locally is necessary, which can reduce the transmission delay but correspondingly increase the execution delay. Since the DNN can offer accurate regression, in the proposed DOMUS, DNN is used both in the actor and the critic to approximate the offloading policy and value function interactively, which enables each UAV agent to select appropriate task-processing strategies. Nonetheless, when adopting the STCO scheme, optimal task-processing strategies cannot be extensively derived. This is because the STCO may not effectively take into account the optimization of the subsequent execution of tasks. In summary, the DOMUS optimization results in up to a 6.77% delay compared to other schemes.

Figure 8 shows the average utility of UAVs for task processing with varying data sizes. The figure shows that our DOMUS mechanism outperforms other schemes in terms of utility, especially for large data sizes. This is because our proposed DOMUS maximizes the utility of task processing for each UAV agent by learning the offloading policy over long training episodes. STCO and SACO obtain similar utility, while the IWPSO method only optimizes the utility from a single UAV perspective, resulting in poor performance, and the DDO method presents the worst utility. Finally, the average utility of UAVs is improved by at least 11.39% compared to the comparative schemes.

(3) Joint impact of computational capability and network bandwidth. We conducted a performance comparison between our proposed DOMUS and other benchmarks by increasing the computational capability of MEC servers and network bandwidth. The computational capability was changed from 2 to 5 Gcycles/s and the bandwidth (indicated by B) increased from 1 to 4 MHz.

Figure 9 shows the joint impact of computational capability and bandwidth on the transmission energy consumption required to complete the UAV tasks. As the computational capability and bandwidth increase, energy consumption decreases. This is because most tasks are offloaded rather than processed locally, so as to reduce the task completion time. The energy consumption in local computing is reduced for UAVs and the transmission energy also reduces with the increased bandwidth. Therefore, whether the computation resources and bandwidth resources are abundant will greatly affect the selection of the computing mode and optimal offloading node. In addition, in our proposed DOMUS scheme, the energy consumption curve rises moderately, and the energy consumption performance for UAVs is significantly better than the DDO method, with an improvement of at least 15.69% over other offloading methods.

Figure 10 shows the task completion delay for UAVs with the variation in computational capability and bandwidth. It is observed that the delay performance changes similarly to the consumed energy consumption. When more computational resources and bandwidths are allocated to UAVs, they are attracted to offload tasks, resulting in degraded delay not only in transmission links but also on servers. However, there still exists a performance gap between our proposed scheme and the other four benchmarks. This is because our proposed DOMUS evaluates the quality of the communication link and offloading nodes for each UAV using the fuzzy logic-based offloading assessment mechanism, and further uses the A2C model to make offloading decisions by continuously updating the parameters of DNNs to enhance the prediction ability of the critic network for actions. This efficiently facilitates the optimization of offloading decisions. In general, the overall delay in task processing was reduced by at least 4.89% compared to other offloading approaches.

Additionally, Figure 11 shows the variation of the average utility of UAVs as computational capability and network bandwidth vary. We can see from the figure that the utilities of different offloading schemes increase as the computational capability bandwidth increases. In particular, our proposed approach outperforms other approaches and improves the utility of UAVs by up to 8.14%. This phenomenon indicates that the DOMUS proposed can effectively enable multi-UAVs to explore the joint optimal task computing policy under the guidance of the devised offloading A2C-based DRL framework in a dynamic network environment.

(4) Impact of transmission power. To further demonstrate the scalability of the proposed DOMUS scheme, we investigated the impact of varying transmission power of UAVs on the overall delay and energy consumption to complete tasks, as shown in Figure 12. The number of UAVs was set to 6. Generally, increasing the transmission power can result in a higher data transmission rate, which helps to reduce data transmission delay. Accordingly, we observe from Figure 12a that all delay curves become smaller as the power gradually increases. Nonetheless, as depicted in Figure 12b, a higher transmission power can lead to a higher energy consumption when UAVs transmit task data due to the linear relationship between them. Moreover, the proposed DOMUS achieves the lowest delay and energy consumption among the five offloading approaches, which reduces the two metrics by at least 4.66% and 9.26%, respectively.

7. Conclusions

This paper addresses the task offloading of an UAV swarm in MEC-assisted 5G heterogeneous networks. The objective is to optimize the utility of the multi-UAV system for task processing and prevent UAVs from offloading via easily disconnected wireless links and poorly-performing service nodes. We first devise an assessment mechanism to evaluate the candidate-offloading nodes by utilizing fuzzy logic theory. Afterward, considering the unknown environmental dynamics in heterogeneous networks, we model the optimization problem as a multi-agent MDP and propose a decentralized task-offloading scheme called DOMUS using the model-free DRL framework based on multi-agent A2C. In particular, the simulation results reveal that the proposed DOMUS can achieve effective convergence as well as reduce the delay and energy consumption under various settings for completing UAV tasks.

In future work, we will integrate the swarm intelligence approach into the proposed learning framework to enhance the services of drone base stations with multiple UAVs. By providing drone base station-enabled MEC architecture and realizing reasonable resource utilization with more advanced approaches, the much stricter requirements of next-generation Internet of Things applications on reliable and efficient service performances will be further satisfied.

Author Contributions

Conceptualization, M.M. and Z.W.; methodology and writing—original draft, M.M.; writing—review and editing, Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China No. 2020YFA0713504.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Dai, M.; Su, Z.; Xu, Q.; Zhang, N. Vehicle Assisted Computing Offloading for Unmanned Aerial Vehicles in Smart City. IEEE Trans. Intell. Transp. Syst. 2021, 22, 1932–1944. [Google Scholar] [CrossRef]
Liu, Z.; Wang, X.; Shen, L.; Zhao, S.; Cong, Y.; Li, J.; Yin, D.; Jia, S.; Xiang, X. Mission-Oriented Miniature Fixed-Wing UAV Swarms: A Multilayered and Distributed Architecture. IEEE Trans. Syst. Man Cybern. Syst. 2022, 52, 1588–1602. [Google Scholar] [CrossRef]
Sigala, A.; Langhals, B. Applications of Unmanned Aerial Systems (UAS): A Delphi Study projecting future UAS missions and relevant challenges. Drones 2020, 4, 8. [Google Scholar] [CrossRef] [Green Version]
Yan, S.; Hanly, S.V.; Collings, I.B. Optimal Transmit Power and Flying Location for UAV Covert Wireless Communications. IEEE J. Sel. Areas Commun. 2021, 39, 3321–3333. [Google Scholar] [CrossRef]
Hu, P.; Zhang, R.; Yang, J.; Chen, L. Development Status and Key Technologies of Plant Protection UAVs in China: A Review. Drones 2022, 6, 354. [Google Scholar] [CrossRef]
Yazid, Y.; Ez-Zazi, I.; Guerrero-González, A.; El Oualkadi, A.; Arioua, M. UAV-enabled mobile edge-computing for IoT based on AI: A comprehensive review. Drones 2021, 5, 148. [Google Scholar] [CrossRef]
Ma, M.; Zhu, A.; Guo, S.; Yang, Y. Intelligent Network Selection Algorithm for Multiservice Users in 5G Heterogeneous Network System: Nash Q-Learning Method. IEEE Internet Things J. 2021, 8, 11877–11890. [Google Scholar] [CrossRef]
Zhou, H.; Jiang, K.; Liu, X.; Li, X.; Leung, V.C.M. Deep Reinforcement Learning for Energy-Efficient Computation Offloading in Mobile-Edge Computing. IEEE Internet Things J. 2022, 9, 1517–1530. [Google Scholar] [CrossRef]
Chinchali, S.; Sharma, A.; Harrison, J.; Elhafsi, A.; Kang, D.; Pergament, E.; Cidon, E.; Katti, S.; Pavone, M. Network offloading policies for cloud robotics: A learning-based approach. Auton. Robot. 2021, 45, 997–1012. [Google Scholar] [CrossRef]
Zhu, A.; Ma, M.; Guo, S.; Yu, S.; Yi, L. Adaptive Multi-Access Algorithm for Multi-Service Edge Users in 5G Ultra-Dense Heterogeneous Networks. IEEE Trans. Veh. Technol. 2021, 70, 2807–2821. [Google Scholar] [CrossRef]
Zhang, X.; Cao, Y. Mobile Data Offloading Efficiency: A Stochastic Analytical View. In Proceedings of the 2018 IEEE International Conference on Communications Workshops (ICC Workshops), Kansas City, MO, USA, 20–24 May 2018; pp. 1–6. [Google Scholar]
Li, H.; Wu, S.; Jiao, J.; Lin, X.H.; Zhang, N.; Zhang, Q. Energy-Efficient Task Offloading of Edge-Aided Maritime UAV Systems. IEEE Trans. Veh. Technol. 2023, 72, 1116–1126. [Google Scholar] [CrossRef]
Guo, M.; Huang, X.; Wang, W.; Liang, B.; Yang, Y.; Zhang, L.; Chen, L. Hagp: A heuristic algorithm based on greedy policy for task offloading with reliability of mds in mec of the industrial internet. Sensors 2021, 21, 3513. [Google Scholar] [CrossRef]
Zhang, D.; Li, X.; Zhang, J.; Zhang, T.; Gong, C. New Method of Task Offloading in Mobile Edge Computing for Vehicles Based on Simulated Annealing Mechanism. J. Electron. Inf. Technol. 2022, 44, 3220–3230. [Google Scholar]
Huang, J.; Wang, M.; Wu, Y.; Chen, Y.; Shen, X. Distributed Offloading in Overlapping Areas of Mobile-Edge Computing for Internet of Things. IEEE Internet Things J. 2022, 9, 13837–13847. [Google Scholar] [CrossRef]
Xia, S.; Yao, Z.; Li, Y.; Mao, S. Online Distributed Offloading and Computing Resource Management With Energy Harvesting for Heterogeneous MEC-Enabled IoT. IEEE Trans. Wirel. Commun. 2021, 20, 6743–6757. [Google Scholar] [CrossRef]
Zhou, H.; Wang, Z.; Cheng, N.; Zeng, D.; Fan, P. Stackelberg-Game-Based Computation Offloading Method in Cloud-Edge Computing Networks. IEEE Internet Things J. 2022, 9, 16510–16520. [Google Scholar] [CrossRef]
Gu, Q.; Shen, B. An Evolutionary Game Based Computation Offloading for an UAV Network in MEC. In Wireless Algorithms, Systems, and Applications: Proceedings of the 17th International Conference, WASA 2022, Dalian, China, 24–26 November 2022; Springer: Cham, Switzerland, 2022; pp. 586–597. [Google Scholar]
You, Q.; Tang, B. Efficient task offloading using particle swarm optimization algorithm in edge computing for industrial internet of things. J. Cloud Comput. 2021, 10, 41. [Google Scholar] [CrossRef]
Li, F.; He, S.; Liu, M.; Li, N.; Fang, C. Intelligent Computation Offloading Mechanism of UAV in Edge Computing. In Proceedings of the 2022 2nd International Conference on Frontiers of Electronics, Information and Computation Technologies (ICFEICT), Wuhan, China, 19–21 August 2022; pp. 451–456. [Google Scholar]
Asaamoning, G.; Mendes, P.; Rosário, D.; Cerqueira, E. Drone swarms as networked control systems by integration of networking and computing. Sensors 2021, 21, 2642. [Google Scholar] [CrossRef]
Pliatsios, D.; Goudos, S.K.; Lagkas, T.; Argyriou, V.; Boulogeorgos, A.A.A.; Sarigiannidis, P. Drone-base-station for next-generation internet-of-things: A comparison of swarm intelligence approaches. IEEE Open J. Antennas Propag. 2021, 3, 32–47. [Google Scholar] [CrossRef]
Amponis, G.; Lagkas, T.; Zevgara, M.; Katsikas, G.; Xirofotos, T.; Moscholios, I.; Sarigiannidis, P. Drones in B5G/6G networks as flying base stations. Drones 2022, 6, 39. [Google Scholar] [CrossRef]
Chen, M.; Wang, T.; Zhang, S.; Liu, A. Deep reinforcement learning for computation offloading in mobile edge computing environment. Comput. Commun. 2021, 175, 1–12. [Google Scholar] [CrossRef]
Zhang, D.; Cao, L.; Zhu, H.; Zhang, T.; Du, J.; Jiang, K. Task offloading method of edge computing in internet of vehicles based on deep reinforcement learning. Clust. Comput. 2022, 25, 1175–1187. [Google Scholar] [CrossRef]
Xu, J.; Li, D.; Gu, W.; Chen, Y. Uav-assisted task offloading for iot in smart buildings and environment via deep reinforcement learning. Build. Environ. 2022, 222, 109218. [Google Scholar] [CrossRef]
Vhora, F.; Gandhi, J.; Gandhi, A. Q-TOMEC: Q-Learning-Based Task Offloading in Mobile Edge Computing. In Proceedings of the Futuristic Trends in Networks and Computing Technologies: Select Proceedings of Fourth International Conference on FTNCT 2021; Springer: Singapore, 2022; pp. 39–53. [Google Scholar]
Zhu, D.; Li, T.; Tian, H.; Yang, Y.; Liu, Y.; Liu, H.; Geng, L.; Sun, J. Speed-aware and customized task offloading and resource allocation in mobile edge computing. IEEE Commun. Lett. 2021, 25, 2683–2687. [Google Scholar] [CrossRef]
Ma, L.; Wang, P.; Du, C.; Li, Y. Energy-Efficient Edge Caching and Task Deployment Algorithm Enabled by Deep Q-Learning for MEC. Electronics 2022, 11, 4121. [Google Scholar] [CrossRef]
Naouri, A.; Wu, H.; Nouri, N.A.; Dhelim, S.; Ning, H. A novel framework for mobile-edge computing by optimizing task offloading. IEEE Internet Things J. 2021, 8, 13065–13076. [Google Scholar] [CrossRef]
Kishor, A.; Chakarbarty, C. Task offloading in fog computing for using smart ant colony optimization. Wirel. Pers. Commun. 2022, 127, 1683–1704. [Google Scholar] [CrossRef]
Guo, H.; Liu, J. Collaborative computation offloading for multiaccess edge computing over fiber–wireless networks. IEEE Trans. Veh. Technol. 2018, 67, 4514–4526. [Google Scholar] [CrossRef]
Pekaslan, D.; Wagner, C.; Garibaldi, J.M. ADONiS-Adaptive Online Nonsingleton Fuzzy Logic Systems. IEEE Trans. Fuzzy Syst. 2020, 28, 2302–2312. [Google Scholar] [CrossRef]
Zhou, W.; Jiang, X.; Luo, Q.; Guo, B.; Sun, X.; Sun, F.; Meng, L. AQROM: A quality of service aware routing optimization mechanism based on asynchronous advantage actor-critic in software-defined networks. Digit. Commun. Netw. 2022. [Google Scholar] [CrossRef]
Athanasiadou, G.E.; Fytampanis, P.; Zarbouti, D.A.; Tsoulos, G.V.; Gkonis, P.K.; Kaklamani, D.I. Radio network planning towards 5G mmWave standalone small-cell architectures. Electronics 2020, 9, 339. [Google Scholar] [CrossRef] [Green Version]
Garroppo, R.G.; Volpi, M.; Nencioni, G.; Wadatkar, P.V. Experimental Evaluation of Handover Strategies in 5G-MEC Scenario by using AdvantEDGE. In Proceedings of the 2022 IEEE International Mediterranean Conference on Communications and Networking (MeditCom), Athens, Greece, 5–8 September 2022; pp. 286–291. [Google Scholar]
Liu, Y.; Dai, H.N.; Wang, Q.; Imran, M.; Guizani, N. Wireless powering Internet of Things with UAVs: Challenges and opportunities. IEEE Netw. 2022, 36, 146–152. [Google Scholar] [CrossRef]
Feng, W.; Liu, H.; Yao, Y.; Cao, D.; Zhao, M. Latency-aware offloading for mobile edge computing networks. IEEE Commun. Lett. 2021, 25, 2673–2677. [Google Scholar] [CrossRef]
Zhou, H.; Wu, T.; Chen, X.; He, S.; Guo, D.; Wu, J. Reverse auction-based computation offloading and resource allocation in mobile cloud-edge computing. IEEE Trans. Mob. Comput. 2022, 1–5. [Google Scholar] [CrossRef]
Huang, S.; Zhang, J.; Wu, Y. Altitude Optimization and Task Allocation of UAV-Assisted MEC Communication System. Sensors 2022, 22, 8061. [Google Scholar] [CrossRef]
Zhang, K.; Gui, X.; Ren, D.; Li, D. Energy-Latency Tradeoff for Computation Offloading in UAV-Assisted Multiaccess Edge Computing System. IEEE Internet Things J. 2021, 8, 6709–6719. [Google Scholar] [CrossRef]
Deng, X.; Sun, Z.; Li, D.; Luo, J.; Wan, S. User-centric computation offloading for edge computing. IEEE Internet Things J. 2021, 8, 12559–12568. [Google Scholar] [CrossRef]

Figure 1. Task offloading for multi-UAV swarms in MEC-assisted 5G heterogeneous networks.

Figure 2. Offloading probability versus velocity, PLR, and BER (a,b).

Figure 3. Convergence of the proposed DOMUS.

Figure 4. Energy consumption and delay comparison in DOMUS under different weighting factors (a,b).

Figure 5. Energy consumption, delay, and utility comparison under different UAV numbers (a–c).

Figure 6. Energy consumption comparison under different average data sizes.

Figure 7. Delay comparison under different average data sizes.

Figure 8. Average utility comparison under different average data sizes.

Figure 9. Energy consumption comparison under different computational capabilities and network bandwidths.

Figure 10. Delay comparison under different computational capabilities and network bandwidths.

Figure 11. Average utility comparison under different computational capabilities and network bandwidths.

Figure 12. Delay and energy comparison under different transmission power (a,b).

Table 1. Key symbol definitions.

Symbols	Definition
$U = {1, . . ., U}$	Set of UAVs
$M = {1, . . ., M}$	Set of servers
$κ_{u}$	Task of UAV $u \in U$
$d_{u, κ}$	Data size of $κ_{u}$
$c_{u, κ}$	Computation resources required by task $κ_{u}$
$α_{u, κ}^{1}$ , $α_{u, κ}^{2}$	Offloading decisions
$λ_{u}$	Computational capability of UAV $u \in U$
$λ_{m}$	Computational capability of server $m \in M$
$l_{u, κ}^{l o c}$	Execution time in local computing
$e_{u, κ}^{l o c}$	Energy consumption in local computing
$ρ_{u}^{e}$	Energy consumption coefficient per CPU cycle
$ξ_{u, κ}^{c}$ , $ξ_{u, κ}^{w}$	Transmission rate via cellular and Wi-Fi networks, respectively
$B_{u}^{c}$ , $B_{u}^{w}$	Allocated bandwidth to the UAV u from cellular and Wi-Fi networks, respectively
$P_{u, κ}^{c}$ , $P_{u, κ}^{w}$	Transmission power of the UAV u via cellular and Wi-Fi connectivities, respectively
$G_{u, κ}^{c}$ , $G_{u, κ}^{w}$	Channel gain over cellular and Wi-Fi networks, respectively
${(σ_{u, κ}^{c})}^{2}$ , ${(σ_{u, κ}^{w})}^{2}$	Noise power of the channel over cellular and Wi-Fi networks, respectively
$d_{u, m}$	Distance between the UAV u and the server m
$l_{u, κ, m}^{t r}$	Task transmission time in the MEC offloading
$e_{u, κ}^{m e c}$	Task execution time on the server
$e_{u, κ}^{m e c}$	Transmission energy consumption in the MEC offloading
$l_{u, κ}^{m e c}$	Total time in the MEC offloading
${\hat{e}}_{u}$	Maximum energy constraint of the UAV u
${\hat{l}}_{u, k}$ , ${\hat{b}}_{u, k}$ , ${\hat{p}}_{u, k}$	Tolerable upper bound values for delay, BER, and PLR, respectively
$w_{d, κ}$ , $w_{e, κ}$	Balance factors for delay and energy consumption, respectively
${\hat{C}}_{m}$	Computation capacity of the server m
$F_{u}$	Utility of the UAV u
$f u z z y (\cdot)$	Fuzzy logic processor
$p_{u, κ}^{m}$	Packet loss rate generated in the data transmission
$b_{u, κ}^{m}$	Bit error rate generated in the data transmission
$χ_{m}$	Offloading probability

Table 2. Parameter settings.

Symbol	Value	Symbol	Value
$B^{c}$ (MHz)	4	$B^{w}$ (MHz)	5
$σ^{2}$ (dBm)	$- 100$	$P_{u}$ (W)	10
$λ_{m}$ (Gcycles/s)	$[0.7, 1]$	$λ_{u}$ (Gcycles/s)	$[5, 8]$
${\hat{C}}_{m}$ (Gcycles)	$[9, 11]$	$ι$	≥1
$d_{u, κ}$ (MB)	$[4, 5]$	$c_{u, κ}$ (Gcycles)	$[1.6, 2]$
$ρ_{u}^{e}$ (J/cycles)	$5 \times 10^{- 10}$	$γ$	$0.99$
$w_{d, κ}$	$0.5$	$w_{e, κ}$	$0.5$
$l r^{a}$	$0.001$	$l r^{c}$	$0.004$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ma, M.; Wang, Z. Distributed Offloading for Multi-UAV Swarms in MEC-Assisted 5G Heterogeneous Networks. Drones 2023, 7, 226. https://0-doi-org.brum.beds.ac.uk/10.3390/drones7040226

AMA Style

Ma M, Wang Z. Distributed Offloading for Multi-UAV Swarms in MEC-Assisted 5G Heterogeneous Networks. Drones. 2023; 7(4):226. https://0-doi-org.brum.beds.ac.uk/10.3390/drones7040226

Chicago/Turabian Style

Ma, Mingfang, and Zhengming Wang. 2023. "Distributed Offloading for Multi-UAV Swarms in MEC-Assisted 5G Heterogeneous Networks" Drones 7, no. 4: 226. https://0-doi-org.brum.beds.ac.uk/10.3390/drones7040226

Article Menu

Distributed Offloading for Multi-UAV Swarms in MEC-Assisted 5G Heterogeneous Networks

Abstract

1. Introduction

2. Related Work

3. System Model and Problem Definition

3.1. System Model

3.2. Task Computing Models

3.3. Utility Model in Task Computing

3.4. Optimization Problem Formulation

4. Offloading Assessment Based on Fuzzy Logic

5. Multi-Agent A2C-Based Decentralized Task Offloading

5.1. Multi-Agent MDP Model in the A2C Framework

5.2. Multi-Agent A2C Framework

5.3. A2C-Based Decentralized Offloading Algorithm

6. Performance Evaluation

6.1. Parameter Settings

6.2. Fitness Demonstration of Offloading Targets

6.3. Convergence Performance

6.4. Impact of Weighting Factors

6.5. Performance Comparison

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI