Next Article in Journal
Susceptibility Prediction of Post-Fire Debris Flows in Xichang, China, Using a Logistic Regression Model from a Spatiotemporal Perspective
Previous Article in Journal
How Much Attenuation Extinguishes mm-Wave Vertically Pointing Radar Return Signals?
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Technical Note

A Fast and Robust Algorithm with Reinforcement Learning for Large UAV Cluster Mission Planning

1
National Lab of Radar Signal Processing, Xidian University, Xi’an 710000, China
2
Beijing Research Institute of Telemetry, Beijing 100076, China
3
Jiuquan Satelite Launch Centre, Jiuquan 735400, China
*
Author to whom correspondence should be addressed.
Submission received: 10 December 2021 / Revised: 19 February 2022 / Accepted: 4 March 2022 / Published: 8 March 2022

Abstract

:
Large Unmanned Aerial Vehicle (UAV) clusters, containing hundreds of UAVs, have widely been used in the modern world. Therein, mission planning is the core of large UAV cluster collaborative systems. In this paper, we propose a mission planning method by introducing the Simple Attention Model (SAM) into Dynamic Information Reinforcement Learning (DIRL), named DIRL-SAM. To reduce the computational complexity of the original attention model, we derive the SAM with a lightweight interactive model to rapidly extract high-dimensional features of the cluster information. In DIRL, dynamic training conditions are considered to simulate different mission environments. Meanwhile, the data expansion in DIRL guarantees the convergence of the model in these dynamic environments, which improves the robustness of the algorithm. Finally, the simulation experiment results show that the proposed method can adaptively provide feasible mission planning schemes with second-level solution speed and that it exhibits excellent generalization performance in large-scale cluster planning problems.

1. Introduction

Unmanned aerial vehicle (UAV) clusters have been widely used to perform various complex missions in military and civil fields, such as plant protection, mobile signal service, load transportation service, target detection, and strike [1,2,3,4,5,6]. For example, a cluster needs to allocate many subgroups of UAV with various ammunition resources to strike heterogeneous enemy targets in different areas, leading to the problem of UAV cluster cooperative strike mission planning. Performing multiple missions simultaneously requires a large number of UAVs, which increases the difficulty of mission planning. Generally, mission planning problems can be considered as complex optimization problems with some system constraints [7,8]. Three algorithms are studied to solve this problem, including mathematical optimization algorithms, heuristic algorithms, and reinforcement learning (RL)-based artificial intelligence algorithms. A brief introduction about these three optimization algorithms are shown in Table 1.
Algorithms based on mathematical models of mission planning problems have been studied in recent years [9,10,11]. In [9], Li et al. simultaneously considered UAV resource and trajectory optimization problems in the non-orthogonal multiple access scenario with two UAVs. Taylor expansion and slack variables are used to maximize the secrecy energy efficiency in the UAV network. To navigate multiple UAVs to avoid obstacles in the cooperative source seeking planning problem, a hybrid gradient descent optimization algorithm was proposed in [11]. It shows the efficient scalability under the condition of the agents without a priori knowledge of obstacles. However, the computational complexity of mathematical optimization algorithms increases dramatically as the number of UAVs increases. That is, these algorithms are not suitable for large UAV cluster mission planning problems.
Some works based on heuristic algorithms are investigated to solve large UAV cluster planning problems, such as A star algorithms and bio-inspired heuristic algorithms [12,13,14,15]. Unlike mathematical optimization algorithms, heuristic algorithms use simple and efficient iterative processes with greedy rules to optimize mission planning schemes. In [16], a bio-inspired heuristic algorithm based on the schooling and foraging behaviors of a fish swarm was proposed, where the UAVs were adaptively aggregated into multiple groups to perform search and rescue missions. Considering the suffered combat threat of the UAV cluster, Zhen et al. propose an ant colony optimization algorithm to solve the cooperative search-attack planning problem, where the UAV was modeled as a Dubins vehicle [17]. An essential procedure of the heuristic algorithm is to evaluate the quality of the individuals at each iteration. The total number of evaluations depends mainly on the population size, special optimal rules, and iteration times. When facing a large UAV cluster, the population and iteration times are usually large, leading to high time consumption [8].
Reinforcement learning techniques provide a fast-solving framework for optimization problems [18,19,20,21,22,23,24] and have been applied to various UAV mission plannings [25,26,27,28]. Many mission planning problems can be considered as decision optimization problems, such as deciding on multiple UAVs to cooperatively perform many different missions and planning proper mission orders for each UAV. The RL formulates the problems as the sequential Markov decision process, where the objective function of the problem is set as the loss of algorithms [18]. After the unsupervised learning process, the solution can be quickly obtained as a group of decision actions [19,20]. Jun et al. use the RL algorithm to fast allocate the communication channels in the cooperative surveillance problem of the UAV cluster [25]. In [26], the RL algorithm was used to optimize the mission assignment scheme, which helped UAVs to avoid dynamic threat areas and reach to the goal areas. In [27], the complex task was decomposed into a series of simple subtasks, and an RL algorithm, i.e., Double Deep Q–network, was proposed to train the cooperative mission planning solver of human–UAV teamwork. However, conventional RL solvers can only solve the optimization algorithms with fixed objective functions and constraints. The model needs to be retrained to adapt to new environments when the problem settings are different from the training [28]. This means that the model cannot converge in dynamic training environments. In real scenarios, the missions change with requirements, leading to the objective functions and constraints of the model being dynamic [29,30]. Therefore, it is necessary to design a robust RL method to ensure the convergence of the model under dynamic training conditions.
In this paper, we address the large UAV cluster collaborative mission planning problem, where the cluster needs to adaptively assign reasonable UAV subgroups for completing many different missions in real-time. To this end, a fast and robust method is proposed, named dynamical information reinforcement learning (DIRL) with the simple attention model (SAM). In the method, the novel DIRL is proposed by importing mission information in the UAV data during the dynamical training process, i.e., the mission’s requirement constraints, environment influence factor, location, and the weights between different objective functions. Compared with the traditional RL algorithm, DIRL ensures that the model converges in the changing training environments, which overcomes the limitation of fixed objective functions and constraints in common RL technologies. Meanwhile, to fast obtain the UAV cluster’s high-dimensional information, we propose the SAM by redesigning a new fast interactive model in the attention model (AM). The SAM can significantly reduce the computational complexity, especially in handling large cluster data. Therefore, the resulting DIRL-SAM method can provide mission planning schemes for different missions in real time with a one-trained model, proving that it is fast and robust.
The remainder of this paper is organized as follows. In Section 2, we describe the simulation scenario of large UAV cluster mission planning and formulate the mission planning problem. In Section 3, the SAM with a new interaction model is derived, and the DIRL is used to train the total network model. Several numerical results are provided in Section 4 to verify the validity of the proposed SAM and DIRL. Finally, the conclusion is made in Section 5.

2. Mission Planning Problem Formulation

2.1. Mission and UAV Formulation

Figure 1 shows the illustration example of the mission planning in a large UAV cluster. In a two-dimensional battlefield, several missions containing different enemy targets and detection equipment are randomly distributed. The planner ranks the mission priorities according to the combat purpose. The mission with high rank needs to be given priority in the mission execution process. The blue and the red areas in Figure 1 represent the different requirements of ammunition and electronic jamming cover areas in combat missions. The size of the areas represents the number of combat resources requirements. The UAVs with various colors carry different types and numbers of combat resources (ammunition and electronic jamming resources). The mission planning is needed to adaptively allocate the limited UAV subgroups to complete the missions under the conditions of the missions’ requirements and UAVs’ capacity. Meanwhile, the resource consumption (various combat resources and fuel resources) and the combat loss of the allocated UAVs need to be reduced as much as possible.
The missions have different priorities { T 1 , T 2 , T K } , where the subscripts represent the priority of missions. The cluster sequentially assigns the reasonable UAV subgroups to perform missions according to the priorities. Each mission needs to be accomplished with various combat demands. { A t t ( T k ) , D e f ( T k ) , E l e ( T k ) } represent the requirements of ammunitions resources, defense capability, and electronic jamming area in the kth combat mission, respectively. The combat efficiency of UAV is affected by the local environment of missions E n ( T k ) [31]. In the mission planning problem, the total flight distance of UAVs is a common consideration factor for calculating the time cost of scheme planning [9,13,17]. The flight distance impacts UAV’s power consumption and arrival time, influencing the effect of mission planning. Therefore, we extend the mission mode in [31] with the mission location L o c ( T k ) . The kth mission data vector in our paper can be expressed by: T k = [ A t t ( T k ) , D e f ( T k ) , E l e ( T k ) , E n ( T k ) , L o c ( T k ) ] .
We use different types and numbers of combat resources in the UAVs to present the heterogeneity. { a t t ( U i ) , d e f ( U i ) , e l e ( U i ) } denote the ammunitions resources, defense capability, and electronic jamming resources of the UAV i , respectively. Similarly, we add the location of the UAV l o c ( U i ) in the UAV data vector that can be formulated as U i = [ a t t ( U i ) , d e f ( U i ) , e l e ( U i ) , l o c ( U i ) ] .

2.2. Multiple Objective Functions of Mission Planning

Under the limitations of various mission requirements, the planning scheme needs to reduce the total combat resources consumption of the UAV cluster. To the kth mission, we set the consumption of all combat resources as the first objective function f 1 k :
f 1 k = E n ( T k ) i k = 1 N k ( a t t ( U i k ) + e l e ( U i k ) ) ,
where i k is the number of the allocated UAVs for the kth mission and N k is the total number. Equation (1) denotes the total combat resources consumption of the allocated UAVs for the kth mission.
In actual combat environments, UAVs may suffer some undetected threats when performing missions, causing combat loss for the total UAV cluster. The planning scheme should consider the combat loss resulting from uncertain threats factors. To this end, we construct the combat loss of UAVs as follows:
f 2 k = i k = 1 N k i k V k ¯ ( p k ) i k ,
where f 2 k is the combat loss of allocated UAVs in completing the kth mission. V k ¯ is the mean combat value of the allocated UAVs for the kth mission, V k ¯ = 1 N k i k = 1 N k V i k , where V i k denotes the combat value of the UAV i k . Different UAVs carry different numbers and types of combat resources, which means that the UAVs have different combat values for the mission planner. We consider that the carrying of combat resources and the cost of UAVs influence the combat value of UAVs. Therefore, V i k can be calculated by
V i k = μ a t t ( U i k ) + e l e ( U i k ) 1 + c i k ,
where μ is used to transform the combat resources to the combat value. 1 represents the calculation of 1-norm and c i k represents the self-cost of UAV i k . In (2), p k is the probability of UAV being destroyed when carrying out the kth mission and is changed with different combat missions. The mission with larger combat resource requirements is usually more dangerous for UAVs, which means that p k increases with the increase in mission resource requirements. It can be formulated as follows:
p k = min ( χ L k 1 , 1 ) ,
where λ is the probability coefficient. It is used to transform L k 1 into p k . L k denotes the mission resource requirement vector in the kth mission, i.e., L k = [ a t t ( T k ) , e l e ( T k ) ]   .
The total flight distance of UAVs is crucial for the mission planning problem. If the allocated UAVs are far from the mission, it increases the mission completion time and the total power consumption of the UAV cluster. Hence, it is necessary to decrease the total flight distance of the allocated UAVs in the mission planning scheme. We set the total flight distance as the third objective function f 3 k :
f 3 k = i k = 1 N k d i k , k ( l o c ( U i k ) , L o c ( T k ) ) ,
where d i k , k is the Euclidean distance between the locations of the UAV i k and the kth mission. From above all, the total objective function of the kth mission can be formulated as:
F k = w 1 k f 1 k + w 2 k f 2 k + w 3 k f 3 k ,
where w 1 k , w 2 k , and w 3 k are the weights with respect to the three objective functions, yielding w 1 k + w 2 k + w 3 k = 1 . According to the different mission requirements, the weights can be adjusted.

2.3. Constraint Conditions of Mission Planning

To ensure the completion of the mission, each combat resource contained in the allocated UAVs needs to be larger than the corresponding resource requirements of the missions. In addition, each allocated UAV for mission k should meet the defense capability threshold. Considering the limitation of the combat resource requirements and the defense capability threshold in the missions, we formulate the constraints of mission planning as follows:
e n ( T k ) i k = 1 N k a t t ( U i k ) A t t ( T k ) ,
e n ( T k ) i k = 1 N k e l e ( U i k ) E l e ( T k ) ,
d e f ( U i k ) D e f ( T k ) .
The constraints in (7) and (8) ensure that each combat resource requirement of missions can be met. Besides, we set the threshold of defense capability in (9) to guarantee the UAVs’ safety when carrying out combat missions.
Finally, the mission planning problem is formulated as:
min F = k = 1 K F k , .
To problem (10), assigning rational numbers and types of UAV subgroups to carry out each mission with various requirements and limitations is a complex combinatorial optimization problem.

3. DIRL-SAM

To solve the UAV cluster mission planning problem in Section 2, we propose the DIRL-SAM. First, we use the proposed SAM to fast extract the high-dimensional features of the UAV cluster during the encoding process. Then, the high-dimensional features are used in the decoder to allocate UAV subgroups for the missions. Finally, the model parameters of the encoder and the decoder are optimized collaboratively according to the DIRL dynamic training algorithm. The detailed model structure of the DIRL-SAM is provided in Figure 2.

3.1. Encoder with SAM

The encoder, used to extract the high dimensional features of UAV cluster data, provides necessary information for the decoder to solve the mission planning problem. Specifically, the encoder has a linear projection network and r sublayers. Each sublayer has a SAM model and a feed-forward (FF) network. The whole process of the encoding is shown in Figure 3.
The input of the linear projection network is the UAV cluster data [ U 1 , U 2 , , U N ] , where N is the total number of the UAVs in the cluster. For each UAV data U i , the linear projection network transforms them to the UAV low-level features h i f :
h i f = W × U i + b ,
where W and b are the parameters of the linear projection network. Then, h i f is fed into the 1th sublayer of the encoder, where the SAM uses h i f to extract the UAV high-level features h i , s 1 . This process can be calculated by:
h i , s l = SAM l ( h i l , h i f , s l ) ,
where h i , s l is the output of the SAM in the sublayer l { 1 , 2 , , r } . h i l is the output of the (l − 1)th sublayer, and h i 1 is equal to h i f in the 1th sublayer. s l is the designed auxiliary vector in the SAM, which is expressed as:
s l = R e l u ( 1 N i = 1 N h i l ) ,
where the R e l u activation function is employed in the SAM to avoid the problem of gradient disappearance in the training process.
To improve the feature extraction performance of the SAM, h i , s l is expected to contain both the UAV cluster data features and its self-features. For the cluster features, h i l interacts with s l in (13), which can represent the cluster information. To maintain the UAV self-features in the interaction process of the SAM, h i l interacts with itself and a vector relating to its original self-features, i.e., h i l and h i f . The two feature vectors do not add extra network models and calculations, making the SAM’s structure lightweight.
After the SAM feature extraction process, the batch normalization (BN) [32] and the skip connection technology [33] are used in the rest of the sublayer:
{ h ^ i l + 1 = B N l ( h i l + h i , s l ) h i l + 1 = B N l ( h ^ i l + 1 + F F l ( h ^ i l + 1 ) ) ,
In the last sublayer of the encoder, we obtain the UAV cluster high-dimensional features { h 1 r + 1 , h 2 r + 1 , , h N r + 1 } . The set of high-dimensional features is used in the decoder, which can obtain the probability vector of each UAV being allocated P θ [20,21,22] ( θ is the parameter vector of the encoder and the decoder). Finally, we can solve the allocated UAVs for each mission accordingly to P θ .
Remark 1.
Figure 4 compares the interactive mode of the AM and the SAM. As shown in Figure 4a, h i l needs to carry out the interactions with all UAV hidden vectors to obtain its hidden features. In this case, the computational complexity of this process is O ( N × N ) in the AM [34,35,36]. In Figure 4b, unlike the AM, we redesign a fast interaction process in the SAM. h i l only needs to interact with two vectors and itself, the computational complexity is reduced from O ( N × N ) to O ( N × 3 ) .

3.2. DIRL

The DIRL is used to update θ with unsupervised training in the dynamic environments. To solve different settings of mission planning problems with the one network model, the θ is optimized under various training conditions. It means that the loss function and constraint conditions of RL are changed correspondingly. The total network model cannot converge in these dynamic training environments by using traditional RL algorithms.
The DIRL makes the encoder and the decoder contain all dynamic information, ensuring the convergence of the model in the changing training process. Specially, U i is expanded with the dynamical data in the DIRL:
U i k = [ U i , w 1 k , w 2 k , w 3 k , T k ]   ,
where U i k is the expanding vector of UAV i for the kth mission, it can include all dynamical information through this simple but effective data expansion operation. Due to the data expansion process of the DIRL, the entire solving process of the model can contain the mission data and weight data, guaranteeing the convergence of the model. The dynamic training conditions with the data expansion operation in DIRL ensure that the trained model can solve the problems with various resource requirements and objective function weights.
The actor–critic framework is employed in the DIRL [37]. The parameter vector of the total model θ is optimized by the policy gradient algorithm [38], which can be calculated as follows:
( θ | U 1 k , U 2 k , , U N k ) = E P θ ( π k | U 1 k , U 2 k , , U N k ) [ ( F k ( π k ) F k ( π c k ) ) log P θ ( π k | U 1 k , U 2 k , , U N k ) ] ,
where is the DIRL loss. π k and π c k are the solutions achieved by the actor–network and critic–network, respectively.
The critic–network is the baseline function in DIRL, which can eliminate the variance during the training process [37]. The actor–network and critic–network use the sample rollout policy and the greedy rollout policy to sample P θ [20], respectively. The performances of the two networks are compared at the end of each epoch. If the results of the actor–network are better than the critic network, we replace the parameters of the critic–network by using the actor–network parameters. It means that the critic–network maintains the best model parameters during the DIRL dynamic training process. Algorithm 1 exhibits the detailed dynamical training process of DIRL.
Algorithm 1. Dynamical Information Reinforcement Learning
1: Input: Initialized actor–network and critic–network parameters θ , θ B L
2: Output: The optimal actor–network parameters θ
3: for each epoch do
4: for each batch do
5: { U 1 , U 2 , , U N } , T k , W k RandomInstance
6: π k SampleRollout ( P θ )
7: π c k GreedyRollou ( P θ B L )
8: π k , π c k
9: θ Adam ( θ , )
10: end for batch
11: if the paired t test ( P θ , P θ B L ) < 5 % in the test batches, then
12: θ B L θ
13: end if
14: end for epoch

4. Experimental Results

4.1. Experimental Settings

The simulation experimental setup contains the following parts.
  • Parameter settings of the contrast optimization algorithms:
We compare our method with two effective heuristic optimization algorithms: genetic algorithm (GA) and particle swarm optimization algorithm (PSO) [39,40]. The main parameter settings of GA and PSO are shown in Table 2 and Table 3.
In addition, we use two models, DIRL-AM and RL-AM, to separately verify the effectiveness of the SAM and the DIRL, respectively. To compare the performances of the SAM and the AM, we only replace the SAM with the AM in the encoder, and the other network structures retained unchanged. To the DIRL, we fix the network structure of the model and exclude the data expansion process in the DIRL. The DIRL-SAM, the DIRL-AM, and the RL-SAM are trained on the same dynamical training conditions in Table 4.
Settings of various missions:
We randomly generate a group of missions with different combat requirements and locations. The mission settings are shown in Table 5.
In Table 5, we consider that missions 1 and 2 have a high threat for UAVs; hence, the priorities are higher than other missions. Meanwhile, we increase the weight of the path length function to ensure that the missions can be completed as soon as possible. We set missions 3 and 4 as the medium threat missions for UAVs, and the weights between the three objective functions are nearly equal. For missions 5 and 6 with lower combat threats, the UAV cluster needs to consume as few combat resources as possible to complete the missions; the weight of resource consumption function is higher than the others.
In Figure 5, the size of the square and circle represents the mission requirements of ammunition resources and electronic jamming area, respectively. The red nodes represent the UAVs with high defense capability and can perform all missions. The blue nodes denote the UAVs with low defense capability and can only perform missions 3–6. The total number of UAVs is 100. We use the Pycharm community edition 2018.3.5 with python 3.6 to randomly generate the same training data for the SAM and AM. The SAM and AM are running on the Pycharm community edition 2018.3.5. The PSO and GA are running on Matrix Laborator 2018 (b). All means are running on the Intel(R) Core (TM) i7-9700K CPU at 3.6 GHz.
2.
Parameters settings about the objection functions:
For the convenience of simulation tests, we consider that the UAVs have the same self-cost. By setting χ = 0.03 in (4) and μ = 10 and c i k = 10 in (3), we obtain the probability function of UAV being destroyed and the UAV value function:
p k = min ( 0.03 ×     L k 1 , 1 ) ,
V i k = 10 U i k 1 + 10

4.2. Simulation Experimental Results

Figure 6 shows the comparison results of the five optimization methods in all missions. In Figure 6a,d,j,m, it is clear that the UAVs near missions 1 and 2 are assigned to complete the missions, and the allocated UAVs meet the limitations of the defense capabilities. This is expected since missions 1 and 2 are regarded as high threat missions and need to be completed within the shortest time possible. The combat resources consumptions and the suffered combat loss for UAVs are nearly not considered in missions 1 and 2. As depicted in Figure 6b,e,k,n, multiple objective functions are simultaneously considered in missions 3 and 4. Even some UAVs are closed to the missions, these are not assigned to carry out the missions. In Figure 6c,f,l,o, to ensure that the consumption of combat resources are as few as possible, some UAVs far from low priority missions 5 and 6 are allocated. The detailed optimization results of different algorithms are shown in Table 6.
Figure 7 shows the training process of DIRL-SAM and RL-SAM. From Figure 7b and Table 6, due to without the data expansion process of DIRL, the model with the common RL algorithm cannot converge in the same dynamic training environment, which leads to the RL-SAM being ineffective and unstable in dealing with different missions. Figure 6g–i shows that the RL-SAM fails to deal with the missions with different requirements. In Figure 7a, the model with DIRL can obtain the convergence in the same dynamical training environments, and we only use one DIRL-SAM model to solve all mission planning problems with varying combat requirements and locations. That is, our proposed DIRL-SAM is robust for the UAV cluster mission planning problem.
From Table 6, although SAM has a slight precision loss versus the AM (about 5%), it can significantly improve solving speed (about 35%). With the help of the new fast interactive process, SAM maintains the feature extraction effect and reduces the computational complexity to speed up the encoding process. Furthermore, compared to the GA and PSO, the DIRL-SAM achieves a considerable advantage in the solving speed (all means run on the same CPU). The DIRL-SAM avoids the time-consuming iterative optimization process and can solve the mission planning problem in a short, required time.
Specifically, we increase the number of UAVs to solve the same mission, and the results for 150 and 200 UAVs are provided in Table 7.
From Table 7, the objective function values of DIRL-SAM and DIRL-AM are decreasing in the condition of more UAVs. In the problem with 200 UAVs, more UAVs can be chosen to conduct the mission, and the algorithms can obtain better planning schemes for the same mission. The search space of GA and PSO is exponential growth as the number of variables increases, the performances of GA and PSO are affected by the larger number of UAVs. In addition, the DIRL-SAM obtains a huge advantage on the solution speed.
To test DIRL-SAM in small-scale problems, we use the exhaustive method to obtain the exact results, and the compared results are shown in Table 8. Table 9 shows the mission settings in small-scale problems.
From Table 8 and Table 9, we set six small-scale problems with different settings to test the performance of the proposed algorithm, and the exhaustive method is used to obtain the exact solution of the tests. All optimization methods yield reasonable results in small instances. The time consumption of the exhaustive method increases rapidly with the increasing number of UAVs. The GA and PSO can obtain the exact solutions in all six tests, the proposed DIRL-SAM obtains the exact solutions in tests 4 and 6, and obtains feasible solutions in tests 2, 3, and 5. The DIRL-SAM has a faster solution speed than others.
Finally, compared with the competitors, DIRL-SAM shows some new characteristics in simulation experiments, such as the fast solving speed and the excellent robustness in the different missions. It can prove that the DIRL-SAM can adaptively allocate UAV groups for each mission in real time and is a practical algorithm to solve large UAV cluster mission planning problems.
Summary: Compared to the competitors, DIRL-SAM uses the new framework to solve the UAV mission planning problem, which helps the algorithm to obtain the fast solution speed. With the number of UAVs increasing, the search space of the solutions is becoming huge, which influences the performance of the GA and PSO. DIRL-SAM exhibits excellent generalization ability in solving the large UAV mission planning problems, proving the effectiveness of the algorithm.

5. Conclusions

In this paper, we proposed a new algorithm DIRL-SAM to fast deal with large UAV cluster mission planning problems. Due to the data expansion operation of DIRL, the model obtains the convergence in dynamic training conditions, ensuring that the trained model can solve the problems with different constraints and objective function weights.
We designed a new lightweight interactive mode in the SAM, which decreases the computational complexity of the original AM model and improves the solving speed. Therefore, the DIRL-SAM algorithm can fast provide satisfactory mission planning schemes for the UAV cluster, especially for the large-scale problem.

Author Contributions

Resources, S.G.; validation, L.Z. and Y.L.; writing—original draft preparation, L.L. and M.L.; writing—review and editing, supervision, S.G. and X.L.; funding acquisition, L.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China under grant (61871307) and the Fundamental Research Funds for the Central Universities (JB210207).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank all reviewers and editors for their comments on this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Fraser, B.T.; Congalton, R.G. Monitoring Fine-Scale Forest Health Using Unmanned Aerial Systems (UAS) Multispectral Models. Remote Sens. 2021, 13, 4873. [Google Scholar] [CrossRef]
  2. Kurdi, H.; AlDaood, M.F.; Al-Megren, S.; Aloboud, E.; Aldawood, A.S.; Youcef-Toumi, K. Adaptive Task Allocation for Multi-UAV Systems Based on Bacteria Foraging Behavior. Appl. Soft Comput. 2019, 83, 105643. [Google Scholar] [CrossRef]
  3. Wang, Y.; Ru, Z.Y.; Wang, K.Z.; Huang, P.Q. Joint Deployment and Task Scheduling Optimization for Large-Scale Mobile Users in Multi-UAV-Enabled Mobile Edge Computing. IEEE Trans. Cybern. 2019, 9, 3984–3997. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  4. Shirani, B.; Najafi, M.; Izadi, I. Cooperative Load Transportation Using Multiple UAVs. Aerosp Sci Technol. 2019, 84, 158–169. [Google Scholar] [CrossRef]
  5. Sun, L.; Chen, J.; Feng, D.; Xing, M. Parallel Ensemble Deep Learning for Real-Time Remote Sensing Video Multi-Target Detection. Remote Sens. 2021, 13, 4377. [Google Scholar] [CrossRef]
  6. Milani, I.; Bongioanni, C.; Colone, F.; Lombardo, P. Fusing Measurements from Wi-Fi Emission-Based and Passive Radar Sensors for Short-Range Surveillance. Remote Sens. 2021, 13, 3556. [Google Scholar] [CrossRef]
  7. Wu, H.S.; Li, H.; Xiao, R.B.; Liu, J. Modeling and Simulation of Dynamic Ant Colony’s Labor Division for Task Allocation of UAV Swarm. Phys. A Stat. Mech. Appl. 2018, 491, 127–141. [Google Scholar] [CrossRef]
  8. Zhang, J.; Xing, J.H. Cooperative Task Assignment of Multi-UAV System. Chin. J. Aeronaut. 2020, 3, 2825–2827. [Google Scholar] [CrossRef]
  9. Li, Y.B.; Zhang, H.J.; Long, K.P. Joint Resource, Trajectory, and Artificial Noise Optimization in Secure Driven 3-D UAVs with NOMA and Imperfect CSI. IEEE J. Sel. Area Commun. 2021, 39, 3363–3377. [Google Scholar] [CrossRef]
  10. Wang, Y.; Wang, D.B.; Wang, J.H. A Convex Optimization Based Method for Multiple UAV Autonomous Formation Reconfiguration. Sci China Technol. Sci. 2017, 47, 249–258. [Google Scholar] [CrossRef]
  11. Mohr, H.; Schroeder, K.; Black, J. Distributed Source Seeking and Robust Obstacle Avoidance through Hybrid Gradient Descent. In Proceedings of the 2019 IEEE Aerospace Conference, Big Sky, MT, USA, 2–9 March 2019; pp. 1–8. [Google Scholar]
  12. Bagheri, S.M.; Taghaddos, H.; Mousaei, A.; Shahnavaz, F.; Hermann, U. An A-Star Algorithm for Semi-optimization of Crane Location and Configuration in Modular Construction. Automat. Constr. 2021, 121, 103447. [Google Scholar] [CrossRef]
  13. Martin, R.A.; Rojas, I.; Franke, K.; Hedengren, J.D. Evolutionary View Planning for Optimized UAV Terrain Modeling in a Simulated Environment. Remote Sens. 2016, 8, 26. [Google Scholar] [CrossRef] [Green Version]
  14. Huang, X.; Dong, X.; Ma, J.; Liu, K.; Ahmed, S.; Lin, J.; Qiu, B. The Improved A* Obstacle Avoidance Algorithm for the Plant Protection UAV with Millimeter Wave Radar and Monocular Camera Data Fusion. Remote Sens. 2021, 13, 3364. [Google Scholar] [CrossRef]
  15. Banerjee, B.P.; Raval, S. A Particle Swarm Optimization Based Approach to Pre-Tune Programmable Hyperspectral Sensors. Remote Sens. 2021, 13, 3295. [Google Scholar] [CrossRef]
  16. Alhaqbani, A.; Kurdi, H.; Youcef-Toumi, K. Fish-Inspired Task Allocation Algorithm for Multiple Unmanned Aerial Vehicles in Search and Rescue Missions. Remote Sens. 2021, 13, 27. [Google Scholar] [CrossRef]
  17. Zhen, Z.Y.; Xing, D.; Gao, C. Cooperative Search-Attack Mission Planning for Multi-UAV based on Intelligent Self-Organized Algorithm. Aerosp. Sci. Technol. 2018, 76, 402–411. [Google Scholar] [CrossRef]
  18. Vinyals, O.; Fortunato, M.; Jaitly, N. Pointer Networks. In Proceedings of the 2015 Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
  19. Meng, X.; Wu, L.; Yu, S. Research on Resource Allocation Method of Space Information Networks Based on Deep Reinforcement Learning. Remote Sens. 2019, 11, 448. [Google Scholar] [CrossRef] [Green Version]
  20. Nazari, M.; Oroojlooy, A.; Snyder, L.V.; Takac, M. Reinforcement Learning for Solving the Vehicle Routing Problem. In Proceedings of the 2018 Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 2–8 December 2018; Volume 31. [Google Scholar]
  21. Bello, I.; Pham, H.V.; Le, Q.V.; Norouzi, M.; Bengio, S. Neural Combinatorial Optimization with Reinforcement Learning. In Proceedings of the 2017 International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
  22. Kool, W.; Herke, V.F.; Welling, M. Attention, Learn to Solve Routing Problems. In Proceedings of the 2019 International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2017. [Google Scholar]
  23. Huang, Y.; Mu, Z.; Wu, S.; Cui, B.; Duan, Y. Revising the Observation Satellite Scheduling Problem Based on Deep Reinforcement Learning. Remote Sens. 2021, 13, 2377. [Google Scholar] [CrossRef]
  24. Ying, C.S.; Chow, A.H.F.; Chin, K.S. An Actor-Critic Deep Reinforcement Learning Approach for Metro Train Scheduling with Rolling Stock Circulation under Stochastic Demand. Transp. Res. B Meth. 2020, 140, 210–235. [Google Scholar] [CrossRef]
  25. Jun, Y.; You, X.H.; Wu, G.X.; Hassan, M.M.; Almogren, A.; Guna, J. Application of Reinforcement Learning in UAV Cluster Task Scheduling. Future Gener. Comput. Syst. 2019, 95, 140–148. [Google Scholar]
  26. Qie, H.; Shi, D.X.; Shen, T.L.; Xu, X.H.; Li, Y.; Wang, L.J. Joint Optimization of Multi-UAV Target Assignment and Path Planning Based on Multi-Agent Reinforcement Learning. IEEE Access 2019, 7, 146264–146272. [Google Scholar] [CrossRef]
  27. Wang, C.; Wu, L.Z.; Yan, C.; Wang, Z.C.; Long, H.; Yu, C. Coactive Design of Explainable Agent-Based Task Planning and Deep Reinforcement Learning for Human-UAVs Teamwork. Chin. J. Aeronaut. 2020, 33, 2930–2945. [Google Scholar] [CrossRef]
  28. Li, K.W.; Zhang, T.; Wang, R. Deep Reinforcement Learning for Multi-objective Optimization. IEEE Trans. Cybern. 2021, 51, 3103–3114. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  29. Atencia, R.C.; Ser, J.D.; Camacho, D. Weighted Strategies to Guide a Multi-Objective Evolutionary Algorithm for Multi-UAV Mission Planning. Swarm Evol. Comput. 2019, 44, 480–495. [Google Scholar] [CrossRef]
  30. Zhen, Z.Y.; Chen, Y.; Wen, L.D.; Han, B. An Intelligent Cooperative Mission Planning Scheme of UAV Swarm in Uncertain Dynamic Environment. Aerosp. Sci. Technol. 2020, 100, 105826. [Google Scholar] [CrossRef]
  31. Zhao, X.Y.; Zong, Q.; Tian, B.L.; Zhang, B.Y.; You, M. Fast Task Allocation for Heterogeneous Unmanned Aerial Vehicles Through Reinforcement Learning. Aerosp. Sci. Technol. 2019, 92, 588–594. [Google Scholar] [CrossRef]
  32. Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 2015 International Conference on Learning Representations (ICLR), Lille, France, 6–11 July 2015. [Google Scholar]
  33. He, K.; Zhang, X.Y.; Ren, S.Q.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
  34. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the 2017 Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 3–9 December 2017. [Google Scholar]
  35. Hussain, R.; Karbhari, Y.; Ijaz, M.F.; Wozniak, M.; Singh, P.K.; Sarkar, R. Revise-Net: Exploiting Reverse Attention Mechanism for Salient Object Detection. Remote Sens. 2021, 13, 4941. [Google Scholar] [CrossRef]
  36. Guo, Q.P.; Qiu, X.P.; Liu, P.F. Star-Transformer. In Proceedings of the 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT), Minneapolis, MN, USA, 5–7 June 2019. [Google Scholar]
  37. Rennie, J.S.; Marcheret, E.; Mroueh, Y.; Ross, J.; Goel, V. Self-Critical Sequence Training for Image Captioning. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 22–25 July 2017. [Google Scholar]
  38. Sutton, R.S.; McAllester, D.; Singh, S.; Mansour, Y. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In Proceedings of the 1999 Neural Information Processing Systems (NIPS), Denver, CO, USA, 29 November–4 December 1999. [Google Scholar]
  39. Zhou, N.; Lau, L.; Bai, R.; Moore, T. A Genetic Optimization Resampling Based Particle Filtering Algorithm for Indoor Target Tracking. Remote Sens. 2021, 13, 132. [Google Scholar] [CrossRef]
  40. Huang, J.; Xing, Y.; You, H.; Qin, L.; Tian, J.; Ma, J. Particle Swarm Optimization-Based Noise Filtering Algorithm for Photon Cloud Data in Forest Area. Remote Sens. 2019, 11, 980. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Illustration of mission planning in a large UAV cluster.
Figure 1. Illustration of mission planning in a large UAV cluster.
Remotesensing 14 01304 g001
Figure 2. The model structure of DIRL-SAM.
Figure 2. The model structure of DIRL-SAM.
Remotesensing 14 01304 g002
Figure 3. Process of the encoding.
Figure 3. Process of the encoding.
Remotesensing 14 01304 g003
Figure 4. The construction of AM and SAM: (a) the interaction process of AM; (b) the interaction process of SAM.
Figure 4. The construction of AM and SAM: (a) the interaction process of AM; (b) the interaction process of SAM.
Remotesensing 14 01304 g004
Figure 5. The combat situation of UAVs mission planning.
Figure 5. The combat situation of UAVs mission planning.
Remotesensing 14 01304 g005
Figure 6. The comparison results of different algorithms: (a) GA for missions 1 and 2; (b) GA for missions 3 and 4; (c) GA for missions 5 and 6; (d) PSO for missions 1 and 2; (e) PSO for missions 3 and 4; (f) PSO for missions 5 and 6; (g) RL-SAM for missions 1 and 2; (h) RL-SAM for missions 3 and 4; (i) RL-SAM for missions 5 and 6; (j) DIRL-AM for missions 1 and 2; (k) DIRL-AM for missions 3 and 4; (l) DIRL-AM for missions 5 and 6; (m) DIRL-SAM for missions 1 and 2; (n) DIRL-SAM for missions 3 and 4; (o) DIRL-SAM for missions 5 and 6.
Figure 6. The comparison results of different algorithms: (a) GA for missions 1 and 2; (b) GA for missions 3 and 4; (c) GA for missions 5 and 6; (d) PSO for missions 1 and 2; (e) PSO for missions 3 and 4; (f) PSO for missions 5 and 6; (g) RL-SAM for missions 1 and 2; (h) RL-SAM for missions 3 and 4; (i) RL-SAM for missions 5 and 6; (j) DIRL-AM for missions 1 and 2; (k) DIRL-AM for missions 3 and 4; (l) DIRL-AM for missions 5 and 6; (m) DIRL-SAM for missions 1 and 2; (n) DIRL-SAM for missions 3 and 4; (o) DIRL-SAM for missions 5 and 6.
Remotesensing 14 01304 g006aRemotesensing 14 01304 g006b
Figure 7. The training process of DIRL-SAM and RL-SAM: (a) training process of DIRL-SAM; (b) training process of RL-SAM.
Figure 7. The training process of DIRL-SAM and RL-SAM: (a) training process of DIRL-SAM; (b) training process of RL-SAM.
Remotesensing 14 01304 g007
Table 1. The common optimization algorithms for UAV mission planning.
Table 1. The common optimization algorithms for UAV mission planning.
AlgorithmsConceptAdvantageLimitation
MathematicsThe mathematical algorithms transform the problem into a mathematical programming model. Based on the constructing optimization model and the objective function, choosing special mathematical methods to solve the problem, i.e., gradient descent, dynamic programming algorithm.High Solving precision.
Strong interpretability.
Poor generalization ability in the complex optimization problem, i.e., nonconvex problem model, multiple complex objective functions.
Time consumption in the large-scale variable problem.
Heuristic The heuristic algorithms design greedy optimization rules according to the problem’s characteristics and the researchers’ experience and knowledge, and iterate the predefined optimization process to obtain feasible solutions until the special convergence rules meet. Not dependent on initial conditions of solution.
Robustness for the solution domain of the problem, i.e., non-differentiability, discontinuity.
Easy to realize.
Time consumption and limitation performance of algorithms in the large-scale variable problem.
Artificial intelligence The artificial intelligence algorithms use the deep neural network to extract the high dimension features of problem data and output the solutions with the designing decoding policy. Unsupervised training methods are employed to optimize the total parameters of the model.Fast solving speed and good generalization ability in large-scale problems.Extensive training process.
Limitation robustness in a complex problem model, i.e., both containing continuous and discrete variables, complex multiple constraint conditions.
Table 2. Main parameters of GA.
Table 2. Main parameters of GA.
GA Algorithm
Population size100
Hybrid probability0.9
Mutation probability0.1
Number of iterations3000
Table 3. Main parameters of PSO.
Table 3. Main parameters of PSO.
GA Algorithm
Population size100
Inertia weight0.8
Learning factor 11.5
Learning factor 21.5
Number of iterations3000
Table 4. Training data settings.
Table 4. Training data settings.
UAV Training Data
a t t ( U i ) ( 0 , 1 )
e l e ( U i ) ( 0 , 1 )
d e f ( U i ) { 10 , 20 }
l o c ( U i ) ( 0 , 5 )
N 100
Mission Training Data
A t t ( T k ) ( 0 , 3 )
E l e ( T k ) ( 0 , 3 )
D e f ( T k ) { 10 , 20 }
L o c ( T k ) ( 0 , 5 )
E n ( T k ) ( 0.5 , 1 )
W k w 1 k , w 2 k , w 3 k ( 0 , 1 )
K 10
Table 5. Mission settings.
Table 5. Mission settings.
MissionPriority Rank Att E l e D e f L o c w 1 w 2 w 3 E n
mission 111.62.420(1.0, 1.5)0.10.10.80.8
mission 222.72.720(4.0, 4.0)0.20.20.60.9
mission 331.41.410(4.0, 1.0)0.30.30.40.8
mission 442.82.810(2.5, 2.5)0.30.30.40.7
mission 551.82.710(1.0, 4.0)0.80.10.10.9
mission 662.71.810(3.0, 2.0)0.80.10.10.9
Table 6. Optimization results of different algorithms.
Table 6. Optimization results of different algorithms.
AlgorithmsMission 1Mission 2Mission 3Mission 4Mission 5Mission 6Total ValueRunning Time
GA2.533.822.9811.255.235.0730.8833.98 s
PSO2.993.843.2811.625.365.1232.2145.32 s
RL-SAM10.548.464.9513.525.787.3750.622.10 s
DIRL-AM2.383.822.7511.074.994.8329.843.06 s
DIRL-SAM2.383.822.7511.075.094.9030.011.98 s
Table 7. Compared results of different methods in different numbers of UAVs.
Table 7. Compared results of different methods in different numbers of UAVs.
Algorithms150 UAVs200 UAVs
Value of   F Running TimeValue of   F Running Time
GA6.514.43 s8.326.31 s
PSO6.676.38 s8.859.53 s
RL-SAM49.050.39 s67.430.48 s
DIRL-AM5.570.63 s4.930.94 s
DIRL-SAM5.570.41 s5.070.57 s
Table 8. Compared results of different methods.
Table 8. Compared results of different methods.
AlgorithmsTest 1Test 2Test 3Test 4Test 5Test 6
Exhaustive method8.85/40.1 s7.70/84.51 s13.55/168.59 s4.83/341.24 s5.45/682.24 s6.34/1676.28 s
GA8.85/1.97 s7.70/1.62 s13.55/1.96 s4.83/2.01 s5.45/1.82 s6.34/1.71 s
PSO8.85/2.13 s7.70/2.18 s13.55/2.23 s4.83/2.28 s5.45/2.31 s6.34/2.53 s
DIRL-SAM9.08/0.32 s7.76/0.24 s13.58/0.35 s4.83/0.19 s5.46/0.28 s6.34/0.33 s
Random19.21/0 s18.32/0 s21.53/0 s18.15/0 s14.64/0 s17.66/0 s
Table 9. Mission settings in small-scale problems.
Table 9. Mission settings in small-scale problems.
TestNumber of UAVs Att E l e L o c w 1 w 2 w 3 E n
Test 1203.44.1(4.55, 0.30)0.50.30.21
Test 2212.54.8(3.80, 2.40)0.20.40.31
Test 3224.05.3(3.85, 4.95)0.20.30.51
Test 4232.42.2(1.40, 1.90)0.50.10.41
Test 5243.12.5(2.75, 1.80)0.50.40.11
Test 6251.65.4(3.50, 3.90)0.10.30.61
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Zuo, L.; Gao, S.; Li, Y.; Li, L.; Li, M.; Lu, X. A Fast and Robust Algorithm with Reinforcement Learning for Large UAV Cluster Mission Planning. Remote Sens. 2022, 14, 1304. https://0-doi-org.brum.beds.ac.uk/10.3390/rs14061304

AMA Style

Zuo L, Gao S, Li Y, Li L, Li M, Lu X. A Fast and Robust Algorithm with Reinforcement Learning for Large UAV Cluster Mission Planning. Remote Sensing. 2022; 14(6):1304. https://0-doi-org.brum.beds.ac.uk/10.3390/rs14061304

Chicago/Turabian Style

Zuo, Lei, Shan Gao, Yachao Li, Lianghai Li, Ming Li, and Xiaofei Lu. 2022. "A Fast and Robust Algorithm with Reinforcement Learning for Large UAV Cluster Mission Planning" Remote Sensing 14, no. 6: 1304. https://0-doi-org.brum.beds.ac.uk/10.3390/rs14061304

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop