A Heuristic Construction Neural Network Method for the Time-Dependent Agile Earth Observation Satellite Scheduling Problem

Chen, Jiawei; Chen, Ming; Wen, Jun; He, Lei; Liu, Xiaolu

doi:10.3390/math10193498

Open AccessArticle

A Heuristic Construction Neural Network Method for the Time-Dependent Agile Earth Observation Satellite Scheduling Problem

¹

School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA

²

College of Systems Engineering, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(19), 3498; https://0-doi-org.brum.beds.ac.uk/10.3390/math10193498

Submission received: 26 August 2022 / Revised: 16 September 2022 / Accepted: 20 September 2022 / Published: 25 September 2022

Download

Browse Figures

Versions Notes

Abstract

:

The agile earth observation satellite scheduling problem (AEOSSP), as a time-dependent and arduous combinatorial optimization problem, has been intensively studied in the past decades. Many studies have proposed non-iterative heuristic construction algorithms and iterative meta-heuristic algorithms to solve this problem. However, the heuristic construction algorithms spend a relatively shorter time at the expense of solution quality, while the iterative meta-heuristic algorithms accomplish a high-quality solution with a lot of time. To overcome the shortcomings of these approaches and efficiently utilize the historical scheduling information and task characteristics, this paper introduces a new neural network model based on the deep reinforcement learning and heuristic algorithm (DRL-HA) to the AEOSSP and proposes an innovative non-iterative heuristic algorithm. The DRL-HA is composed of a heuristic construction neural network (HCNN) model and a task arrangement algorithm (TAA), where the HCNN aims to generate the task planning sequence and the TAA generates the final feasible scheduling order of tasks. In this study, the DRL-HA is examined with other heuristic algorithms by a series of experiments. The results demonstrate that the DRL-HA outperforms competitors and HCNN possesses outstanding generalization ability for different scenario sizes and task distributions. Furthermore, HCNN, when used for generating initial solutions of meta-heuristic algorithms, can achieve improved profits and accelerate interactions. Therefore, the DRL-HA algorithm is verified to be an effective method for solving AEOSSP. In this way, the high-profit and high-timeliness of agile satellite scheduling can be guaranteed, and the solution of AEOSSP is further explored and improved.

Keywords:

agile satellite scheduling; deep reinforcement learning; heuristic construction neural network; task arrangement algorithm

MSC:

90C27; 68T05; 68T20

1. Introduction

The rapid development of space technology has greatly improved the performance of earth observation satellites, enabling them to carry more payloads and have better maneuverability. As a new generation of imaging satellites, an agile satellite with the capabilities of roll, pitch, and yaw possesses greater mobility than traditional satellites. The observable range of the agile satellite is a zonal area with the centerline which is the sub-satellite track. Its roll angle and pitch angle jointly determine the observable range. In Figure 1, the observable range of the satellite is the area surrounded by the pitch boundary at the front and rear and the roll boundary at the left and right. Additionally, the upgrades of these instruments allow that the time shooting an observed target could be not completely constrained by satellite over-the-top time, thereby extending the visible time window for an observed target. In this way, the starting observation time can be chosen among a longer time interval. However, when converting to a new observation, the satellite requires some time, which we call transition time in this paper, to maneuver from a previous position to another one [1]. Any change in observation angles and positions between two consecutive tasks affects the starting time of the task, rendering this problem highly time-dependent. Thus, greater mobility creates more flexible observation angles and time-dependent features, magnifying the difficulty of the AEOSSP.

Many studies have proposed non-iterative heuristic construction algorithms and iterative metaheuristic algorithms to solve the AEOSSP. When designing these two types of heuristic algorithms, scholars should comply with some principles, such as the earliest start principle [2], the least possible conflict principle [3], and the maximum imaging quality principle [4], etc. With these principles, Lemaitre et al. [5] studied the Pleiades agile satellite launched by the French Space Agency and comprehensively described the AEOSSP for the first time. Based on Lemaitre’s research, numerous related algorithms have been proposed.

The heuristic construction algorithm contains a systematic structure and does not generate current status information in the solution process. With these algorithms, the accepted approximate solutions can be obtained within a short time. Many studies were made for developing the heuristic construction algorithms by establishing distinct models [6,7,8] with different theories and introducing indicators based on different task features [9]. However, because these algorithms’ systematic structures do not update the historical scheduling information in the solution process, global information features are lost and the AEOSSP is not solved effectively. For addressing these defects, the iterative metaheuristic algorithms that evaluate the local solution and update the historical scheduling information are tendered. They are similar to genetic and simulated annealing algorithms with certain randomness and generality. Because genetic algorithms are highly adaptable when combined with other models and algorithms, most of them belong to the class of improved genetic algorithms [10,11,12,13], raising the quality of the solutions. Furthermore, while some algorithms are designed for single-objective AEOSSPs, metaheuristic algorithms can also be utilized for multi-objective optimization [12,14,15]. With these iterative metaheuristic algorithms, the solutions are outstanding, but the surplus time and space complexity cause their inferior efficiency.

In conclusion, there are some inevitable weaknesses in both two types of algorithms. The heuristic construction algorithms spend a relatively shorter time at the expense of solution quality, while the iterative metaheuristic algorithms accomplish a high-quality solution with a lot of time. The traditional heuristic algorithms usually calculate the priority of each task according to the task characteristics and then schedules them based on the acquired priority. The static characteristics of the task are usually adopted in the heuristic functions. In general, when the observation target is determined, the constant parameters obtained through simulation calculation are considered as static characteristics, such as the return of the task, the start time of the visual time window (VTW), the duration time of the task, etc. During the whole scheduling process, these characteristics remain a fixed value. The flow of this type of algorithm is shown in Figure 2. Firstly, the scheduling scenario is initialized. Then, the priority sequence of the tasks to be observed is obtained through the heuristic function.

Generally, only static characteristics of the task are taken into consideration in the traditional heuristic algorithm. Since its solution process does not produce current state information, there is no historical scheduling information, which is significant to improve the effectiveness of the solution. In addition, most of the global information characteristics would be lost.

Based on the existing research about the development of artificial intelligence, the new neural network models provide new ideas for solving AEOSSP problems. The AEOSSP aims to rationally allocate limited resources to observation tasks for satisfying the users’ diversified demands and maximizing schedule profits, which is a classical combinational optimization problem. Many studies [16,17,18,19,20] were conducted to verify the effectiveness of neural network models for solving scheduling problems. Among them, the deep reinforcement learning utilized in the heuristic algorithm (DRL-HA) [21] was gradually developed by a series of research. Vinyals et al. [22] proposed a printer network (PN), applied to the sequence-to-sequence (Seq2Seq) structure with encoding and decoding as two recurrent neural networks (RNNs). PN applies an attention mechanism, which reaches a one-step decision by analyzing the outputs of encoding and decoding. Bello et al. [23] developed the attention mechanism and used the strategy gradient method and actor-critic algorithm (AC) for training the network model. Nazari et al. [24] simplified the encoding process of the pointer network by abandoning the RNN structure and adopted different attention mechanisms for the decoder. Compared to the previous research, the efficiency is greatly improved. However, there is one inevitable weakness of this model: one vehicle’s dynamic parameter remains an identical value for individual tasks, rendering the deviation of the solution. Kool [21] solved this defect and proposed an innovative attention model (AM) by introducing Transformer, a natural language process. Unlike the previous research, which abandons RNN, this study took Transformer encoder’s multihead-attention (MHA) as the model’s encoder. In this way, the embedding layer is able to gain the individual task characteristic information and enter it into the MHA layer for obtaining global variables of the current task set. Furthermore, this model interacts with the outputs of the encoder and decoder to get the matching degree of tasks, which is beneficial for selecting the suitable task. According to these studies, a neural model based on DRL-HA can be utilized to solve TSP, VRP, backpack, and other NP-hard problems. Therefore, based on these studies, this paper introduces the DRL-HA model to the AEOSSP problems and investigate its validity.

Because the AEOSSP is a NP-hard problem, for the traditional heuristic algorithms, the search difficulty and time to obtain the solution increase sharply with the increase of the problem scale. Thus, the traditional scheduling algorithms are not able to satisfy the requirements regarding the high-profit and high-timeliness of agile satellite scheduling, such as the ability to create rapid adjustments for emergency tasks. Meanwhile, in large-scale scheduling scenarios, the solution time of the traditional intelligent optimization algorithm is unacceptably lengthy. Additionally, in a specific scenario, the demand of user observation has a certain regularity, while the existing algorithms do not take this phenomenon into the consideration. With the continuous application of the DRL-HA in combinational optimization problems, the advantages of high efficiency and effective utilization of historical data are gradually discovered. Therefore, the research on agile satellite scheduling based on DRL-HA is consequential to improve the emergency response ability, rapid response ability and efficiency of satellite–earth observation system, which potentially possess values in theory and application. With this innovative algorithm, when agile satellites implement tasks such as environmental surveillance, disaster relief, and intelligence reconnaissance, the high-profit and high-timeliness of agile satellites tasks can be guaranteed.

From the analysis above, the traditional algorithms are not able to render time and quality to be satisfied simultaneously. When these traditional algorithms are applied, large-scale task scheduling scenarios usually require an unacceptable amount of time. Furthermore, neither the characteristic information of the task set nor historical scheduling scenarios are comprehensively taken into consideration. When switching to a new scheduling scenario, the algorithms search and iterate again, which wastes a large amount of time. Therefore, to rectify these disadvantages, the following problems should be solved:

(1): How to comprehensively deploy dynamic task information and scheduling scenario information;
(2): How to establish the DRL-HA model aiming to solve the AEOSSP;
(3): How to utilize convoluted constraints for establishing an efficient model.

This study utilizes the DRL-HA model to comprehensively deploy the task characteristic information and historical scheduling information for solving AEOSSP with a high-quality solution and relatively less time.

The remainder of this paper is organized as follows. In Section 2, we put forward a mathematical description of the AEOSSP. In Section 3, a heuristic algorithm based on DRL-HA for AEOSSP is introduced in detail. We set up experiments to verify the effectiveness of the proposed algorithm in Section 4, and the experimental results are displayed in Section 5. The conclusions are drawn in Section 6.

2. Problem Analysis and Modeling

2.1. Problem Description and Assumptions

Agile satellites possess prominent capabilities for earth observation, but observation resources are still relatively scarce compared to the increasing observation requirements. The agile satellite scheduling problem (AEOSSP) is a comprehensive planning and scheduling problem, whose purpose is to develop an optimal observation plan for a satellite with the maximum observation profit under the current set of tasks to be observed. To study the ability to solve the AEOSSP by the DRL-HA model, when establishing the corresponding mathematical model, the following assumptions should be made:

(1): It is necessary to decompose the complex tasks such as stereo imaging tasks and regional target tasks into point target tasks for processing because, in this way, the tasks considered in this paper are all point target tasks, which can be completed within the visible time window (VTW). Since the AEOSSP is a time-dependent problem, the VTW for each point target is significant for analysis and modeling.
(2): This paper aims at the single-orbit AEOSSP and neglects the charging process in the sun area and the data download process. These assumptions were made because they are not the conditions of concern for the present research. These excessive conditions are unnecessary and bring dispensable costs.

2.2. Fundamental Objective Function

x_{i}

: the scheduled status of the task. When

x_{i}

is equal to 1, the task is selected to be completed. Otherwise, it is not scheduled since it violates the constraint.

t a s k_{i}

: the i-th task in the task set to be scheduled.

x_{i} = \{\begin{matrix} 1, t a s k_{i} s c h e d u l e d \\ 0, o t h e r w i s e \end{matrix}

(1)

The fundamental object of the AEOSSP is to maximize the total profit obtained from task requirements. The profit is measured by the sum of the profit of the individual observation task, expressed as:

M a x P = m a x \sum_{i}^{N} p_{i} x_{i}

(2)

P: the total profit of the total satellite observation tasks.
$p_{i}$ : the profit of $t a s k_{i}$ .

2.3. Constraints

The observation time of each task should be within VTW of the task:

t_{i}^{s} - t w_{i}^{s} \geq 0

(3)

t w_{i}^{e} - t_{i}^{e} \geq 0

(4)

$t w_{i}$ : the VTW of $t a s k_{i}$ .
$t w_{i}^{s}$ : the start time of $t w_{i}$ .
$t w_{i}^{e}$ : the end time of $t w_{i}$ .
$t_{i}^{s}$ : the actual start time of $t a s k_{i}$ .
$t_{i}^{e}$ : the actual end time of $t a s k_{i}$ .

The actual observation of each task should meet the required duration of the observation:

t_{i}^{e} - (t_{i}^{s} + l_{i}) \geq 0

(5)

$l_{i}$ : the duration time required for $t a s k_{i}$ to be observed.

The time interval between the two adjacent observation tasks

t a s k_{i}

and

t a s k_{j}

in the priority must be longer than the transition time

t r a n s_{i j}

:

t_{j}^{s} - (t_{i}^{e} + t r a n s_{i j}) \geq 0

(6)

$t r a n s_{i j}$ : the transition time from $t a s k_{i}$ to $t a s k_{j}$ where $t a s k_{j}$ is the next task of $t a s k_{i}$ .

The total storage occupied by the observation tasks should be within M.

M - \sum_{i}^{N} m_{i} \geq 0

(7)

M: the total storage of the satellite.
$m_{i}$ : the storage required for a successful observation $t a s k_{i}$ .

The total power consumed by the observation task should be within E.

E - \sum_{i}^{N} e_{i} \geq 0

(8)

E: the power of the satellite.
$e_{i}$ : the power required for a successful observation $t a s k_{i}$ .

2.4. Attitude Transition Angles and Time

In the agile satellites, the transition time is calculated based on attitude angles. In the actual scene, the agile satellite attitude transition is a multi-stage process from slowly accelerating to a constant speed, and lastly decelerating. The required stabilization time should be counted after the attitude control action is completed and the transition time is usually calculated based on the method of attitude angle synthesis. This paper utilizes a piece-wise linear function to approximate the time-dependent attitude transition time. When the total attitude angle changes with few degrees, the attitude maneuver time holds a fixed value. When the total attitude angle change reaches a certain threshold, the linear function is used to calculate the attitude maneuver time.

t r a n s_{i, j}^{k} = \{\begin{matrix} λ_{0}, Δ θ_{i, j} \leq θ_{1} \\ \frac{Δ θ_{i, j}}{v_{1}} + λ_{1}, θ_{1} < Δ θ_{i, j} < θ_{2} \\ \frac{Δ θ_{i, j}}{v_{2}} + λ_{2}, θ_{2} < Δ θ_{i, j} < θ_{3} \\ \frac{Δ θ_{i, j}}{v_{3}} + λ_{3}, θ_{3} < Δ θ_{i, j} < θ_{4} \\ \frac{Δ θ_{i, j}}{v_{4}} + λ_{4}, θ_{4} < Δ θ_{i, j} \end{matrix}

(9)

t w_{i}^{s}, t w_{i}^{e}, p_{i}, l_{i}, t_{i}^{s}, t_{i}^{e}, E, e_{i}, t r a n s_{i, j} \geq 0

(10)

Δ θ_{i, j} = | θ_{i}^{r o l l_s} - θ_{i}^{p i t c h_s} | + | θ_{i}^{r o l l_e} - θ_{i}^{p i t c h_e} |

(11)

$λ_{0}, λ_{1}, λ_{2}, λ_{3}, λ_{4}$ : the required stabilization time for satellite attitude transition at the different stages.
$Δ θ_{i, j}$ : the total attitude angle changes from $t a s k_{i}$ to $t a s k_{j}$ and the threshold value to calculate the attitude transition time.
$θ_{1}, θ_{2}, θ_{3}, θ_{4}$ : the thresholds of attitude transition angles at the different stages.
$v_{1}, v_{2}, v_{3}, v_{4}$ : the attitude transition speeds at the different stages.
$θ_{i}^{r o l l_s}$ : the start roll angle.
$θ_{i}^{r o l l_e}$ : the end roll angle.
$θ_{i}^{p i t c h_s}$ : the pitch angle.
$θ_{i}^{p i t c h_e}$ : the end pitch angle.

According to the research about GRILS [25], Table 1 displays these constants for attitude transition and task parameters.

3. Heuristic Scheduling Algorithm for Deep Reinforcement Learning

The traditional heuristic algorithm usually consists of two parts: calculating the priority of each task and scheduling them according to the priority of each task. According to the the characteristics of the traditional input task, this paper proposes an innovative heuristic algorithm based on the deep reinforcement learning model (DRL-HA).

3.1. DRL-HA

The heuristic scheduling algorithm based on the DRL-HA utilizes a neural network to parameterize the task priority. The framework of the algorithm is shown in Figure 3. Starting from the current scheduling scenario, we firstly initialize the decision point, sort the tasks in the task set through the trained heuristic construction neural network (HCNN) model, and then select the task with the highest priority.

3.1.1. HCNN Model

The HCNN is based on the sequence-to-sequence (Seq2Seq) structure, which includes an encoder, decoder, and global function.

(1): Encoder: the encoder’s objective is to convert the static feature parameters of the to-be-scheduled tasks into a high-dimensional parameter eigenvector by one-dimensional convolution. In this way, the network is able to exhaustively exploit the parameter information of the task to make a reasonable decision. Because each imaging satellite task is independent, the output of the neural network is irrelevant to the sequence of the task data. Therefore, a simple embedding layer is necessary to solve this problem, which additionally reduces the time and space complexity of the encoder model. In this study, the features of a task are composed of the static feature s and the dynamic feature d. The static features are attribute features, including task observation time window $t w_{i}$ , profit $p_{i}$ , duration $l_{i}$ , attitude angle parameters $θ_{i}^{r o l l_s}, θ_{i}^{r o l l_e}, θ_{i}^{p i t c h_s}, θ_{i}^{p i t c h_e}$ , and so on. The dynamic features are the to-be-scheduled status: the value is 0 or 1; 0 means available, while 1 means unavailable.
(2): Global variables: since the eigenvector of an individual task is independent in the encoder, these variables are not able to indicate the interaction features between tasks. Therefore, this study applies an improved multi-head attention for extracting related features. Figure 4 displays the structure of the neural network. By inputting the eigenvector set of the task set, we get the variable G, which contains the historical scheduling information between tasks.
(3): Decoder: unlike the encoder, each decoding step points to the input task. In this research, because selecting the current task is based on previous historical decision information, the HCNN model accomplishes the decoding process through this historical decision information. To obtain the historical decision variables, a neural network module with a memory function is required to realize the acquisition of historical decision information. Fortunately, the RNN network is a great choice for solving this problem. In this algorithm, the gated recurrent unit (GRU) network structure [26] is selected for achieving the extraction of the historical decision information.
(4): Attention mechanism: the fundamental goal of the attention mechanism is to select the information that is the most compatible with the current target and has the highest degree of attention from numerous information. Based on the encoder-decoder structure, this mechanism utilizes the neural network model for completing feature interaction between the outputs from the encoder and decoder. In this way, the related distribution of the next task is obtained. At the softmax layer, this related distribution is able to be converted into the probability distribution diagram of task selection, which is greatly advantageous for selecting the next task.

The overall structure of the HCNN model is shown in Figure 5. The blue cells are the structural parts of this model; the dynamic process, where dynamic parameters are updated during the process, is indicated by the orange arrows and cells; there is one black arrow from the probability function to the embedding layer, representing the task which should be concealed; the grey arrows and cells demonstrate that all parameters are constant.

At the encoder layer, the static parameter eigenvector

H_{s}

and the dynamic parameter eigenvector

H_{d}

are obtained by embedding. At the decoder layer, the historical decision information

h_{t}

is obtained by GRU.

In the HCNN model, this study designs two attention layers for selecting tasks. In the first layer, the total feature parameter M is obtained from the connection layer in formula (12). Furthermore, we utilize the Glimpse network structure [23] to complete feature interactions and attention for

h_{t}

and M. By originally filtering the current task set, the weighted context variable

c_{t}

of the current scene task set is obtained. It should be mentioned that DRL-HA uses a masking mechanism [19] to conceal tasks that have been ordered to ensure that there is no duplicate selection of tasks. In this figure, the only black arrow reflects that this model has been scheduled as s1 and this task should be concealed.

M = w_{1} [H_{s}, H_{d}]

(12)

c_{t} = G l i m p s e (M, h_{t})

(13)

[ $H_{s}, H_{d}$ ], [ $g_{s}^{i}, t_{s}^{i}, c_{t}$ ]: indicating the variables are joined together.
$w_{1}$ , $v_{1}^{i}$ : the to-be-trained network parameters.

The second attention layer achieves feature interaction and attention for the global variable

G_{s}

, the total feature parameter M and the context variable of the task set

c_{t}

for the matching degree of the task. Lastly, all variables are normalized by the softmax layer and the selected probability of the to-be-selected task under the current decision situation is output.

{\tilde{u}}_{t}^{i} = v_{1}^{i} t a n h (w_{2} [g_{s}^{i}, t_{s}^{i}, c_{t}])

(14)

P (y_{t + 1} | Y_{t}, X_{t}) = s o f t m a x ({\tilde{u}}_{t}^{i})

(15)

To guarantee the high-quality solution and the generality of the HCNN model, this paper employs random and greedy strategies for training and testing.

3.1.2. Task Arrangement Algorithm

The task arrangement algorithm (TAA, Algorithm 1) is used to arrange the specific execution time of observation tasks after the task sequence is formed by the HCNN model. In order to decrease the training cost of DRL-HA, this paper adopts the following TAA for model evaluation and training. When the task is selected to be executed, the earliest VTW start time and end time are determined for the task, and this part of the selected time window resources is updated so that they cannot be used by other tasks. TAA mainly includes checking constraint conditions, selecting the task execution location, and updating the time window. The pseudocode of TAA is the following.

Algorithm 1 TAA

Input:
Task priority List $L = {t a s k_{i} | i = 1, 2, . . ., l}$
The available time windows W;
Output:
Task execution List $L^{'}$
Initialize task execution List $L^{'}$
for task in L do
for time window in W do
if task satisfy all constraints then
$L^{'} \leftarrow task$ // Select the most ahead position for task by the earliest VTW start time
end if
Update VTW
end for
end forreturn $L^{'}$

Constraint checking, which is used for determining whether the task can be executed, is the first part of TAA. The tasks will be handled in the observation selection process. Some tasks will be executed, while others will be deleted. In the task scheduling process shown in Figure 6, the task currently being scheduled has a task observable time window, and its length covers the task’s observation duration, so the task is capable of being executed.

In the task arrangement shown in Figure 7, there are two observable time windows for the task being arranged currently. However, the length of these two times windows cannot cover the whole task duration, indicating that the task is not feasible for execution and should be deleted from the to-be-scheduled task set.

3.2. Training Methods

The HCNN model is a decision model, and we apply a policy gradient method and the actor-critic (AC) algorithm (Algorithm 2) [23] to train the parameters of this decision model.

Algorithm 2 Actor-critic algorithm.

1:: Initialize actor network parameters $θ$
2:: Initialize critic network parameters $θ_{c}$
3:: Initialize the priority list L
4:: for $i t e r a r i o n \leftarrow 1, 2, \dots$ do
5:: reset gradients: $d θ \leftarrow 0$ , $d θ_{c} \leftarrow 0$
6:: generate N problem instances from $ϕ$
7:: for $k = 1, 2, \dots N$ do
8:: step counter $t \leftarrow 0$
9:: while not terminated do
10:: select next variable $x_{t + 1}^{k}$ according to the probability distribution $P (y_{t + 1}^{k} | Y_{t}^{k}, X_{t}^{k})$
11:: update new state $X_{t + 1}^{k}$ to $X_{t}^{k}$
12:: $y_{t + 1}$ is added to L
13:: $t \leftarrow t + 1$
14:: end while
15:: calculate the reward $R^{k}$ by algorithm $T A A$ to L
16:: end for
17:: $d θ \leftarrow \frac{1}{N} \sum_{k = 1}^{N} (R^{k} - V (X_{0}^{k}; θ_{c})) \nabla_{θ} log P (Y^{k} | X_{0}^{k})$
18:: $d θ_{c} \leftarrow \frac{1}{N} \sum_{k = 1}^{N} {\nabla_{θ} (R^{k} - V (X_{0}^{k}; θ_{c}))}^{2}$
19:: update $θ$ using $d θ$ and $θ_{c}$ using $d θ_{c}$
20:: end for

The objective of reinforcement learning is to maximize the reward from the environment. However, the output of the HCNN model is a sequence of task priority, which is not an explicit reward. To evaluate the model and obtain a reward appropriately, the profit of the TAA after the HCNN model is selected to be the reinforcement reward function value.

During the training, the parameters

θ

of the HCNN model are updated in each iteration for approximating the optimal parameters

θ^{*}

as closely as possible, which could render the reinforcement reward function value to become the optimal

R^{*}

.

In the training process of each iteration, for each case, the task scheduling sequence is generated by the current HCNN network parameters. Then, the profit of this scheduling sequence is calculated by the TAA algorithm to get the reward value, which reinforces learning needs. The calculation process shown in step 17 is used to update the actor-network, where

V (X_{0}^{k}; θ_{c})

is the output value of the critic with the current parameter

θ_{c}

. After observing the difference between the estimated and the true value, the critic network is updated at step 18.

4. Experiment Study

4.1. Experiment Setup

4.1.1. Dataset

The agile satellite scheduling problem (AEOSSP) is an arduous combinatorial optimization and complex engineering problem. The observation time window of each task and the acquisition of various parameters such as pitch value at each moment within the time window require complex dynamic equation calculations. Considering the lack of a benchmark for the AEOSSP and the requirements of the model training dataset, we designed simulation scenarios and generated a dataset based on Liu’s study [1]. Specifically, we applied the regional observation target used in [1] as our task to be scheduled, and all targets were spread according to a uniform random distribution. On the simulation day, we set eight orbits from the satellite and defined the O-ith to distinguish the ith orbit. The scene where the target can be observed on the first orbit (O-1) was considered the training dataset. A total of 200,000 scenes were generated as the training dataset. The validation dataset consisted of 1000 scenes generated independently.

The maximum pitch, roll, and yaw angles of the satellite were

45^{\circ}

,

45^{\circ}

, and

90^{\circ}

, respectively. The scheduling time range was from 00:00:00 to 24:00:00 on a certain day, a total of 24 h. The six orbital parameters of the satellite were the semimajor axis (a), eccentricity (e), inclination (i), perigee angle (

ω

), right ascension of ascending node (

Ω

), and angle of horizontal approach (m). The satellite orbit parameter setting is shown in the following Table 2.

4.1.2. Competitors

To verify the effectiveness of the algorithm proposed in this paper, we designed the following heuristic algorithms: Random, the earliest visual time window (VTW) start time first (WF), the profits first (PF), the profits/duration first (PDF), and the profits-window first (PWF) algorithm for comparisons. The details are as follows:

(1): Random: Randomly sort all to-be-scheduled tasks, insert the scheduling plan in order, and check whether the constraints are satisfied until the scheduling plan cannot accommodate any task.
(2): WF: Sort all to-be-scheduled tasks from the earliest start time of the task’s VTW to the latest start time of the task’s VTW, insert the scheduling plan in order, and check whether it satisfies constraints until the scheduling plan cannot accommodate any task.
(3): PF: Sort all to-be-scheduled tasks from high profit to low profit, insert the scheduling plan in order, and check whether the constraints are satisfied until the scheduling plan cannot accommodate any task.
(4): PDF: Before sorting, divide the profits of all to-be-scheduled tasks by their respective observation duration time to obtain the unit observation time profits of the tasks. Sort these tasks from high unit observation time profit to low unit observation time profit, insert the scheduling plan in order, and check whether the constraints are satisfied until the scheduling plan cannot accommodate any task.
(5): PWF: Sort all to-be-scheduled tasks from high profit to low profit, and when the tasks possess the same profit, sort them from the earliest start time of the task’s VTW to the latest start time of the task’s VTW. Insert the scheduling plan in order, and check whether the constraints are satisfied until the scheduling plan cannot accommodate any task.

In addition, to verify the practicability of the heuristic construction neural network (HCNN) model as the initial solution provider of the meta-heuristic algorithm, we utilized two meta-heuristic representative iterative search algorithms, the tabu search (TS) algorithm [27,28,29,30,31,32], simulated annealing (SA) algorithm [33,34] and greedy randomized iterated search (GRILS) [25] for testing. SA and TS are classical algorithms for solving combinatorial optimization problems but are not designed particularly for solving AEOSSP. GRILS is a state-of-the-art algorithm for AEOSSP since it introduces the domain structure and INSERT operator according to the characteristics of the AEOSSP. The TS, SA and GRILS with the trained HCNN model as the initial solution are named TS-HCNN, SA-HCNN and GRILS-HCNN, respectively. Similarly, the initial solutions with other heuristic algorithms mentioned above are considered “competitors”: the selected algorithms that are compared with the algorithm proposed in this study. Since the other three algorithms are not prevailing and classical, WF and PF are selected as the representative competitors. For example, the TS algorithm with the WF output was selected as the initial solution (TS-W), and the SA algorithm with the PF output was selected as the initial solution (SA-P). In TS and SA, new solutions are both generated by the neighborhood operation, which reverses two random tasks in the current solution. Table 3 displays all competitors.

There are 12 algorithms in the comparison algorithm experiment, including HCNN, WF, PF, GRILS, TS-HCNN, TS-W, TS-P, SA-HCNN, SA-W, SA-P, G-W, and G-P, and the maximum time is set as 10 min. In the TS algorithm, the neighborhood size is equal to the quadruple scale scene, and the local optimum is not allowed in 5 iterations; the initial temperature of SA is

4000^{\circ}

, and the temperature decay rate is 0.999. The SA algorithm stops when the temperature is less than

0 . 001^{\circ}

.

4.2. Algorithm Parameters

HCNN employs the AC algorithm to train the model. The network model parameter settings of the actor in the AC algorithm are shown in Table 4. The network model parameter settings of the critic are shown in Table 5. There is a ReLU activation between two feed-forward layers.

This topic used the Adam optimizer [35] to train the network of participants and reviewers. The initial learning rate

η

was 0.0005, and the number of training epochs was 10. Due to the different scales of training data and GPU utilization, the corresponding batch training scale was used for different scene scales. All experiments in this paper used a single GPU RTX 2080-Ti, i9-9900k CPU, and 64 GB memory. The compiler language was Python 3.7 (Python Software Foundation: Wilmington, DE, USA); the deep learning framework used Pytorch 1.2.0 (Linux Foundation: San Francisco, CA, USA). They are both open sources and free.

5. Experiment Results

In this section, we designed three kinds of experiments to verify the performance of the deep reinforcement learning and heuristic algorithm (DRL-HA): (1) utilizing DRL-HA as a heuristic solving algorithm; (2) testing this algorithm’s data distribution sensitivity; (3) utilizing the HCNN as an initial solution of the meta-heuristic algorithm. The algorithm evaluation requires the following indicators:

(1): The average task total profits (P). When an algorithm can obtain a better solution in a large number of scenarios, the superiority of high-quality solutions and the stability of the algorithm are verified.
(2): The total running time of the algorithm (T). The running time of the algorithm is the required running time for obtaining the scheduling sequence. It fully reflects the timeliness performance of the algorithm for the agile satellite problem (AEOSSP). In addition, as the scale of the problem increases, the time complexities of the different algorithms become different, and the applicable scenarios of the algorithm will also change accordingly. It should be mentioned that the time of the DRL-HA algorithm refers to the time that the trained network runs on the test set. For the following data of T, its unit is second(s).

5.1. Results of DRL-HA

To validate the generalization capabilities of the DRL-HA, we trained 25-, 50-, 75-, 100-, 125-, and 150-size HCNN models for DRL-HA. These sizes of model are representative, and the tests with these sizes can comprehensively reflect the generalization capabilities. For the test data, we generate 100 instances with 25, 50, 75, 100, 125, 150, 175, and 200 tasks, respectively, according to the O-1 dataset, which is the same as the training data.

The average task total profits and total running time of DRL-HA and its competitors are shown in Table 6 and Table 7. In these two result tables, H-25 is used to represent the HCNN model trained under the scale of 25, and the HCNN models trained by other distinct scales are expressed in the same way.

From the results, it can be seen that the DRL-HA algorithm can obtain the best value in eight scenarios with different scales, and the best HCNN model is

9.15 %

higher than the best competitor. The average results of 100 scenarios also fully reflect the stability and superiority of the DRL-HA to agile satellite scheduling scenarios. Figure 8 displays the average profits and total time for the HCNN at different size. The HCNN models trained by different scale datasets have similar performance when meeting the same scale test scenarios, indicating that there is no essential difference in the performances of HCNN models trained on different scales. Additionally, the Figure 9a shows the average profits of each algorithm in different scale scenarios. Obviously, the trained HCNN model can directly solve scheduling scenarios with different task scales and with great performance.

In terms of the running time, the experimental results of each algorithm in different scale scenarios are shown in Figure 9b. The data for “HCNN” are the average values from all HCNN models. Obviously, the running time of the DRL-HA algorithm matches the level of the magnitude and the calculation trend compared to other heuristic algorithms. Therefore, the DRL-HA algorithm does not increase the time complexity.

In summary, DRL-HA not only accomplishes better performance in terms of task profits but also has the advantage of running time, which demonstrates the high efficiency of the DRL-HA as a solving algorithm.

5.2. Results of Sensitivity Experiments

To validate the generalization capability of DRL-HA under different task distributions, we trained it at the scheduling scenarios with scales of 50and 100 in this paper, and 100 scheduling instances were generated for the other seven orbits (distinguished by the number of orbits, where O-1 is the test/training orbit). These scales of samples are representative from the small to large scale. Since this test was designed mainly for comparing competitors instead of performance on different scales, these samples are appropriate. The random algorithm, WF, PF, PDF, and PWF were compared with the HCNN.

In order to study the performance in a single-orbit with a large-scale task scenario, this experiment used a scale of 100 single orbit scheduling instances and generated 100 sets of scheduling instances for different orbits. The average task total profits and the total running time of different orbits in different scenarios are shown in Table 8 and Table 9, while Figure 10 displays the line trend charts. From these tables and figures, at every orbit and different scenario sizes, the HCNN obviously accomplishes more average profits than other algorithms. In addition, the HCNN works better with the increasing scenario sizes. For example, the HCNN achieves

24.74 %

and

139.86 %

better average profits than Random in G2 for the 25 scenario task and 100 scenario task, respectively. Therefore, the DRL-HA model achieves better performance than other algorithms and the best results in large-scale instances in different orbits, indicating that the model has great generalization ability for different scenario sizes and task distributions.

5.3. Results of HCNN as the Initial Solution

To validate the generalization capabilities of the HCNN as the initial solution of the iterative meta-heuristic algorithm, we conducted comparative experiments with TS and SA algorithms with task scales of 50, 100, 150, and 200. These scales of samples are representative and sufficient for comparing two algorithms. Since these algorithms possess randomness, we ran them ten times and calculated their average results. For the tabu search (TS), when the historical optimal solution is not updated for 100 iterations, this algorithm stops and outputs the historical optimal solution. Each iteration of the TS algorithm searches multiple domain solutions, but only one single algorithm is performed in the simulated annealing (SA) algorithm. Thus, we set the iteration to have no improvement for 1000 times for SA. In addition, due to the randomness of these meta-heuristic algorithms, we ran these algorithms 30 times and calculated the standard deviation (std.) to compare their stability. The parameters of the greedy randomized iterated search (GRILS) are almost identical with the study [25]. However, since this paper focuses on the AEOSSP for the single orbit, and the maximum number of completed tasks is much less than that of [25], we therefore set NumoFlterNolmp”, which is the number of interactions without the updated optimal solution, as 10.

The average task total profits, the standard deviation, and the total running time of HCNN and its competitors are shown in Table 10. From this table, it can be seen that the profit of the HCNN is obviously higher than those of the other competitors; for example, G-HCNN achieves more profits than G-W and G-P. Besides, the std. of the HCNN is mostly smaller than its competitors, indicating that the HCNN can maintain a stable and high-quality initial solution for different task scales and corresponding meta-heuristic algorithms. Figure 11 displays the average profits for all algorithms as an initial solution, showing that the performance of the algorithms with GRILS is better than that of other competitors, and the G-HCNN completes the most average profits. For example, in size-25, because G-W achieves more

17.65 %

average profits than TS-W and G-P achieves more

20.67 %

average profits than SA-P, the GRILS algorithm reveals its superior performance to the other two algorithms. Furthermore, in size-200, TS-HCNN achieves

23.33 %

and

10.21 %

improved performance compared to TS-W and TS-P, respectively, suggesting that the HCNN model as the initial solution can better ameliorate the performance of the iterative meta-heuristic algorithm than other two algorithms.

Although the results of three different GRILS algorithms are similar, the advantages of G-HCNN can still be discovered. To further explore the differences among these three algorithms, we count the average number of iterations of ten solution processes, shown in Figure 12. From this figure, it can be obtained that G-HCNN maintains the advantage of quality during the algorithm operations, because the profit quality at the early iterations of GRILS achieves the same level of the profit quality at the later iterations of other competitors, and the total number of iterations are relatively smaller. This finding demonstrates that HCNN can output the effective task priority sequence, select high-quality tasks, and provide high priority for observation, which accelerates the iterations of GRILS. Furthermore, during the initialization of the algorithms, the quality of the initial solutions by the three algorithms has been improved, which benefits from the INSERT operator of GRILS algorithm. The INSERT operator is able to arrange tasks for observation as much as possible but causes the algorithm to iterate slowly. In the later iterations, the profit gap among three algorithms gradually decreases, indicating the eminent optimization ability of GRILS in the AEOSSP.

In summary, the trained HCNN model as an initial solution of the iterative meta-heuristic algorithm is not only able to accomplish better profits than competitors but also costs a reduced number of iterations compared to others.

6. Conclusions and Future Directions

In this paper, a heuristic scheduling algorithm based on the deep reinforcement learning and heuristic algorithm (DRL-HA) is proposed based on the historical scheduling and task characteristic information to solve the agile earth observation satellite scheduling problem (AEOSSP) with high efficiency. The final scheduling sequence is obtained by combining the DRL-HA model trained by the heuristic construction neural network (HCNN) and the task arrangement algorithm (TAA). Three kinds of experiments verify that the algorithm has outstanding performance and generalization ability. Among them, experiments of instances with different scales show that (1) the profit of the DRL-HA algorithm is

9.15 %

higher than that of the best heuristic algorithm, on average; (2) moreover, this algorithm model has a good generalization ability, demonstrating this algorithm process’ outstanding performance in the experiments with different task scales and task distributions; (3) the algorithm is capable of providing high-quality initial solutions with a smaller number of iterations compared to results from other competitors. The model trained under a fixed scale and task distribution is able to deal with other scenarios with eminent performance and maintains the superiority of the algorithm.

The DRL-HA model proposed in the paper exhaustively unitizes historical scheduling information and task feature information to obtain the prominent scheduling sequence. Therefore, the DRL-HA algorithm is verified to be an effective method for solving AEOSSP. In this way, the high profit and high timeliness of agile satellite scheduling can be guaranteed, and the solution of AEOSSP is further explored and improved. Although this model requires the support of hardware and much time for training, its great generalization capability renders it suitable for different scenarios, and producing a high-quality initial solution provides this model the ability to be combined with other algorithms. In future work, we will try to combine deep reinforcement learning with the single-satellite algorithm to accomplish refined task allocation for multi-satellite scheduling problems.

Author Contributions

Conceptualization, J.C., M.C. and X.L.; Data curation, J.C. and J.W.; Formal analysis, J.C. and L.H.; Funding acquisition, L.H.; Investigation, J.C. and M.C.; Methodology, J.C., M.C. and L.H.; Project administration, J.C.; Resources, J.C.; Supervision, X.L.; Validation, J.C.; Visualization, J.C.; Writing—original draft, J.C., M.C. and J.W.; Writing—review & editing, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Hunan Provincial Innovation Foundation for Postgraduate CX20210034 and the National Natural Science Foundation of China (72001212).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Liu, X.; Laporte, G.; Chen, Y.; He, R. An adaptive large neighborhood search metaheuristic for agile satellite scheduling with time-dependent transition time. Comput. Oper. Res. 2017, 86, 41–53. [Google Scholar] [CrossRef]
Wang, X.; Han, C.; Zhang, R.; Gu, Y. Scheduling multiple agile earth observation satellites for oversubscribed targets using complex networks theory. IEEE Access 2019, 7, 110605–110615. [Google Scholar] [CrossRef]
Liu, Z.; Feng, Z.; Ren, Z. Route-reduction-based dynamic programming for large-scale satellite range scheduling problem. Eng. Optim. 2019, 51, 1944–1964. [Google Scholar] [CrossRef]
Beaumet, G.; Verfaillie, G.; Charmeau, M. Feasibility of autonomous decision making on board an agile earth-observing satellite. Comput. Intell. 2011, 27, 123–139. [Google Scholar] [CrossRef]
Lemaıtre, M.; Verfaillie, G.; Jouhaud, F.; Lachiver, J.M.; Bataille, N. Selecting and scheduling observations of agile satellites. Aerosp. Sci. Technol. 2002, 6, 367–381. [Google Scholar] [CrossRef]
De Florio, S. Performances optimization of remote sensing satellite constellations: A heuristic method. In Proceedings of the 5th International Workshop on Planning and Scheduling for Space (IWPSS 2006), Baltimore, MD, USA, 22–25 October 2006. [Google Scholar]
Wang, P.; Reinelt, G.; Gao, P.; Tan, Y. A model, a heuristic and a decision support system to solve the scheduling problem of an earth observing satellite constellation. Comput. Ind. Eng. 2011, 61, 322–335. [Google Scholar] [CrossRef]
Han, C.; Wang, X.-W.; Chen, Z. Scheduling for single agile satellite, redundant targets problem using complex networks theory. Chaos Solitons Fractals 2016, 83, 125–132. [Google Scholar]
Liang, X.; Wang, H.; Xu, R.; Chen, H. Priority-based constructive algorithms for scheduling agile earth observation satellites with total priority maximization. Expert Syst. Appl. 2016, 51, 195–206. [Google Scholar]
Yuan, Z.; Chen, Y.; He, R. Agile earth observing satellites mission planning using genetic algorithm based on high quality initial solutions. In Proceedings of the 2014 IEEE Congress on Evolutionary Computation (CEC), Beijing, China, 6–11 July 2014; pp. 603–609. [Google Scholar]
Li, Y.; Xu, M.; Wang, R. Scheduling observations of agile satellites with combined genetic algorithm. In Proceedings of the Third International Conference on Natural Computation, Haikou, China, 24–27 August 2007. [Google Scholar]
Lopez, P.; Tangpattanakul, P.; Jozefowiez, N. Multi-objective optimization for selecting and scheduling observations by agile earth observing satellites. In Proceedings of the Third International Conference on Natural Computation, Espoo, Finland, 18–22 October 2012. [Google Scholar]
Chen, M.; Wen, J.; Song, Y.J.; Xing, L.N.; Chen, Y.W. A population perturbation and elimination strategy based genetic algorithm for multi-satellite tt&c scheduling problem. Swarm Evol. Comput. 2021, 65, 100912. [Google Scholar]
Li, J.; Jing, N.; Emmerich, M.; Li, L.; Chen, H. Preference-based evolutionary many-objective optimization for agile satellite mission planning. IEEE Access 2018, 6, 40963–40978. [Google Scholar] [CrossRef]
Li, X.; Li, Z.; Chen, H. A multi-objective binary-encoding differential evolution algorithm for proactive scheduling of agile earth observation satellites. Adv. Spa Res. 2019, 63, 3258–3269. [Google Scholar] [CrossRef]
He, L.; Weerdt, M.D.; Yorke-Smith, N. Time/sequence-dependent scheduling: The design and evaluation of a general purpose tabu-based adaptive large neighbourhood search algorithm. J. Intell. Manuf. 2020, 31, 1051–1078. [Google Scholar] [CrossRef]
Hopfield, J.J.; Tank, D.W. “Neural” computation of decisions in optimization problems. Biol. Cybern. 1985, 52, 141–152. [Google Scholar] [CrossRef] [PubMed]
Hu, H.; Zhang, X.; Yan, X.; Wang, L.; Xu, Y. Solving a new 3d bin packing problem with deep reinforcement learning method. arXiv 2017, arXiv:1708.05930. [Google Scholar]
Chen, M.; Chen, Y.; Chen, Y.; Qi, W. Deep reinforcement learning for agile satellite scheduling problem. In Proceedings of the 2019 IEEE Symposium Series on Computational Intelligence (SSCI), Xiamen, China, 6–9 December 2019. [Google Scholar]
Chen, M.; Chen, Y.; Du, Y.; Wei, L.; Chen, Y. Heuristic algorithms based on deep reinforcement learning for quadratic unconstrained binary optimization. Knowl.-Based Syst. 2020, 207, 106366. [Google Scholar] [CrossRef]
Kool, W.; Van Hoof, H.; Welling, M. Attention, learn to solve routing problems! arXiv 2018, arXiv:1803.08475. [Google Scholar]
Vinyals, O.; Fortunato, M.; Jaitly, N. Pointer networks. In Proceedings of the Advances in Neural Information Processing Systems (NIPS 2015), Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
Bello, I.; Pham, H.; Le, Q.V.; Norouzi, M.; Bengio, S. Neural combinatorial optimization with reinforcement learning. arXiv 2016, arXiv:1611.09940. [Google Scholar]
Nazari, M.; Oroojlooy, A.; Snyder, L.V.; Takáč, M. Reinforcement learning for solving the vehicle routing problem. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS 2018), Montreal, QC, Canada, 3–8 December 2018. [Google Scholar]
Peng, G.; Song, G.; He, Y.; Yu, J.; Vansteenwegen, P. Solving the agile earth observation satellite scheduling problem with time-dependent transition times. IEEE Trans. Syst. Man Cybern. Syst. 2020, 52, 1614–1625. [Google Scholar] [CrossRef]
Cho, K.; Van Merrienboer, B.; Gulcehre, C.; Ba Hdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
Wang, Y.; Lue, Z.; Glover, F.; Hao, J.K. Probabilistic grasp-tabu search algorithms for the ubqp problem. Comput. Oper. Res. 2013, 40, 3100–3107. [Google Scholar] [CrossRef]
Glover, F.; Kochenberger, G.; Alidaee, B.; Amini, M.M. Tabu search with critical event memory: An enhanced application for binary quadratic programs. In Meta-Heuristics; Springer: Boston, MA, USA, 1999; pp. 93–109. [Google Scholar]
Glover, F.; Ye, T.; Punnen, A.P.; Kochenberger, G. Integrating tabu search and vlsn search to develop enhanced algorithms: A case study using bipartite boolean quadratic programs. Eur. J. Oper. Res. 2015, 241, 697–707. [Google Scholar] [CrossRef]
Palubeckis, G. Multistart tabu search strategies for the unconstrained binary quadratic optimization problem. Ann. Oper. Res. 2004, 131, 259–282. [Google Scholar] [CrossRef]
Glover, F.; Lü, Z.; Hao, J.K. Diversification-driven tabu search for unconstrained binary quadratic problems. 4OR 2010, 8, 239–253. [Google Scholar] [CrossRef]
Ying, Z. A decomposition-based multi-objective tabu search algorithm for tri-objective unconstrained binary quadratic programming problem. In Proceedings of the IEEE International Conference on Computational Science & Engineering, Guangzhou, China, 21–24 July 2017. [Google Scholar]
Alkhamis, T.M.; Hasan, M.; Ahmed, M.A. Simulated annealing for the unconstrained quadratic pseudo-boolean function. Eur. J. Oper. Res. 1998, 108, 641–652. [Google Scholar] [CrossRef]
Katayama, K.; Narihisa, H. Performance of simulated annealing-based heuristic for the unconstrained binary quadratic programming problem. Eur. J. Oper. Res. 2001, 134, 103–119. [Google Scholar] [CrossRef]
Trottier, L. Off-policy actor-critic. arXiv 2012, arXiv:1205.4839. [Google Scholar]

Figure 1. Diagram of agile satellite performing observation tasks.

Figure 2. Traditional heuristic algorithm flow diagram of static parameter input.

Figure 3. DRL-HA operation process diagram.

Figure 4. Structure of multi-head attention.

Figure 5. The overall structure of the HCNN model.

Figure 6. Task observation arrangement process.

Figure 7. Observation arrangement process for tasks deleted due to constraint violation.

Figure 8. Performance of HCNN at different sizes. (a) Average profits for HCNN at different sizes. (b) Total time for HCNN at different sizes.

Figure 9. Performance of the algorithms for scenarios with different scales. (a) Average profits of the algorithms for scenarios with different scales. (b) Total time of the algorithms for scenarios with different scales.

Figure 10. Performance of HCNN at different sizes. (a) Average profits for the single-orbit scenarios with 50 tasks; (b) Average profits for the single-orbit scenarios with 100 tasks.

Figure 11. The average profit of total tasks of the comparison algorithms under different task scales.

Figure 12. Interactions of GRILS at different sizes. (a) Interactions of GRILS at size 50. (b) Interactions of GRILS at size 100. (c) Interactions of GRILS at size 150. (d) Interactions of GRILS at size 200.

Table 1. Attitude transition and task parameters.

Parameters	Values
Attitude Transition Speeds	$v_{1}$ = 1.5, $v_{2}$ = 2, $v_{3}$ = 2.5, $v_{4}$ = 3
Attitude Stabilization Time	$λ_{0}$ = 11.66, $λ_{1}$ = 5, $λ_{2}$ = 10, $λ_{3}$ = 16, $λ_{4}$ = 22
Thresholds of Attitude Transition Angles	$θ_{1}$ = 10, $θ_{2}$ = 30, $θ_{3}$ = 60, $θ_{4}$ = 90
Duration Time Required for Imaging	l∼N (15, 30)

Table 2. Satellite orbit parameters.

Satellite	a	e	i	$ω$	$Ω$	m
AS-01	7141701.7 km	0.000627	98.5964 deg	95.5069 deg	342.307 deg	125.2658 deg

Table 3. Competitors.

	Random	WF	PF	PDF	PWF
TS	-	TS-W	TS-P	-	-
SA	-	SA-W	SA-P	-	-
GRILS	-	G-W	G-P	-

Table 4. Main parameters of the actor network.

Actor-HCNN
Global: Q,K,V-dim = 128, Head = 8, Layers = 3, Feed forward = 512, Dropout = 0.1
Encoder: Conv-1D( $D_{i n p u t s i z e}$ , Filter = 128, kernel size = 1, stride = 1)
Conv-1D( $D_{i n p u t s i z e} = 1$ , Filter = 128, kernel size = 1, stride = 1)
Decoder: GRU(hidden size = 128, layers = 1, dropout = 0.2)

Table 5. Main parameters of the critic network.

Critic-HCNN
Encoder: Conv-1D( $D_{i n p u t s i z e}$ , Filter = 128, kernel size = 1, stride = 1)
Feed forward layer 1 (input features = 256; output features = 20)
Feed forward layer 2 (input features = 20; output features = 20)
Output layer (input features = 20; output features = 1)

Table 6. Experimental results of scenarios for scale from 25 to 100.

	25		50		75		100
	P	T(s)	P	T(s)	P	T(s)	P	T(s)
H-25	116.37	7.22	160.97	13.04	186.85	22.16	201.59	33.51
H-50	112.82	7.67	163.73	12.8	188.78	22.62	206.08	33.74
H-75	112.24	7.69	162.81	12.84	187.77	23.55	205.39	34.19
H-100	111.99	6.5	161.14	12.98	187.03	23.67	204.74	34.65
H-125	111.28	5.25	161.04	12.99	187.88	22.66	205.74	34.25
H-150	111.26	5.29	160.72	12.69	186.53	22.34	203.66	33.96
Random	85.35	3.68	106.33	9.48	115.64	19.19	121.43	31.97
WF	109.77	2.49	127.56	6.76	136.07	13.67	140.35	22.18
PF	103.65	3.69	145.35	9.44	169.55	19.72	181.51	32.11
PDF	101.61	3.74	143.28	9.71	165.64	20.26	180.11	33.57
PWF	104.39	3.61	148.19	9.12	171.73	18.69	186.01	30.5

Table 7. Experimental results of scenarios with different scales for scale from 125 to 200.

	125		150		175		200
	P	T(s)	P	T(s)	P	T(s)	P	T(s)
H-25	212.2	49.23	220.34	67.84	229.41	85.19	236.23	100.9
H-50	215.36	48.82	223.56	67.6	231.88	78.81	238.37	100.74
H-75	214.79	49.78	225.94	68.05	233.07	78.97	241.19	98.74
H-100	215.14	50.83	224.28	69.75	232.87	80.27	238.42	104.43
H-125	216.61	50.44	227.22	68.45	236.85	78.43	243	97.41
H-150	216.81	49.47	227.05	66.98	237.79	77.4	244.36	102.11
Random	122.98	50.49	129.87	73.42	131.21	95.08	132.99	119.95
WF	145.62	34.75	149.12	50.72	151.67	64.14	155.01	80.98
PF	192.89	50.34	200.64	73.97	207.84	96.27	213.48	120.7
PDF	191.15	53.24	198.9	77.75	205.73	102.33	212.24	126.45
PWF	199.77	47.33	208.01	67.68	218.46	86.34	223.36	107.52

Table 8. Experimental results for the single-orbit scenarios with 25 tasks.

Orbits		O-1	O-2	O-3	O-4	O-5	O-6	O-7	O-8
HCNN	P	112.82	120.55	108.54	110.97	97.97	108.65	112.4	97.39
	T(s)	5.17	5.19	5.09	5.17	5.17	5.26	5.26	5.15
Random	P	84.05	97.7	76.75	82.68	63.12	84.78	86.25	68.78
	T(s)	2.8	2.85	2.78	2.81	2.36	2.78	2.83	2.68
WF	P	105.1	115.51	100.01	103.73	72.72	102.35	105.49	86.57
	T(s)	1.93	1.34	2.02	1.95	1.08	1.44	1.94	1.95
PF	P	102.2	112.02	96.59	99.38	82.15	101.74	102.9	89.84
	T(s)	2.84	2.89	2.75	2.78	2.39	2.76	2.84	2.69
PDF	P	101.33	112	94.3	99.32	83.09	100.2	101.34	87.85
	T(s)	2.89	2.95	2.79	2.82	2.42	2.84	2.88	2.75
PWF	P	103.36	113.06	98.59	100.8	84.02	102.53	103.74	90.96
	T(s)	2.73	2.77	2.63	2.7	2.24	2.67	2.73	2.64

Table 9. Experimental results for the single-orbit scenarios with 100 tasks.

Orbits		O-1	O-2	O-3	O-4	O-5	O-6	O-7	O-8
HCNN	P	206.08	226.75	185.15	195.79	146.8	197.33	196.7	172.29
	T(s)	32.14	32.9	30.17	33.72	29.93	29.63	32.13	32.18
Random	P	119.93	137.38	101.25	112.39	87.03	121.53	115.59	97.92
	T(s)	32.36	35.08	31.35	32.58	25.72	32.47	32.34	34.33
WF	P	135.87	173.02	123.26	131.1	101.94	161.61	136.02	124.81
	T(s)	21.84	16.63	20.22	21.84	8.71	14.67	21.57	23.84
PF	P	183.76	211.46	156.44	174.77	132.43	186.93	181.06	148.41
	T(s)	32.14	34.8	31.46	32.73	26.39	32.56	32.2	34.31
PDF	P	181.26	209.77	154.25	172.28	135.04	183.3	177.97	145.37
	T(s)	33.46	36.47	32.98	34.21	27.74	34.57	33.49	36.08
PWF	P	188.92	217.46	161.33	179.34	139.06	194.55	186.94	152.41
	T(s)	30.05	33.12	29.65	31.05	16.46	30.22	30.51	31.98

Table 10. The average total task profit results of the comparison algorithms under different task scales.

	50			100			150			200
	P	std.	T(s)	P	std.	T(s)	P	std.	T(s)	P	std.	T(s)
WF	128	0.00	0.2	162	0.00	0.5	178	0.00	0.8	163	0.00	1.2
PF	153	0.00	0.3	178	0.00	0.7	187	0.00	1.1	208	0.00	1.5
HCNN	166	0.00	0.8	202	0.00	1.7	228	0.00	2.4	247	0.00	3.2
SA-W	159.8	7.86	1819.11	188.6	12.18	3615.16	205	7.82	5423.85	188.3	12.42	5977.93
SA-P	163.6	5.12	1644.76	197.6	5.59	4413.37	214.1	8.23	5580.30	220.8	7.70	5530.17
SA-HCNN	178.4	2.20	1215.25	205.5	3.69	2665.23	237.7	6.48	3687.63	251.7	3.16	4646.30
TS-W	175.9	4.18	2385.94	221.8	6.01	6065.45	236.1	7.41	6171.87	224.8	7.57	6264.22
TS-P	174.3	5.48	1940.55	215.1	5.45	6054.29	231.1	8.17	6206.29	240.9	6.74	6134.78
TS-HCNN	185.5	2.20	1711.24	225.6	1.81	6061.48	259.3	2.97	6162.82	262.4	3.80	6341.28
G-W	195	3.77	563.68	235.1	5.03	2327.79	250.6	6.44	3109.30	256.9	5.56	4926.20
G-P	197.6	3.88	528.63	234.3	5.18	1899.61	251.9	7.03	2784.32	257.5	5.87	4346.50
G-HCNN	199.6	1.85	512.67	237.5	4.65	1583.61	253.7	3.11	2574.07	262.3	4.49	3946.41

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, J.; Chen, M.; Wen, J.; He, L.; Liu, X. A Heuristic Construction Neural Network Method for the Time-Dependent Agile Earth Observation Satellite Scheduling Problem. Mathematics 2022, 10, 3498. https://0-doi-org.brum.beds.ac.uk/10.3390/math10193498

AMA Style

Chen J, Chen M, Wen J, He L, Liu X. A Heuristic Construction Neural Network Method for the Time-Dependent Agile Earth Observation Satellite Scheduling Problem. Mathematics. 2022; 10(19):3498. https://0-doi-org.brum.beds.ac.uk/10.3390/math10193498

Chicago/Turabian Style

Chen, Jiawei, Ming Chen, Jun Wen, Lei He, and Xiaolu Liu. 2022. "A Heuristic Construction Neural Network Method for the Time-Dependent Agile Earth Observation Satellite Scheduling Problem" Mathematics 10, no. 19: 3498. https://0-doi-org.brum.beds.ac.uk/10.3390/math10193498

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Heuristic Construction Neural Network Method for the Time-Dependent Agile Earth Observation Satellite Scheduling Problem

Abstract

1. Introduction

2. Problem Analysis and Modeling

2.1. Problem Description and Assumptions

2.2. Fundamental Objective Function

2.3. Constraints

2.4. Attitude Transition Angles and Time

3. Heuristic Scheduling Algorithm for Deep Reinforcement Learning

3.1. DRL-HA

3.1.1. HCNN Model

3.1.2. Task Arrangement Algorithm

3.2. Training Methods

4. Experiment Study

4.1. Experiment Setup

4.1.1. Dataset

4.1.2. Competitors

4.2. Algorithm Parameters

5. Experiment Results

5.1. Results of DRL-HA

5.2. Results of Sensitivity Experiments

5.3. Results of HCNN as the Initial Solution

6. Conclusions and Future Directions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI