Next Article in Journal
On a Class of Differential Variational Inequalities in Infinite-Dimensional Spaces
Next Article in Special Issue
Sparse Grid Adaptive Interpolation in Problems of Modeling Dynamic Systems with Interval Parameters
Previous Article in Journal
Approximation of the Constant in a Markov-Type Inequality on a Simplex Using Meta-Heuristics
Previous Article in Special Issue
Automatic Convexity Deduction for Efficient Function’s Range Bounding
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Machine Learning Control Based on Approximation of Optimal Trajectories

1
Federal Research Center “Computer Science and Control” of the Russian Academy of Sciences, 119333 Moscow, Russia
2
Department of Mechanics and Mechatronics, RUDN University, 117198 Moscow, Russia
3
School of Aerospace Engineering, Tsinghua University, Beijing 100084, China
*
Author to whom correspondence should be addressed.
Submission received: 30 December 2020 / Revised: 24 January 2021 / Accepted: 26 January 2021 / Published: 29 January 2021
(This article belongs to the Special Issue Control, Optimization, and Mathematical Modeling of Complex Systems)

Abstract

:
The paper is devoted to an emerging trend in control—a machine learning control. Despite the popularity of the idea of machine learning, there are various interpretations of this concept, and there is an urgent need for its strict mathematical formalization. An attempt to formalize the concept of machine learning is presented in this paper. The concepts of an unknown function, work area, training set are introduced, and a mathematical formulation of the machine learning problem is presented. Based on the presented formulation, the concept of machine learning control is considered. One of the problems of machine learning control is the general synthesis of control. It implies finding a control function that depends on the state of the object, which ensures the achievement of the control goal with the optimal value of the quality criterion from any initial state of some admissible region. Supervised and unsupervised approaches to solving a problem based on symbolic regression methods are considered. As a computational example, a problem of general synthesis of optimal control for a spacecraft landing on the surface of the Moon is considered as supervised machine learning control with a training set.

1. Introduction

Complexity of the control synthesis problems for autonomous robots which must perform the assigned tasks and achieve the set goal, led to new ideas in the control theory. Now, to create a control system for an autonomous robot, this system needs to be trained [1,2], instead of obtaining it by solving some known optimization problems.
To formulate the real problem of mobile robot control, it is needed to describe a large number of different phase constraints. These can be walls, doors between the rooms, windows, columns and other obstacles. For example, a robot has to avoid a column, not to hit on a wall and to get in a door. Now, when control systems for mobile robots are being created, programmers imagine the problems that this robot must face and decide how it should overcome them. Quite a laborious process, but it is quite justified in conditions when control systems were developed on an individual basis for single technical objects, such as spacecraft. However, modern automation and robotization is reaching a broader level and becoming ubiquitous. This trend requires the development of new universal and even automatic approaches to the development of control systems.
Application of symbolic regression methods allows to automatically receive mathematical expressions for control functions. Such mathematical expressions describe how the robot should optimally reach the goal avoiding the obstacles.
Only symbolic regression methods can search structure and parameters of mathematical expression. Other methods, and even artificial neural networks, search only parameters. The searching of control function structure in the control synthesis problem is called machine learning control [1]. This is a new technology in the development of control systems and it has not yet been proposed a rigorous mathematical formulation that defines this approach. In this paper, we propose some mathematical formalization of the machine learning problem (Section 2) and, on the basis of the proposed definitions, we single out a special area of machine learning—machine learning control (Section 3).
One of the main problems of machine learning control is the problem of control synthesis. The paper first presents the general mathematical formulation of the control synthesis problem, and then proposes its numerical formulation, since according to the methodology of machine learning control, the synthesis problem must be solved numerically using symbolic regression methods.
Further in the work in Section 4, we present our approach to solving the problem of machine learning control based on approximation of optimal trajectories. According to the technique of learning firstly it is necessary to create a training set in order to show to learning object what we want of it. For this purpose initially the optimal control problem is solved with the same quality criterion as for the synthesis problem from some different initial conditions. Obtained optimal trajectories are templates for learning. They show what forms of plots for variables must be obtained in the result of control synthesis problem solution and what values of functional must give these solutions. Then, obtained optimal trajectories for different initial conditions are approximated by a numerical method of symbolic regression. The proposed approach of machine learning based on approximation of optimal trajectories is demonstrated in the computational example of general synthesis of optimal control for a spacecraft landing on the surface of the Moon (Section 5).

2. Problem Statement of Machine Learning

Definition 1.
A set of computational procedures, that transforms a vector x from an input space X to a vector y from an output space Y , and there is not any algebraic equation y = f ( x ) for them, is called anunknown function.
For example, the system of ordinary differential equations x ˙ = f ( x ) is an unknown function for a vector of initial conditions x ( 0 ) and a vector of solutions as functions of time x ( t , x ( 0 ) ) , if a general solution is unknown for this differential equation.
The unknown function between input vector x and output vector y is defined as
y = α ( x ) .
Then for differential equations without general solutions, an unknown function has a form
x ( t , x 0 ) = α ( x 0 ) .
Definition 2.
A work area is a subset of input vector space, where the input vectors exist surely and that is used for solving the problem.
The unknown function can be realized by a physical equipment or an experiment. Then unknown function will be called black box, but will be described as (1).
Let a set of input vectors be determined in the work area
X ˜ = { x 1 , , x N } X .
For every input vector, output vector is determined by the unknown function (1)
Y ˜ = { y 1 = α ( x 1 ) , , y N = α ( x N ) } Y .
Definition 3.
A pair of sets,
( X ˜ , Y ˜ ) ,
is called a training set.
It is known that there are supervised and unsupervised machine learning methods.
An unsupervised machine learning problem can be formulated as follows: for some unknown function (1) and a positive small value δ it is necessary to find a function
y = β ( x , q ) ,
where q is a vector of parameters, q = [ q 1 q p ] T , such that x X
y β ( x , q ) δ .
A supervised machine learning problem consequently can be formulated as follows: for some unknown function (1) and a positive small value δ , it is necessary to determine a positive value ε , to build a training set (5) and to find a function (6) such that if the total error for the training sample is less than the given value ε
i = 1 N y i β ( x i , q ) ε ,
then for x * from work area, but not included in the training set x * X and x * X ˜ the following inequation is performed
y * β ( x * , q ) δ ,
where y * = α ( x * ) .
Here the function β ( x , q ) includes a parameter vector q . In many approaches a structure of function is defined beforehand on the basis of experience or intuitively, and it is necessary to find only values of some parameters. For example, an artificial neural network [3,4,5], which is often used for solving of the machine learning problems, has a set structure and large number of unknown parameters. In contrast, symbolic regression methods [6,7,8] allow you to search for both function structure and parameters.

3. The Problem of General Control Synthesis as Machine Learning Control

In the field of control there are also problems that require machine learning. One of the main machine learning control problems is a search for a control function in the general control synthesis problem.
The problem of control general synthesis was formulated in the middle of the last century by Boltyanskii [9] after studying the Pontryagin’s maximum principle for the optimal control problem.
The problem has the following description.
The mathematical model of the control object is given in the form of the system of ordinary differential equations
x ˙ = f ( x , u ) ,
where x is a vector of state, x R n , u is a vector of control, u U R m , U is a compact set, m n .
The domain of initial conditions is given
X 0 R n .
Existence of the initial condition domain is a main feature of the control general synthesis problem. Initially Boltyanskii defined the domain of initial conditions as a whole space of states X 0 = R n , because he tried to solve this problem analytically. In this case we assume to solve this problem numerically. Therefore the domain X 0 is a restricted set in the space of states.
The terminal condition is given
x ( t f ) = x f R n ,
where t f is unassigned time of getting from any initial condition x 0 X 0 to the terminal state (12).
The finishing time is bounded
t f t + ,
where t + is a given positive value.
The phase constraints are given
φ i ( x ) 0 , i = 1 , , r .
The quality criterion is given
J = X 0 t 0 t f f 0 ( x ( t , x 0 ) , u ( t ) ) d t min u U ,
where x ( t , x 0 ) is a partial solution of differential Equation (10) with control u ( t ) U from initial condition x 0 X 0 .
It is necessary to find a control function in the form
u = h ( x ) U ,
where h ( x ) :   R n R m .
If one inserts the control function (16) in the right part of differential Equation (10), then the system of stationary differential equations is received
x ˙ = f ( x , h ( x ) ) ,
which does not have a free control vector in the right part.
Any partial solution of the differential Equation (17) from initial conditions (11) achieves terminal condition (12), performing all conditions on phase constraints (14) with optimal value of the quality criterion (15).
Note, that the control function (16) can have simple discontinuities, therefore in many cases analytical methods could not be applied. The majority of analytical methods such as integrator backstepping [10,11] and analytical design of aggregated regulators [12,13] provides stability on Lyapunov by nonlinear smooth feedback control. The main drawback of all analytical methods of control synthesis solution is that they are bounded with the specific form of the mathematical model of control object. The control synthesis problem (10)–(17) under consideration is complicated by the arbitrary form of the mathematical model of the control object and sub-integral function of quality criterion, as well as the phase constraints and a wide class of control functions, which can have simple discontinuities.
In general case, this control general synthesis problem can be solved numerically by symbolic regression methods as machine learning control problem.
For application of the numerical methods it is necessary to reformulate the problem statement. The domain of initial conditions is changed onto finite set of initial state points
X ˜ 0 = { x 0 , 1 , , x 0 , K } .
The terminal condition (12) and the phase constraints are added into quality criterion (15), and the integral of the domain of initial conditions is changed onto sum of all initial state points.
J 1 = i = 1 K a 1 x f x ( t f , i , x 0 , i ) + 0 t f , i f 0 ( x ( t , x 0 , i ) , u ( t ) ) + ϑ ( φ ( x ( t , x 0 , i ) ) ) p ( x ( t , x 0 , i ) ) d t min u U ,
where a 1 is a weight coefficient, ϑ ( A ) is a Heaviside step function
ϑ ( A ) = 1 , if A > 0 0 , otherwise ,
p ( B ) is a penalty function, t f , i is a time of terminal state (12) achievement from initial condition x 0 , i ,
t f , i = t , if t t + and x f x ( t , x 0 , i ) ε 0 t + , otherwise , i = 1 , , K ,
ε 0 is a small positive value, that determines accuracy of terminal state achievement.
Within the framework of the formulation of the machine learning problem, the solution to the synthesis problem based on symbolic regression methods is machine learning control.

3.1. Control Synthesis as Unsupervised Machine Learning Control

The first approach is a direct search of the control function on basis of a quality criterion minimization. In this case we receive unsupervised machine learning control. The stated general control synthesis problem (10), (12), (18)–(21) can be solved in the concept of unsupervised machine learning control by different symbolic regression methods. Such approach is demonstrated by genetic programming [2], network operator method [14], variational genetic programming [15], variational analytic programming [16], multi-layer network operator [17], binary variational genetic programming [18], modified Cartesian genetic programming [19]. All mentioned symbolic regression methods search for mathematical expressions of control functions, that provide for the received solutions achievement of the terminal condition (12) from all the initial conditions (18) with optimal value of the quality criterion (19), describing the time and accuracy of terminal state hitting, and including phase constraints in the form of penalty functions.
Symbolic regression methods use evolutionary algorithms to search for functions and can achieve a certain level of accuracy when minimizing the functional, but it still remains unknown how the values of the criterion (19) for these solutions are far from real optimal values. To correct this problem it is possible to use a supervised machine learning with a training set received by the solution of the optimal control problem.

3.2. Control Synthesis as Supervised Machine Learning Control

The second approach is a learning with application of a training set. This is a supervised machine learning control. In this case firstly it is necessary to obtain the training set. For this purpose solutions of the optimal control problem can be used.
The statement of optimal control problem includes a mathematical model of control object (10), an initial condition given in one point
x 0 R n ,
terminal condition (12), (13) the phase constraints (14), and a quality criterion
J 2 = a 1 x f x ( t f ) + 0 t f f 0 ( x ( t ) , u ( t ) ) + ϑ ( φ ( x ( t ) ) p ( x ( t ) ) d t min u U ,
where
t f = t , if   t t +   and   x f x ( t ) ε 0 t + ,   otherwise .
It is necessary to find a control in the form
u ˜ = v ( t , x 0 ) U .
When inserting the function (25) into the right part of the mathematical model of the control object (10), the following system of non-stationary differential equations is received
x ˙ = f ( x , v ( t , x 0 , i ) ) .
To create a training set for the control synthesis problem it is necessary to solve the optimal control problem on criterion (23) for each particular initial condition from (18) and to receive sets of optimal controls
U 0 = { v ( t , x 0 , 1 ) , , v ( t , x 0 , K ) }
and optimal trajectories
X ˜ = { x ˜ ( t , x 0 , 1 ) , , x ˜ ( t , x 0 , K ) } .
Now we define the time interval D e l t a t and calculate the value of the state vector on each optimal trajectory at the interval boundaries. As a result, get a training set of optimal trajectories
X ˜ = { X ˜ 1 , , X ˜ K } ,
where
X ˜ i = { x ˜ ( 0 , x 0 , i ) = x 0 , i , x ˜ i ( t 1 , x 0 , i ) , , x ˜ i ( t M i , x 0 , i ) } , i = 1 , , K ,
t j = t j 1 + Δ t , j = 1 , , M i , i = 1 , , K , Δ t is a given time interval.
Now in order to solve the control synthesis problem (10), (12), (18)–(21), and to find the control function in the form (16) it is enough to approximate the training set (29) on a criterion
J 3 = i = 1 K j = 0 M i x ( t j , x 0 , i ) x ˜ ( t j , x 0 , i ) min h ( x ) U ,
where t 0 = 0 , x ( t , x 0 , i ) is a partial solution of the Equation (17) with the initial conditions x 0 , i , x ˜ ( t , x 0 , i ) is a partial solution of the Equation (28), i { 1 , , K } .
To ensure the fulfillment of phase constraints, both criteria (31) and (19) are applied. In result, the following combined criterion is used
J 4 = J 1 + γ J 3 min h ( x ) U ,
where γ is a weight coefficient.
To solve the approximation problem, a symbolic regression is also used. The control synthesis on the base of optimal trajectories approximation allows to find a control function (16) that provides receiving optimal control with accuracy to approximation of the training set. The solution closest to the optimal one is determined by the accuracy of the optimal control problem.

4. Computational Algorithms

In order to solve the control synthesis problem as machine learning control on the base of optimal trajectories set approximation it is required to solve two complex problems, the optimal control problem in order to form a training set, and the approximation of optimal trajectories by some symbolic regression method. For both problems evolutionary computations are used.

4.1. Algorithms for the Optimal Control Problem

The optimal control problem with phase constraints is not uni-modal [20], therefore evolutionary algorithms are applied, which can solve a global optimization problem. Recently, it is popular to use hybrid evolutionary algorithms that combine different evolutionary algorithms. Studies of evolutionary algorithms for numerical solution of the optimal control problem show [21], that the most successful in solving this problem are genetic algorithm (GA) [22], Particle swarm optimization algorithm (PSO) [23] and Grey wolf optimizer algorithm (GWO) [24].
All evolutionary algorithms include evolution of possible solutions. It implies such changes in possible solutions that some of new possible solutions ensure obtaining the value of the goal functional not worse than the old possible solutions before the change. At evolution of possible solutions information about the values of the goal functional for other possible solutions is used.
PSO-algorithm uses the best current possible solution, and the best of solutions from some random selected ones as well as information about historical changes for this possible solution. An evolution is performed for each component of possible solution
q ˜ j i = q j i + σ v j i , j = 1 , , p ,
where q ˜ j i is a new value of the component j of the possible solution i, q ˜ i = [ q ˜ 1 q ˜ p ] T , q j i is the component j of the old possible solution i, q i = [ q 1 i q p ] T , p is a dimension of the vector, v j i is the component j of a historical vector for possible solution i, v i = [ v 1 i v p i ] , this historical vector is changed one time in one cycle of generation
v j i α v j i + ξ β ( q j i ( k ) q j i ) + ξ γ ( q j 0 q j i ) , j = 1 , , p ,
q ¯ j i is the component j of the best solution from k randomly selected ones, q ¯ i = [ q ¯ 1 q ¯ p ] T , q j ( 0 ) is the component j of the best current possible solution, q ( 0 ) = [ q 1 ( 0 ) q p ( 0 ) ] T , ξ is a random value in the interval from 0 to 1, at each call this function gives a new random number, α , β , γ are constant parameters of the algorithm, vector v has zero initial value.
The GWO-algorithm performs the following changes of possible solution on the base of some best current solutions
q ˜ j i = 1 N k = 0 N 1 ( q j ( k ) r ( 2 ξ 1 ) | 2 ξ q j ( k ) q j i | ) , j = 1 , , p ,
where q j ( 0 ) q j ( N 1 ) are j components of N best possible solutions,
r = 2 1 g G ,
r is calculated one time in a generation, g is a number of generation, G is a quantity of generations.
The GA considers vectors in the form of Grey code and performs evolution for two selected possible solutions as operations of crossover and mutation. For crossover two possible solutions are selected
z i = [ z 1 i z c + d i ] T , z j = [ z 1 j z c + d j ] T ,
where z k i , z k j { 0 , 1 } , c is a number of bit for integer part, d is a number of bits for fractional part of Grey code.
Then the point of crossover s is determined. As a result two new possible solutions are received
z ˜ i = [ z 1 i z s i z s + 1 j z c + d j ] T , z ˜ j = [ z 1 j z s j z s + 1 i z c + d i ] T .
In hybrid algorithms in each cycle of evolution for each possible solution one of three ways (33), (34) or (35), (36), or (37), (38) of obtaining new possible solutions is selected randomly. The algorithm stops calculation after all cycles of evolution are performed.

4.2. Numerical Methods of Symbolic Regression for the Control Synthesis Problem

For solution of the control synthesis problem numerical methods of symbolic regression are used. Now more than fourteen methods are know. The methods code a mathematical expression and search for optimal solution on the code space. All methods differ in the form of code.
For example, consider the following mathematical expression
y = sin ( x 1 ) + exp ( q 1 x 1 ) cos ( q 2 x 2 + q 1 ) .
To code this mathematical expression the following basic sets are used:
the set of arguments
F 0 = { f 0 , 1 = x 1 , f 0 , 2 = x 2 , f 0 , 3 = q 1 , f 0 , 4 = q 2 } ,
the set of elementary functions
F = { f 1 , 1 ( z ) = z , f 1 , 2 ( z ) = z , f 1 , 3 ( z ) = sin ( z ) , f 1 , 4 ( z ) = cos ( z ) , f 1 , 5 ( z ) = exp ( z ) , f 2 , 6 ( z 1 , z 2 ) = z 1 + z 2 , f 2 , 7 ( z 1 , z 2 ) = z 1 z 2 } ,
where indexes of elements point the number of arguments and a function number, if the first index is equal to zero, then this is an argument of the mathematical expression.
The most popular and the earliest symbolic regression method is the genetic programming by J.Koza [6]. This method presents a mathematical expression in the form of computational tree. In the Figure 1 the computational tree for the mathematical expression (39) is presented.
In the Figure 1 the nodes of the tree contain numbers of functions, the leaves contain arguments of the mathematical expression.
A code of genetic programming for the mathematical expression (39) has the following form
2 6 , 1 3 , 0 1 , 2 7 , 1 5 , 2 7 , 1 2 , 0 3 , 0 1 , 1 4 , 2 6 , 2 7 , 0 2 , 0 4 , 0 3 .
The code of genetic programming consists of indexes of elements of the computational tree on all branches from the top to the leaves. The code of genetic programming is used for presentation of the computational tree in the computer memory.
A code of genetic programming is not very comfortable, as codes of different mathematical expressions have different length that also changes after crossover operation. If in the mathematical expression one argument enters several times, then it has to be on leaves of the computational tree the same number of times.
Another method of symbolic regression—the network operator method [14]—codes mathematical expression in the form of oriented graph. In the Figure 2 the network operator graph for the mathematical expression (39) is presented.
On the network operator graph the nodes contain the numbers of functions with two arguments, the source-nodes contain the arguments of the mathematical expression, the arcs are marked with the numbers of functions with one argument.
In the computer memory the network operator graph is presented as an integer matrix.
Ψ = 0 0 0 0 0 0 2 0 3 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 7 1 0 0 0 0 0 0 0 0 6 0 4 0 0 0 0 0 0 0 7 5 0 0 0 0 0 0 0 0 7 1 0 0 0 0 0 0 0 0 6 .
Each row of the matrix corresponds to a graph node. The numbers of nodes are located at the top of the nodes (see Figure 2). The nodes are numbered in such a way that the number of the node from which the arc exits must be less than the number of the node where the arc enters. Then a network operator matrix has an upper triangular form. In the network operator matrix the numbers of functions with two arguments are located on the main diagonal. Zero element on the main diagonal shows that the row corresponds to a source-node. Other non-zero non-diagonal elements are the numbers of functions with one argument.
Consider one more symbolic regression method. In Cartesian genetic programming [25] the code of the mathematical expression (39) has the following form
7 1 3 , 2 5 1 , 5 6 2 , 3 1 3 , 7 2 4 , 6 9 3 , 4 10 1 , 7 11 7 , 6 8 12 .
Here, every integer vector corresponds to one elementary function. The first element of vector is the function number, other elements are element numbers from the set of arguments (40). If a function has one argument, then the second argument is not used. The result of calculation according to each vector is added to the argument set, so a number of arguments increases in one every time after calculation of functions according to the vectors.
Due to redundancy, Cartesian genetic programming code has the same length for all mathematical expressions. A crossover operation for Cartesian genetic programming are performed by exchange of vectors after crossover point, so the length of codes does not change.
Studies of symbolic regression methods for the control synthesis problems show that it is effective to use in these methods the principle of small variations of the basic solution [26]. According to this principle only one basic solution is encoded by a method of symbolic regression. Other possible solutions are encoded by sets of variation vectors. Each variation vector makes one small change of basic solution code. After some generations the basic solution is changed on the best current found solution. This approach makes it possible to speed up the search process by narrowing the search space and avoiding additional checks for the correctness of the codes of possible solutions.

5. Computational Experiment

In a computational experiment a problem of general synthesis of optimal control for a spacecraft landing on the surface of the Moon is considered [27]. The differential equations of spacecraft state are the following
x ˙ 1 = g E ( P c + u 2 ) cos ( u 1 x 2 ) x 5 g M h cos ( x 2 ) , x ˙ 2 = g E ( P c + u 2 ) sin ( u 1 x 2 ) x 5 + g M h sin ( x 2 ) x 1 , x ˙ 3 = x 1 cos ( x 2 ) 1000 , x ˙ 4 = x 1 sin ( x 2 ) 1000 , x ˙ 5 = P c + u 2 P s ,
where x = [ x 1 x 2 x 3 x 4 x 5 ] T is a state vector, namely x 1 is the current speed of the spacecraft (m/s), x 2 is a trajectory inclination angle (rad), x 3 is the current flight altitude relative to the lunar surface (km), x 4 is a flight distance (km), x 5 is the mass of spacecraft including fuel (kg). u = [ u 1 u 2 ] T is a control vector, values of which are constrained
π 2 u 1 π 2 , 80 u 2 80 .
Parameters of model have the following values: gravitational acceleration at the certain altitude above the lunar surface
g M h = g M r M r M + x 3 2 ,
the Moon gravitational acceleration g M = 1.623 m / s 2 , the Earth gravitational acceleration g E = 9.80665 m / s 2 , the Moon radius r M = 1737 km , nominal thrust of the spacecraft engine P c = 720 kg, spacecraft engine thrust P s = 319 s.
A domain of initial states is
X 0 = x 0 , 1 = 1689 ,   1.65 x 0 , 2 1.55 ,   17 x 0 , 3 20 ,   x 0 , 4 = 0 ,   x 0 , 5 = 1500 .
A terminal state is
x f = x 1 f = 10 , x 3 f = 0.2 T .
Phase constraints are determined by the mechanics of spacecraft flight. Obviously the speed x 1 , altitude x 3 and fuel level x 5 cannot be negative, reaching a zero altitude x 3 or zero fuel level x 5 at a significant speed x 1 means that the spacecraft has crashed. Consider the following phase constraints
h k ( x ) = x j 0 , k = 1 , 2 , 3 , j = 1 , 3 , 5 , h k ( x ) = ϑ ( 0.001 x j ) ( x 1 V m a x ) 0 , k = 4 , 5 , j = 3 , 5 ,
where V m a x is the maximum landing speed, V m a x = 1 , ϑ ( A ) is the Heaviside function.
According to the proposed method at the first step the training set is to be formed. We determine the finite set of initial states within the domain (48) and solve the optimal control problem for each initial state from this set.
Let us replace the domain of initial states (48) with a set of M = 21 elements uniformly distributed on this domain
X ˜ 0 = x 0 , j + 7 ( i 1 ) = 1689 1.65 + 0.05 ( i 1 ) 17 + 0.5 ( j 1 ) 0 1500 T ,   i = 1 , 3 ¯ ,   j = 1 , 7 ¯ .
Quality criterion considers the proximity of reaching terminal state and the case of phase constraints violation
J j = α 1 x 1 t f ( x 0 , j ) x 1 f 2 + x 3 t f ( x 0 , j ) x 3 f 2 + α 2 0 t f ( x 0 , j ) k = 1 K ϑ h k ( x ( t , x 0 , j ) ) h k ( x ( t , x 0 , j ) ) d t min ,
where α i , i = 1 , 2 are given penalty factors, K = 5 is a number of phase constraints, j = 1 , M ¯ .
To search for solution to the optimal control problem the direct approach was used. The original problem was reduced to a nonlinear programming problem by introducing the time interval Δ t . The solution of each optimal control problem in form of control vector at discrete moments of time was searched independently by hybrid evolutionary algorithm combining modern Grey Wolf Optimizer (GWO), which does not require problem specific tuning of additional parameter, and well-known Particle swarm optimization (PSO). Separately these algorithms showed a high efficiency in solving optimal control problems. A hybrid realization is to increase their effectiveness.
In a computational experiment the size of the set of possible solutions was 100, number of search iterations was 5000. Modeling parameters were the following: maximum control time t m a x = 300 , discretization time interval Δ t = 30 , penalty factors α 1 = 10 , α 2 = 10 .
At the second step of proposed approach we use obtained optimal trajectories to synthesize a multidimensional control function of object state space. The search for a control function is conducted by a symbolic regression method that search for the most suitable expression that approximates provided optimal trajectories best.
We used the network operator method to synthesize a control function. NOP allows to search for the structure of mathematical expression simultaneously with the search for optimal values of parameter vector. In the computational experiment we used the following parameters of NOP: size of NOP matrix was 40, size of the set of input variables was 3, size of the set of input parameters was 12, number of outputs was 2, number of candidate solutions in the initial set was 256, maximum number of search iteration was 25,000.
As a result of computational experiment, a control function in the form of NOP matrix and a parameter vector was obtained. The mathematical expression for the found control function is as follows
u 1 = π / 2 , if u ˜ 1 > π / 2 π / 2 , if u ˜ 1 < π / 2 u ˜ 1 , otherwise , u 2 = 80 , if u ˜ 2 > 80 80 , if u ˜ 2 < 80 u ˜ 2 , otherwise ,
where
u ˜ 1 = χ 6 ( z 34 , tanh ( z 35 ) , z 37 ) ,
u ˜ 2 = sgn ( u 1 ) | u 1 | z 36 + arctan ( z 35 ) + sgn ( z 34 ) | z 34 | + log ( | z 33 | ) +
z 32 1 + z 31 + z 28 z 28 3 + log ( | z 26 | ) z 25 + arctan ( z 21 ) + sgn ( ( z 17 ) ) | z 17 | +
exp ( q 12 ) + tanh ( q 10 ) + q 9 + log ( q 5 ) + tanh ( q 1 ) ,
z 37 = min { z 36 z 36 3 , log ( | z 35 | ) , z 34 z 34 3 , z 32 1 , sgn ( z 28 ) | z 28 | , tanh ( z 27 ) , exp ( z 25 ) , z 20 1 , x 2 3 } ,
z 36 = min { exp ( z 29 ) , tanh ( z 28 ) , sgn ( z 27 ) | z 27 | , arctan ( z 26 ) , z 22 3 , z 20 3 , tanh ( q 6 ) , q 1 3 , x 3 3 } ,
z 35 = arctan ( z 23 ) + tanh ( z 22 ) + tanh ( q 12 ) + log ( | x 3 | ) ,
z 34 = max { sgn ( z 33 ) | z 33 | , log ( | z 30 | ) , z 29 1 , z 22 1 , z 20 1 } ,
z 33 = min { z 32 1 , arctan ( z 29 ) , tanh ( z 24 ) , z 22 3 , z 19 , z 16 , z 11 1 , tanh ( q 3 ) , tanh ( q 2 ) , exp ( x 2 ) } ,
z 32 = min { z 31 3 , z 26 , log ( | z 25 | ) , z 18 1 , q 10 , arctan ( q 6 ) , q 2 3 } ,
z 31 = log ( | z 27 | ) + z 26 2 + z 24 2 + arctan ( z 22 ) + z 21 3 + exp ( z 20 ) + exp ( z 17 ) +
z 16 1 + q 12 3 q 5 + q 2 3 + x 1 3 ,
z 30 = max { z 29 , z 26 , tanh ( z 25 ) , arctan ( z 18 ) , sgn ( z 16 ) | z 16 | , q 11 } ,
z 29 = χ 6 ( z 28 , z 27 1 , z 26 3 , log ( | z 24 | ) , z 22 3 , z 20 , z 17 3 , q 7 , q 2 1 , q 1 ) ,
z 28 = max { z 27 , z 23 1 , z 20 , log ( | z 19 | ) , z 18 z 18 3 , z 17 , log ( | z 16 | ) , q 7 , log ( q 5 ) } ,
z 27 = χ 6 ( z 24 , arctan ( z 20 ) , q 11 , q 8 q 8 3 , q 6 3 , q 5 , log ( q 4 ) ) ,
z 26 = χ 5 ( z 23 3 , z 20 , z 19 z 19 3 , z 18 , sgn ( z 15 ) | z 15 | , q 12 , x 3 3 ) ,
z 25 = max { z 23 2 , z 22 , z 21 z 21 3 , arctan ( z 17 ) , q 11 , q 9 , q 5 } ,
z 24 = max { tanh ( z 21 ) , z 20 z 20 3 , z 15 3 , q 12 q 12 3 , q 10 , q 8 3 , q 5 3 , q 4 3 , exp ( x 2 ) } ,
z 23 = z 20 z 18 z 16 q 9 q 7 sgn ( x 1 ) | x 1 | ,
z 22 = χ 6 ( z 19 , z 17 1 , exp ( z 16 ) , q 10 3 , arctan ( q 9 ) , q 8 1 , q 2 ) ,
z 21 = max { tanh ( z 18 ) , q 8 2 , q 7 , q 6 , q 3 } ,
z 20 = χ 6 ( z 19 , z 17 , arctan ( z 10 ) , q 6 , q 2 , x 3 1 ) ,
z 19 = z 16 + arctan ( q 11 ) + q 8 1 + q 5 + arctan ( x 3 ) ,
z 18 = χ 5 ( z 15 , q 12 1 , q 7 1 , q 4 , x 1 x 1 3 ) ,
z 17 = q 3 + x 3 q 3 x 3 ,
z 16 = q 2 x 2 tanh ( q 5 ) ,
z 15 = sgn ( q 10 + q 5 q 5 3 + q 3 1 + q 1 + x 1 ) q 10 + ( q 5 q 3 ) 2 + q 3 2 + q 1 2 + x 1 2 ,
χ 5 ( a 1 , a 2 ) = a 1 + a 2 a 1 a 2 ,
χ 5 ( a 1 , , a s ) = χ 5 ( χ 5 ( a 1 , χ 5 ( , χ 5 ( a s 1 , a s ) ) ,
χ 6 ( a 1 , , a s ) = sgn i = 1 s a i i = 1 s a i 2 ,
q 1 = 2.3474 , q 2 = 10.5066 , q 3 = 9.9106 , q 4 = 13.1419 , q 5 = 9.6631 , q 6 = 4.4541 ,
q 7 = 2.1899 , q 8 = 4.8552 , q 9 = 3.1116 , q 10 = 6.6172 , q 11 = 12.6812 , q 12 = 15.6148 .
Functions χ 5 and χ 6 are commutative, associative, and have a unit element, zero.
To check the solution we used the found control function to obtain optimal control and corresponding trajectories for various initial states from (48). Among considered initial states were both those that were present in the training set (51) and those that were not present.
Table 1 shows the values of quality criterion J * obtained using the found control function (53) for 21 initial states from the finite set (51). The optimal trajectories known for these initial states were previously used as a training set. This test is to show the quality of approximation. The value of the quality criterion J o c p obtained by solving the optimal control problem for the same initial state is showed in the table as a reference value. The average deviation of the quality criterion values from the reference ones is 0.0591 , maximum deviation is 0.2514 , the standard deviation is 0.0648 .
Table 2 shows the values of quality criterion J * obtained using the found control function (53) for 10 initial states generated randomly within the domain (48). This test is to show the suitability of the found control function for any initial state from the domain (48). The value of the quality criterion J o c p obtained by solving the optimal control problem for the same initial state is showed in the table as a reference value. The average deviation of the quality criterion values from the reference ones is 0.0366 , maximum deviation is 0.1122 , the standard deviation is 0.0341 .
Figure 3 and Figure 4 illustrate the results of computational experiment for initial state x 0 = [ 1689 1.565 18.92 0 1500 ] T . Graphs of the trajectories obtained using the found control function are shown by black solid lines. For comparison the trajectories obtained by solving the optimal control problem are shown by grey dashed lines. Figure 5 shows the found control function values over time.
The computational experiment showed that the found multidimensional control function allows one to obtain a close to optimal solution for any initial states from the given domain (48) even for those initial states that were not in the training set (51).
According to the analysis of the standard deviation, the training set contained a sufficient number of optimal trajectories. A better value of the standard deviation for the experiment with randomly distributed in (48) initial states can be explained by the fact that the set (51) had a large number of initial states on the boundaries of the set (48).

6. Conclusions and Perspectives

The paper provides mathematical formulations of the machine learning problem, supervised and unsupervised, defines the basic concepts, such as the work area and the training set. Based on the presented formulations, it is shown that the main task of machine learning is to find a function that determines the correspondence between the input data and the resulting data. It is shown that today this problem can be solved numerically using symbolic regression methods. The problem of obtaining a mathematical expression arises in various situations—approximation of experimental data to determine a physical law or a trend model; efficiently analyze and predict variables or indicators based on previous observations; identification of a mathematical model of a process or a dynamic object; generalization of the control law based on the current state of the control object. The application of machine learning based on symbolic regression methods to control opens up the possibility of solving such a complex problem in control theory as the problem of general control synthesis. The paper presents a mathematical formulation of the control synthesis problem and provides methods for its solution using machine learning both directly and based on a training set. An important result of the article is the methodology for solving the problem of general control synthesis as machine learning control based on a training set. An approach to constructing a training sample based on multiple solutions to the optimal control problem is proposed. An example of solving a specific problem of synthesis of control of a complex technical object based on the approximation of optimal trajectories is given. It is shown that such a control, obtained on the basis of machine learning, gives good results not only for the input data from the training set, but also not from it.
The concept of machine learning is widely known, but very often limited by its association with neural network technology. We are expanding the concept of machine learning to include a description of an unknown function in its formulation. Thus, a function can be specified and training is aimed only at finding parameters, as in neural networks, but you can also search for the structure of the function and its parameters. This became possible with the advent of symbolic regression methods. The complexity of these methods lies in the need to organize search in a space in which there is no metric. This greatly complicates the solution of the problem of finding the required structure of the function. This complexity opens up a wide field of research. One of the ways to solve this problem is to use the principle of small variations of the basic solution indicated in the article. This approach allows you to concentrate the search for a solution around a basic solution based on the developer’s experience or intuition. This approach also requires further study.

Author Contributions

Conceptualization, A.D. and E.S.; methodology, A.D., E.S. and S.K.; software, S.K.; validation, E.S. and G.D.; formal analysis, A.D., G.D.; investigation, S.K.; data curation, S.K.; writing—original draft preparation, A.D., E.S. and S.K.; writing—review and editing, E.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the Ministry of Science and Higher Education of the Russian Federation, project No. 075-15-2020-799.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Duriez, T.; Brunton, S.L.; Noack, B.R. Machine Learning Control–Taming Nonlinear Dynamics and Turbulence; Fluid Mechanics and Its Applications; Springer International Publishing Switzerland: Berlin/Heidelberg, Germany, 2017; Volume 116, 211p. [Google Scholar]
  2. Alibekov, E.; Kubalık, J.; Babushka, R. Symbolic method for deriving policy in reinforcement learning. In Proceedings of the 55th IEEE Conference on Decision and Control (CDC), Las Vegas, NV, USA, 12–14 December 2016; pp. 2789–2795. [Google Scholar]
  3. Levine, S.; Koltun, V. Learning complex neural network policies with trajectory optimization. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 829–837. [Google Scholar]
  4. Nagabandi, A.; Kahn, G.; Fearing, R.S.; Levine, S. Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In Proceedings of the 2018 IEEE International Conference on Robotics and Automation (ICRA), Brisbane, Australia, 21–25 May 2018; pp. 7559–7566. [Google Scholar]
  5. Gao, T.; Liu, Y.; Liu, L.; Li, D. Adaptive neural network-based control for a class of nonlinear pure-feedback systems with time-varying full state constraints. IEEE/CAA J. Autom. Sin. 2018, 5, 923–933. [Google Scholar] [CrossRef]
  6. Koza, J.R. Genetic Programming: On the Programming of Computers by Means of Natural Selection; MIT Press: Cambridge, MA, USA; London, UK, 1992; 819p. [Google Scholar]
  7. Dracopoulos, D.; Kent, S. Genetic Programming for Prediction and Control. Neural Comput. Appl. 1998, 6, 214–228. [Google Scholar] [CrossRef] [Green Version]
  8. Dracopoulos, D. Genetic Algorithms and Genetic Programming for Control. In Evolutionary Algorithms in Engineering Applications; Springer: Berlin/Heidelberg, Germany, 1997; pp. 329–343. [Google Scholar]
  9. Boltyanskii, V.G. Mathematical Methods of Optimal Control; Holt, Rinehart and Winston: New York, NY, USA, 1971; 272p. [Google Scholar]
  10. Krstic, M.; Kanellakopoulos, I.; Kokotovic, P.V. Nonlinear and Adaptive Control Design; Wiley–Interscience: New York, NY, USA, 1995; 576p. [Google Scholar]
  11. Khalil, H.K. Nonlinear Control; Pearson Education, Inc.: Prentice Hall, NJ, USA, 2015; 403p. [Google Scholar]
  12. Kolesnikov, A.A. Introduction of synergetic control. In Proceedings of the American Control Conference ACC-2014, Portland, OR, USA, 4–6 June 2014; pp. 3013–3016. [Google Scholar]
  13. Kolesnikov, A.; Veselov, G.; Kolesnikov, A.; Monti, A.; Ponci, F.; Santi, E.; Dougal, R. Synergetic synthesis of DC-DC boost converter controllers: Theory and experimental analysis. In Proceedings of the IEEE Applied Power Electronics Conference and Exposition—APEC, Dallas, TX, USA, 10–14 March 2002; pp. 409–415. [Google Scholar]
  14. Diveev, A.I. A Numerical Method for Network Operator for Synthesis of a Control System with Uncertain Initial Values. J. Comput. Syst. Sci. Int. 2012, 51, 228–243. [Google Scholar] [CrossRef]
  15. Diveev, A.I.; Ibadulla, S.I.; Konyrbaev, N.B.; Shmalko, E.Y. Variational Genetic Programming for Optimal Control System Synthesis of Mobile Robots. IFAC-PapersOnLine 2015, 48, 106–111. [Google Scholar] [CrossRef]
  16. Diveev, A.I.; Ibadulla, S.I.; Konyrbaev, N.B.; Shmalko, E.Y. Variational Analytic Programming for Synthesis of Optimal Control for Flying Robot. IFAC-PapersOnLine 2015, 48, 75–80. [Google Scholar] [CrossRef]
  17. Diveev, A.I.; Shmalko, E.Y. Optimal Motion Control for Multi-Robot System by Multilayer Network Operator. In Proceedings of the 11th IEEE Conference on Industrial Electronics and Applications (ICIEA 2016), Hefei, China, 5–7 June 2016; pp. 2168–2173. [Google Scholar]
  18. Diveev, A.I.; Balandina, G.I.; Konstantinov, S.V. Binary Variational Genetic Programming for the Problem of Synthesis of Control System. In Proceedings of the 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD 2017), Guilin, China, 29–31 July 2017; pp. 165–170. [Google Scholar]
  19. Diveev, A.I. Cartesian Genetic Programming for Synthesis of Optimal Control System. In Proceedings of the Future Technologies Conference (FTC) 2020, Vancouver, BC, Canada, 5–6 November 2020; Springer Nature Switzerland AG: Cham, Switzerland, 2021; Volume 2, pp. 205–222. [Google Scholar]
  20. Diveev, A.I.; Shmalko, E.Y.; Sofronova, E.A. Theoretical Fundamentals for Unimodality Estimation of an Objective Functional in the Optimal Control Problem. In Proceedings of the 2019 6th International Conference on Control, Decision and Information Technologies (CoDIT), Paris, France, 23–26 April 2019; pp. 767–772. [Google Scholar]
  21. Diveev, A.I.; Konstantinov, S.V. Study of the Practical Convergence of Evolutionary Algorithms for the Optimal Program Control of a Wheeled Robot. J. Comput. Syst. Sci. Int. 2018, 57, 561–580. [Google Scholar] [CrossRef]
  22. Goldberg, D.E. Genetic Algorithms in Search, Optimization, and Machine Learning; Addison–Wesley: Reading, MA, USA, 1989. [Google Scholar]
  23. Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95-International Conference on Neural Networks, Perth, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar]
  24. Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey Wolf Optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef] [Green Version]
  25. Miller, J.; Thomson, P. Cartesian Genetic Programming. In Proceedings EuroGP 2000R 3rd European Conf. Genetic Programming; Poli, R., Banzhaf, W., Langdon, W.B., Miller, J.F., Nordin, P., Fogarty, T.C., Eds.; Springer: Edinburgh, UK; Berlin, Germany, 2000; Volume 1802, pp. 121–132. [Google Scholar]
  26. Diveev, A.I. Small Variations of Basic Solution Method for Nonnumerical Optimization. IFAC-PapersOnLine 2015, 48, 28–33. [Google Scholar] [CrossRef]
  27. Liu, X.L.; Duan, G.R.; Teo, K.L. Optimal Soft Landing Control for Moon Lander. Automatica 2008, 44, 1097–1103. [Google Scholar] [CrossRef]
Figure 1. Computational tree of genetic programming.
Figure 1. Computational tree of genetic programming.
Mathematics 09 00265 g001
Figure 2. The network operator graph of the mathematical expression.
Figure 2. The network operator graph of the mathematical expression.
Mathematics 09 00265 g002
Figure 3. The graphs of trajectories: (a) spacecraft speed over time x 1 ( t ) ; (b) spacecraft altitude over time x 3 ( t ) . Found solution—black solid line; reference solution—grey dashed line.
Figure 3. The graphs of trajectories: (a) spacecraft speed over time x 1 ( t ) ; (b) spacecraft altitude over time x 3 ( t ) . Found solution—black solid line; reference solution—grey dashed line.
Mathematics 09 00265 g003
Figure 4. The graphs of trajectories: (a) spacecraft speed over distance x 1 ( x 4 ) ; (b) spacecraft altitude over distance x 3 ( x 4 ) . Found solution—black solid line; reference solution—grey dashed line.
Figure 4. The graphs of trajectories: (a) spacecraft speed over distance x 1 ( x 4 ) ; (b) spacecraft altitude over distance x 3 ( x 4 ) . Found solution—black solid line; reference solution—grey dashed line.
Mathematics 09 00265 g004
Figure 5. The graphs of control values over time: (a) control u 1 ( t ) ; (b) control u 2 ( t ) . Found solution—black solid line; reference solution—grey dashed line.
Figure 5. The graphs of control values over time: (a) control u 1 ( t ) ; (b) control u 2 ( t ) . Found solution—black solid line; reference solution—grey dashed line.
Mathematics 09 00265 g005
Table 1. Results of the computational experiment using initial states from the training set.
Table 1. Results of the computational experiment using initial states from the training set.
Initial State x 0 J * J ocp Initial State x 0 J * J ocp
[ 1689 1.65 17 0 1500 ] T 0.1777 0.0018 [ 1689 1.55 18.5 0 1500 ] T 0.2525 0.0011
[ 1689 1.6 17 0 1500 ] T 0.0295 0.0056 [ 1689 1.65 19 0 1500 ] T 0.0240 0.0012
[ 1689 1.55 17 0 1500 ] T 0.0049 0.0029 [ 1689 1.6 19 0 1500 ] T 0.0501 0.0024
[ 1689 1.65 17.5 0 1500 ] T 0.1433 0.0060 [ 1689 1.55 19 0 1500 ] T 0.0030 0.0009
[ 1689 1.6 17.5 0 1500 ] T 0.0264 0.0044 [ 1689 1.65 19.5 0 1500 ] T 0.1035 0.0036
[ 1689 1.55 17.5 0 1500 ] T 0.0127 0.0024 [ 1689 1.6 19.5 0 1500 ] T 0.0822 0.0027
[ 1689 1.65 18 0 1500 ] T 0.0780 0.0033 [ 1689 1.55 19.5 0 1500 ] T 0.0045 0.0045
[ 1689 1.6 18 0 1500 ] T 0.0439 0.0084 [ 1689 1.65 20 0 1500 ] T 0.0954 0.0036
[ 1689 1.55 18 0 1500 ] T 0.0061 0.0023 [ 1689 1.6 20 0 1500 ] T 0.0635 0.0052
[ 1689 1.65 18.5 0 1500 ] T 0.0703 0.0019 [ 1689 1.55 20 0 1500 ] T 0.0334 0.0099
[ 1689 1.6 18.5 0 1500 ] T 0.0111 0.0019
Table 2. Results of the computational experiment for random initial states.
Table 2. Results of the computational experiment for random initial states.
Initial State x 0 J * J ocp Initial State x 0 J * J ocp
[ 1689 1.565 18.92 0 1500 ] T 0.0176 0.0034 [ 1689 1.628 18.2 0 1500 ] T 0.0398 0.0016
[ 1689 1.571 17.21 0 1500 ] T 0.0048 0.0018 [ 1689 1.614 19.24 0 1500 ] T 0.0508 0.0022
[ 1689 1.63 19.39 0 1500 ] T 0.1136 0.0014 [ 1689 1.644 18.57 0 1500 ] T 0.0613 0.0052
[ 1689 1.558 19.91 0 1500 ] T 0.0567 0.0083 [ 1689 1.563 18.33 0 1500 ] T 0.0032 0.0023
[ 1689 1.582 17.62 0 1500 ] T 0.0042 0.0016 [ 1689 1.589 18.9 0 1500 ] T 0.0464 0.0043
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Diveev, A.; Konstantinov, S.; Shmalko, E.; Dong, G. Machine Learning Control Based on Approximation of Optimal Trajectories. Mathematics 2021, 9, 265. https://0-doi-org.brum.beds.ac.uk/10.3390/math9030265

AMA Style

Diveev A, Konstantinov S, Shmalko E, Dong G. Machine Learning Control Based on Approximation of Optimal Trajectories. Mathematics. 2021; 9(3):265. https://0-doi-org.brum.beds.ac.uk/10.3390/math9030265

Chicago/Turabian Style

Diveev, Askhat, Sergey Konstantinov, Elizaveta Shmalko, and Ge Dong. 2021. "Machine Learning Control Based on Approximation of Optimal Trajectories" Mathematics 9, no. 3: 265. https://0-doi-org.brum.beds.ac.uk/10.3390/math9030265

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop