Active Collision Avoidance for Robotic Arm Based on Artificial Potential Field and Deep Reinforcement Learning

Xu, Qiaoyu; Zhang, Tianle; Zhou, Kunpeng; Lin, Yansong; Ju, Wenhao

doi:10.3390/app14114936

Open AccessArticle

Active Collision Avoidance for Robotic Arm Based on Artificial Potential Field and Deep Reinforcement Learning

by

Qiaoyu Xu

^*,

Tianle Zhang

,

Kunpeng Zhou

,

Yansong Lin

and

Wenhao Ju

School of Mechatronics Engineering, Henan University of Science and Technology, Luoyang 471023, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(11), 4936; https://0-doi-org.brum.beds.ac.uk/10.3390/app14114936

Submission received: 6 May 2024 / Revised: 30 May 2024 / Accepted: 4 June 2024 / Published: 6 June 2024

(This article belongs to the Special Issue Innovative Applications of Artificial Intelligence in Multidisciplinary Sciences: Latest Advances and Prospects)

Download

Browse Figures

Versions Notes

Abstract

:

To address the local minimum issue commonly encountered in active collision avoidance using artificial potential field (APF), this paper presents a novel algorithm that integrates APF with deep reinforcement learning (DRL) for robotic arms. Firstly, to improve the training efficiency of DRL for the collision avoidance problem, Hindsight Experience Replay (HER) was enhanced by adjusting the positions of obstacles, resulting in Hindsight Experience Replay for Collision Avoidance (HER-CA). Subsequently, A robotic arm collision avoidance action network model was trained based on the Twin Delayed Deep Deterministic Policy Gradient (TD3) and HER-CA methods. Further, a full-body collision avoidance potential field model of the robotic arm was established based on the artificial potential field. Lastly, the trained action network model was used to guide APF in real-time collision avoidance planning. Comparative experiments between HER and HER-CA were conducted. The model trained with HER-CA improves the average success rate of the collision avoidance task by about 10% compared to the model trained with HER. And a collision avoidance simulation was conducted on the rock drilling robotic arm, confirming the effectiveness of the guided APF method.

Keywords:

robot collision avoidance; artificial potential field; deep reinforcement learning; hindsight experience replay

1. Introduction

In the process of tunnel construction, workers often need to face an extreme construction environment, which brings a great challenge for the traditional tunnel construction methods, but also provides a broad practice space for intelligent construction and construction in tunnel engineering [1]. Meeting the complex requirements of desired tasks involves addressing the critical issue of collision avoidance for rock drilling robotic arms in intelligent construction processes. The issue of real-time collision avoidance path planning for a rock drilling robotic arm to evade two dynamic obstacles needs to be solved. How to ensure the real-time aspect of collision avoidance planning while also making sure that the planned actions allow the robotic arm to reach the target position, not just to avoid obstacles, is the main problem to be resolved.

The drilling robotic arm possesses seven degrees of freedom (DOF), and is computationally very expensive with higher DOFs and obstacles [2]. Existing algorithms mainly focus on planning a new collision-free path applications [3], which is not appropriate for real-time collision avoidance. However, the artificial potential field method and deep learning approaches are considered to solve this problem. The artificial potential field (APF) method, a local algorithm, can effectively generate full-body collision avoidance actions. Deep reinforcement learning (DRL) emerges as a promising approach, with its trained action models capable of global optima identification. But, each of these methods has a problem with real-time collision avoidance planning:

The artificial potential field has the problem of falling into local minima [4].
DRL training is inefficient because of the need to avoid multiple obstacles.

To actualize collision avoidance, a myriad of studies have been conducted, among which the research on artificial potential field (APF) emerges as a significant direction [5,6,7,8,9]. To solve the collision avoidance problem of the end-effector, it converts the distance between the end-effector of the robotic arm and the obstacle into a repulsive force [10,11]. Because in more scenarios it is necessary to ensure that the whole body of the robotic arm will not collide with the obstacle, the whole-body collision avoidance of the robotic arm has been considered in some studies. Li et al. [12] estimate the distance between the obstacle and the robot skeleton in real time using deep visual perception, thereby establishing a multi-joint repulsive force model. Additionally, there is also some research on robots without visual recognition. Control points are set on the robotic arm, and full-body collision avoidance is performed by calculating the repulsive force from the control point to the obstacle, and local minima are avoided by virtual obstacles [13,14]. However, the method of avoiding local minima through virtual obstacles requires entering the local minima position first, and after determining the position as local minima, setting the virtual obstacle to make the robotic arm leave the current position. This method is very inefficient for real-time motion planning. Generally speaking, there is almost no artificial potential field-based method that can provide real-time path planning to avoid local minimum positions.

To address the above problem reinforcement learning provides a new solution to this problem due to its global planning capability. In the study conducted by Cao et al. [15], a Deep Deterministic Policy Gradient (DDPG) algorithm is used for collision avoidance training for a three-degree-of-freedom robotic arm to solve the problem of collision of the robotic arm under static obstacles. Li et al. [16] used a combination of APF and DDPG to process the obstacle avoidance trajectory planning process in phases, designed the robotic arm artificial potential field method to approach the target, and then the obstacle avoidance phase set the DDPG dominant training, which improves the mobile robotic arms in the case of narrow passages and obstacle restraints Obstacle avoidance ability. For the collision avoidance motion planning of robotic arms, the exploration space is large, rewards are sparse, and the learning efficiency of the agent is relatively low. Andrychowicz et al. [17] were the first to propose Hindsight Experience Replay (HER), which enables an agent to learn efficiently from samples of sparse and binary rewards by setting up virtual targets. HER is widely used in path planning research, and scholars add HER to their methods to increase the training effect [18,19]. Further research has been conducted on the basis of HER, and improved algorithms have been proposed with some success. HER is considered to produce target bias in multi-target learning because of changing the likelihood of the sampled transitions and trajectories used in training. So, bias-corrected HER (BHER) was proposed to correct the bias [20]. The Hindsight Goal Ranking (HGR) method has been proposed to further improve the learning efficiency [21]. Although HER works well for the reach-target problem with binary rewards, it is not optimized for the collision avoidance problem.

To grapple effectively with these challenges, this paper presents an innovative action-based obstacle avoidance method that accomplishes active robotic arm collision prevention, guided by the potential field forces in tandem with DRL. The main contributions of this paper are as follows.

Propose a DRL-guided method for collision avoidance in a simplified robotic arm model. The local minima problem of APF is solved by DRL guidance, while the use of a simplified robotic arm reduces the difficulty of DRL training.
Proposed Hindsight Experience Replay for Collision Avoidance (HER-CA) algorithm to improve the training effect. The algorithm allows the agent to learn about collision avoidance from collision experience.
Full-body collision avoidance model for rock drilling robotic arm based on artificial potential field.

The rest of the paper is structured as follows. In Section 2, a simplified method for the robotic arm is shown first, followed by a Twin Delayed Deep Deterministic Policy Gradient (TD3) and HER-CA based method. In Section 3, a full-body collision avoidance model based on APF is shown. In Section 4, we conducted experiments comparing HER-CA and Hindsight Experience Replay (HER), and simulation of robotic arm collision avoidance. Section 5 summarizes and concludes.

2. Deep Reinforcement Learning Method

2.1. Simplified Robotic Arm

In this study, the robotic arm is a 7-DOF rock-drilling robotic arm that has five slewing joints and two telescoping joints, developed by Gengli Machinery in Luoyang, China. The first three joints and the last three joints are in a spherical coordinate configuration in the robotic arm, connected by the fourth slewing joint. Figure 1 shows the coordinate structure of the robotic arm and Table 1 shows the D-H parameters of the robotic arm.

The primary function of the deep reinforcement learning (DRL) method is to train an action model, which aids the robotic arm in successfully navigating around obstacles. In this scenario, the need is for a general directional obstacle avoidance, rather than a meticulous precision avoidance. When analyzing the structure of the rock drilling arm, the main body can be simplified as the upper arm paired with two segments of the propelling beam, represented as a cylinder

Φ

with the red line serving as the axis. The offset angle from

Φ

to the direction of the beam is denoted as

θ_{m}

, as illustrated in Figure 2.

Determine the obstacle coordinates for cylinder

Φ

in the joint space by leveraging the coordinates of the obstacle point in its operational space. Let the coordinates of the obstacle in the operation space be x, y, z, and then the coordinates of the obstacle point in the joint space of the big arm are

q_{1 o b s} = {tan}^{- 1} (\frac{y}{x})

(1)

q_{2 o b s} = {tan}^{- 1} (\frac{z}{x})

(2)

where

q_{1 o b s}

,

q_{2 o b s}

are the positions of the obstacle points for the first two joints. The final simplified joint space obstacle points are obtained by performing one slewing of

Φ

according to the slewing joint angle

q_{4}

x_{o b s} = q_{1 o b s} + θ_{m} cos (q_{4})

(3)

y_{o b s} = q_{2 o b s} + θ_{m} sin (q_{4})

(4)

where

x_{o b s}

,

y_{o b s}

are the coordinate value of the joint obstacle points after the simplification of the robotic arm.

2.2. Describe Environment

TD3 was first proposed by Fujimoto et al. [22], and it is a type of actor–critic method for deep reinforcement learning. This algorithm mainly solves the problems in the DDPG algorithm. TD3 learns two Q-functions instead of one, and then takes a more conservative (i.e., smaller) estimate of the Q-value as the target value. TD3’s policy update is delayed compared to the Q-value update, in short, all of the above help to reduce overestimation bias in Deep Q-Learning [23]. The model is trained using the TD3 algorithm, and the environment is described below.

Under the context of the simplified operation of the robotic arm mentioned above, this obstacle avoidance planning issue can be recognized as a problem of planning a two-dimensional spatial path. In this research, two points of obstruction exist. The state begins by randomly locating the starting point, target point, and the two obstruction points. Following this, the state space

s_{t}

is designed as

s_{t} = {x, y, x_{g o a l}, y_{g o a l}, x_{o b s 1}, y_{o b s 1}, x_{o b s 2}, y_{o b s 2}} \in S

(5)

where S represents the entire state space and the other symbolism is shown in Table 2.

Define the action space as

a_{t}

by using

Δ x

as the x-coordinate variation and

Δ y

as the y-coordinate variation, and A represents the entire action space:

a_{t} = {Δ x, Δ y} \in A

(6)

A binary reward function for arrivals and collisions is designed, while the step length is added as a penalty to the reward function in order to make the agent try to take the shortest path, and the reward function is

r_{t} (a_{t}, s_{t})

:

r_{s t e p} = - (|Δ x| + |Δ y|) k_{s t e p}

(7)

r_{c o l l i s i o n} = \{\begin{matrix} - C_{c o l l i s i o n}, & i f c o l l i s i o n \\ 0, & i f n o c o l l i s i o n \end{matrix}

(8)

r_{r e a c h} = \{\begin{matrix} C_{r e a c h}, & x < C_{t h r} \land y < C_{t h r} \\ 0, & x > C_{t h r} \lor y > C_{t h r} \end{matrix}

(9)

r_{t} (a_{t}, s_{t}) = r_{s t e p} + r_{c o l l i s i o n} + r_{r e a c h}

(10)

where

r_{s t e p}

is the step reward function, and

k_{s t e p}

is the reward coefficient. Where

r_{c o l l i s i o n}

is the collision reward function, and

C_{c o l l i s i o n}

represents the reward value for which a collision occurs.

r_{r e a c h}

is the arrival reward function,

C_{r e a c h}

is the arrival reward value, and

C_{t h r}

is the range of arrival determination.

2.3. Hindsight Experience Replay for Collision Avoidance

Hindsight Experience Replay (HER) is an experience pooling mechanism for improving the training efficiency of agents used in sparse and binary reward situations. The basic idea is that every experience of an agent still contains valuable information even if it fails to reach the goal. By treating a hindsight goal from failed experiences as the original goal, HER enables the agent to receive rewards frequently.

To improve HER for the collision problem, the path points are considered as target points in the HER algorithm while the obstacle points in the collision experience are moved. The location of the new obstacle point is at a random location, thus there is a high probability that the experience will become a collision-free experience. In this way, agent learns the collision avoidance ability from the experience of failed collision avoidance, and achieves the purpose of accelerating deep reinforcement learning training and improving the training effect, as shown in Figure 3.

There are two ways to keep obstacles away when using HER-CA, obstacle reset and obstacle move away. Obstacle reset is when an obstacle is randomly placed into the space as if resetting the environment, and obstacle move away is to move the obstacle away by

x_{r a n d o m}

,

y_{r a n d o m}

distance, where

x_{r a n d o m}

,

y_{r a n d o m}

are random numbers. We will perform a comparison of the two later in the course of the experiment.

In HER-CA we save the data recorded in each game as OP, save the data of one of the steps as OS, and use the future strategy [17] for goal conversion, i.e., converting a random step after the current position to the goal position, and using obstacle resets to generate possible collision-free experiences. See Algorithm 1 for a more formal description of the algorithm.

Algorithm 1 Hindsight Experience Replay for Collision Avoidance

1:: Initialize: TD3
2:: for $e p o c h \leftarrow 1$ to $M A X_{e}$ do
3:: Play a number of games and add the data from each game to the data pool PL.
4:: for $t r u n \leftarrow 1$ to $M A X_{t}$ do
5:: Randomly select a game OL from PL.
6:: Randomly select a step OS from OL.
7:: Get the future step coordinates $x_{f u t u r e}, y_{f u t u r e}$ .
8:: $x_{g o a l} \leftarrow x_{f u t u r e}, y_{g o a l} \leftarrow y_{f u t u r e}$
9:: if OS collision then
10:: $x_{o b s i} \leftarrow x_{o b s i} + x_{r a n d o m}, x_{r a n d o m} \in [- 40, 40]$
11:: $y_{o b s i} \leftarrow y_{o b s i} + y_{r a n d o m}, y_{r a n d o m} \in [- 40, 40]$
12:: Limit $x_{o b s i}$ and $y_{o b s i}$ to 0 to 90.
13:: end if
14:: Recalculating OS ’s reward value.
15:: One network training using OS ’s data.
16:: end for
17:: Soft update of the network model.
18:: Save Action Models.
19:: end for

3. Guided Artificial Potential Field

3.1. Attractive Force

The attractive force is utilized for guiding the rock drilling robotic arm to achieve the desired target position. The target orientation of the arm is determined by both the position and attitude of its endpoint, which can be represented as a rotation matrix

T_{g o a l}

. In a bid to simplify the computation of the APF, we calculate the target angle vector

Q_{g o a l}

of the robotic arm’s joints. This is achieved by resolving the inverse kinematics using an iterative method. Consequently, we compute the vector for the joint attractive force (or moment) by using the difference between the current joint angle vector of the robotic arm, that is,

Q_{g o a l}

and the target angle as showcased in (11):

M_{a t t} = K_{a t t} ⊙ {tan}^{- 1} ((Q_{g o a l} - Q_{c u r}) ⊙ K_{a t t}^{r})

(11)

where

M_{a t t}

is the vector of attractive scaling factors, and the elements in

K_{a t t}^{r}

are the inverse of the corresponding elements of

K_{a t t}

. Equation (11) uses the inverse cotangent function

{tan}^{- 1}

to limit the maximum value of the attractive force (or moment), preventing excessive speed of motion when the robotic arm is moving at a distance from the target position.

3.2. Repulsive Force

3.2.1. Control Point Repulsion

The role of the repulsive force is to move the arm away from the obstacle. In order to achieve full-body collision avoidance of the arm, the main structure of the arm is regarded as a collection of multiple control points (as shown in Figure 4), and the positions of the control points are derived by means of the positive kinematics of the robot. When an obstacle enters the radius of the repulsive field of a control point (as shown in Figure 5), the obstacle generates a repulsive force on the control point:

F_{r e p} = {\dot{α}}_{f} (\frac{1}{r} - \frac{1}{R_{f i e l d}}) \frac{1}{r^{2}} K_{f} {tan}^{- 1} (K_{g} \times g)

(12)

where

F_{r e p}

refers to the vector of the repulsive force. The term r symbolizes the distance between the obstacle and the control point.

R_{f i e l d}

demarcates the radius within which the repulsive force field influences, while

{\dot{α}}_{f}

represents the direction vector that emanates from the obstacle and leads to the control point. Where g is the weighted average value representing the distance of the joint position from the target position, and

K_{f}

represents the proportionality coefficient of the repulsive force in the operation space. When obstacles are near the target position, the repulsive force exceeds the attractive force, which results in a target inaccessibility problem. Where

{tan}^{- 1} (K_{g} \times g)

makes the repulsive force decrease when the distance from the target is closer, and solves the target inaccessibility problem.

3.2.2. Joint Space Repulsion

To operate a robotic arm, the production of joint force (or moment) is crucial, the repulsive force vector is converted into the robotic arm joint force (or moment) by the Jacobian method.

The end-effector positional state of the robotic arm (containing position and angle) can be regarded as a vector

\dot{p}

and the joint value of the robotic arm can be regarded as a vector

\dot{q}

:

\dot{p} = [\begin{matrix} x \\ y \\ z \\ α \\ β \\ γ \end{matrix}], \dot{q} = [\begin{matrix} q_{1} \\ q_{2} \\ q_{3} \\ q_{4} \\ q_{5} \\ q_{6} \\ q_{7} \end{matrix}]

(13)

where x, y, z represent the end-effector spatial position coordinates, and

α

,

β

,

γ

are the spatial attitude angles. Where

q_{1} \sim q_{7}

are the robotic arm joint values. Theoretically, any element in

\dot{p}

can be calculated by

\dot{q}

, so there is (14)

[\begin{matrix} x \\ y \\ z \\ α \\ β \\ γ \end{matrix}] = [\begin{matrix} f_{1} (\dot{q}) \\ f_{2} (\dot{q}) \\ f_{3} (\dot{q}) \\ f_{4} (\dot{q}) \\ f_{5} (\dot{q}) \\ f_{6} (\dot{q}) \end{matrix}]

(14)

Each term on the left side of (15) is fully differentiated, and the right side is obtained by taking partial derivatives for

q_{1}

to

q_{7}

:

[\begin{matrix} Δ x \\ Δ y \\ Δ z \\ Δ α \\ Δ β \\ Δ γ \end{matrix}] = [\begin{matrix} \frac{\partial f_{1}}{\partial q_{1}} & \frac{\partial f_{1}}{\partial q_{2}} & \dots & \frac{\partial f_{1}}{\partial q_{7}} \\ \frac{\partial f_{2}}{\partial q_{1}} & \frac{\partial f_{2}}{\partial q_{2}} & \dots & \frac{\partial f_{2}}{\partial q_{7}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \frac{\partial f_{6}}{\partial q_{1}} & \frac{\partial f_{6}}{\partial q_{2}} & \dots & \frac{\partial f_{6}}{\partial q_{7}} \end{matrix}] [\begin{matrix} Δ q_{1} \\ Δ q_{2} \\ Δ q_{3} \\ Δ q_{4} \\ Δ q_{5} \\ Δ q_{6} \\ Δ q_{7} \end{matrix}]

(15)

The vector on the left side of the equation represents the amount of change in the position of the end-effector denoted as

Δ P

, the matrix with 7 rows and 6 columns on the right side is the Jacobi matrix which we will denote as J, and the vector on the far right side of the equation represents the amount of change in the joints denoted as

Δ Q

, and (15) can be abbreviated as (16)

Δ P = J Δ Q

(16)

In the differential scenario, this equation can be seen as a mapping relationship that converts joint velocities into end-effector velocities. By numbering the control points, with n carrying the number of joints and m representing the ordering of the control points, the Jacobian matrix

J_{n m}

of the control points can be derived. In the differential case, where time

Δ t

is consistently one, velocity is represented by force in this document. The three components of the vector

\vec{F_{r e p}}

are utilized as the initial three components of

Δ P

. Since the repulsive force does not produce torque on the control point so the last three elements of

Δ P

representing the angular velocity vector are

θ

, which gives

Δ P

. From this the joint space repulsive force (or moment) vector can be calculated.

M_{r e p} = pinv (J_{n m}) Δ P ⊙ K_{r e p}

(17)

where

M_{r e p}

signifies the joint repulsive force (or moment) vector. The function

pinv

stands for the pseudo-inverse of the matrix. Meanwhile,

K_{r e p}

denotes the vector of repulsion scale factors.

3.3. Guidance

The issue of target unreachability is addressed during the computation of the repulsive force. The pre-established Action Model is employed to resolve the local minima problem and direct the movement of the robotic arm.

The target values for the first two joints, simplified points for joint space obstacles, and the current joint values of the robotic arm are fed into the Action Model to arrive at the subsequent position

Q_{m o d}

for the two joints. This

Q_{m o d}

is then utilized as the target values for the initial two joints to ascertain a temporary target position

Q_{t a r g e t}

, which guides the robotic arm by generating an attractive force at the joints.

Since Action Model does not incorporate precise obstacle avoidance and only provides directional guidance, fewer training steps are necessary. The guidance can take place at short intervals. However, the proximity to the temporary target position may result in slower guidance. Thus, to expedite the process, the maximum attractive force is implemented when the arm is quite distant from the final target.

M_{a t t_{1}} = C_{m a t t m} \frac{M_{a t t_{1}}}{|M_{a t t_{1}}|}

(18)

M_{a t t_{2}} = C_{m a t t m} \frac{M_{a t t_{2}}}{|M_{a t t_{2}}|}

(19)

where

M_{a t t_{1}}

represents the attractive moment value of the first joint,

M_{a t t_{2}}

represents the attractive moment value of the second joint, and

C_{m a t t m}

represents the attractive moment maximum.

The rapid alteration of the temporary target position during the guidance process may precipitate a sudden change in the attractive moment, potentially resulting in a shock.

To circumvent such a situation, a limit is imposed on the rate of change of the attractive moment, as follows:

M_{a t t_{1}} = M_{a t t o l d_{1}} + C_{a l i} \frac{C_{a l r} (M_{a t t_{1}} - M_{a t t o l d_{1}})}{1 + C_{a l r} | M_{a t t_{1}} - M_{a t t o l d_{1}} |}

(20)

M_{a t t_{2}} = M_{a t t o l d_{2}} + C_{a l i} \frac{C_{a l r} (M_{a t t_{2}} - M_{a t t o l d_{2}})}{1 + C_{a l r} | M_{a t t_{2}} - M_{a t t o l d_{2}} |}

(21)

where

M_{a t t o l d_{1}}

,

M_{a t t o l d_{2}}

symbolize the attractive moment value at the previous point in time, and

C_{a l i}

and

C_{a l r}

are constants used to adjust the function.

At the end of Algorithm 2,

L_{s t e p}

is utilized as a step factor. It multiplies the combined force (or moment) M to ascertain the joint action quantity designated as

Δ Q

.

Algorithm 2 Guided APF

1:: while $R u n$ do
2:: if $T_{g o a l}$ changed then
3:: Calculate the joint angle $Q_{g o a l}$ .
4:: $Q_{t a r g e t} \leftarrow Q_{g o a l}$
5:: end if
6:: Get the position of the obstacle points $O P$ .
7:: Get the current joint value $Q_{c u r}$ .
8:: if $abs (Q_{g o a l} - Q_{c u r}) > T h r$ then
9:: Generate $s_{t}$ as input to Action Moled, and take a step to get $Q_{m o d}$ .
10:: $Q_{t a r g e t_{1}} \leftarrow Q_{{mod}_{1}}, Q_{t a r g e t_{2}} \leftarrow Q_{{mod}_{2}}$
11:: Use $Q_{t a r g e t}$ to calculate $M_{a t t}$ .
12:: Amplify $M_{a t t_{1}}$ and $M_{a t t_{2}}$ to maximum.
13:: else
14:: Use $Q_{g o a l}$ to calculate $M_{a t t}$ .
15:: end if
16:: Attractive limits are imposed on $M_{a t t_{1}}$ and $M_{a t t_{1}}$ .
17:: Calculate the coordinates of the control points $C P$ .
18:: for $n \in C P$ do
19:: for $m \in O P$ do
20:: if $|n m| < R_{f i e l d}$ then
21:: Calculate the repulsive moment at these two points $M_{r e p_t m p}$ .
22:: $M_{r e p} \leftarrow M_{r e p} + M_{r e p_t m p}$
23:: end if
24:: end for
25:: end for
26:: $M \leftarrow M_{a t t} + M_{r e p}$
27:: $Δ Q \leftarrow M ⊙ L_{s t e p}$
28:: end while

4. Experiments

In this segment, we implement three distinct experiments to substantiate the validity of the methodology delineated in this paper. Initially, a comparative analysis is performed between HER and HER-CA. This experiment serves to assess the efficacy of each algorithm when subjected to diverse parameters. To define the environment, we tapped into OpenAI’s Gym library whereas network training was facilitated through the Pytorch library. The upcoming experiment is designed for the simulation of robotic arm collision avoidance. The purpose of these trials was to observe and evaluate the arm’s ability to evade static as well as dynamic obstacles. MATLAB and CoppeliaSim served as the platforms on which these robotic arm collision avoidance simulations were executed.

4.1. Comparison of HER and HER-CA

Firstly, we configured the environment. We set the joint limits for the initial two joints of the robotic arm to ±45°, and applied a 45° offset to them, creating a two-dimensional joint space where both dimensions ranged from 0 to 90°. During the environment’s initialization, the starting point, obstacle point, and target point were indiscriminately positioned within the joint space. The path is judged to be a collision when it is within 3° of an obstacle, and a reach when it is within 2° of the target point. The actions given by the action model plus a Gaussian noise were explored for each game, and the maximum number of steps in each game was five. Both the action model and the value model use a fully connected neural network with 3 hidden layers, each having 256 neurons. The action model uses the ReLU activation function and the network output is processed using the Tanh function with the learning rate set to

3 \times 10^{- 5}

. The value model uses the ReLU activation function and the learning rate is set to

3 \times 10^{- 4}

. All networks use the Adam optimizer.

Each epoch plays 200 games and stores the data from each game in the data pool, and then performs 20 network trainings. Each network training is randomly collected from the data pool for 50 steps, where each step has an 80% probability of being processed by HER or HER-CA. In addition a soft update of the network model is performed at the end of each epoch.

Data acquisition was achieved by testing preserved action network models. Specifically, each action model was saved at intervals of every 10 epochs, and the success rate of each model was ascertained by running 5000 games with every individual action model.

The initial experiment sought to evaluate the effectiveness between obstacle resetting and obstacle moving away in the context of using the HER-CA method. As depicted in Figure 6, using the move away strategy slightly outperforms the resetting method, exhibiting a marginally higher average and peak success rate.

Subsequent experiments compared the effects of HER and HER-CA, with setups involving one obstacle and two obstacles for model training scenarios. The experimental results are shown in Figure 7, which indicates that the model trained by HER-CA exhibits an average success rate improvement of about 10% compared to the model trained by HER.

We used the results of the success test for the full training process to calculate the average success rate, and the success rate for the 5000 epochs at the end of the training was used for interval estimation to calculate the confidence level. The average success rate improvement of HER-CA in the case of one obstacle was 9.4%, and the confidence level for an improvement greater than 5% was 71.1%. The average success rate of HER-CA in the case of two obstacles was improved by 10.1%, and the confidence level for an improvement greater than 5% was 88.5%.

The one-obstacle case HER-CA carried less effect enhancement than the two-obstacle case, and the reason for this phenomenon may be that one obstacle is inherently less difficult to avoid, leading to the fact that HER can also be trained slightly more efficiently.

In the third experiment we tested the performance of HER-CA in a 3D path finding task. The space is a cube space with a side length of 45, in which Obstacles and target points were randomly generated in a cubic space with coordinates from 3 to 45. The performance of the HER-CA and HER was tested for the 2 obstacles and 3 obstacles cases, respectively. A decaying learning rate is used, with an initial learning rate of

5 * 10^{- 6}

for the action model and

5 * 10^{- 5}

for the value model, decaying by half every 20,000 epochs. The remaining parameters were the same as in the previous experiment.

The experimental results are shown in Figure 8. In the case of 2 obstacles, the average success rate of HER-CA improves by 20.1% compared to HER, and the confidence level of improvement greater than 10% is 82.7%. In the case of 3 obstacles, HER-CA has a maximum success rate of 90.9%, while HER could not be successfully trained, with a maximum success rate of only 29.7% at an epoch number of 9000.

4.2. Static Obstacle Avoidance

In the experimental paradigm, the robotic arm is tasked to navigate from its initial position to the target position while avoiding two stationary obstacles. The task information of the robotic arm and the definition of obstacles are shown in Table 3. Comparative experiments utilizing the APF method were also conducted. As shown in Figure 9, when using the APF (artificial potential field) method for collision avoidance, the robotic arm directly tries to move towards the target position. This leads to it getting stuck in a local minimum position, unable to reach the target point.

However, our guided APF method enables the arm to circumvent the two obstacles achieving its destined position, as shown in Figure 10. The closest proximity of the entire robotic arm from the obstacles is represented in Figure 11. A collision could potentially occur when this minimum distance reduces to less than 150 mm. However, in this experiment, the shortest distance is consistently greater than this collision threshold. The most minimal distance observed was 161 mm, occurring at 4.6 s.

4.3. Dynamic Obstacle Avoidance

This experiment was conducted to corroborate the effectiveness of our novel active collision avoidance method in bypassing dynamic obstacles to reach the target position. During the experiment, we positioned the obstacle to move in the path of the robotic arm, while the arm was required to traverse from the starting position to the same final goal position as the previous experiment, avoiding any collision on its path. The initial positions of the two obstacles are in the upper left side of the beam, moving towards the lower right side at different speeds, and the obstacle information is shown in Table 4. As demonstrated in Figure 12, the robotic arm initially attempts to pass underneath the obstacles. When the obstacles move down and block the path, it changes its direction and moves upwards, passing through the gap between the two obstacles. In the end, the robotic arm successfully navigates around the dynamic obstacles and reaches its target position smoothly.

The change in each joint angle is shown in Figure 13. The closest proximity between the robotic arm and the moving obstacle during this collision avoidance process is 200.5 mm. This smallest measured distance occurred at the 3 s, which is graphically represented in Figure 14.

5. Conclusions

The HER-CA method is proposed, which has improved the success rate of the model trained with the HER method by about 10%. Then a development was proposed for robotic arm’s collision avoidance, which combines deep reinforcement learning for guidance and APF for full-body collision avoidance. The algorithm enables the robotic arm to demonstrate higher intelligence during the dynamic collision avoidance process, as it can autonomously find a path to avoid local minima positions in real time, as shown in Figure 12. There is uncertainty in the results of the movements generated by the neural network, which can lead to less smooth movements of the robotic arm. However, this does not detract from the possible application of the above methods to unmanned intelligent robotic arms. Applications under other robotic arms and avoidance of shaped obstacles have not been considered yet, and this will be our next research direction.

Author Contributions

Q.X.: conceptualization, writing—original draft, data curation, formal analysis, validation, visualization. T.Z.: writing—review and editing, conceptualization, data curation, formal analysis, validation. K.Z.: writing—review and editing, conceptualization, data curation. Y.L.: writing—review and editing. W.J.: writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, L.; Zou, H.; Liu, H.; Tu, W.; Chen, Y. Research Status and Development Trends on Intelligent Construction of Drill-and-BlastTunnels. China J. Highw. Transp. 2024, 1–27. Available online: https://kns.cnki.net/kcms/detail/61.1313.U.20231207.1324.002.html (accessed on 3 June 2024).
Ananthanarayanan, H.S. Complete Path Planning of Higher DOF Manipulators in Human Like Environments. Ph.D. Thesis, University of Dayton, Dayton, OH, USA, 2015. [Google Scholar]
Zhang, H.; Jia, Y.; Guo, Y.; Qian, K.; Song, A.; Xi, N. Online Sensor Information and Redundancy Resolution Based Obstacle Avoidance for High DOF Mobile Manipulator Teleoperation Regular Paper. Int. J. Adv. Robot. Syst. 2013, 10, 244. [Google Scholar] [CrossRef]
Song, D.; Qu, J. Particle Swarm Optimization Algorithm Based on Artificial Potential Field. In Proceedings of the 2nd International Conference on Mechanical, Electronic, Control and Automation Engineering (MECAE 2018), Qingdao, China, 30–31 March 2018; pp. 591–594. [Google Scholar]
Yu, H.; Ning, L. Coordinated Obstacle Avoidance of Multi-AUV Based on Improved Artificial Potential Field Method and Consistency Protocol. J. Mar. Sci. Eng. 2023, 11, 1157. [Google Scholar] [CrossRef]
Wang, R.; Wang, J.; Wang, N. Research on path planning of mobile robot based on improved artificial potential field. In Proceedings of the 2015 Joint International Mechanical, Electronic and Information Technology Conference, Chongqing China, 18–20 December 2015; pp. 660–663. [Google Scholar] [CrossRef]
Song, J.; Zhao, M.; Liu, Y.; Liu, H.; Guo, X. Multi-Rotor UAVs Path Planning Method based on Improved Artificial Potential Field Method. In Proceedings of the 2019 Chinese Control Conference (CCC), Guangzhou, China, 27–30 July 2019; pp. 8242–8247. [Google Scholar] [CrossRef]
Liu, Z.; Wang, X.; Li, K. Research on path planning of multi-rotor UAV based on improved artificial potential field method. MATEC Web Conf. 2021, 336, 07006. [Google Scholar] [CrossRef]
Chen, Q.; Liu, Y.; Wang, P. An Autonomous Obstacle Avoidance Method for Dual-Arm Surgical Robot Based on the Improved Artificial Potential Field Method. In Proceedings of the 2022 International Conference on Intelligent Robotics and Applications, Harbin, China, 1–3 August 2022; pp. 496–508. [Google Scholar]
Miao, T.T.; Fang, H.Z.; An, K.; Sun, Y.B.; Fang, Z.H.; Guan, X.Q. Obstacle avoidance planning algorithm of robot arm based on attraction factor of artificial potential field. Comput. Eng. Des. 2024, 45, 578–586. [Google Scholar] [CrossRef]
Liu, H.; Wu, J.; Fan, S.; Jin, M.; Fan, C. Integrated virtual impedance control based pose correction for a simultaneous three-fingered end-effector. Ind. Robot 2018, 45, 255–266. [Google Scholar] [CrossRef]
Li, H.; Wang, X.; Huang, X.; Ma, Y.; Jiang, Z. Multi-Joint Active Collision Avoidance for Robot Based on Depth Visual Perception. IEEE/CAA J. Autom. Sin. 2022, 9, 2186–2189. [Google Scholar] [CrossRef]
Chen, Y.; Chen, L.; Ding, J.; Liu, Y. Research on Real-Time Obstacle Avoidance Motion Planning of Industrial Robotic Arm Based on Artificial Potential Field Method in Joint Space. Appl. Sci. 2023, 13, 6973. [Google Scholar] [CrossRef]
Park, S.O.; Lee, M.C.; Kim, J. Trajectory Planning with Collision Avoidance for Redundant Robots Using Jacobian and Artificial Potential Field-based Real-time Inverse Kinematics. Int. J. Control. Autom. Syst. 2020, 18, 2095–2107. [Google Scholar] [CrossRef]
Cao, Y.; Guo, Y.; Li, L.; Zhu, B.; Zhao, Z. Reinforcement Learning-based Trajectory Planning for Manipulator Obstacle Avoidance. J. Mech. Transm. 2023, 47, 40–46+96. [Google Scholar] [CrossRef]
Li, Y.; Zhang, C.; Chai, L. Collaborative obstacle avoidance trajectory planning for mobile robotic arms based on artificial potential field DDPG algorithm. Comput. Integr. Manuf. Syst. 2024, 1–15. Available online: https://link.cnki.net/urlid/11.5946.TP.20230920.0949.008 (accessed on 3 June 2024).
Andrychowicz, M.; Wolski, F.; Ray, A.; Schneider, J.; Fong, R.; Welinder, P.; Mcgrew, B.; Tobin, J.; Abbeel, P.; Zaremba, W. Hindsight Experience Replay. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Tymoteusz, L.; Andrzej, M. Reinforcement Learning-Based Algorithm to Avoid Obstacles by the Anthropomorphic Robotic Arm. Appl. Sci. 2022, 12, 6629. [Google Scholar] [CrossRef]
Evan, P.; MyeongSeop, K.; JaeHan, P.; JiHun, B.; JungSu, K. Path Planning for Multi-Arm Manipulators Using Deep Reinforcement Learning: Soft Actor-Critic with Hindsight Experience Replay. Sensors 2020, 20, 5911. [Google Scholar] [CrossRef] [PubMed]
Bai, C.; Wang, L.; Wang, Y.; Wang, Z.; Zhao, R.; Bai, C.; Liu, P. Addressing Hindsight Bias in Multigoal Reinforcement Learning. IEEE Trans. Cybern. 2023, 53, 392–405. [Google Scholar] [CrossRef] [PubMed]
Luu, T.M.; Yoo, C.D. Hindsight Goal Ranking on Replay Buffer for Sparse Reward Environment. IEEE Access 2021, 9, 51996–52007. [Google Scholar] [CrossRef]
Fujimoto, S.; Van Hoof, H.; Meger, D. Addressing Function Approximation Error in Actor-Critic Methods. arXiv 2018, arXiv:1802.09477. [Google Scholar]
Dankwa, S.; Zheng, W. Twin-Delayed DDPG: A Deep Reinforcement Learning Technique to Model a Continuous Movement of an Intelligent Robot Agent. In Proceedings of the ICVISP 2019: 3rd International Conference on Vision, Image and Signal Processing, Vancouver, BC, Canada, 26–28 August 2019. [Google Scholar]

Figure 1. The structure and coordinate system of the robot.

Figure 2. (a) Rock drill arm mod: 1—arm swing joint; 2—arm pitch joint; 3—arm telescoping joint; 4—beam slewing joint; 5—beam pitch joint; 6—beam swing joint; 7—beam telescoping joint. (b) Arm simplified.

Figure 3. The figure on the left shows the original experience and the figure on the right shows the transformed experience.

Figure 4. Schematic diagram of the control points of the rock drilling robotic arm.

Figure 5. Schematic diagram of repulsive forces generated at control points.

Figure 6. Comparison plot of training success rates for different collision-free experience generation methods. Blue represents obstacle reset red represents obstacle removal, and the shaded area represent one standard deviation.

Figure 7. Comparison plot of training success rates for different numbers of obstacles, where blue and green correspond to HER-CA and HER, respectively.

Figure 8. The comparison between HER and HER-CA in three-dimensional pathfinding tasks.

Figure 9. The process of real-time collision avoidance movement using only the APF method. From 1 to 4 represents the process of the robotic arm from the beginning to the end of the simulation.

Figure 10. Robotic arm motion process using our guidance APF method. The green arrow represents the direction of the temporary target pose given by the action model. From 1 to 4 represents the process of the robotic arm from the beginning to the end of the simulation.

Figure 11. The closest distance between the robot and the obstacles. The red dotted line is the distance limit.

Figure 12. Robotic arm motion process using our guidance APF method. The green arrow represents the direction of the temporary target pose given by the action model. From 1 to 6 represents the process of the robotic arm from the beginning to the end of the simulation.

Figure 13. The change in the value of each joint.

Figure 14. The closest distance between the robot and the obstacles. The red dotted line is the distance limit.

Table 1. D-H parameters of 7-DOF rock-drilling robotic arm,

q_{1}

–

q_{7}

are the robotic arm joint values.

Table 1. D-H parameters of 7-DOF rock-drilling robotic arm,

q_{1}

–

q_{7}

are the robotic arm joint values.

Joint	$θ_{i}$	$D_{i}$ (mm)	$α_{i}$	$L_{i}$ (mm)
1	$q_{1}$	0	$- p i / 2$	200
2	$p i / 2 + q_{2}$	0	$p i / 2$	−35
3	$p i / 2$	4297 $+ q_{3}$	0	0
4	$p i / 2 + q_{4}$	0	$p i / 2$	0
5	$p i / 2 + q_{5}$	650	$p i / 2$	80
6	$p i / 2 + q_{6}$	679.5	$p i / 2$	241
7	0	3058 $+ q_{7}$	0	0

Table 2. Description of state space elements.

Symbol	Representation
x	Current position x-coordinate
y	Current position y-coordinate
$x_{g o a l}$	Target position x-coordinate
$y_{g o a l}$	Target position y-coordinate
$x_{o b s 1}$	Obstacle 1 location x-coordinate
$y_{o b s 1}$	Obstacle 1 location y-coordinate
$x_{o b s 2}$	Obstacle 2 location x-coordinate
$y_{o b s 2}$	Obstacle 2 location x-coordinate

Table 3. Simulation information for robotic arms and obstacles.

Spherical Obstacle and Robot	Information
Start joint value of the robot arm	[ $0^{\circ}$ , $0^{\circ}$ , 0 mm, $0^{\circ}$ , $0^{\circ}$ , $0^{\circ}$ , 0 mm]
Goal position and attitude of the end-effector	[8680 mm, 4506 mm, 1700 mm, $0^{\circ}$ , $0^{\circ}$ , $0^{\circ}$ ,]
Goal joint value of the robot arm	[ ${38.51}^{\circ}$ , − ${9.71}^{\circ}$ , 1515 mm, $0^{\circ}$ , ${9.71}^{\circ}$ , − ${38.51}^{\circ}$ , 1300 mm]
Coordinates of obstacle 1	[6002 mm, 3106 mm, 1089 mm]
Coordinates of obstacle 2	[6292 mm, 2556 mm, 2389 mm]
Obstacle radius	300 mm

Table 4. Simulation information for dynamic obstacles.

Spherical Obstacle	Information
Coordinates of obstacle 1	[6002 mm, 3106 mm, 1089 mm]
Coordinates of obstacle 2	[6292 mm, 2556 mm, 2389 mm]
Velocity of obstacles 1	[0, −10, −30] (mm/iteration)
Velocity of obstacles 2	[0, −40, −10] (mm/iteration)
Obstacle radius	300 mm

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, Q.; Zhang, T.; Zhou, K.; Lin, Y.; Ju, W. Active Collision Avoidance for Robotic Arm Based on Artificial Potential Field and Deep Reinforcement Learning. Appl. Sci. 2024, 14, 4936. https://0-doi-org.brum.beds.ac.uk/10.3390/app14114936

AMA Style

Xu Q, Zhang T, Zhou K, Lin Y, Ju W. Active Collision Avoidance for Robotic Arm Based on Artificial Potential Field and Deep Reinforcement Learning. Applied Sciences. 2024; 14(11):4936. https://0-doi-org.brum.beds.ac.uk/10.3390/app14114936

Chicago/Turabian Style

Xu, Qiaoyu, Tianle Zhang, Kunpeng Zhou, Yansong Lin, and Wenhao Ju. 2024. "Active Collision Avoidance for Robotic Arm Based on Artificial Potential Field and Deep Reinforcement Learning" Applied Sciences 14, no. 11: 4936. https://0-doi-org.brum.beds.ac.uk/10.3390/app14114936

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Active Collision Avoidance for Robotic Arm Based on Artificial Potential Field and Deep Reinforcement Learning

Abstract

1. Introduction

2. Deep Reinforcement Learning Method

2.1. Simplified Robotic Arm

2.2. Describe Environment

2.3. Hindsight Experience Replay for Collision Avoidance

3. Guided Artificial Potential Field

3.1. Attractive Force

3.2. Repulsive Force

3.2.1. Control Point Repulsion

3.2.2. Joint Space Repulsion

3.3. Guidance

4. Experiments

4.1. Comparison of HER and HER-CA

4.2. Static Obstacle Avoidance

4.3. Dynamic Obstacle Avoidance

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI