1. Introduction
In the process of tunnel construction, workers often need to face an extreme construction environment, which brings a great challenge for the traditional tunnel construction methods, but also provides a broad practice space for intelligent construction and construction in tunnel engineering [
1]. Meeting the complex requirements of desired tasks involves addressing the critical issue of collision avoidance for rock drilling robotic arms in intelligent construction processes. The issue of real-time collision avoidance path planning for a rock drilling robotic arm to evade two dynamic obstacles needs to be solved. How to ensure the real-time aspect of collision avoidance planning while also making sure that the planned actions allow the robotic arm to reach the target position, not just to avoid obstacles, is the main problem to be resolved.
The drilling robotic arm possesses seven degrees of freedom (DOF), and is computationally very expensive with higher DOFs and obstacles [
2]. Existing algorithms mainly focus on planning a new collision-free path applications [
3], which is not appropriate for real-time collision avoidance. However, the artificial potential field method and deep learning approaches are considered to solve this problem. The artificial potential field (APF) method, a local algorithm, can effectively generate full-body collision avoidance actions. Deep reinforcement learning (DRL) emerges as a promising approach, with its trained action models capable of global optima identification. But, each of these methods has a problem with real-time collision avoidance planning:
To actualize collision avoidance, a myriad of studies have been conducted, among which the research on artificial potential field (APF) emerges as a significant direction [
5,
6,
7,
8,
9]. To solve the collision avoidance problem of the end-effector, it converts the distance between the end-effector of the robotic arm and the obstacle into a repulsive force [
10,
11]. Because in more scenarios it is necessary to ensure that the whole body of the robotic arm will not collide with the obstacle, the whole-body collision avoidance of the robotic arm has been considered in some studies. Li et al. [
12] estimate the distance between the obstacle and the robot skeleton in real time using deep visual perception, thereby establishing a multi-joint repulsive force model. Additionally, there is also some research on robots without visual recognition. Control points are set on the robotic arm, and full-body collision avoidance is performed by calculating the repulsive force from the control point to the obstacle, and local minima are avoided by virtual obstacles [
13,
14]. However, the method of avoiding local minima through virtual obstacles requires entering the local minima position first, and after determining the position as local minima, setting the virtual obstacle to make the robotic arm leave the current position. This method is very inefficient for real-time motion planning. Generally speaking, there is almost no artificial potential field-based method that can provide real-time path planning to avoid local minimum positions.
To address the above problem reinforcement learning provides a new solution to this problem due to its global planning capability. In the study conducted by Cao et al. [
15], a Deep Deterministic Policy Gradient (DDPG) algorithm is used for collision avoidance training for a three-degree-of-freedom robotic arm to solve the problem of collision of the robotic arm under static obstacles. Li et al. [
16] used a combination of APF and DDPG to process the obstacle avoidance trajectory planning process in phases, designed the robotic arm artificial potential field method to approach the target, and then the obstacle avoidance phase set the DDPG dominant training, which improves the mobile robotic arms in the case of narrow passages and obstacle restraints Obstacle avoidance ability. For the collision avoidance motion planning of robotic arms, the exploration space is large, rewards are sparse, and the learning efficiency of the agent is relatively low. Andrychowicz et al. [
17] were the first to propose Hindsight Experience Replay (HER), which enables an agent to learn efficiently from samples of sparse and binary rewards by setting up virtual targets. HER is widely used in path planning research, and scholars add HER to their methods to increase the training effect [
18,
19]. Further research has been conducted on the basis of HER, and improved algorithms have been proposed with some success. HER is considered to produce target bias in multi-target learning because of changing the likelihood of the sampled transitions and trajectories used in training. So, bias-corrected HER (BHER) was proposed to correct the bias [
20]. The Hindsight Goal Ranking (HGR) method has been proposed to further improve the learning efficiency [
21]. Although HER works well for the reach-target problem with binary rewards, it is not optimized for the collision avoidance problem.
To grapple effectively with these challenges, this paper presents an innovative action-based obstacle avoidance method that accomplishes active robotic arm collision prevention, guided by the potential field forces in tandem with DRL. The main contributions of this paper are as follows.
Propose a DRL-guided method for collision avoidance in a simplified robotic arm model. The local minima problem of APF is solved by DRL guidance, while the use of a simplified robotic arm reduces the difficulty of DRL training.
Proposed Hindsight Experience Replay for Collision Avoidance (HER-CA) algorithm to improve the training effect. The algorithm allows the agent to learn about collision avoidance from collision experience.
Full-body collision avoidance model for rock drilling robotic arm based on artificial potential field.
The rest of the paper is structured as follows. In
Section 2, a simplified method for the robotic arm is shown first, followed by a Twin Delayed Deep Deterministic Policy Gradient (TD3) and HER-CA based method. In
Section 3, a full-body collision avoidance model based on APF is shown. In
Section 4, we conducted experiments comparing HER-CA and Hindsight Experience Replay (HER), and simulation of robotic arm collision avoidance.
Section 5 summarizes and concludes.
4. Experiments
In this segment, we implement three distinct experiments to substantiate the validity of the methodology delineated in this paper. Initially, a comparative analysis is performed between HER and HER-CA. This experiment serves to assess the efficacy of each algorithm when subjected to diverse parameters. To define the environment, we tapped into OpenAI’s Gym library whereas network training was facilitated through the Pytorch library. The upcoming experiment is designed for the simulation of robotic arm collision avoidance. The purpose of these trials was to observe and evaluate the arm’s ability to evade static as well as dynamic obstacles. MATLAB and CoppeliaSim served as the platforms on which these robotic arm collision avoidance simulations were executed.
4.1. Comparison of HER and HER-CA
Firstly, we configured the environment. We set the joint limits for the initial two joints of the robotic arm to ±45°, and applied a 45° offset to them, creating a two-dimensional joint space where both dimensions ranged from 0 to 90°. During the environment’s initialization, the starting point, obstacle point, and target point were indiscriminately positioned within the joint space. The path is judged to be a collision when it is within 3° of an obstacle, and a reach when it is within 2° of the target point. The actions given by the action model plus a Gaussian noise were explored for each game, and the maximum number of steps in each game was five. Both the action model and the value model use a fully connected neural network with 3 hidden layers, each having 256 neurons. The action model uses the ReLU activation function and the network output is processed using the Tanh function with the learning rate set to . The value model uses the ReLU activation function and the learning rate is set to . All networks use the Adam optimizer.
Each epoch plays 200 games and stores the data from each game in the data pool, and then performs 20 network trainings. Each network training is randomly collected from the data pool for 50 steps, where each step has an 80% probability of being processed by HER or HER-CA. In addition a soft update of the network model is performed at the end of each epoch.
Data acquisition was achieved by testing preserved action network models. Specifically, each action model was saved at intervals of every 10 epochs, and the success rate of each model was ascertained by running 5000 games with every individual action model.
The initial experiment sought to evaluate the effectiveness between obstacle resetting and obstacle moving away in the context of using the HER-CA method. As depicted in
Figure 6, using the move away strategy slightly outperforms the resetting method, exhibiting a marginally higher average and peak success rate.
Subsequent experiments compared the effects of HER and HER-CA, with setups involving one obstacle and two obstacles for model training scenarios. The experimental results are shown in
Figure 7, which indicates that the model trained by HER-CA exhibits an average success rate improvement of about 10% compared to the model trained by HER.
We used the results of the success test for the full training process to calculate the average success rate, and the success rate for the 5000 epochs at the end of the training was used for interval estimation to calculate the confidence level. The average success rate improvement of HER-CA in the case of one obstacle was 9.4%, and the confidence level for an improvement greater than 5% was 71.1%. The average success rate of HER-CA in the case of two obstacles was improved by 10.1%, and the confidence level for an improvement greater than 5% was 88.5%.
The one-obstacle case HER-CA carried less effect enhancement than the two-obstacle case, and the reason for this phenomenon may be that one obstacle is inherently less difficult to avoid, leading to the fact that HER can also be trained slightly more efficiently.
In the third experiment we tested the performance of HER-CA in a 3D path finding task. The space is a cube space with a side length of 45, in which Obstacles and target points were randomly generated in a cubic space with coordinates from 3 to 45. The performance of the HER-CA and HER was tested for the 2 obstacles and 3 obstacles cases, respectively. A decaying learning rate is used, with an initial learning rate of for the action model and for the value model, decaying by half every 20,000 epochs. The remaining parameters were the same as in the previous experiment.
The experimental results are shown in
Figure 8. In the case of 2 obstacles, the average success rate of HER-CA improves by 20.1% compared to HER, and the confidence level of improvement greater than 10% is 82.7%. In the case of 3 obstacles, HER-CA has a maximum success rate of 90.9%, while HER could not be successfully trained, with a maximum success rate of only 29.7% at an epoch number of 9000.
4.2. Static Obstacle Avoidance
In the experimental paradigm, the robotic arm is tasked to navigate from its initial position to the target position while avoiding two stationary obstacles. The task information of the robotic arm and the definition of obstacles are shown in
Table 3. Comparative experiments utilizing the APF method were also conducted. As shown in
Figure 9, when using the APF (artificial potential field) method for collision avoidance, the robotic arm directly tries to move towards the target position. This leads to it getting stuck in a local minimum position, unable to reach the target point.
However, our guided APF method enables the arm to circumvent the two obstacles achieving its destined position, as shown in
Figure 10. The closest proximity of the entire robotic arm from the obstacles is represented in
Figure 11. A collision could potentially occur when this minimum distance reduces to less than 150 mm. However, in this experiment, the shortest distance is consistently greater than this collision threshold. The most minimal distance observed was 161 mm, occurring at 4.6 s.
4.3. Dynamic Obstacle Avoidance
This experiment was conducted to corroborate the effectiveness of our novel active collision avoidance method in bypassing dynamic obstacles to reach the target position. During the experiment, we positioned the obstacle to move in the path of the robotic arm, while the arm was required to traverse from the starting position to the same final goal position as the previous experiment, avoiding any collision on its path. The initial positions of the two obstacles are in the upper left side of the beam, moving towards the lower right side at different speeds, and the obstacle information is shown in
Table 4. As demonstrated in
Figure 12, the robotic arm initially attempts to pass underneath the obstacles. When the obstacles move down and block the path, it changes its direction and moves upwards, passing through the gap between the two obstacles. In the end, the robotic arm successfully navigates around the dynamic obstacles and reaches its target position smoothly.
The change in each joint angle is shown in
Figure 13. The closest proximity between the robotic arm and the moving obstacle during this collision avoidance process is 200.5 mm. This smallest measured distance occurred at the 3 s, which is graphically represented in
Figure 14.