1. Introduction
As a key piece of equipment in modern urban construction, cast iron has good fatigue resistance and vibration reduction, so it is often used in automobile parts manufacturing, railway, machinery manufacturing, and other fields. Because of its high strength, it is also widely used in urban water supply systems. According to the American Water Association (AWWA), the vast majority of water supply pipes in the United States are gray iron pipes and ductile iron pipes, and more than 90% of the existing water supply pipelines in China use metal pipes. Still, at the same time, cast iron pipes are generally prone to rust, which can easily cause damage to the surface layer of the metal pipeline, thus resulting in a reduction in the service life of the pipeline and an increase in construction and maintenance costs [
1]. To ensure safe operation and to avoid water pollution and waste of resources due to internal damage, timely and accurate pipeline damage detection is of great value for industrial applications. Corrosion exists in all areas of life, not only in water supply pipelines but also in tooth corrosion and ceramic corrosion [
2,
3,
4].
Nowadays, popular pipeline defect detection methods include the magnetic flux leakage method [
5], ultrasonic detection method [
6], etc. However, these methods have certain limitations. The cost of detection is too high, and the area that can be detected is limited. It is difficult to detect some small areas of corrosion. Therefore, a pipeline defect detection method with wide application scope and low cost is needed [
7].
In recent years, digital image processing and machine vision techniques have been rapidly developed in the field of structural health monitoring and can be effectively used to investigate defects on the external surface of pipes or other metal surfaces [
8], such as corrosion and cracks. Kuo [
9] et al. constructed a rust identification model based on the statistical properties of image color and the K-means clustering method, which is suitable for images with uneven illumination, However, when the image surface is uneven or the corrosion area is large or very deep, the probability of correct detection of this algorithm is low. Medeiros [
10] proposed a model for classifying corroded and non-corroded surfaces using texture descriptors obtained from greyscale co-occurrence matrices and image color features. Safari and Shoorehdeli [
11] applied artificial neural networks, Gabor filters, and entropy. Bondada [
12] et al. detected and quantitatively assessed pipeline corrosion damage by calculating the average of image pixel saturation values. Hoang [
13] proposed a method to automatically detect the corrosion of the inner wall of the water supply pipe. By combining the image texture feature extraction algorithm and the support vector machine classifier with the differential pollination optimization, the detection accuracy of the inner wall of the water supply pipe was 92.81%. Qu, ZH [
14] et al. proposes a method to detect pitting corrosion by combining feature extraction and random forest algorithms, without studying more corrosion types. Nhat-Duc [
15] proposed a LSHADE meta-heuristic algorithm to optimize the SVM model to detect pitting on the surface of components, with an accuracy of 91.80%, the accuracy rate needs to be improved. However, the above method only detects the presence or absence of corrosion on the pipeline without classifying and identifying different corrosion patterns, which lacks practicality and accuracy for realistic water supply pipelines with different corrosion types.
Therefore, this paper uses a combination of multiple image feature extraction and selection and Support Vector Machines (SVM) to classify and identify different corrosion patterns of pipes. The existing SVM research and applications mainly use Principal Component Analysis (PCA) methods to reduce the dimensionality of the dataset. Still, the PCA-extracted principal components have a certain degree of ambiguity. They are not as complete as the original samples. At the same time, the Random Forest (RF) algorithm is an excellent solution to this problem. Retaining Rahman [
16] used the Random Forest algorithm to calculate and rank the feature importance, and after selecting the top-ranked features, used SVM to classify the proteins. However, the random forest algorithm for feature selection suffers from the problem of not considering the impact of correlation between feature variables on recognition accuracy, so this paper uses feature simplification (FS) to reduce the effect of redundant features on the random forest algorithm. In the face of high-dimensional feature data, the feature simplification algorithm can improve the performance of the random forest algorithm in feature selection, further enhancing the timeliness of the algorithm and the accuracy of subsequent recognition.
In addition, SVM is highly dependent on determining parameters such as kernel function parameters and penalty factors, so optimizing the optimal parameters is the key to improving the generalization ability of SVM models. The Particle Swarm Optimization (PSO) algorithm, a population intelligence-based stochastic search algorithm, is commonly used to optimize the kernel function parameters and penalty factors of SVM models. Li. F [
17] proposed a PSO-SVM-based method for predicting the probability of failure of pressure pipelines. Although the PSO algorithm can optimize the parameters of the SVM model, the PSO algorithm itself lacks stochasticity and quickly falls into the dilemma of local optimum. In this paper, the Slime Mold Algorithm (SMA) [
18] is proposed to optimize the parameters of the support vector machine classification model. The Slime Mould Optimisation algorithm has the advantages of solid convergence performance, few tuning parameters, and easy operation, and it can maintain a balance between local optimality search and global search, which can meet the needs of optimizing the internal parameters of support vector machines in this paper.
Therefore, this paper combines the image feature extraction and selection algorithm, as well as the support vector machine classification model to achieve the classification and recognition of pipeline inner wall corrosion, and uses the feature simplified random forest (RF) algorithm to improve the related algorithm to improve the performance of random forest algorithm feature selection. The Slime Mold algorithm (SMA) is used to optimize the parameters in the SVM model to build the SMA-SVM classification model. Finally, the model is applied to the data set of pipe wall corrosion to classify and identify the damage to the pipe’s inner wall.
4. Experiments
4.1. Experimental Environment Platform
The experimental environment of this study is Inter Core i5-4200M CPU 2.5 GHz, using Matlab R2016a platform and Libsvm toolbox.
4.2. Feature-Simplified Random Forest Algorithm
In this paper, the features of the traditional random forest algorithm and the improved random forest algorithm are ranked in importance by two metrics, Mean Decrease Accuracy (MDA) and Mean Decrease Gini (MDGini) [
30], as shown in
Figure 13 and
Figure 14.
In
Figure 13 and
Figure 14, “MDA” indicates the degree to which the prediction accuracy of the RF algorithm decreases. The higher the value, the more critical the function is. “MDGini” indicates the degree of influence of each variable on the heterogeneity of observations at each node of the classification tree. The higher the value, the greater the importance of the variable. When calculating feature importance, the improved random forest algorithm has used the simplified algorithm (FS) to eliminate features with zero or very low weights. Only the remaining features are analyzed for feature importance (in this paper, the features eliminated are 41, 43, 44, 55, 66, and 77); the features screened out by the improved random forest algorithm are the same as the features with the lowest importance in the traditional algorithm’s importance ranking. The improved random forest algorithm effectively reduces the random forest error’s upper bound and improves the feature selection’s feasibility.
In
Table 4, 1–78 represent the original feature dataset, A1–A78 represent the results of ranking the feature parameters of the traditional random forest algorithm, and B1–B78 represent the results of ranking the feature parameters of the random forest improved by the feature simplification algorithm. The bolded and italicized features in the table are the features eliminated by the feature simplification algorithm.
Through a large number of comparison experiments, the top 70% feature attributes in the feature importance ranking of the improved random forest algorithm were finally selected in this paper as the vector set for constructing subsequent feature recognition, namely 1, 2, …, 55, with a total feature importance percentage of 94%.
4.3. Experimental Parameter Setting
Parameter setting of SVM: the most widely used RBF kernel function is used as the kernel function.
SMA parameter settings: the initial population size is set to 20, and the number of terminating generations is set to 200; the penalty parameter
is set to 0.01 to 500; the kernel parameter
is set to 0.01 to 100, and the weighting factor is set to 1.
Figure 15 shows the changing trend of the fitness function value.
As seen in
Figure 15, the penalty parameter
has a value of 10.3858, and the kernel parameter
has a value of 0.1 for the SVM classification model after optimization by the vicious bacteria optimization algorithm.
4.4. Classification Results of the FS-RF-SMA-SVM Model
Sample sets after image feature selection are divided into training and test sets and classified in the SMA-SVM classification model. At the same time, in order to better demonstrate the recognition ability of the newly constructed Support Vector Machine Classification Model (SMA-SVM) optimized by the Myxobacteria Optimization Algorithm for corrosion detection in water supply pipelines, its performance is compared with the traditional SVM classifier, the Support Vector Machine Classification Model (DFP-SVM) optimized by differential pollination in [
13], and so on. The support vector machine classification model (PSO-SVM) of [
31] particle population optimization and the BP network of [
32] were compared. These benchmark models were selected because they have been proven by previous studies to be a method for pattern classification. The confusion matrix graph of classification and recognition results is shown in
Figure 16,
Figure 17,
Figure 18,
Figure 19,
Figure 20,
Figure 21,
Figure 22,
Figure 23,
Figure 24 and
Figure 25 (where RF represents a random forest algorithm feature and FS-RF represents an improved random forest algorithm).
From
Figure 16,
Figure 17,
Figure 18,
Figure 19,
Figure 20,
Figure 21,
Figure 22,
Figure 23,
Figure 24 and
Figure 25, it can be seen that for the five classifier models, the confusion matrix graph of classification results shows that the number of correct samples for the improved random forest algorithm feature selection is more than that for the random forest algorithm. For the confusion matrix graph of classification results of the same kind of feature data, it can be reflected that the correct number of samples classified by the SMA-SVM classification model is more than that classified by the traditional SVM classifier. The literature [
13] proposed a support vector machine classification model optimized by differential flower pollination (DFP-SVM), and the literature [
31] proposed a support vector machine classification model with particle population optimization (PSO-SVM), and the literature [
32], BP network classification model, from which the correct number of samples can be obtained. In this paper, the random forest algorithm improved by the feature simplification algorithm and the SVM classification model improved by the myxobacteria optimization algorithm can improve the accuracy of identifying the characteristics of the damage image on the inner wall of pipelines.
The image characteristics of six different types of pipeline wall damage samples are compared under the improved random forest algorithm and the improved SVM classification model. As shown in
Table 5, the number of normal pipeline samples in the test set is 126, the number of color-order corroded pipeline samples is 66, the number of texture-order corroded pipeline samples is 90, the number of point-type corroded pipeline samples is 30, and the number of local corroded pipeline samples is 37. There are 48 global corrosion pipeline samples and 397 total test set samples.
For normal pipeline, pitting pipeline, and locally corroded pipeline, the classification result of this algorithm is the best, with accuracy of 99.21%, 90.00%, and 94.59%, respectively. Although the algorithm is not optimal for color-order corrosion, texture-order corrosion, and global corrosion pipelines, there is little difference between the classification results and the optimal algorithm. Therefore, from the viewpoint of the classification results of individual pipe damage image categories, the classification results of the above algorithms are not significantly different. Still, from the overall classification results, the classification results of this algorithm are the best and are relatively stable, with 376 correct samples classified—an accuracy rate of 94.710%. Therefore, in summary, this improved algorithm’s recognition and classification results are better than other classification algorithms and have high generalizability.
Next, the overall performance of several classification models after feature selection of the traditional random forest algorithm and the improved random forest algorithm was analyzed in terms of algorithm accuracy (Accuracy), precision (Precision), recall (Recall), F1-score and mean square error (RMSE.) [
33].
Table 6 shows the results, and to show more graphically the change curves of the classification results of the improved algorithm in this paper with those of the traditional algorithm and other optimization algorithms under these evaluation parameters,
Figure 12 compares the results using a bar chart.
From the experimental results in
Table 6 and
Figure 26, it can be seen that by comparing the recognition and classification results of the BP neural network, SVM, and optimized SVM classification algorithm models, the accuracy, recall, F1 score, and accuracy of the algorithm proposed in this paper are higher than those of other algorithms, and the mean square error index value also has good results. Therefore, the improved classification algorithm in this paper has a good classification effect and practicability. At the same time, by comparing the classification results of the improved random forest algorithm and the random forest algorithm, it can be seen that the values of the five evaluation indicators of the improved RF classification results are better than the RF classification results, which verifies the effectiveness of the improved feature selection algorithm in this paper. Combined with
Table 6, it can be concluded that the SVM classification model optimized by SMA has better classification results for normal pipes, pitting corrosion, and locally corroded pipes and the classification results for other pipes are less different from its optimal algorithm. In summary, the analysis can be concluded that the FS-RF-SMA-SVM model algorithm can provide technical support for pipe damage detection.
5. Conclusions
This study first proposes a feature selection random forest algorithm based on feature simplification, which solves the problem of the reliability of attribute weights when traditional random forest algorithms partition more feature data, considers the influence of correlation between feature variables on recognition accuracy and reduces the influence of redundant features on the algorithm. Then slime mold algorithm is used to optimize the kernel function parameters and penalty factors of the SVM model. Finally, the proposed model is applied to the classification and prediction of pipeline corrosion damage data sets. Experimental results show that the classification accuracy of the SMA-SVM algorithm based on FS-RF feature selection proposed in this paper is better than other literature algorithms. Test samples (399) were divided into 376 pairs, and the accuracy was 94.710%, 4.786%, 3.023%, 4.03%, and 0.503% higher than that of traditional SVM, DFP-SVM, PSO-SVM, and BP neural network, respectively. The experimental results meet the expected requirements, which provides a new idea for the damage detection of the inner wall of the water supply pipeline.
However, with the development of society and the increase in market demand, the requirements for pipeline detection technology will become higher and higher in the future. Therefore, the work of this paper still needs to be improved. In future research, the following aspects can be strengthened.
- (1)
In terms of feature dimensionality reduction, this paper uses an improvement of the traditional random forest algorithm, which has good results for the feature data in this paper. However, the classification effect on the new feature data set still needs to be studied; therefore, further improving the generality of the algorithm and overcoming the limitations of the feature data are the key points to be learned in the future.
- (2)
In terms of research objects, this paper only studied the common damage (corrosion) on the inner wall of the pipeline, and further research is needed to identify other damage categories, such as pipeline cracks, pipeline fractures, etc.
- (3)
From the aspect of damage identification and classification, the popular depth learning technology can be used to realize the identification of pipe wall damage, and further improve the accuracy of identification.