Intelligent Detection Method for Concrete Dam Surface Cracks Based on Two-Stage Transfer Learning

Li, Jianyuan; Lu, Xiaochun; Zhang, Ping; Li, Qingquan

doi:10.3390/w15112082

Open AccessArticle

Intelligent Detection Method for Concrete Dam Surface Cracks Based on Two-Stage Transfer Learning

by

Jianyuan Li

^1,2,3

,

Xiaochun Lu

^1,2,*,

Ping Zhang

^1,2 and

Qingquan Li

^1,2

¹

College of Hydraulic and Environmental Engineering, China Three Gorges University, Yichang 443002, China

²

Hubei Key Laboratory of Construction and Management in Hydropower Engineering, China Three Gorges University, Yichang 443002, China

³

State Key Laboratory of Simulation and Regulation of Water Cycle in River Basin, China Institute of Water Resources and Hydropower Research, Beijing 100048, China

^*

Author to whom correspondence should be addressed.

Water 2023, 15(11), 2082; https://0-doi-org.brum.beds.ac.uk/10.3390/w15112082

Submission received: 27 April 2023 / Revised: 20 May 2023 / Accepted: 23 May 2023 / Published: 31 May 2023

(This article belongs to the Special Issue Application of Artificial Intelligence in Hydraulic Engineering)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The timely identification and detection of surface cracks in concrete dams, an important public safety infrastructure, is of great significance in predicting engineering hazards and ensuring dam safety. Due to their low efficiency and accuracy, manual detection methods are gradually being replaced by computer vision techniques, and deep learning semantic segmentation methods have higher accuracy and robustness than traditional image methods. However, the lack of data images and insufficient detection performance remain challenges in concrete dam surface crack detection scenarios. Therefore, this paper proposes an intelligent detection method for concrete dam surface cracks based on two-stage transfer learning. First, relevant domain knowledge is transferred to the target domain using two-stage transfer learning, cross-domain and intradomain learning, allowing the model to be fully trained with a small dataset. Second, the segmentation capability is enhanced by using residual network 50 (ResNet50) as a UNet model feature extraction network to enhance crack feature information extraction. Finally, multilayer parallel residual attention (MPR) is integrated into its jump connection path to improve the focus on critical information for clearer fracture edge segmentation. The results show that the proposed method achieves optimal mIoU and mPA of 88.3% and 92.7%, respectively, among many advanced semantic segmentation models. Compared with the benchmark UNet model, the proposed method improves mIoU and mPA by 4.6% and 3.2%, respectively, reduces FLOPs by 36.7%, improves inference speed by 48.9%, verifies its better segmentation performance on dam face crack images with a low fine crack miss detection rate and clear crack edge segmentation, and achieves an accuracy of over 85.7% in crack area prediction. In summary, the proposed method has higher efficiency and accuracy in concrete dam face crack detection, with greater robustness, and can provide a better alternative or complementary approach to dam safety inspections than the benchmark UNet model.

Keywords:

dam surface cracks; transfer learning; intelligent detection; small-scale datasets

1. Introduction

Concrete dams are an important infrastructure for developing and utilising hydropower and water resources in human society, and they play an extensive role in promoting sustainable socioeconomic development and ensuring energy security. However, concrete dams are exposed to atmospheric and water environments for long periods and are subjected to external effects such as water erosion, temperature changes, dry and wet conditions, and freezing and thawing [1,2]. Thus, defects inevitably appear on their surfaces [3], with cracks being the most significant factor threatening dam safety and stability [4]. Cracks not only exist on dam surfaces but, if left untreated, will also extend to the inside of the dam, affecting the strength and service life of the dam and eventually causing safety accidents such as leakage and collapse [5]. Therefore, timely identification and detection of cracks on concrete dam surfaces are of great importance in predicting engineering hazards and ensuring dam safety. Due to the low efficiency and accuracy of manual detection and the weak adaptability of traditional image segmentation methods, an intelligent detection method for concrete dam surface cracks based on two-stage transfer learning is proposed, which can achieve accurate and intelligent dam crack segmentation from unmanned aerial vehicle (UAV) images.

To identify the safety hazards posed by cracks in concrete dams, early crack detection was mainly manual, such as inspectors taking a census of cracks through auxiliary tools such as binoculars [6] or by means of manual hanging baskets close to the dam surface [7]. However, manual detection is costly, susceptible to subjective influences, inefficient and inaccurate, and dangerous [8]. Recently, researchers have been investigating the role of dynamic sensor-based crack detection, and together with online singular spectrum analysis with real-time eigen perturbations can make a great contribution to crack monitoring [9]. Real-time single-sensor online damage monitoring technology based on eigen perturbation has been rapidly developed. This method can process data online and detect damage in structures in real time, improve the accuracy of assessing damage by generating new data through eigen perturbation, and monitor the deformation and direction of cracks in real time, enabling temporal and spatial advantages in crack identification [10,11]. However, information also needs to be collected before employing real-time single-sensor and singular spectrum analysis monitoring based on characteristic perturbations [12,13]. The location of cracks is found in time and information such as the specific shape and area of cracks is obtained to better judge the crack hazard level and repair. The greatest difference between dam surface cracks and other surface cracks is in the spatial distribution. Additionally, other cracks are mainly located on land, which is more convenient for using instruments and equipment directly for close-range exclusion. Information related to cracks in dams can be obtained efficiently and accurately by processing UAV images with computer vision technology [14,15].

In recent years, crack image segmentation methods based on computer vision have been researched and can be divided into traditional image segmentation methods and deep learning semantic segmentation methods [16,17]. Traditional image segmentation methods mainly use lower-order visual information such as a lower grey value of the crack’s image pixels than the background [18], image binarization, image filters, and numerical images to achieve segmentation for crack recognition [19,20,21]. For example, Talab et al. [22] first changed an image to a greyscale image and then classified the background and foreground in the greyscale image using a suitable threshold to segment out the cracks. Zhou and Liu [23] used a threshold segmentation algorithm to extract concrete cracks, which was more effective in identifying concrete with no colour difference on the surface. Cho et al. [24] identified and classified crack candidate pixels, then performed filtering to remove noise, and finally searched, filled, and thresholded the crack region to detect cracks. Compared with manual direct detection, traditional image segmentation methods have a certain degree of safety and feasibility; however, when there is considerable noise in the background, the method is vulnerable to interference from external factors and segmentation results are not accurate [25]. Moreover, segmentation methods have more steps for crack detection and require humans to continuously adjust the parameters to adapt to the segmentation scene. Thus, with low detection efficiency [26] that cannot meet practical engineering needs in terms of accuracy and efficiency, an accurate and efficient intelligent crack segmentation method must be established.

Compared with traditional image segmentation methods based on low-order visual information, the higher-order visual information used in deep learning semantic segmentation methods has higher accuracy and robustness [27]. Semantic segmentation algorithms classify each image pixel by using higher-order visual information to fully consider the association between each pixel point [28], thus allowing for more efficient and accurate segmentation of image classes.

After analysing existing studies, semantic segmentation algorithms are feasible for crack detection. Jiang et al. [29] proposed a concrete crack image recognition method based on the UNet model to segment cracks. Then, they used the maximum inner tangent circle algorithm to calculate crack width. Jian et al. [30] used the DeepLabV3 model to experiment with the public dataset Crack500, and the test set IoU was 67%, which is less accurate. However, most studies are based on experimental studies of cracks in housing buildings, pavement, and bridges using open datasets [31,32,33], while there are fewer studies on safe methods for concrete dam crack detection. Tang et al. [34] proposed a visual crack width measurement method to improve the crack shape estimation efficiency by simplifying the redundant data in crack images according to an improved image refinement algorithm and width measurement scheme after crack image segmentation using the UNet model. Huang et al. [35] proposed an improved DeepLabV3+ network based on hydraulic concrete crack segmentation but could not solve the application scenario of fine crack segmentation. Chen et al. [36] proposed A_DCDNet for dam crack detection based on an FCNN, which possesses efficient detection efficiency, but the model is too coarse for crack edge segmentation. Wang et al. [37] constructed a dam crack dataset and used the benchmark model SegNet to achieve dam crack segmentation, but crack leakage occurred. Cheng et al. [38] proposed an improved U²Net model for embankment crack detection with good detection accuracy, but the dataset is small and prone to underfitting and inadequate training if trained directly. Therefore, if the existing methods are directly applied to concrete dam surface crack detection, the following problems remain: (1) Concrete dam cracks are still different from cracks in the public dataset in distribution and pixel characteristics; thus, the public dataset is not suitable as a training dataset for concrete dam crack detection (see Figure 1). The cracks in the public dataset are more obvious, while the concrete dam cracks are more obscure, small, and low contrast, so the public crack dataset cannot be directly expanded into a dam face crack image, which will change the probability distribution of the training set data and reduce the recognition accuracy and edge segmentation precision. Additionally, the small sample size in the concrete dam crack dataset will lead to inadequate model training and missed detection. (2) Concrete dam images have few pixel points occupied by cracks, while the background takes up the vast majority of the image, resulting in the model’s inadequate crack information extraction for cracks in the dam face features and weak segmentation of fine cracks. (3) Cracks in actual projects are more complex, and existing algorithms have low robustness and pixel accuracy for detecting cracks in concrete dams due to uneven lighting and the low contrast between cracks and the surrounding environment, making edge segmentation smooth.

To solve the above problems and improve concrete dam crack detection efficiency and accuracy, this paper explores an intelligent detection method for concrete dam surface cracks based on two-stage transfer learning. A two-stage transfer learning strategy is designed to fully train the model and improve accuracy by using the ResNet50 network as the backbone feature extraction network to fully extract the dam face crack feature information. Multilayer parallel residual attention (MPR) is constructed and added to improve the model robustness and edge segmentation performance. We conducted experiments on a self-built dam surface crack dataset with UAVs collecting images at 3–10 m from the dam surface with crack widths between 3 and 20 mm to verify the correctness and feasibility of the proposed method. Finally, the experimental results show that the mIoU, mPA, and frames per second (FPS) of the proposed method in this paper are 88.3%, 92.7%, and 36.5, respectively, showing the best performance in dam surface crack segmentation.

The main contributions of this paper are as follows:

An intelligent detection method for concrete dam surface cracks based on two-stage transfer learning is proposed to share parameters and features through cross-domain and intradomain learning. The proposed approach can alleviate low segmentation accuracy due to the lack of concrete dam crack datasets that results in inadequate model training and overfitting and reduce the phenomenon of fine crack miss detection.
The ResNet50 network is used as the backbone feature extraction network to enhance crack feature information extraction by the UNet encoder and improve the ability to segment fine cracks.
The designed multilayer parallel residual attention (MPR) is integrated into its jump connection path to suppress the interference of extraneous regions on crack segmentation to improve pixel accuracy and make crack edge contours clearer.

2. Related Work

2.1. Deep Learning-Based Crack Detection

Deep learning currently has two popular directions: object detection methods and semantic segmentation methods. Usually, two main types of target detection exist: (1) two-stage detection [40] and (2) one-stage detection [41]. The two-stage detection approach first generates a fixed-size feature map for each candidate region using a Rol pooling layer and then obtains the result using bounding box regression, which is computationally intensive, although the accuracy is high. The one-stage detection method predicts the object class and bounding box in one step, as the name suggests, with high computational efficiency. Huang et al. [42] detected cracks in dams by improving YOLOX, and the results achieved high accuracy. Min et al. [43] proposed an improved YOLOv4 to achieve accurate bridge crack identification. The object detection method is accurate and efficient, enabling not only the identification of the type of detection target but also the localization of the target; however, it lacks the identification of specific contours in the detection target.

Deep learning semantic segmentation focuses on using convolutional neural networks to understand the real-world objects represented by each pixel in an image, that is, to segment the different objects in an image at the pixel level so that a specific outline of the target can be obtained. Since the specific shape and area of cracks are obtained to better determine the crack hazard level and to repair them, this paper uses semantic segmentation to conduct research on crack detection in concrete dams. In 2015, the fully convolutional network (FCN) was proposed [44] to reach a new milestone in image segmentation for deep learning. FCN uses a softmax function on top of a convolutional layer to classify each pixel to complete fine image segmentation. FCN cannot use global scene category information, which will cause feature loss. In the same year, Ronneberger et al. proposed the UNet model [45] based on the FCN network architecture, which modified the FCN encoder and decoder to obtain better performance on fewer training data, although the feature extraction capability is not optimal. Meanwhile, to solve the problem that FCN cannot utilise global scene information, Zhao et al. proposed the pyramid scene parsing network (PSPNet) [46] in 2017, which enables semantic segmentation models to fully access contextual relationships through the pyramid structure. In 2018, Chen et al. proposed a codec structure with null-separable convolution for image semantic segmentation (DeepLabV3+) [47], which achieves excellent segmentation results; however, it is computationally intensive and may lead to long training times. Subsequently, a number of excellent semantic segmentation models also emerged [48] and were applied to crack segmentation in various scenarios [49]. Liu et al. [50] used the DeepLabV3+ model for crack detection in ageing buildings and bridges to obtain information on crack changes in buildings. Rill-García et al. [51] trained an improved fully convolutional neural network (FCNN) on the public CrackForest dataset and achieved road crack detection but with low segmentation accuracy. Liu et al. [52] used Swin-UNet to perform experiments on a public fracture dataset, which can obtain fracture contours efficiently, but with an IoU of 70% and low accuracy. In summary, although semantic segmentation achieves better performance for crack detection than traditional image segmentation, targeted optimisation is needed for concrete dam surface crack detection where cracks are obscure and datasets are lacking.

2.2. Transfer Learning-Based Crack Detection

Deep learning algorithms require large datasets to support training, and insufficient datasets can seriously affect crack feature extraction and image detection. To address the lack of datasets limitation, the use of transfer learning is one of the most effective and practical solutions [53]. Transfer learning is the transfer of knowledge learned from one domain to another dataset domain; the knowledge is transferred from the source domain, and the knowledge is given to the target domain. In general, transfer learning can obtain relatively good test results even with small samples of labelled data [54]. For example, Fan et al. [3] proposed a transfer learning-based underwater dam crack image segmentation model that can accurately segment underwater dam crack images with a small dataset. Li et al. [55] improved crack detection accuracy by sharing model parameters in the source domain through a transfer learning approach. In theory, transfer learning can be performed between any domains, but if the pretrained dataset is more relevant to the task target dataset, then the better the pretrained model for knowledge transfer, the better the a priori knowledge acquisition for transfer learning will be [56]. Therefore, this paper proposes a two-stage transfer learning strategy to enable better transfer learning.

2.3. Crack Detection Backbone Network

Concrete dam surface cracks are more obscure and minute, and feature information extraction is difficult; therefore, it is necessary to make targeted changes to the model backbone feature extraction network to enhance the encoder information extraction capability. The dominant backbone network structures in current crack recognition are AlexNet [57], VGG [58], and ResNet [59]. Dung and Anh [60] replaced the backbone feature extraction network of the FCN model with a VGG network. The model achieved a comprehensive evaluation index, average accuracy of 90%, and can predict concrete cracks more accurately than the original model. Based on relevant research experience, network depth is crucial for visual recognition tasks; however, in practical experiments, it has been found that with the superposition of convolutional and pooling layers, the network learning effectiveness decreases and the error rate increases [61]. With the introduction of residual networks [59], it is possible to reach a deeper network level with a low classification error rate. Residual networks are widely used for this reason. Zhao et al. [62] addressed the low recognition accuracy problem of the UNet model by using ResNet18 as the backbone network to enhance the feature extraction capability of the network. Xu et al. [63] used the ResNet34 residual network as a model encoder to better extract crack detail information. Considering the stronger feature extraction capability of ResNet, this paper replaces the UNet backbone feature extraction network with ResNet50 to increase the depth of the feature extraction network and improve the segmentation accuracy of the model.

2.4. Crack Detection Based on Attention Mechanisms

In computer vision, attention mechanisms can mimic the human visual system, thus assigning different weights according to the importance of different content to improve attention to key information and enhance the network feature representation [64]. Yu et al. [65] addressed the low accuracy problem of existing deep learning road crack detection methods and improved the model’s ability to detect cracks by embedding an attention mechanism in the UNet model. Given the application flexibility of the attention mechanism and the excellent performance that can be achieved in image processing and inspired by the residual module [59] and the atrous-spatial pyramid pooling (ASPP) module in DeepLabV3+ [47], an MPR attention mechanism was designed and integrated into the jump connection path of UNet to increase the model’s attention to crack feature information, reduce the effect of noise on crack edge segmentation results, and improve the model’s pixel segmentation accuracy.

3. Proposed Approach

3.1. Model Framework

The UNet model [45], as one of the most commonly used semantic segmentation models for deep learning, has the advantages of being lightweight and easy to deploy and can achieve good segmentation results even for small-scale dataset training; therefore, the UNet model is chosen for concrete dam surface crack detection in this paper. The UNet model can be divided into two parts: the encoder and the decoder. The encoder is used to extract features from the input image to gradually obtain higher-order semantic feature information in the image. The decoder is used to upsample and convolve the input feature map to gradually recover the image size and the number of feature map channels. Finally, the classification of pixels in the image is predicted. To enhance the information transfer between the encoder and decoder, the UNet model is designed with a jump connection so that the high-resolution shallow information output from the corresponding stage of the encoder is fed directly to the decoder, supplementing some of the feature information lost due to downsampling or convolution.

To further improve the segmentation ability and edge recognition accuracy of the UNet model for obscure and small cracks and to make the segmentation of concrete dam surface cracks more accurate, the UNet model is improved in this paper; the structure of the improved model is shown in Figure 2. The specific improvement measures are as follows: (1) A ResNet50 network is built as the feature extraction network of the UNet model encoder. The number of network layers and extraction capacity are deepened to fully obtain the crack feature information through the residual module so that the model can effectively learn the deep features of the dam face cracks and improve the crack segmentation model accuracy. (2) MPR attention added to the jump connection layer is designed to enhance the feature representation of the model by acquiring more semantic information, which not only suppresses feature responses in irrelevant regions and increases the importance of effective feature information channels but also allows the network to focus on crack feature information, complementing the loss of detail and enabling the model to segment the dam face crack images more accurately.

3.1.1. ResNet50 Backbone Network

Appropriately increasing the network depth can strengthen the feature extraction ability of the model and obtain more deep semantic information, but some researchers have found that as the structure of the network model deepens, training becomes increasingly difficult and encounters problems such as disappearing and exploding gradients. To effectively solve the degradation problem caused by deepening the network while strengthening crack feature information extraction by the UNet backbone network and improving crack detection accuracy, the ResNet50 backbone network, shown in Table 1, is proposed as the UNet encoder in this paper. Each residual module in the ResNet50 network consists of three concatenated convolutional layers of 1 × 1, 3 × 3, and 1 × 1, and a residual structure is added to retain some of the shallow information, which can prevent the model degradation problem due to gradient explosion. The residual module is divided into 2 types, as shown in Figure 3. If the number of input feature map channels matches the number of output channels, the residual module is A. If the number of input feature map channels does not match the number of output channels, the residual structure needs to perform a 1 × 1 convolution to adjust the number of channels, and the residual module is B.

3.1.2. Multilayer Parallel Residual Attention

MPR is designed to address the problem that cracks in the image provide little information and the pixel segmentation accuracy is low. MPR is added to the jump connection so that the decoder can obtain more important semantic information for learning and increase the edge segmentation accuracy. The MPR consists of 4 parts (Figure 4): the skip layer, the 1 × 1 convolution layer, the 3 × 3 convolution layer, and the atrous convolution layer with a dilation rate of 3. The 4 parts perform the convolution calculation in parallel, and the main codes are shown in Table 2. The skip layer skips the input feature map x and adds it directly to the output of F_i(x). This step retains the feature information of the previous layer in the feature map y of the next layer to protect information integrity. The 1 × 1 convolution and 3 × 3 convolution layers allow the model to perform feature learning under multiscale convolution, extracting deep semantic information and increasing the nonlinear capability of the model. The cavity convolution layer uses a cavity convolution with a sampling rate of 3 to give the model a larger perceptual field for the same number of parameters and computational effort, which helps to improve the missing information problem for crack edge identification. Finally, the outputs of the four are summed to increase the amount of crack information in each feature map dimension, thus reducing the influence of irrelevant feature information on subsequent crack identification results.

3.2. Two-Stage Transfer Learning Strategy

Semantic segmentation is a type of supervised learning that requires a large quantity of labelled training data, which is not easy to obtain on concrete dam surface cracks in complex environments. The labour cost of acquisition and data labelling is high, and the lack of datasets easily leads to inadequate model training and low detection accuracy. To solve the problem of insufficient training data for deep learning, transfer learning methods have been developed [66]. However, if the difference between two domains is particularly large, the results obtained by directly adopting the transfer learning method are often poor. Therefore, this paper designs a two-stage learning strategy for knowledge transfer to improve the low detection accuracy and poor modelling results on small sample datasets. The two-stage transfer learning training process is shown in Figure 5. (1) The first stage is cross-domain-based model knowledge transfer, i.e., sharing model parameters using the source domain to the target domain. The training results obtained from the PASCAL VOC 2012 dataset model in the source domain are used as the pretraining model for the target domain (DatasetA) in the first stage. The parameter information shared between the source and target domains is found so that DatasetA can be updated with parameters from the perspective of the algorithm and the model during the training process. This process avoids network learning from scratch and allows a better mentor model to be obtained in the second stage. (2) The second stage is based on intradomain feature knowledge transfer, i.e., mapping features from the source and target domains from the original feature space to the new feature space. The pretrained model of DatasetA after the first stage of transfer learning is used as the source domain in the second stage to learn the same information and knowledge structure from the related domain. Then, the same feature representation existing in the source domain as in the target domain is migrated to the target domain (DatasetB) through the transfer learning method. This ensures that the existing labelled data samples in the source domain can be better utilised for classification training in the new space and solves the poor accuracy problem caused by the sparse labelled dam face crack images.

Both phases were trained for 300 generations, but due to the similarity in data, tasks and models between the first and second phases, a freeze training step was added to the second phase transfer training to speed up the training efficiency and prevent the weights from being corrupted. This means that the backbone network was frozen for the first 150 training generations, only the decoding classifier was trained, and the whole network was unfrozen for the second 150 training and learning generations. The two-stage transfer learning strategy reduced the model training time and resulted in a highly accurate and robust model for intelligent concrete dam surface crack detection.

4. Experiments and Evaluation Indicators

4.1. Experimental Setup

The experiments in this paper were based on the Ubuntu 18.04 operating system, the CPU processor was an AMD EPYC7543 32-Core processor, the GPU used a GeForce RTX3090 graphics card with 24 GB of video memory, and the programming language was Python 3.8, and CUDA 11.1, the PyTorch 1.8.1 deep learning framework for network training was used. The specific model training parameters are shown in Table 3.

4.2. Datasets

The dataset used for the experiments in this paper has three parts: the PASCAL VOC 2012 dataset, the first stage cross-domain training crack dataset (DatasetA), and the second stage target domain concrete dam surface crack image dataset (DatasetB). The PASCAL VOC 2012 dataset is a public dataset for world-class computer vision challenge image classification, detection, or semantic segmentation, consisting of 1464 images from the training set, 1449 images from the validation set, containing 20 categories and 1 background. DatasetA consists of 2 parts, partly derived from the publicly available datasets CrackForest [30], SDNET2018 [24], and Aft Original Crack DataSet Second [26], but the distribution and pixel characteristics of cracks on the concrete dam surfaces are very different from the open source dataset. Therefore, in the 3 publicly available datasets, only the images with fine and obscure cracks were selected, and those with similarity greater than 0.7 were excluded to improve the dataset quality; 280 images were finally obtained. Another part of DatasetA we acquired in actual concrete road cracks, and 700 small crack images were added to increase the realism of the dataset and make it closer to the actual engineering context. After the image enhancement algorithm retinex [67], adjustment of image luminosity, contrast, and spatial variation (random rotation, flip), DatasetA was expanded to 3152 images and used as the target domain dataset for cross-domain training in the first phase of the model. The selected open dataset images and road crack images are shown in Figure 6.

The second phase target domain DatasetB is our self-built concrete dam surface crack dataset, and the DJI Mavic 3 UAV was used to photograph several concrete dams in the upper Jinsha River in China to obtain concrete dam crack image data. The shooting distance was 3~10 m and focused on image diversity (different angles, background conditions, light intensity) with a total of 350 images collected, with crack width between 3 and 20 mm. To ensure that the dataset can adapt to different complex environments and to make the model more generalisable and robust, the images were annotated and then expanded using retinex enhancement, adjusting image luminosity, contrast, and spatial variation to obtain 1393 images.

The intelligent detection method for concrete dam surface cracks in this paper used supervised learning. To ensure the quality of the dataset and the accuracy of the target information, both the actual collected images of DatasetA and DatasetB were manually annotated at the pixel level using the LabelMe annotation tool. Finally, DatasetB was randomly divided into a training set and a validation set at a ratio of 8:2.

4.3. Evaluation Indicators

To objectively evaluate the performance of different models, typical evaluation metrics mean intersection over union (mIoU), mean pixel accuracy (mPA), parameters, and floating point operations (FLOPs) are introduced in this paper. Intersection of union (IoU) represents the overlap rate between the predicted mask and the actual mask. mIoU is the arithmetic mean of the IoU values for each category and assesses whether the overall image segmentation is precise. Pixel accuracy (PA) indicates the proportion of the number of correctly segmented pixels, and mPA is the arithmetic mean of each class of PA to assess the global accuracy of the model. Parameters measure the number of model parameters, FLOPs measure the complexity of the model, i.e., the number of model computations, and FPS measures the speed of inference of the model.

m I o U = \frac{1}{k + 1} \sum_{i = 0}^{k} \frac{p_{i i}}{\sum_{j = 0}^{k} p_{i j} + \sum_{j = 0}^{k} p_{j i} - p_{i i}}

(1)

m P A = \frac{1}{k + 1} \sum_{i = 0}^{k} \frac{p_{i i}}{\sum_{j = 0}^{k} p_{i j}}

(2)

where k + 1 denotes the category to be predicted plus a background, and p_ij denotes the number of pixels in category i that are predicted to be in category j. Therefore, p_ii is a positive sample, p_ij is a false negative sample, and p_ji is a false positive sample.

5. Results and Discussion

5.1. Ablation Experiments

To verify the effectiveness and advancement of the improvement strategies in this paper, eight sets of ablation experiments were designed to validate the model under the same dataset and experimental environment to verify the impact of different improvement strategies on the concrete dam surface crack extraction accuracy. In Table 4, it can be seen that the concrete dam surface crack segmentation performance improved under different strategies. After using ResNet50 as the backbone feature extraction network, mIoU and mPA improved by 1.2 and 1.3 percentage points, respectively, compared with the UNet model, the model computations were reduced by 64.1 percentage points, and the inference speed increased by 128.1 percentage points, indicating that the ResNet50 backbone network can obtain more crack feature information and segment the foreground crack region more accurately. Additionally, it can reduce the complexity of the UNet model and speed up the model inference capability. After the UNet model was embedded with MPR attention, mIoU and mPA improved by 1.1 and 1.6 percentage points, respectively, compared to the UNet model. The model’s ability to extract important semantic information was enhanced, suppressing some of the useless background feature information, improving pixel segmentation accuracy, and making edge details more complete but sacrificing detection time. After using the two-stage transfer learning strategy, both mIoU and mPA substantially improved, proving that the method can effectively alleviate model overfitting and undertraining due to the lack of a dataset, reduce fragmentation of dam surface cracks, and reduce missed detections. In the overall analysis, the integrated capacity of the model was optimal after integrating the three improvements simultaneously. The mIoU and mPA of the proposed method were 88.3% and 92.7%, respectively, and were 4.6 and 3.2 percentage points higher than those of the UNet model. The computational volume required by this paper was 36.7 percentage points lower than that of the UNet model, and the inference speed was 48.9 percentage points higher than that of the baseline UNet model. This demonstrates that the method achieves better concrete dam surface crack segmentation performance. Although the increase in the number of method parameters in this paper led to an increase in the memory taken up by the model weights, the size of the memory taken up by the model had no direct effect on the concrete dam surface crack segmentation in this paper.

5.2. Comparative Trials of Different Attention Mechanisms

To reflect the advantages of MPR attention in this paper, Table 5 shows the experimental results of different attention mechanisms on model detection ability under the same experimental environment and dataset. The table shows that SE [68] attention had the least disturbance on the model inference speed but did not significantly improve model accuracy and precision. Compared to SE attention, the popular CBAM [69] attention mechanism achieved a more pronounced improvement to cleft IoU and PA, indicating that convolutional block attention module (CBAM) attention is more capable of extracting fine targets. The improvement in crack accuracy from adding CA [70] attention was negligible and had no significant effect on accuracy. There was a small decrease in accuracy with the addition of the lightweight ECA [71] model, with a 0.3% decrease in mIoU and a 0.7% decrease in mPA, indicating that lightweight efficient channel attention (ECA) is less suitable for fine crack detection and that attention is not always applicable to any scenario. Compared with other commonly used attention mechanisms, MPR attention, designed in this paper, is more effective in improving accuracy and precision and is more suitable for detecting concrete dam surface cracks. Although MPR will have some impact on the inference speed, the improvement in detection accuracy is undoubtedly more important for better ensuring dam safety and stability and is, therefore, more cost-effective.

5.3. Training Trials with Different Transfer Learning Approaches

Table 6 shows the effect of different transfer learning approaches on improving the segmentation accuracy of the model. It is clear from Table 6 that the cross-domain-based transfer school approach achieved the lowest improvement in training accuracy for the target domain, with mIoU and mPA only improving by 0.6% and 0.3%, demonstrating that if the difference between the 2 domains is large, the pretrained model has little effect in improving the target domain. Compared with cross-domain transfer, the intradomain transfer learning approach achieved a more significant improvement in target domain accuracy, with the IoU and PA of the cracks improving by 1.9% and 1.5%, respectively, indicating that better target domain learning can be achieved through intradomain feature knowledge transfer. Compared to single-segment transfer learning, a two-stage transfer learning approach with cross-domain and in-domain learning was used in this paper for optimal results. The cross-domain training was first performed on the in-domain DatasetA through the out-of-domain PASCAL VOC 2012 dataset to avoid training from zero so that a better in-domain pretraining model was obtained. Then, the pretraining model was used as the tutor model for the target domain to achieve the best transfer learning effect. The final IoU and PA of the cracks improved by 5.6% and 2.9%, respectively, indicating that in transfer learning, in addition to the impact of the difference in data distribution between the source and target domains on the accuracy improvement, the merit of the mentor model also has an impact on transfer learning, which also verifies the effectiveness of the two-stage transfer learning strategy in this paper.

5.4. Crack Segmentation Image Comparison

For a more intuitive analysis and to demonstrate that the proposed method has greater crack segmentation capability, the concrete dam surface cracks detection results are visualised in this section using the UNet model, UNet + ResNet50, UNet + ResNet50 + MPR, and the methods in this paper. The visualisation of the crack segmentation for the different models is shown in Figure 7. The selected dam face crack images were complex and contained obscure, microscopic cracks; therefore, segmentation was challenging. The UNet benchmark model in Figure 7 roughly extracted the dam face crack contours, but the extraction was incomplete and weak for fine cracks, leading to serious underidentification and noise phenomena, and the segmentation of crack edges was smoother and rougher, with low segmentation accuracy. With the addition of the ResNet50 backbone network to UNet, the model’s ability to extract features from the image was enhanced, and the segmentation accuracy improved, allowing the outline of fine cracks to be roughly extracted. However, there were fractures and noise, and the edge details needed to be improved. UNet + ResNet50 then added MPR attention to reduce the interference of background regions on subsequent predictions, focus more on crack regions, and enhance fine crack extraction, thus improving the accuracy and precision of the model’s crack edge shape segmentation with clearer contours and a better ability to retain details than the UNet model, but fragmented segmentation and missed detection were still present. After combining the three improved strategies, the proposed method achieved the highest degree of dam face crack segmentation completeness, which was closer to the manual annotation, and had a strong segmentation ability and robustness for cracks of different scales. These results show that the two-stage transfer learning strategy can compensate for the undersegmentation and overfitting phenomenon caused by the small quantity of data and can effectively improve the missed detection situation caused by thin cracks and few effective pixel points and process the edge details in a clearer, more detailed, and complete way. The combined subjective and objective results show that the proposed method exhibits the best performance in intelligent concrete dam surface crack segmentation and shows greater robustness in segmenting complex and obscure fine cracks.

5.5. Comparison of Typical Models

Table 7 shows the training evaluation results of this paper’s model with the current typical semantic segmentation models UNet [28], PSPNet [29], DeepLabV3+ [30], and SegFormer [31] in the same experimental environment and with the same dataset. From Table 7, the DeeplabV3+ model had the lowest mIoU and mPA with only 71.6% and 75.0%, respectively, followed by PSPNet, with both models having lower segmentation values than expected. This indicates that using large networks such as DeepLabV3+ and other models with insufficient sample sizes can easily lead to overfitting. The better segmentation accuracy of the UNet and SegFormer models shows that these models have some stable segmentation ability even in small datasets and complex scenarios. Overall, the accuracy indexes of these methods were optimal, and the IoU and PA of cracks significantly improved compared with the UNet model before improvement, achieving 76.9% and 85.6%, respectively. This indicates that the segmentation accuracy of the improved method for concrete dam surface cracks is better than the other models and more suitable for concrete dam surface crack detection.

5.6. Discussion

5.6.1. The Effectiveness of Transfer Learning

Figure 8 shows the change curves of the ablation experimental metrics. It can be observed that the mIoU and mPA metric values of each experimental model increased with the increasing number of iterations, and the training loss decreased with the increasing number of iterations. The models that did not use the transfer learning strategy were trained for 260 iterations before the fluctuation in each metric gradually stabilised. Then, the mIoU, mPA, and training loss curves were parallel to the X-axis, indicating that the training could end. The improvement in mIoU and mPA metrics and the decrease in loss values were evident in the models trained with the transfer learning strategy. mIoU and mPA increased rapidly in the first 10 phases of the training process and converged smoothly in the latter phases of the training process, with the corresponding loss values dropping to much lower values. In summary, with the use of the transfer learning strategy, the segmentation accuracy of small datasets can be improved through knowledge and feature sharing, and high accuracy can be achieved in a short training time. Therefore, transfer learning can be applied not only to detection tasks with small datasets but also to scenarios with limited computational power for model training to achieve fast training results, reduce the number of training rounds, and save computational resources.

5.6.2. Crack Area Pixel Value Prediction

Table 4, Table 6 and Table 7 and Figure 7 show that the proposed method performs best in the automatic concrete dam surface crack segmentation, with more accurate segmentation of complex and obscure small cracks, and can accurately and efficiently detect crack location and shape. This subsection, therefore, performs a pixel-level statistical analysis of the cracks obtained from the segmentation of this paper’s method to predict the pixel value of the image crack area. Table 8 shows the statistical table of crack area prediction of the proposed method, and the selected images are consistent with those segmented in Figure 7. In Table 8, it can be seen that the crack area prediction accuracy by the proposed method reaches more than 85.7%, and the prediction accuracy of relatively obvious crack areas can reach more than 90%. The test results show that, in addition to the fast and accurate crack identification and segmentation, the proposed crack segmentation method can also accurately obtain crack area information. In summary, the UAV images can be fed into the proposed method to efficiently and accurately identify and segment concrete dam surface cracks, then the pixel values of the segmented crack areas can be derived, and the actual area values of concrete dam cracks can be obtained by converting the pixel points to the actual values [72]. The crack hazard level is judged based on the area obtained, and then the appropriate material is selected for remediation, enabling better targeted repair during dam health diagnosis and solving the low manual measurement efficiency and detection accuracy problems while avoiding material waste.

5.6.3. Practical Applications

Figure 9a,b gives the results of the dam crack image segmentation without annotation by the method in this paper. The method effectively detects the general outline and location of the cracks in the picture and can accurately segment different cracks, which proves the effectiveness and reliability of the method in this paper for detecting cracks in actual dams. Figure 9c,d shows the dam crack images with different environmental backgrounds from DatasetB, and we find that they can also obtain good recognition, probably because the migration learning process is used in pretraining the weights for crack images of similar environments, so crack segmentation in other environmental backgrounds also achieves good robustness.

5.6.4. Future Work

Table 4 reflects the combined ability of the methods in this paper to reach optimality. The mIoU and inference speed were improved by ResNet50 as a feature extraction network, but the improvement in mPA was not significant. The mPA measures the proportion of pixels correctly segmented by the model, indicating that there is still room for improvement in the model’s segmentation and edge recognition capabilities. The experiments found that the mPA improvement was more obvious after adding MPR to the UNet model. Therefore, the two models were combined, but there was still the fine crack missed detection problem, which was most likely caused by the small sample size of the dataset, making the model training insufficient. After the improved UNet model was trained using a two-stage transfer strategy, a significant reduction in model misses was observed in Figure 5, demonstrating that transfer learning is very effective in helping to train when using small datasets. The method ultimately has good inference speed while achieving the highest segmentation accuracy but also reflects the large number of parameters and the large space occupied by the method volume, resulting in weak model mobility, which is less suitable for detection in moving scenes. In the future, the model can be thinned by pruning the convolutional layers or replacing the model encoder with a lightweight feature extraction network to reduce the parameters and computations to make the model lightweight so that the model can also be deployed in mobile and embedded devices such as Jetson.

6. Conclusions

To improve concrete dam surface crack detection efficiency and accuracy, an intelligent concrete dam surface crack detection method based on two-stage transfer learning is proposed, which can achieve accurate and efficient UAV concrete dam surface crack image segmentation. Compared with other typical semantic segmentation models, the proposed method has the best overall capability and better performance in detecting cracks in concrete dams.

After training using two-stage transfer learning, the proposed method alleviates the inadequate training problem caused by small-scale dam crack datasets, reduces the fine crack miss detection phenomenon, and can achieve high accuracy in shorter training rounds. Additionally, using ResNet50 as a UNet model feature extraction network can fully extract crack feature information and improve the model segmentation capability. Finally, adding MPR to the jump connection path significantly improves pixel accuracy and makes crack edge segmentation more delicate and complete.

The experimental results show that the mIoU and mPA of the proposed method reach 88.3% and 92.7%, respectively, and the FPS is 36.5. The proposed method has the ability to segment cracks on concrete dam surfaces more efficiently and accurately and achieves an accuracy of over 85.7% for predicting the crack area, which can better detect cracks in dams safely.

Author Contributions

J.L.: Conceptualization, methodology, software, writing—original draft, and writing—review and editing. X.L.: Data curation and investigation. P.Z.: supervision and conceptualization. Q.L.: Writing, methodology, review, and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (52009068), the Natural Science Foundation of Hubei Province (2022CFD168) and the Major Science and Technology Projects of the Ministry of Water Resources (SKS-2022102).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Some of the datasets in this study are openly available from (https://github.com/cuilimeng/CrackForest-dataset, (accessed on 6 December 2022)), https://pan.baidu.com/s/1a5VCmzJunFbg_tyzgszOmQ?pwd=9kwl (accessed on 6 December 2022).

Acknowledgments

We would like to acknowledge the financial support of the National Natural Science Foundation of China (52009068), the Natural Science Foundation of Hubei Province (2022CFD168) and the Major Science and Technology Projects of the Ministry of Water Resources (SKS-2022102).

Conflicts of Interest

The authors declare no conflict of interest.

References

Ren, Q.; Li, M.; Shen, Y.; Zhang, Y.; Bai, S. A Pixel-Level morphological segmentation and feature quantification method for hydraulic concrete cracks. J. Hydropower 2021, 40, 234–246. [Google Scholar]
Salazar, F.; Vicente, D.J.; Irazábal, J.; De-Pouplana, I.; Mauro, J.S. A review on thermo-mechanical modelling of arch dams during construction and operation: Effect of the reference temperature on the stress field. Arch. Comput. Methods Eng. 2020, 27, 1681–1707. [Google Scholar] [CrossRef]
Fan, X.; Cao, P.; Shi, P.; Chen, X.; Zhou, X.; Gong, Q. An underwater dam crack image segmentation method based on multi-level adversarial transfer learning. Neurocomputing 2022, 505, 19–29. [Google Scholar] [CrossRef]
Huang, Y.; Zhang, X.; Xu, Z.; Huang, J.; Xu, X. Image stitching of underwater dam cracks based on connected domain a priori. Chin. Body Vis. Image Anal. 2020, 25, 408–418. [Google Scholar] [CrossRef]
Xu, B.; Xia, H. A review of research on concrete dam cracking morphology and its hazard analysis methods. J. Water Resour. Water Eng. 2016, 27, 162–168. [Google Scholar]
Deng, J. Census and treatment process of concrete dam cracks in a hydropower station in Lichuan, Hubei. China Constr. Waterproofing 2021, 03, 52–55. [Google Scholar] [CrossRef]
Huang, C.; Yang, X.; Xia, J. Inspection and treatment of water cracks on the upstream face of Danjiangkou initial project dam. People’s Yangtze River 2015, 46, 45–48+74. [Google Scholar] [CrossRef]
Qu, C.; Wang, C. Concrete pavement crack detection based on attention mechanism and lightweight cavity convolution. Comput. Sci. 2023, 50, 231–236. [Google Scholar]
Wang, Y.; Gao, R.; Liu, W. Design of concrete crack displacement sensor based on optical fiber bending loss. Sens. Microsyst. 2023, 42, 87–90. [Google Scholar] [CrossRef]
Liu, Z.; Wang, Y.; Sun, F.; Jia, X.; Nie, F. An integration-assisted multi-objective optimization algorithm combining feature perturbation and allocation strategies. Comput. Eng. 2022, 48, 115–123. [Google Scholar] [CrossRef]
Bhowmik, B.; Krishnan, M.; Hazra, B.; Pakrashi, V. Real-time unified single-and multi-channel structural damage detection using recursive singular spectrum analysis. Struct. Health Monit. 2019, 18, 563–589. [Google Scholar]
Lin, Y.; Shen, Q. Building deformation monitoring based on singular spectrum analysis. Mapp. Stand. 2021, 37, 33–37. [Google Scholar]
Liu, H.; Hu, J.; Yuan, D.; Jun, C.; Xing, W. Research on seismic coherence attribute calculation method based on singular value spectrum analysis[C]//Professional Committee of Petroleum Physical Exploration of China Petroleum Society, Exploration Geophysics Committee of China Geophysical Society. In Proceedings of the 2022 China Petroleum Physical Exploration Annual Academic Conference, Haikou China, 27–29 September 2022. [Google Scholar]
Tian, G.; Liu, J.; Yang, W. A dual neural network for object detection in UAV images. Neurocomputing 2021, 443, 292–301. [Google Scholar] [CrossRef]
Wei, S.; Liu, Y.; Liu, J. Research on the application of concrete structure surface crack detection based on UAV and digital image method. Spec. Struct. 2020, 37, 107–111. [Google Scholar] [CrossRef]
Nguyen, A.; Gharehbaghi, V.; Le, N.T.; Sterling, L.; Chaudhry, U.I.; Crawford, S. ASR crack identification in bridges using deep learning and texture analysis. Structures 2023, 50, 494–507. [Google Scholar]
Zawad, M.; Shahriar, R.; Zawad, M.; Shahriar, F.; Rahman, M.; Priyom, S.N. A comparative review of image processing based crack detection techniques on civil engineering structures. J. Soft Comput. Civ. Eng. 2021, 5, 58–74. [Google Scholar]
SHanzaei, H.; Afshar, A.; Barazandeh, F. Automatic detection and classification of the ceramic tiles’ surface defects. Pattern Recognit. 2017, 66, 174–189. [Google Scholar] [CrossRef]
Premachandra, C.; Waruna, H.; Premachandra, H.; Parape, C. Image based automatic road surface crack detection for achieving smooth driving on deformed roads. In Proceedings of the 2013 IEEE International Conference on Systems, Man, and Cybernetics, Manchester, UK, 13–16 October 2013; pp. 4018–4023. [Google Scholar]
Tang, Q.; Tan, Y.; Peng, L.; Cao, H. Research on the identification method of tunnel lining cracks based on digital image technology. J. Railw. Sci. Eng. 2019, 16, 3041–3049. [Google Scholar] [CrossRef]
Gupta, P.; Dixit, M. Image-based crack detection approaches: A comprehensive survey. Multimed. Tools Appl. 2022, 81, 40181–40229. [Google Scholar] [CrossRef]
Talab, A.M.A.; Huang, Z.; Xi, F.; Ming, L.H. Detection crack in image using Otsu method and multiple filtering in image processing techniques. Optik 2016, 127, 1030–1033. [Google Scholar] [CrossRef]
Zhou, Y.; Liu, T. Computer vision-based concrete crack identification. J. Tongji Univ. (Nat. Sci.) 2019, 47, 1277–1285. [Google Scholar]
Cho, H.; Yoon, H.; Jung, J. Image-based crack detection using crack width transform (CWT) algorithm. IEEE Access 2018, 6, 60100–60114. [Google Scholar] [CrossRef]
Yang, J.; Yang, D. A review of semantic segmentation based on deep learning. Chang. Inf. Commun. 2022, 35, 69–72. [Google Scholar] [CrossRef]
Zhang, S.W.; Bao, T.F.; Gao, X.H. A computer vision-based crack detection method for concrete dams. Adv. Water Resour. Hydropower Sci. Technol. 2021, 41, 83–88. [Google Scholar]
Li, L.; Zhang, X.; Lian, J.; Zhou, Y.; Zheng, W. Semantic SLAM algorithm combined with road structural features. J. Harbin Inst. Technol. 2021, 53, 175–183. [Google Scholar]
Zhang, X.; Yao, Q.; Zhao, J.; Jin, Z.; Feng, Y. A review of semantic segmentation methods for fully convolutional neural network images. Comput. Eng. Appl. 2022, 58, 45–57. [Google Scholar]
Jiang, T.; Tang, Y.; Lu, C.; Shen, G. Research on concrete crack image recognition method based on semantic segmentation. Eng. Surv. 2023, 51, 42–47. [Google Scholar]
Wang, J.; Yuan, H.; Rui, T.; Zhao, Q.; Yin, C. A semantic segmentation-based crack identification method for concrete structures. Build. Struct. 2022, 52, 923–929. [Google Scholar] [CrossRef]
Dorafshan, S.; Thomas, R.J.; Maguire, M. SDNET2018: An annotated image dataset for non-contact concrete crack detection using deep convolutional neural networks. Data Br. 2018, 21, 1664–1668. [Google Scholar] [CrossRef]
Yang, F.; Zhang, L.; Yu, S.; Prokhorov, D.; Mei, X.; Ling, H. Feature pyramid and hierarchical boosting network for pavement crack detection. IEEE Trans. Intell. Transp. Syst. 2019, 21, 1525–1535. [Google Scholar] [CrossRef]
Li, L.; Ma, W.; Li, L.; Lu, C. Research on detection algorithm for bridge cracks based on deep learning. Acta Autom. Sin. 2019, 45, 1727–1742. [Google Scholar]
Tang, Y.; Huang, Z.; Chen, Z.; Chen, M.; Zhou, H.; Zhang, H.; Sun, J. Novel visual crack width measurement based on backbone double-scale features for improved detection automation. Eng. Struct. 2023, 274, 115158. [Google Scholar]
Huang, S.; Bao, T.; Li, Y.; Niu, H. Semantic segmentation method of hydraulic concrete cracks based on improved DeeplabV3+ network. Adv. Hydropower Sci. Technol. 2023, 43, 81–86. [Google Scholar]
Chen, B.; Zhang, H.; Li, Y.; Wang, S.; Zhou, H.; Lin, H. Quantify pixel-level detection of dam surface crack using deep learning. Meas. Sci. Technol. 2022, 33, 065402. [Google Scholar] [CrossRef]
Wang, Z.; Zhang, Q.; Fang, D.; Wang, X. Research on dam crack detection method based on deep learning. Water Resour. Plan. Des. 2022, 1, 90–94. [Google Scholar]
Cheng, H.; Li, Y.; Li, H.; Hu, H. Embankment crack detection in UAV images based on efficient channel attention U2Net. Structures 2023, 50, 430–443. [Google Scholar]
Cui, L.; Qi, Z.; Chen, Z.; Meng, F.; Shi, Y. Pavement distress detection using random decision forests. In Data Science, Proceedings of the International Conference on Data Science, Sydney, Australia, 8–9 August 2015; Springer: Cham, Switzerland, 2015; pp. 95–102. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, Z. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In IEEE Transactions on Pattern Analysis & Machine Intelligence; IEEE: Piscataway, NJ, USA, 2017; Volume 39, pp. 1137–1149. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Huang, B.; Kang, F.; Tang, Y. Real-time detection method of concrete dam cracks based on target detection. J. Tsinghua Univ. (Nat. Sci. Ed.) 2023, 1–9. [Google Scholar] [CrossRef]
Du, M.; Yang, G.; Zhang, H. Research on bridge crack detection method based on YOLOv4-EfficientNet B7. J. Tianjin Urban Constr. Univ. 2023, 29, 55–61. [Google Scholar] [CrossRef]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Computer Vision Foundation, Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and efficient design for semantic segmentation with transformers. Adv. Neural Inf. Process. Syst. 2021, 34, 12077–12090. [Google Scholar]
Yang, J.; Li, H.; Zou, J.; Jiang, S.; Li, R.; Liu, X. Concrete crack segmentation based on UAV-enabled edge computing. Neurocomputing 2022, 485, 233–241. [Google Scholar] [CrossRef]
Liu, Z.; Li, X.; Li, J.; Teng, S. A new approach to automatically calibrate and detect building cracks. Buildings 2022, 12, 1081. [Google Scholar] [CrossRef]
Rill-García, R.; Dokladalova, E.; Dokládal, P. Pixel-accurate road crack detection in presence of inaccurate annotations. Neurocomputing 2022, 480, 1–13. [Google Scholar] [CrossRef]
Liu, S. Research on concrete crack segmentation algorithm based on Swin-Unet. Henan Sci. Technol. 2022, 41, 13–17. [Google Scholar] [CrossRef]
Zhong, H.; Lv, Y.; Yuan, R.; Yang, D. Bearing fault diagnosis using transfer learning and self-attention ensemble lightweight convolutional neural network. Neurocomputing 2022, 501, 765–777. [Google Scholar] [CrossRef]
Zhang, H.; Feng, L.; Hao, Y.; Wang, Y. Dynastic recognition of ancient wall paintings based on attention mechanism and migration learning. Comput. Appl. 2023, 1–9. [Google Scholar]
Li, Y.; Cheng, H.; Li, H.; Wang, J.; Hu, Q. UAV image based on improved U~2-net with migration learning for embankment crack detection. Adv. Water Resour. Hydropower Sci. Technol. 2022, 42, 52–59. [Google Scholar]
Yu, X.; Wang, J.; Hong, Q.Q.; Teku, R.; Wang, S.H.; Zhang, Y.D. Transfer learning for medical images analyses: A survey. Neurocomputing 2022, 489, 230–254. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Dung, C.V.; Anh, L.D. Autonomous concrete crack detection using deep fully convolutional neural network. Autom. Constr. 2019, 99, 52–58. [Google Scholar] [CrossRef]
Kang, L.; Ye, P.; Li, Y.; Doermann, D. Convolutional neural networks for no-reference image quality assessment. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 1733–1740. [Google Scholar]
Zhao, Y.; Hu, H.S.; Tang, M.; Chen, H. Research on crack detection method of concrete bridge based on improved coding-decoding network. Guangzhou Archit. 2022, 50, 1–7. [Google Scholar]
Xu, G.; Liao, C.; Chen, J.; Dong, B.; Zhou, Y. Extraction of concrete apparent crack information based on HU-ResNet. Comput. Eng. 2020, 46, 279–285. [Google Scholar] [CrossRef]
Zhao, X.B.; Wang, J.J. Bridge crack detection based on improved DeeplabV3+ and migration learning. Comput. Eng. Appl. 2022, 1–10. [Google Scholar]
Yu, O.; Jing, P.; Zhang, W.; Xie, S.F.; Sli, Z.H.; Song, C. An improved U-Net model for road crack detection based on residuals and attention mechanism. Comput. Eng. 2022, 1–14. [Google Scholar] [CrossRef]
Ge, W.; Yu, Y. Borrowing treasures from the wealthy: Deep transfer learning through selective joint fine-tuning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, HI, USA, 21–26 July 2017; pp. 1086–1095. [Google Scholar]
Petro, A.B.; Sbert, C.; Morel, J.M. Multiscale retinex, Image Process. Line 2014, 71–88. [Google Scholar] [CrossRef]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Computer Vision Foundation, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Hou, Q.; Zhou, D.; Feng, J. Coordinate attention for efficient mobile network design. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13713–13722. [Google Scholar]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
Sun, X. Design of Shellfish Size Detection System Based on Deep Learning; Jiangnan University: Wuxi, China, 2022. [Google Scholar]

Figure 1. Comparison of crack features. The first row shows images of cracks in the open dataset [31,39], and the second row shows images of cracks in concrete dam surfaces.

Figure 2. Model framework for this paper.

Figure 3. Residual module. (A) When the number of input channels matches the number of output channels, and (B) when the number of input channels does not match the number of output channels.

Figure 4. Multilayer parallel residual attention.

Figure 5. Two-stage transfer learning training process.

Figure 6. Images from DatasetA. From left to right, selected CrackForest, SDNET2018, Aft Original Crack Dataset. Second, a realistic collection of road cracks.

Figure 7. Crack segmentation results for different models. From left to right, original, ground truth, UNet, UNet + ResNet50, UNet + ResNet50 + MPR, our method. (A–E) denote the segmentation of different dam cracks in different models. The red box reflects a sharper contrast.

Figure 8. Variation curves of ablation experiment metrics. Figure 8 shows the metric values for each generation of weights obtained from training, from left to right, mIoU, mPA, and loss. The red circle in the figure shows the last 10 generations of the model metrics.

Figure 9. The results of this paper’s method for segmenting cracks in actual engineering dams. (a,b) are dam crack images without annotation training, (c,d) are dam crack images with different environmental backgrounds from DatasetB.

Table 1. ResNet50 network structure.

Layer Name	Operation	Output Size	Multiplier
Layer0	Input	512² × 3	×1
	Conv7 × 7	256² × 64
	Maxpool	128² × 64
Layer1	Conv1 × 1	128² × 64	×3
	Conv3 × 3	128² × 64
	Conv1 × 1	128² × 256
Layer2	Conv1 × 1	64² × 128	×4
	Conv3 × 3	64² × 128
	Conv1 × 1	64² × 512
Layer3	Conv1 × 1	32² × 256	×6
	Conv3 × 3	32² × 256
	Conv1 × 1	32² × 1024
Layer4	Conv1 × 1	16² × 512	×3
	Conv3 × 3	16² × 512
	Conv1 × 1	16² × 2048

Table 2. MPR main codes.

class MPR (nn.Module):

def __init__ (self, ch_in, ch_out):

Super (FResidual, self).__init__ ()

# first

self.conv = nn.Sequential (

nn.Conv2d (ch_in, ch_out, kernel_size = 3, stride = 1, padding = 1, bias = True),

nn.BatchNorm2d (ch_out),

nn.ReLU (inplace = True)

)

# second

self.skip = nn.Sequential (

nn.Conv2d (ch_in, ch_out, kernel_size = 1, stride = 1, padding = 0),

nn.BatchNorm2d (ch_out),

)

# Third

self.aspp_block1 = nn.Sequential (

nn.Conv2d (ch_in, ch_out, 3, stride = 1, padding = 3, dilation = 3),

nn.ReLU (inplace = True),

nn.BatchNorm2d (ch_out),

)

# Fourth

self.conv2 = nn.Conv2d (ch_in, ch_out, kernel_size = 1, stride = 1, padding = 0)

Table 3. Model training parameters.

Parameters	Configuration
Input image size	512 × 512
Batch_Size	8
Epoch	300
Optimiser	Adam
Momentum	0.9
Initial learning rate	0.0001
Initial learning rate decay type	CosineAnnealingLR
Loss function	Cross Entropy Loss

Table 4. Comparison of the ablation experiment results.

Model	ResNet50	MPR	Transfer Learning	mIoU/%	mPA/%	Parameters/M	FLOPs/G	FPS
UNet				83.7	89.5	34.5	512.6	24.5
A	√			84.9	90.8	43.9	184.1	55.9
B		√		84.8	91.1	41.5	694.1	17.9
C			√	85.4	91.1	34.5	512.6	24.5
D	√	√		85.4	91.2	71.6	324.0	36.5
E		√	√	87.1	91.7	41.5	694.1	17.9
F	√		√	87.3	91.5	43.9	184.1	55.9
Our	√	√	√	88.3	92.7	71.6	324.0	36.5

Note:“√” indicates an improved strategy added to the original UNet model.

Table 5. Results of experiments with different added attention mechanisms (IoU and PA only for the crack class).

Model	IoU/%	mIoU/%	PA/%	mPA/%	FPS
UNet	68.0	83.7	79.2	89.5	24.5
UNet + SE	68.4	83.9	81.1	90.4	24.4
UNet + CBAM	68.9	84.2	81	90.4	19.1
UNet + CA	68.0	83.7	79.6	89.7	21.5
UNet + ECA	67.4	83.4	77.9	88.8	24.3
UNet + MPR	70.3	84.8	82.5	91.1	17.9

Table 6. Comparison results of different transfer learning approaches. (IoU and PA only for the crack class).

Transfer Learning Approach	Source Domain Datasets	Source Domain Model		Target Domain Model
Transfer Learning Approach	Source Domain Datasets	mIoU	mPA	IoU	mIoU	PA	mPA
/	/	/	/	71.3	85.4	82.7	91.2
Cross-domain training	PASCAL VOC 2012	69.5	75.2	72.6	86.0	83.3	91.5
Intradomain training	DatasetA	88.7	93.5	73.2	86.4	84.2	91.9
Two-stage training	PASCAL VOC 2012 +DatasetA	90.2	94.4	76.9	88.3	85.6	92.7

Table 7. Comparison of evaluation indicators of different models.

Model	IoU/%		mIoU/%	PA/%		mPA/%
Model	Crack	Background	mIoU/%	Crack	Background	mPA/%
UNet	68.0	99.4	83.7	79.2	99.8	89.5
PSPNet	52.3	99.2	75.7	62.9	99.7	81.3
DeepLabV3+	44.1	99.1	71.6	50.2	99.8	75.0
SegFormer	64.5	99.4	81.9	77.6	99.7	88.7
Our method	76.9	99.6	88.3	85.6	99.8	92.7

Table 8. Statistics on the predicted crack area of the model in this paper.

Picture		A	B	C	D	E
Pixel value	Ground True	78,556	57,768	38,576	45,377	33,107
Pixel value	Our prediction	67,326	52,577	33,304	42,627	32,426
Precision/%		85.7	86.3	97.9	93.9	91.0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, J.; Lu, X.; Zhang, P.; Li, Q. Intelligent Detection Method for Concrete Dam Surface Cracks Based on Two-Stage Transfer Learning. Water 2023, 15, 2082. https://0-doi-org.brum.beds.ac.uk/10.3390/w15112082

AMA Style

Li J, Lu X, Zhang P, Li Q. Intelligent Detection Method for Concrete Dam Surface Cracks Based on Two-Stage Transfer Learning. Water. 2023; 15(11):2082. https://0-doi-org.brum.beds.ac.uk/10.3390/w15112082

Chicago/Turabian Style

Li, Jianyuan, Xiaochun Lu, Ping Zhang, and Qingquan Li. 2023. "Intelligent Detection Method for Concrete Dam Surface Cracks Based on Two-Stage Transfer Learning" Water 15, no. 11: 2082. https://0-doi-org.brum.beds.ac.uk/10.3390/w15112082

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intelligent Detection Method for Concrete Dam Surface Cracks Based on Two-Stage Transfer Learning

Abstract

1. Introduction

2. Related Work

2.1. Deep Learning-Based Crack Detection

2.2. Transfer Learning-Based Crack Detection

2.3. Crack Detection Backbone Network

2.4. Crack Detection Based on Attention Mechanisms

3. Proposed Approach

3.1. Model Framework

3.1.1. ResNet50 Backbone Network

3.1.2. Multilayer Parallel Residual Attention

3.2. Two-Stage Transfer Learning Strategy

4. Experiments and Evaluation Indicators

4.1. Experimental Setup

4.2. Datasets

4.3. Evaluation Indicators

5. Results and Discussion

5.1. Ablation Experiments

5.2. Comparative Trials of Different Attention Mechanisms

5.3. Training Trials with Different Transfer Learning Approaches

5.4. Crack Segmentation Image Comparison

5.5. Comparison of Typical Models

5.6. Discussion

5.6.1. The Effectiveness of Transfer Learning

5.6.2. Crack Area Pixel Value Prediction

5.6.3. Practical Applications

5.6.4. Future Work

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI