Design and Optimization of CNN Architecture to Identify the Types of Damage Imagery

Fan, Ching-Lung; Chung, Yu-Jen

doi:10.3390/math10193483

Open AccessArticle

Design and Optimization of CNN Architecture to Identify the Types of Damage Imagery

by

Ching-Lung Fan

^1,*

and

Yu-Jen Chung

²

¹

Department of Civil Engineering, The Republic of China Military Academy, Kaohsiung 830, Taiwan

²

Department of Marine Science, The Republic of China Naval Academy, Kaohsiung 813, Taiwan

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(19), 3483; https://0-doi-org.brum.beds.ac.uk/10.3390/math10193483

Submission received: 18 August 2022 / Revised: 13 September 2022 / Accepted: 20 September 2022 / Published: 23 September 2022

(This article belongs to the Section Computational and Applied Mathematics)

Download

Browse Figures

Versions Notes

Abstract

:

Damage to the surface construction of reinforced concrete (RC) will impact the security of the facility’s structure. Deep learning can effectively identify various types of damage, which is useful for taking protective measures to avoid further deterioration of the structure. Based on deep learning, the multi-convolutional neural network (MCNN) has the potential for identifying multiple RC damage images. The MCNN6 of this study was evaluated by indicators (accuracy, loss, and efficiency), and the optimized architecture was confirmed. The results show that the identification performance for “crack and rebar exposure” (Type B) by MCNN6 is the best, with an accuracy of 96.81% and a loss of 0.07. The accuracy of the other five types of damage combinations is also higher than 80.0%, and the loss is less than 0.44. Finally, the MCNN6 model can be used in the detection of various damage to achieve automated assessment for RC facility surface conditions.

Keywords:

convolutional neural network; damage images; reinforced concrete; image recognition

MSC:

68T07

1. Introduction

Damage to the surface of reinforced concrete (RC) structures includes cracks, efflorescence, flakes, spalling, exposed steel bars, and holes. This damage impacts the security of facility structures. Severe cracks can result in breakdown of the structures; efflorescence can result in corrosion of the rebars, and the rebars can lose their concrete protective layer by the spalling. In addition, having bare steel bars can accelerate the deterioration of the structures. If the above breakdowns of the structures are not addressed immediately, they can cause serious safety problems by spreading defects in the life cycle of the infrastructure of the structure [1]. The deterioration of RC can be caused by multiple factors, the most common of which is corrosion caused by chloride invasion or concrete carbonization [2]. Common types of damage to RC include efflorescence, cracking, spalling, and rebar exposure. Efflorescence is caused by salt and carbon dioxide from water interacting with other atmospheric gases after the water moves inside the concrete and later evaporates from its surface. This process frequently occurs in buildings with concrete or cement mortar. Cracks can be defined as the result of accidental interruption and local material faults in structural materials; small cracks result in insufficient serviceability, and large cracks result in structural failure [3]. A crack is a critical indicator in evaluating the conditions of current buildings and infrastructure facilities [4]. Spalling refers to concrete surface peeling without exposing the rebars [5]. Such problems not only ruin the appearance but also influence the structural safety and usage quality of infrastructure facilities.

This typical damage affects the durability and maintainability of concrete [6]. With early evaluation and detection, security measures can be used to prevent damage and breakdowns [7]. Personnel and devices should be invested in performing damage identification of the RC as soon as possible to avoid damage to the RC, which would bring about infrastructure and structural consequences. However, traditional identification methods are based on visual inspection by inspectors, and the efficiency and reliability of the identification of this damage are often doubted. Furthermore, it is common for human visual inspections to be restricted by inspector training [8]. The professionalism and experience of personnel will directly affect the accuracy of the identification and produce inconsistent inspection results for different types of damage. Jang et al. [9] believe that visual inspection is usually time consuming, laborious, and unreliable and is sometimes unsuitable for areas where the target structure cannot be accessed. In terms of costs, security, accuracy, and reliability, human visual inspection is usually considered inefficient [10]. Therefore, for the efficiency and objectivity of the damage assessment, it is necessary to develop an identification method for optimizing the determination of various types of RC damage.

The capability of intelligent systems to learn and improve through experience obtained from large data is known as machine learning (ML) [11]. ML is developing most rapidly in the field of predictive data, where techniques can be learned about a task and some performance measures by using mass data. Since ML can operate efficiently on large data sets and provide a measure of feature importance for each class, a pattern is learned or established from the training data, and new instances are inferred from this pattern. Therefore, ML has been widely applied in civil and structural engineering, where large amounts of computation are required. For example, Zhou et al. [12] utilized ANN and an adaptive neuro-fuzzy inference system to predict the shear strength of grouted reinforced concrete (RC) block masonry walls. Chang et al. [13] proposed an artificial intelligence (AI)-based structural health monitoring strategy based on neural network modeling to calculate the damage to structures. Zhang et al. [14] adopted ML to calculate the steel weight loss distribution from RC beams, which can then be applied to predict flexural load. Mangalathu et al. [15] used eight ML methods to identify the mode of seismic failure in RC shear walls. Eleven ML models were applied to predict the shear capacity of steel fiber RC beams [16]. These ML methods are proven and enhanced for the discrimination of damage characteristics of RC structures.

Computer vision technology has noncontact, long-distance, fast, low cost, and low interference for the daily operation of facility structures [17]. Thus, some researchers have recently developed algorithms based on computer vision to detect or identify different types of RC damage [18]. Another machine learning method based on computer vision has broken the limits of human vision and can effectively warn of the structural capabilities and durability of facilities. Compared with human vision, machine learning can conquer the subjective aspects and the deviations in sight checking of the facility surfaces, and it provides accurate recognition results. The principles of machine learning are different. Famous methods for machine learning are used for image classification and the modeling of public buildings, such as support vector machines [19,20,21], random forests [22], artificial neural networks [23], principal component analysis [24], clustering analysis [25,26] self-organizing maps [27], and deep learning [28,29].

To actually analyze the spectrum reflection and image features, some factors must be considered in image recognition and detection, such as the object’s material, texture, number, range, shape, size, and color. Color and texture have always been used as classification indicators because they describe the surfaces of objects adequately [30]. In particular, feature extraction is the process of determining the unique features of images, and it is a crucial part of using image processing for object recognition [10]. In addition, these methods take the features of the images and then evaluate whether there is damage, and it is inevitably influenced by manual labeling [6]. Most supervised machine learning relies on manually labeling features, which is usually time consuming and inefficient. Because the selection of manual features is subjective, it might not contain sufficient information to make the network operate normally, and it is thus helpful to adopt deep learning techniques [31]. Compared with traditional methods, deep learning models allow computers to learn and extract features automatically. These features are essential for supervised learning methods, and deep learning methods do not require manual feature creation [17]. In addition, considering the complexity of various types of damage in a real environment and the inevitable background noise, the detection method based on deep learning has conquered the wide variety that occurs in reality, such as sunlight and shadow, which leads to the limitations of poor adaptability [6]. A convolutional neural network (CNN), as one of the classic algorithms of deep learning, is a feedforward neural network that has deep structures and convolution computing power, and its accuracy in image identification is outstanding [32]. One of the main adventures of the CNN is to eliminate the dependence on prior knowledge and the manual labeling of the images, making the extraction of features more efficient. The CNN does not need experts to determine any feature, which is different from the previous methods, to learn to identify from the training data set [33]. Therefore, the CNN can detect and classify the images precisely regardless of their sizes, positions, and directions [34]. Due to the combination of computer vision and deep learning methods, the deterioration of infrastructure can be precisely evaluated [35]. Recently, a method based on CNN was developed to detect cracks [36,37], spalling [38], rebar exposure [39], welded connections [40], and corrosion [33]. The deep learning methods showed outstanding performance when only a certain type of damage was detected. However, in reality, many kinds of RC damage with different shapes, sizes, and colors can exist simultaneously on the surface of buildings or infrastructure. These damage features are also thorny problems that image recognition or detection must solve.

Spencer Jr et al. [41] believe that every single civil engineering structure is unique, and it is more challenging for deep learning technology to identify damage due to the different colors of the surfaces. In addition, the colors and textures of some of the building surfaces are similar to those of damaged features. For example, concrete’s color is light gray, and spalling is mostly dark gray. The two are easily confused, which results in a high probability of errors in recognition and classification. How to improve the detection accuracy is one of the problems that should be solved first in classification techniques. If a reasonable framework is established and the design level is improved, the recognition accuracy will be significantly improved [42]. Several CNN architectures have been used in the literature, but their identification accuracy depends on the network architecture configuration and the hyperparameter setting. In view of these concerns, this research conducted tests from various RC damage images to actually understand the impact of the CNN architecture and parameters on image recognition. Li et al. [6] believe that there are more types of concrete damage images under various conditions, and they conducted performance comparison studies to improve the robustness of the proposed method. To obtain higher detection performance for multiple types of damage, this study uses four common types of RC damage: cracks, efflorescence, rebar exposure, and spalling. A recognition model was designed based on deep learning and conducted training and testing on multiple types of damage images to achieve automated RC facility status evaluation.

When used to identify RC damage, convolutional neural networks (CNNs) require objective and consistent criteria and can enhance the efficiency of the damage evaluation process through automation. However, the ability of CNNs to extract features from image data must be further optimized, and more accurate methods for evaluating the performance of CNNs must be developed. This study constructed a deep learning model by training and testing a CNN on four classes of images of RC damage to automate the damage identification process while ensuring consistent results. With further refinement of the CNN design and model hyperparameters, the CNN will become more suitable for image processing and damage identification applications. The accuracy of a model is the primary metric used to determine whether that model can facilitate maintenance efforts. If a CNN can only identify one damage class, it will not be useful in a complex environment; therefore, an optimized CNN that can identify multiple classes of damage must be developed. Through continual evaluation of the accuracy and generalization performance of the CNN proposed herein, a damage identification model that can help engineers and other maintenance personnel inspect buildings efficiently and economically can be established. The contributions of this study are summarized as follows.

(1): The proposed model can accurately and consistently identify damage under various realistic conditions and exhibit good generalization performance.
(2): The effects of model hyperparameters on the performance of the CNN were evaluated, and the results were used to refine the proposed deep learning model. Ultimately, the CNN could recognize four classes of damage simultaneously.
(3): Original images of damage to real buildings were used to optimize the CNN, and additional metrics were used to evaluate the model performance.

The remainder of this study is organized as follows. Section 2 describes the details of the CNN architecture and implements details of the proposed model. Section 3 provides the analysis of the experimental results obtained, which indicates the advantages of the proposed model. Section 4 provides the conclusions.

2. Methodology

An artificial neural network (ANN) is a network architecture composed of an input layer, hidden layer, and output layer. The input layer accepts information from the external environment and can convert input data into a signal that adapts to the network according to the characteristics of the problem; the output layer processes the information output to the external environment and uses a nonlinear function to convert the input data into an output signal. The calculation probability is used as the basis for prediction. The hidden layer is between the input and output layers as the connection structure to solve nonlinear problems. Each node in the hidden layer and the output layer is represented as a weight combination of all nodes (including bias) in the previous layer, and this combination is further passed to the activation function to model nonlinearity [31]. However, because the ANN considers all the pixels independently, it does not consider the location in the spaces between every single pixel clearly, and the ANN thus cannot extract high-level features [43]. Otherwise, deep learning methods can provide a higher level of performance for the features [44].

The main advantage of the CNN based on deep learning is that the image is processed as a multidimensional input instead of as a single vector, the spatial contexts of the image pixels are explicitly considered [45], and damage-sensitive features are automatically extracted during the training process [21]. Additionally, it reduces the neural network’s training parameter by partial connection and weight sharing [32]. The above characteristics allow CNNs to learn to identify images automatically, and they have been widely used in applications such as image recognition, classification, detection, localization, and segmentation. The CNN is composed of a series of convolutional layers, activation functions, pooling layers, and fully connected layers, and finally, it obtains the output prediction results. The CNN is a typical deep neural network with multiple convolutional layers for achieving automatic feature extraction, and it has been successfully applied to image recognition [46]. Since the CNN can keep the features in the image unchanged when performing image recognition and reduce the dimensionality and calculation time, the extracted image features are input into the fully connected layer, and finally, the result of the classification or recognition is output. CNNs usually have one or more convolutional layers and pooling layers bound to weights to extract features that are tolerant to scaling, translation, and rotation, as well as the fully connected layer associated with these features, and they can classify images or objects [40]. The architecture, functions, and parameters of the conventional CNN are explained as follows:

2.1. Convolutional Layer

The principle of the convolution is to slide across the input image from the left to the right and from the top to the bottom using a specific size window. The sliding stride usually uses the same distances in the height and width directions. The sliding window in the CNN is called the filter (or kernel), and the area slipped by the filter is called the receptive field. The matrix and the pixel values of the images multiply when the convolutional layer passes the filter (also called the dot product); then, the multiplied values are added, and a deviation value is added, as in Equation (1). When the added value of the dot product is larger, it means that the shape of the input image is similar to the shape of the filter. Every position of the input image has a value; when the filter is slipped over all of the positions, a feature map is generated (Figure 1). There might be some feature maps in the convolutional layer for learning different feature functions. Filters can be regarded as feature identifiers, which are used to extract various features from the image. The more filters there are, the deeper the feature maps are, and it is easier for us to identify the images. Therefore, the purpose of the CNN’s convolution layer is to retain the spatial arrangement of the images and to obtain the local images to be the input features. The filter can acquire each local feature from the images to be the input of the next layer. During the training process of the convolution layer, all filters of the shapes in the images according to the input patterns are determined. In this way, the CNN retains the total calculation of each local area, and the convolutional layers concentrate on obtaining the local features of the images.

y = \sum (x_{i j} \times f_{i j}) + b

(1)

In addition to determining the size of the filter, strides and paddings must be set up. The distance that the filter moves is the stride, and the convolutional layer is the method for the CNN to extract the features of the input images by moving one unit each time to convolute the input data. When the unit of the stride movement is larger, the overlap rate of the receptive field decreases, which can reduce the calculation cost and could also lose the features of the input data. In addition, padding sets the pixel value to zero at the four boundaries of the input image, which allows the input and output image sizes to maintain the same size, and the output size is shown in Equation (2). O, M, and F are the size of the output image, the input image, and the filter size, respectively; P and S are the padding and the stride, respectively.

O = \frac{M - F + 2 P}{S} + 1

(2)

2.2. Activation Layer

To introduce nonlinear features into the model after linear calculations (including elementwise products and summation) in the convolutional layer, a nonlinear activation function is used after each convolutional layer. The sigmoid, tanh, and rectified linear unit (ReLU) are the typical activation functions used by the CNN; among them, the sigmoid is a classic neural network activation function. In gradient descent, the gradient disappears easily, which leads to the termination of the gradient propagation and the slowdown of the calculation speed. Therefore, ReLU is introduced as a nonlinear activation function because it can increase the training speed and reduce the vanishing gradient without significant changes in accuracy. Since the gradients of ReLU are 0 for negative input and 1 for positive input, the calculation speed when using the sigmoid function is much faster [47]. Equation (3) shows the ReLU function, where x and y are the input and output of the ReLU activation layer, respectively. The gradients of ReLU are always 0 and 1. A value less than 0 is output as 0, and a value greater than 0 is output directly. This result is the so-called feature map. Each point on the feature map can be regarded as a region feature in the original image.

\begin{array}{l} Re LU = \max (0, x) \\ \frac{d y}{d x} = \{\begin{matrix} 1, \begin{matrix} x > 0 \end{matrix} \\ 0, \begin{matrix} x \leq 0 \end{matrix} \end{matrix} \end{array}

(3)

2.3. Pooling Layer

The pooling layer, also called a subsampling layer, performs subsampling along the spatial dimension of the feature map through a predefined function (such as maximum value, average value, etc.) on a local area [48]. Pooling layers can significantly reduce the spatial dimension of the input images (change the length and width, and keep the depth unchanged). The purpose of this layer is not only to retain the critical features of the feature map but also to reduce the computational cost and control overfitting. The step of reducing the input size through a pooling layer to compress images and retain important information has the advantage of speeding up the efficiency of the model’s operation and reducing the differences in the image feature positions. The information is more concentrated on whether there are matched characteristics in the images after pooling instead of those in any picture position. Once some specific features are verified in the original input, their relative positions are more significant than their absolute positions. Therefore, it can still easily be identified when the image is shifted. The two major methods for using a pooling layer are average pooling and maximum pooling; the former calculates the average of the feature map, while the latter extracts the maximum value. Similar to the convolutional layer, the strides of the pooling layer determine how many indices must be moved to conduct the next pooling operation on the new subarray [33]. Figure 2 is the average pooling with a 2 × 2 filter and stride of 1. The average value of its receptive field is used as a subsampling operation.

2.4. Dropout Layer

When a model and a training sample are too matched, which makes the model unable to produce good results by verification and test data, then overfitting occurs. Overfitting causes difficulty for the machine learning algorithm, and the dropout process is used to solve this problem in deep learning [49]. The concept of dropout is to randomly discard part of the parameters during the model training process and stop specific feature detection work; in other words, it prevents neurons from being activated with a certain probability to ensure that the neural network does not match the training samples too much. The main operation is to randomly disconnect the hidden layer with a certain probability, reducing the dependence on some local features, and preventing the problem of model overfitting. Note that dropout can only be used in training, not in testing.

2.5. Data Augmentation Method

Data augmentation is a method for enhancing the performance of the CNN model; it can decrease the loss due to a small database and reduce the possibility of overfitting [50]. Because the CNN model must use a large number of training samples, some small varieties can be added to the training data through data augmentation to change the array of the training data and keep the labels constant, thus generating more training images without adding new original images [40]. Data augmentation methods include flip, horizontal flip, vertical flip, random crop, grayscale change, shift transform, scale transform, contrast transform, color jittering, rotation, and many other methods.

2.6. Fully Connected Layer

A fully connected layer with specific weights observes the features that most fit in with the classification results, calculates weights from the dot product between the current and former layers, and obtains the correct probability of different classifications. Each feature map of the convolution layer is a feature of an input signal; the more layers that are represented, the more abstracts of the features. To integrate every convolution layer feature, a fully connected layer is used to combine these features, and the Softmax function or other classifier is used to output the classification results. Softmax is a multinomial logistic function that can produce a vector in the range (0, 1) to represent the classification probability distribution of each class. Softmax is often used as a conversion function for multiclass classification [51]. The sigmoid function is suitable for binary classification.

Another task for the fully connected layer is backpropagation, and the four main steps of the iterative learning process are forward propagation, calculation of the loss function, backpropagation, and weight update. The loss function is used to calculate the differences in the predictive value and the actual value and update the weight and deviation of the neuron connection according to the error value, to minimize the error of the loss function. The loss function is an index that evaluates the deviation between the prediction result and the ground truth [52]. The loss function often used in deep learning is cross-entropy, as in Equation (4), where Yi’ is the labeling vector that corresponds to the i-th class, whose vector code is 0 or 1, and Yi is the output probability of the i-th class labeling vector. A pixel-wise Softmax is operated over the final feature map with the cross-entropy loss function, and a smaller loss function represents a higher accuracy of the model [53].

Cross Entropy = - \sum Y i^{'} \cdot \log (Y i)

(4)

To update the weights (W), a set of initial weights (Wi) is first given randomly. Then, backpropagation is used to determine which part of the weight causes the most significant loss and to adjust and reduce the loss. Gradient descent is then used to calculate the derivative dL/dW and perform the next weight update, as in Equation (5), in which W is the weight, Wi is the initial weight, L is the loss, and η is the learning rate. The learning rate refers to the extent to which the parameters use gradient descent to update the weights; a high learning rate means that the larger the weight update is, the faster the network will learn. Therefore, this model can take less time to converge to the optimal weight. All of the weights are updated to change the direction of the gradient, and backpropagation is performed repeatedly until the loss value is the minimum.

W = W i - η \frac{d L}{d W}

(5)

2.7. Designing and Testing of the CNN Model

When the deep learning method is used to identify images, detecting or classifying, a deep learning structure is first designed that fits the image of the given data set, and the performances (such as accuracy, velocity, and loss) of the models can be enhanced by adjusting the network layers and parameters. This study uses mainly the CNN-based framework to design the model and trains and tests the model to identify the images of RC damage. Perhaps the most common way to split data into sets for model development and assessment, typically called training and validation sets, is cross-validation. Morgan and Jacobs [54] consider cross-validation difficult to separate out test data, and using just one training, validation, and test data set may introduce significant biases associated with the specific data that end up in those splits, leading to suboptimal models and error estimates, particularly for smaller data sets. Cheng and Hoang [55] pointed out that although a large amount of data folds, a reasonable model performance estimate can be obtained. However, the computational cost can be prohibitive, especially for hybrid intelligent models. Therefore, this study abandons the cross-validation approach and directly splits the data into subsets for training, validation, and testing. The split of the image data set into training and testing sets was based on recommendations and best practices outlined in the relevant literature. The CNN was developed using two-dimensional images to ensure that the division represented the overall distribution of images in the data set. The primary purpose of the study was to develop a convenient damage identification model that domain experts can use to make decisions based on computer-based visual inspections.

To test the performance of the model while training the network, the data set is divided into a training set, validation set, and test set. The training set is mainly used for model training, the validation set is used for parameter adjustment, and the test set is used for model performance evaluation. According to Gao and Mosalam [56], training and testing data are usually chosen around 8:2; however, 70% of the whole data set was selected based on training and recommendations in the literature [57]. This article is based on the average of the literature, as mentioned above, and considers that the case has only a small number of images (1600), so the proportions of training, validation, and testing are adjusted to 62.5%, 12.5%, and 25%, respectively. A data set of 1600 images that include cracks, efflorescence, rebar exposure, and spalling in 400 samples is shown in Figure 3; the four classes of damage images are randomly divided into 1000 images in the training set, 200 images in the verification set, and 400 images in the test set.

These images use digital cameras to capture the surface damage of various facilities, and this non-contact sensor is quite convenient for collecting image data. Camera and image data are primarily used as visual aids to identify damage types and condition assessments. In particular, the maximum measurement distance taken by a camcorder is about 5 m, and this method helps obtain image data and surface estimates. This study uses an image sensor of a digital camera, which can record image data by hand-held manipulation. Because of its high pixel (pixel), small size, and easy portability, it is suitable for quick surface inspection and space inspection—a restricted field for surface quality inspection of various facilities within a small area.

The RC damage images collection of 1600 images (Figure 3) came from buildings such as teaching buildings, laboratories, dormitories, gymnasiums, and student restaurants, from which data sets were selected for training and testing. Images of the study were selected and manually photographed from the outside of the building. These images are taken from diverse viewing perspectives and scales that are free from limitations on a fixed distance between damage and the camera, focus length, and illumination. An image can contain one type of crack, efflorescence, rebar exposure, and spalling. The resolution of raw images is 2448 × 3264 pixels and will cause significant calculation costs and training time when directly inputted into the CNNs, so 1600 images are resized into a size of 310 × 460 pixels resolution. The hyperparameters of the CNN are optimized to construct a model to identify RC damage. Finally, images from four types of damage were carefully selected and imported into the models for comparison of damage identification.

Different CNN models can be used to handle tasks such as feature extraction, image detection, classification, recognition, and segmentation. The model performance can be improved by establishing a deep network architecture that conforms to the image data set, setting hyperparameters such as the size and number of filters, network layers, activation functions, and continuous iterations and repeated testing. This study uses 1600 damaged images as a data set for training, validation, and testing, and uses deep learning methods to find object recognition methods in images [54]. Through the parameter optimization of CNN (model training, parameter adjustment, and performance evaluation), models for identifying various damaged images can be designed and applied to various damage detections to improve the evaluation efficiency of building conditions.

CNN is a deep neural network for supervised learning. Different layers, distributions, and the number and type of filter kernels in each layer will be different. The CNN structure will affect the performance of the model and the accuracy of identification. Therefore, understanding the functions and principles of different structures is helpful in designing the structure of deep neural networks that meet practical applications. The primary purpose of the convolutional and pooling layers is to extract features, reduce image parameters, and then input the features to the fully connected layer for classification. Its neurons are only connected to the kernel of the previous layer, and the weight of each connection is the same and shared in the same layer; the CNN structure proposed in this study is shown in Figure 4.

To enhance the performance of the model, the parameters are adjusted for a series of training and testing images using the multiple CNN (MCNN). Based on the CNN, this research proposes an architecture of six network models (Table 1) and performs tests for the identification of the four classes of RC damage. Then, through the number and size of the filter, the depth of the convolutional layer, and the number of hidden layer neurons, as well as the addition of dropout and data augmentation, an optimized recognition model for multiple damaged images is designed.

(1): MCNN1: First, MCNN1 is designed as the input layer, convolutional layer, pooling layer, and fully connected layer (hidden layer and output layer), which all have only one layer; the ReLU activation function is applied to the output of all convolutional layers and the fully connected layer; the Softmax function is used for the output layer. The number of filters is 32, the kernel size is 5 × 5, and the hidden layer has 1500 neurons.
(2): MCNN2: The purpose of the CNN is targeted feature extraction, and deeper networks can usually provide more accurate features [58]. Theoretically, the network of deep layer structures is often more advantageous than traditional shallow structures. Kang et al. [7] also believe that using various techniques to improve the CNN to make the network deeper raises the object detection accuracy. Thus, MCNN2 has two more convolution layers, two more pooling layers, and one more hidden layer than MCNN1. Sharma et al. [45] believe that by doubling the number of filters in each subsequent convolution layer to increase the number of feature maps in the hidden layer, more relative information can be extracted from the input images. Therefore, MCNN2 increases the number of filters to 64 and 128, and the remaining parameters are the same as MCNN1.
(3): MCNN3: Atha and Jahanshahi [33] pointed out that the smaller the input sliding window size is, the fewer image features CNN can learn, which will affect the overall performance. However, with a smaller sliding window size, the damaged area can be located more accurately. Therefore, MCNN3 changes the size of all filters from 5 × 5 to 3 × 3 on the original architecture of MCNN2, and the remaining number of layers and parameters are the same as in MCNN2.
(4): MCNN4: To reduce the overfitting of the fully connected layer, the dropout layer can be used. MCNN4 adds its dropout to 0.5 based on MCNN3′s original architecture.
(5): MCNN5 decreases the number of neurons to 500 to reduce the model’s training time, and tests whether this reduction will influence the identification accuracy.
(6): MCNN6: Under widely changing conditions, it is crucial for us to obtain a well-configured deep structure and rich data set to address the practical problems that can occur in the real world [59]. For deep neural networks, the performance can be further improved by increasing the amount of training data [6]. By generating more training images without adding new original ones, data augmentation avoids overfitting in terms of translation, color, or light changes [34]. Therefore, MCNN6 uses data augmentation technology on the original architecture of MCNN4 to increase the amount of image data.

Table 1. Six architectures based on MCNN.

Model	Layer	Filter	Kernel Size	Hidden Layer Neuron	Dropout	Data Augmentation
MCNN1	Conv-1, Maxpool-1, ReLU	32	5 × 5	1500	-	-
MCNN2	Conv-1, Maxpool-1, ReLU Conv-2, Maxpool-2, ReLU Conv-3, Maxpool-3, ReLU	32 64 128	5 × 5	500 1000	-	-
MCNN3	Conv-1, Maxpool-1, ReLU Conv-2, Maxpool-2, ReLU Conv-3, Maxpool-3, ReLU	32 64 128	3 × 3	500 1000	-	-
MCNN4	Conv-1, Maxpool-1, ReLU Conv-2, Maxpool-2, ReLU Conv-3, Maxpool-3, ReLU	32 64 128	3 × 3	500 1000	0.5	-
MCNN5	Conv-1, Maxpool-1, ReLU Conv-2, Maxpool-2, ReLU Conv-3, Maxpool-3, ReLU	32 64 128	3 × 3	500	0.5	-
MCNN6	Conv-1, Maxpool-1, ReLU Conv-2, Maxpool-2, ReLU Conv-3, Maxpool-3, ReLU	32 64 128	3 × 3	500 1000	0.5	Rescale = 1/255, rotation = 40 width shift = 0.2 height shift = 0.2 shear = 0.2 zoom = 0.2.

3. Experiments and Results Discussion

3.1. Evaluation of the Model

The performance of the supervised learning model, i.e., the generalization ability of the image damage recognition, must be measured by a certain metric. The parameters of the learning algorithm or the variables of the classification target can be adjusted through the metric to gradually optimize the model. The evaluation method can compare the data predicted by the model with the actual data to measure the performance of the model. Table 2 shows the results of identifying four RC damage images with six models by various metrics. Among them, accuracy is the most commonly used metric for evaluating supervised learning models. Accuracy means the percentage that belongs to each class in all prediction classes, as shown in Equation (6). Instances of the model correctly identifying an object of interest in an image and making no predictions when no actual objects of interest were depicted in an image were categorized as true positive (TP) and true negative (TN), respectively. Instances of the model predicting that an object in an image was an object of interest when it was not and not identifying any objects of interest although an object of interest was present in an image were categorized as false positive (FP) and false negative (FN), respectively. The evaluation of classifiers is usually based on the accuracy of the prediction and a comparison of supervised learning algorithms on the accuracy of the trained classifiers on a specific data set [60]. The accuracy of this study includes the test set, training set, and validation set; in particular, the accuracy of the test set shows the predictive ability of the model on new data. MCNN6 is the best in the test and validation sets, 73.6% and 80.21%, respectively. The accuracy of the other five models in the test set did not exceed 70%, and MCNN1 was the only accuracy of all six models that did not exceed 50%, with only approximately 25%. The accuracy of the MCNN2 training set is the best (95%), but the overfitting is also the largest (overfitting obtained as the difference between the training and test accuracies).

Accuracy = \frac{TP + TN}{TP + FN + FP + TN}

(6)

Efficiency refers to the training time used by the model to recognize images. Because this research is calculated on a CPU (central processing unit), the computing efficiency is relatively slow compared with computing on a GPU (graphics processing unit). If the detection work is performed by a GPU, then the efficiency will be better. Experiments show that when the filter size decreases (3 × 3), the training time required for the model is greatly reduced by nearly 50%, and the accuracy is slightly improved, but the difference is not significant. MCNN4 has the lowest computational cost, with test, training, and validation times of 9.4, 1871, and 371 s, respectively; however, MCNN6 shows the best performance when all metrics are considered. Model training was optimized based on the loss function that is designed to evaluate the performance [61]. The loss is the error value between the predicted and actual results; the smaller the loss values are, the better the image recognition performance. After MCNN4, MCNN5, and MCNN6 are added to the dropout layer, the loss is significantly reduced, and the overfitting of the model is also improved. The model training process shows that the data augmentation technique can help to improve the accuracy of the model and effectively avoid overfitting. In addition, compared with MCNN4, MCNN5 reduces the number of hidden layers and has 1000 neurons; however, the results show that this change has little effect on the performance of the model. Finally, the training results of a series of hyperparameter adjustments in this study show that MCNN6 has the best recognition ability. The purpose is to apply the optimized model to the recognition of various RC damage images, and the data of these evaluation metrics can prove that the model can meet the needs of practical applications.

Figure 5 shows the accuracy and loss of damaged images in the training and verification steps by the six models. MCNN1 has only one convolutional layer, pooling layer, and hidden layer, and the number of filters is 32 with a kernel size of 5 × 5. The test results show that the accuracy is maintained at approximately 24%, which cannot be effectively improved; the loss value is also above 12, which means that the model is not easy to train, as shown in Figure 5a. As the epoch increases for the remaining five models, the accuracy of the training and verification steps increases while the loss value decreases, and the overfitting phenomenon gradually improves. In addition, by increasing the epochs of the best model, MCNN6, to 50, the difference between the accuracy and loss of the training and verification is reduced (Figure 6), and the test accuracy is slightly reduced to 72.2%. Although the test accuracy is within the error tolerance range, the model training time has also increased considerably.

3.2. Experiments on the Optimized Mode

This study collected four common damage images of RC to form a data set, including crack, efflorescence, rebar exposure, and spalling. Each damage class had 400 images, for a total of 1600 RC damaged images. In each experiment, two classes of damage images are used for multiple classifications, and there are six types of image combinations to evaluate the ability of the optimized MCNN6 to recognize two damages. Type A is an image of crack and efflorescence; Type B is an image of crack and rebar exposure; Type C is an image of crack and spalling; Type D is an image of efflorescence and rebar exposure; Type E is an image of efflorescence and spalling; and F type is the image of rebar exposure and spalling, as shown in Table 3.

Figure 7 shows that when the epoch of MCNN6 is 30, the accuracy of the training set of types A to F is above 80%. Among them, the accuracy of the model for type B is the highest, and when the epoch is 14, the accuracy is not lower than 95%. Similarly, the accuracy of type D is the second highest, and when the epoch is 18, the accuracy is no longer lower than 90%. When the epochs are 22 and 29 for types A and C, respectively, the accuracy is not lower than 90%. In addition, for Types F and E, when the epoch is 24, the accuracy is not lower than 85% and 80%, respectively. The red circular dot indicates the starting point of convergence; the accuracy trend gradually increases after this point.

Figure 8 shows that the error value between the predicted result of MCNN6 and the actual result increases as the epoch increases, and the two continue to shrink. When the epoch is 14, it begins to converge, and the loss of type B does not exceed 0.15, and when the epoch is 27, the loss is 0.065 (the smallest), which shows that MCNN6 is the easiest to train for type B. When epochs D, C, and A are 15, 24, and 26, respectively (convergence), the loss does not exceed 0.25, and the final loss values of the C and A types are very similar. In addition, when the epoch is 18, it begins to converge, and the loss value of type F does not exceed 0.35. The loss value of type E is the highest among the six types, and the epoch is 23; the loss does not exceed 0.45 and is finally 0.44. The red diamond indicates the starting point of convergence; the loss gradually decreases when errors occur.

Commonly, the performance of the deep learning method can be defined in terms of accuracy and computational efficiency (i.e., less simulation/computational time) [62]. The MCNN in this study is a multiple classification. The performance of the model can be evaluated from the test set data, and the accuracy, loss, and efficiency are used to confirm the recognition performance of MCNN6 for two damages. It can be seen from Table 4 that (a) MCNN6 has the best recognition performance for type B, with an accuracy of 96.81% and a loss of 0.07, which indicates that the model has a high degree of recognition ability for crack and rebar exposure. (b) The accuracy of MCCN6 for Type D is 90.51%. Since the two types of damage are quite different between efflorescence and rebar exposure, the model also has a certain degree of recognition for these two damages. (c) The accuracy and loss for Type A and Type C are very similar, approximately 88% and 0.20, respectively; MCNN6 has similar recognition capabilities for cracks and spalling or cracks and efflorescence, but the efficiency of Type A is 9.56 min faster than that of Type C. (d) Type F is rebar exposure and spalling. The accuracy of MCNN6 for these two damages is 85.22%, and the loss is 0.31. The recognition performance of the model is slightly reduced. The damage color and shape of the rebar exposure and spalling are similar. (e) Compared with the above five types, MCNN6 has the worst recognition performance for Type E. The accuracy, loss, and efficiency are 80%, 0.44, and 4677 s, respectively. This finding shows that the model is less able to identify efflorescence and spalling, and the model has the longest training time. Finally, this experiment can make the following conclusions: Types A, B, and C have better accuracy, which means that MCNN6 has better recognition ability for cracks, and cracks are unique compared with other damage; in contrast, Types E and F have poor accuracy. This finding shows that MCNN6 does not easily recognize spalling; in other words, spalling is easily confused with efflorescence and rebar exposure.

To achieve automated RC facility status detection, an optimized recognition model is necessary. Because RC facilities can produce multiple types of damage simultaneously, they can effectively perform various damage recognition tasks. This study uses a combination of six RC damage images to evaluate the performance of MCNN6 in multiple image recognition. The accuracy of MCNN6′s recognition of six types of RC damage is 80.0%~96.81%, the loss is 0.07~0.44, and the efficiency is 3302~4677 s. The model has been proven to be robust in identifying different classes of damage and generally achieves satisfactory results. Due to the large number of CNN hyperparameters, each has its different functions. After designing and optimizing the structure of the CNN, as well as training and testing it on various damages in images, this research summarizes the key points of setting the CNN hyperparameters:

The greater the number of filters is, the larger the number of feature maps that are generated. By extracting more features in the image, it is easier to identify the content of the image. The filters of this research show that 32, 64, and 128 are the best. When the number is increased by 256, the model accuracy is not improved.
The smaller the kernel size is, the smaller the number of image features that the CNN can learn. This research experiment shows that the 3 × 3 size of the recognition image is the best. If other sizes or combinations of different kernel sizes are used, the stability of the model is not good.
The characteristic of the CNN is that it needs considerable training data to improve the performance of the model. This research uses data augmentation technology to increase the amount of image training, and the results show that this approach can effectively improve the accuracy.
Deep architecture networks are often better than traditional architectures. This study found that with the same number of neurons in the hidden layer, the accuracy of multiple hidden layers is slightly higher than that of one hidden layer. The number of neurons has no significant effect on the model’s recognition of damage in images.
The quality of the image data (for example, resolution and noise) is a critical factor for damage identification, and it will have a certain degree of influence on the performance of the model.
Epochs have a specific relationship with the accuracy and efficiency of the model. As the number of epochs increases, the accuracy will increase; however, the accuracy has a specific limit, and there is a positive correlation between the number of epochs and the efficiency (training time).

To design an optimal CNN recognition model, it is necessary to adjust its structure, number of network layers, and hyperparameters, and continuously train and test to find a model architecture suitable for image features. The MCNN6 in this study was evaluated by metrics and confirmed that the optimal structure is a three-layer convolutional layer and a pooling layer, the number of filters is 32, 64, and 128, the size of the kernel is 3 × 3, the stride is one, and the activation function is ReLU. In the fully connected layer, there are two hidden layers (the number of neurons is 500 and 1000, respectively), while the classification function of the output layer is Softmax. In addition, dropout and epoch are set to 0.5 and 30, respectively, and data augmentation is used to increase the amount of image data. MCNN6 has a good accuracy of more than 80.0% for a variety of RC damage recognition and can provide a favorable detection direction for evaluating the surface condition of automated facilities.

Numerical images obtained from building surfaces contain valuable information (damage type and range). Fast and accurate damage classification through CNN can assist facility maintainers in assessing the deterioration of building surfaces. This study proposes the MCNN6 architecture as a small neural network with three convolutional, pooling, and fully connected layers. Compared with classification problems using enormous neural networks (e.g., VGG16), this model has advantages in training and detection time; coupled with powerful computing efficiency and accuracy, it can meet building detection and condition assessment efficiency requirements. The object recognized by MCNN6 is a combination of four damage types (Types A–F), which shows that the model has good performance in the versatility of multiple damage recognition. MCNN6 can perform multi-class image recognition and object classification, fully demonstrating absolute capabilities for advanced feature extraction. A variety of damages (crack, efflorescence, rebar exposure, and spalling) are accurately identified by the model; if verified with the results of the expert inspection, a more accurate damage range, quantity, location, and damage degree can be obtained from the image. Therefore, this innovative method will be beneficial to developing automated detection systems, showing great potential in non-contact sensing and image classification tasks, and providing better strategic directions for technologies in civil engineering and computer science.

4. Conclusions

The difficulty of RC damage recognition lies in using limited images to train a model that can conform to the laws of other images. At present, deep learning to identify RC damage images has considerable potential and can replace traditional manual visual inspection to improve the efficiency, speed, and objectivity of detection. Since there can be multiple classes of RC damage on the surfaces of buildings or facilities simultaneously, their shapes, sizes, colors, and characteristics are different. To effectively identify a variety of RC damages, this research designed an optimized CNN architecture by adjusting various parameters and a series of training and testing steps. Furthermore, various types of damage image combinations were used to test the recognition ability of the CNN, and three metrics were implemented to measure the performance of the model. In addition, the model is gradually optimized through these metrics to improve the performance of the image recognition model and to achieve automated RC status evaluation.

The critical aspect about the model is not the accuracy of the training set but the generalization performance of the model on the test set. The performance of the optimized MCNN6 in this study is evaluated using the data in the test set. MCNN6 has the best recognition performance for crack and rebar exposure (Type B), with an accuracy of 96.81% and a loss of 0.07. The accuracy of the other five types of damage combinations is also higher than 80.0%, and the loss is less than 0.44. Generally, MCNN6 has a better recognition ability for cracks; however, it is unideal to recognize spalling.

The two main frameworks for achieving RC damage automated inspection are image data and image processing technology (including computer vision and deep learning methods). Images can quickly acquire fine images through sensors such as digital cameras, which have been developed to be quite mature and convenient. However, there is still room for improvement in image processing technology. In addition, since different classes of damage have their own characteristics, they will have varying degrees of impact on the recognition ability of deep learning. Therefore, it is necessary to test different MCNN architectures to understand the performance of the hyperparameters and network layers for image recognition. Finally, the results of this research can be used as a reference for optimizing various parameters of CNNs, thereby helping to overcome the limitations of current computer vision and to improve the ability of automated recognition models in the future. The model based on deep learning can effectively identify various complex damage images. In the future, there will be opportunities to replace the subjectivity and efficiency of human visual judgment to ensure that it has high practicability in the practice of automated image recognition.

Author Contributions

C.-L.F. edited and wrote the manuscript; Y.-J.C. collected the data and analyzed the data; All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article. The data presented in this study can be requested from the authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

Mei, Q.; Gül, M. Multi-level feature fusion in densely connected deep-learning architecture and depth-first search for crack segmentation on images collected with smartphones. Struct. Health Monit. 2020, 19, 1726–1744. [Google Scholar] [CrossRef]
Neville, A.M. Properties of Concrete, 4th ed.; Pearson Education Limited: Longman, UK, 2011. [Google Scholar]
Yao, Y.; Tung, S.T.E.; Glisic, B. Crack detection and characterization techniques—An overview. Struct. Control Health Monit. 2014, 21, 1387–1413. [Google Scholar] [CrossRef]
Yang, X.; Li, H.; Yu, Y.; Luo, X.; Huang, T.; Yang, X. Automatic pixel-level crack detection and measurement using fully convolutional network. Comput. Civ. Infrastruct. Eng. 2018, 33, 1090–1109. [Google Scholar] [CrossRef]
Zhang, C.; Chang, C.C.; Jamshidi, M. Concrete bridge surface damage detection using a single-stage detector. Comput. Civ. Infrastruct. Eng. 2020, 35, 389–409. [Google Scholar] [CrossRef]
Li, S.; Zhao, X.; Zhou, G. Automatic pixel-level multiple damage detection of concrete structure using fully convolutional network. Comput. Civ. Infrastruct. Eng. 2019, 34, 616–634. [Google Scholar] [CrossRef]
Kang, D.; Benipal, S.S.; Gopal, D.L.; Cha, Y.J. Hybrid pixel-level concrete crack segmentation and quantification across complex backgrounds using deep learning. Autom. Constr. 2020, 118, 103291. [Google Scholar] [CrossRef]
Graybeal, B.A.; Phares, B.M.; Rolander, D.D.; Moore, M.; Washer, G. Visual inspection of highway bridges. J. Nondestruct. Eval. 2002, 21, 67–83. [Google Scholar]
Jang, K.; Kim, N.; An, Y.K. Deep learning–based autonomous concrete crack evaluation through hybrid image scanning. Struct. Health Monit. 2019, 18, 1722–1737. [Google Scholar]
Kim, H.; Ahn, E.; Shin, M.; Sim, S.H. Crack and noncrack classification from concrete surface images using machine learning. Struct. Health Monit. 2019, 18, 725–738. [Google Scholar] [CrossRef]
Grefenstette, J.J. Genetic algorithms and machine learning. In Proceedings of the Sixth Annual Conference on Computational Learning Theory, Santa Cruz, CA, USA, 26–28 July 1993; pp. 3–4. [Google Scholar]
Zhou, Q.; Zhu, F.; Yang, X.; Wang, F.; Chi, B.; Zhang, Z. Shear capacity estimation of fully grouted reinforced concrete masonry walls using neural network and adaptive neuro-fuzzy inference system models. Constr. Build. Mater. 2017, 153, 937–947. [Google Scholar] [CrossRef]
Chang, C.M.; Lin, T.K.; Chang, C.W. Applications of neural network models for structural health monitoring based on derived modal properties. Measurement 2018, 129, 457–470. [Google Scholar] [CrossRef]
Zhang, M.; Akiyama, M.; Shintani, M.; Xin, J.; Frangopol, D.M. Probabilistic estimation of flexural loading capacity of existing RC structures based on observational corrosion-induced crack width distribution using machine learning. Struct. Saf. 2021, 91, 102098. [Google Scholar] [CrossRef]
Mangalathu, S.; Jang, H.; Hwang, S.H.; Jeon, J.S. Data-driven machine-learning-based seismic failure mode identification of reinforced concrete shear walls. Eng. Struct. 2020, 208, 110331. [Google Scholar] [CrossRef]
Rahman, J.; Ahmed, K.S.; Khan, N.I.; Islam, K.; Mangalathu, S. Data-driven shear strength prediction of steel fiber reinforced concrete beams using machine learning approach. Eng. Struct. 2021, 233, 111743. [Google Scholar] [CrossRef]
Dong, C.Z.; Catbas, F.N. A review of computer vision–based structural health monitoring at local and global levels. Struct. Health Monit. 2020, 20, 692–743. [Google Scholar] [CrossRef]
Adhikari, R.S.; Moselhi, O.; Bagchi, A.; Rahmatian, A. Tracking of defects in reinforced concrete bridges using digital images. J. Comput. Civ. Eng. 2016, 30, 04016004. [Google Scholar] [CrossRef]
Chen, P.H.; Shen, H.K.; Lei, C.Y.; Chang, L.M. Support-vector-machine-based method for automated steel bridge rust assessment. Autom. Constr. 2012, 23, 9–19. [Google Scholar] [CrossRef]
Li, G.; Zhao, X.; Du, K.; Ru, F.; Zhang, Y. Recognition and evaluation of bridge cracks with modified active contour model and greedy search-based support vector machine. Autom. Constr. 2017, 78, 51–61. [Google Scholar] [CrossRef]
Wang, Z.; Cha, Y.J. Unsupervised deep learning approach using a deep auto-encoder with an one-class support vector machine to detect structural damage. Struct. Health Monit. 2020, 20, 406–425. [Google Scholar] [CrossRef]
Shi, Y.; Cui, L.; Qi, Z.; Meng, F.; Chen, Z. Automatic road crack detection using random structured forests. IEEE Trans. Intell. Transp. Syst. 2016, 17, 1–12. [Google Scholar] [CrossRef]
Lee, B.J.; Lee, H.D. Position-invariant neural network for digital pavement crack analysis. Comput. Civ. Infrastruct. Eng. 2004, 19, 105–118. [Google Scholar] [CrossRef]
Abdel-Qader, I.; Pashaie-Rad, S.; Abudayyeh, O.; Yehia, S. PCA-based algorithm for unsupervised bridge crack detection. Adv. Eng. Softw. 2006, 37, 771–778. [Google Scholar] [CrossRef]
Cha, Y.J.; Wang, Z. Unsupervised novelty detection–based structural damage localization using a density peaks-based fast clustering algorithm. Struct. Health Monit. 2018, 17, 313–324. [Google Scholar] [CrossRef]
Diez, A.; Khoa, N.L.D.; Makki Alamdari, M.; Wang, Y.; Chen, F.; Runcie, P. A clustering approach for structural health monitoring on bridges. J. Civ. Struct. Health Monit. 2016, 6, 429–445. [Google Scholar] [CrossRef]
Mathavan, S.; Rahman, M.; Kamal, K. Use of a self-organizing map for crack detection in highly textured pavement images. J. Infrastruct. Syst. 2015, 21, 1–11. [Google Scholar] [CrossRef]
Lui, A.K.F.; Chan, Y.H.; Leung, M.F. Modelling of destinations for data-driven pedestrian trajectory prediction in public buildings. In Proceedings of the IEEE International Conference on Big Data, Orlando, FL, USA, 15–18 December 2021; pp. 1709–1717. [Google Scholar]
Lui, A.K.F.; Chan, Y.H.; Leung, M.F. Modelling of pedestrian movements near an amenity in walkways of public buildings. In Proceedings of the 8th International Conference on Control, Automation and Robotics, Xiamen, China, 8–10 April 2022; pp. 394–400. [Google Scholar]
Zhu, Z.; Brilakis, I. Parameter optimization for automated concrete detection in image data. Autom. Constr. 2010, 19, 944–953. [Google Scholar] [CrossRef]
Wu, R.T.; Jahanshahi, M.R. Data fusion approaches for structural health monitoring and system identification: Past, present, and future. Struct. Health Monit. 2020, 19, 552–586. [Google Scholar]
He, H.X.; Zheng, J.C.; Liao, L.C.; Chen, Y.J. Damage identification based on convolutional neural network and recurrence graph for beam bridge. Struct. Health Monit. 2021, 20, 1392–1408. [Google Scholar] [CrossRef]
Atha, D.J.; Jahanshahi, M.R. Evaluation of deep learning approaches based on convolutional neural networks for corrosion detection. Struct. Health Monit. 2018, 17, 1110–1128. [Google Scholar] [CrossRef]
Yeum, C.M.; Dyke, S.J.; Ramirez, J. Visual data classification in post-event building reconnaissance. Eng. Struct. 2018, 155, 16–24. [Google Scholar]
Rubio, J.J.; Kashiwa, T.; Laiteerapong, T.; Deng, W.; Nagai, K.; Escalera, S.; Nakayama, K.; Matsuo, Y.; Prendinger, H. Multi-class structural damage segmentation using fully convolutional networks. Comput. Ind. 2019, 112, 103121. [Google Scholar] [CrossRef]
Davoudi, R.; Miller, G.R.; Calvi, P.; Kutz, J.N. Computer vision–based damage and stress state estimation for reinforced concrete and steel fiber–reinforced concrete panels. Struct. Health Monit. 2019, 19, 1645–1665. [Google Scholar] [CrossRef]
Kim, B.; Cho, S. Image-based concrete crack assessment using mask and region-based convolutional neural network. Struct. Control Health Monit. 2019, 26, 1–15. [Google Scholar] [CrossRef]
Beckman, G.H.; Polyzois, D.; Cha, Y.J. Deep learning-based automatic volumetric damage quantification using depth camera. Autom. Constr. 2019, 99, 114–124. [Google Scholar] [CrossRef]
Liang, X. Image-based post-disaster inspection of reinforced concrete bridge systems using deep learning with Bayesian optimization. Comput. Civ. Infrastruct. Eng. 2019, 34, 415–430. [Google Scholar] [CrossRef]
Yeum, C.M.; Choi, J.; Dyke, S.J. Automated region-of-interest localization and classification for vision-based visual assessment of civil infrastructure. Struct. Health Monit. 2019, 18, 675–689. [Google Scholar] [CrossRef]
Spencer, B.F.; Hoskere, V.; Narazaki, Y. Advances in computer vision-based civil infrastructure inspection and monitoring. Engineering 2019, 5, 199–222. [Google Scholar] [CrossRef]
He, L.; Lian, J.; Ma, B. Intelligent damage identification method for large structures based on strain modal parameters. J. Vib. Control. 2014, 20, 1783–1795. [Google Scholar] [CrossRef]
Coates, A.; Lee, H.; Ng, A.Y. An analysis of single-layer networks in unsupervised feature learning. J. Mach. Learn. Res. 2011, 15, 215–223. [Google Scholar]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Sharma, A.; Liu, X.; Yang, X.; Shi, D. A patch-based convolutional neural network for remote sensing image classification. Neural Netw. 2017, 95, 19–28. [Google Scholar] [CrossRef] [PubMed]
Leng, S.; Lin, J.R.; Hu, Z.Z.; Shen, X. A hybrid data mining method for tunnel engineering based on real-time monitoring data from tunnel boring machines. IEEE Access 2020, 8, 90430–90449. [Google Scholar] [CrossRef]
Xu, Y.; Bao, Y.; Chen, J.; Zuo, W.; Li, H. Surface fatigue crack identification in steel box girder of bridges by a deep fusion convolutional neural network based on consumer-grade camera images. Struct. Health Monit. 2019, 18, 653–674. [Google Scholar] [CrossRef]
Nogueira, K.; Penatti, O.A.B.; Dos Santos, J.A. Towards better exploiting convolutional neural networks for remote sensing scene classification. Pattern Recognit. 2017, 61, 539–556. [Google Scholar] [CrossRef]
Cha, Y.J.; Choi, W.; Suh, G.; Mahmoudkhani, S.; Büyüköztürk, O. Autonomous structural visual inspection using region-based deep learning for detecting multiple damage types. Comput. Civ. Infrastruct. Eng. 2018, 33, 731–747. [Google Scholar] [CrossRef]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Bengio, Y. Learning deep architectures for AI. Found. Trends Mach. Learn. 2009, 2, 1–127. [Google Scholar] [CrossRef]
Zhang, L.; Shen, J.; Zhu, B. A research on an improved Unet-based concrete crack detection algorithm. Struct. Health Monit. 2021, 20, 1864–1879. [Google Scholar] [CrossRef]
Bai, Y.; Mas, E.; Koshimura, S. Towards operational satellite-based damage-mapping using U-net convolutional network: A case study of 2011 Tohoku Earthquake-Tsunami. Remote Sens. 2018, 10, 1626. [Google Scholar] [CrossRef]
Morgan, D.; Jacobs, R. Opportunities and challenges for machine learning in materials science. Annu. Rev. Mater. Res. 2020, 50, 71–103. [Google Scholar] [CrossRef]
Cheng, M.Y.; Hoang, N.D. Groutability prediction of microfine cement based soil improvement using evolutionary LS-SVM inference model. J. Civ. Eng. Manag. 2014, 20, 839–848. [Google Scholar] [CrossRef]
Gao, Y.; Mosalam, K.M. Deep transfer learning for image-based structural damage recognition. Comput. Civ. Infrastruct. Eng. 2018, 33, 748–768. [Google Scholar] [CrossRef]
Ray, R.; Kumar, D.; Samui, P.; Roy, L.B.; Goh, A.; Zhang, W. Application of soft computing techniques for shallow foundation reliability in geotechnical engineering. Geosci. Front. 2020, 12, 375–383. [Google Scholar] [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Cha, Y.J.; Choi, W.; Büyüköztürk, O. Deep learning-based crack damage detection using convolutional neural networks. Comput. Civ. Infrastruct. Eng. 2017, 32, 361–378. [Google Scholar] [CrossRef]
Kotsiantis, S.B. Supervised machine learning: A review of classification techniques. Informatica 2007, 31, 249–268. [Google Scholar]
Liu, C.; Sepasgozar, S.M.; Shirowzhan, S.; Mohammadi, G. Applications of object detection in modular construction based on a comparative evaluation of deep learning algorithms. Constr. Innov. 2021, 22, 141–159. [Google Scholar] [CrossRef]
Salehi, H.; Burgueño, R. Emerging artificial intelligence methods in structural engineering. Eng. Struct. 2018, 171, 170–189. [Google Scholar] [CrossRef]

Figure 1. The calculation process for the features of the convolutional layer.

Figure 2. Example of average pooling with a 2 × 2 filter and stride of 1.

Figure 3. Four classes of images of RC damage.

Figure 4. The traditional architecture of CNN.

Figure 5. The accuracy and loss of the six models for training and verification of damage images.

Figure 6. Accuracy and loss of MCNN6 training and verification (epoch = 50).

Figure 7. The accuracy of MCNN6 for six types of RC damage.

Figure 8. MCNN6 loss of six types of RC damage.

Table 2. Test results based on the MCNN.

Metric	MCNN1	MCNN2	MCNN3	MCNN4	MCNN5	MCNN6
Test accuracy (%)	24.84	50.16	52.4	66.67	67.78	73.6
Training accuracy (%)	24.35	95	88.02	89.38	87.63	73.77
Validation accuracy (%)	25	57.39	62.15	77.46	75	80.21
Test efficiency (s)	23.7	15.5	12.1	9.4	8.5	6.2
Training efficiency (s)	5300	3843	1887	1871	2035	2045
Validation efficiency (s)	1060	770	378	371	407	425
Loss	12.14	0.81	0.7	0.49	0.48	0.61
Overfitting	0.005	0.448	0.356	0.227	0.199	0.002

Table 3. Combinations of six types of RC damage images.

Type	Crack	Efflorescence	Rebar Exposure	Spalling
A	√	√
B	√		√
C	√			√
D		√	√
E		√		√
F			√	√

Table 4. Evaluation results of MCNN6 on six types of damage.

Type	A	B	C	D	E	F
Accuracy (%)	87.5	96.81	88.83	90.51	80.0	85.22
Loss	0.20	0.07	0.21	0.18	0.44	0.31
Efficiency (s)	3302	3845	3875	4239	4677	3990

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fan, C.-L.; Chung, Y.-J. Design and Optimization of CNN Architecture to Identify the Types of Damage Imagery. Mathematics 2022, 10, 3483. https://0-doi-org.brum.beds.ac.uk/10.3390/math10193483

AMA Style

Fan C-L, Chung Y-J. Design and Optimization of CNN Architecture to Identify the Types of Damage Imagery. Mathematics. 2022; 10(19):3483. https://0-doi-org.brum.beds.ac.uk/10.3390/math10193483

Chicago/Turabian Style

Fan, Ching-Lung, and Yu-Jen Chung. 2022. "Design and Optimization of CNN Architecture to Identify the Types of Damage Imagery" Mathematics 10, no. 19: 3483. https://0-doi-org.brum.beds.ac.uk/10.3390/math10193483

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Design and Optimization of CNN Architecture to Identify the Types of Damage Imagery

Abstract

1. Introduction

2. Methodology

2.1. Convolutional Layer

2.2. Activation Layer

2.3. Pooling Layer

2.4. Dropout Layer

2.5. Data Augmentation Method

2.6. Fully Connected Layer

2.7. Designing and Testing of the CNN Model

3. Experiments and Results Discussion

3.1. Evaluation of the Model

3.2. Experiments on the Optimized Mode

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI