In this chapter, we summarize the research conducted for the prediction of soundness, screening with respect to prioritization, and extraction of deterioration factors in sewage pipes; we conduct the research using XAI in an attempt to interpret classification models using machine learning. We then describe the novelty of the present study based on these aspects. In this section, the following two points are focused on, along with the purpose of this study, and the novelty of this study is described based on these two points.
2.1. Summary of Previous Research
There are existing attempts to predict the deterioration of sewage pipes using statistical and machine-learning methods. In addition, the factors related to the deterioration of sewage pipes have been examined by interpreting the constructed models.
As statistical methods, some studies have applied regression analysis [
7,
8] or logistic regression analysis [
9,
10,
11,
12,
13]. As an example of applying regression analysis, Gedam et al. [
8] constructed a regression model using pipe age, pipe diameter, pipe material, and pipe depth as explanatory variables. Also, the regression coefficients show that pipe age affects deterioration in a statistically significant manner. On the other hand, as an example of applying logistic regression analysis, Ana et al. [
10] constructed a binary logistic regression model that classifies the condition of sewer pipes into two values (good or bad condition). From the constructed model, it has also been shown that pipe age, pipe material, and pipe length are important factors influencing deterioration. However, it has been mentioned that the deterioration of sewage pipes is a non-linear process, and that it is difficult for statistical models to predict this process with high accuracy [
14]. On the other hand, machine learning models can establish the linear and non-linear relationships between input factors and the condition of sewage pipes, and machine learning approaches have also been addressed [
15].
As an approach involving machine learning models, Sousa et al. [
15] compared artificial neural networks and support vector machines (SVMs) with the performance of logistic regression and showed that the performance of ANNs was the best.
Harvey and Mcbean [
16] compared the performance of decision trees and SVM and showed that decision trees perform better. In addition, they applied Random Forest to predict the state of the pipe (good or poor) and obtained good results [
17].
Laakso et al. [
18] built a model to predict the state (good or poor) of a pipe using Random Forest and evaluated the importance of variables using the Boruta algorithm.
Winkler et al. [
19] used a method that extends the decision tree, the boosted decision tree, to perform binary classification of pipe states. The importance of the variables was assessed by calculating the feature importance of the constructed model.
Nguyen et al. [
20] built a model to predict the pipe condition (Good Condition or Bad Condition) using 17 different machine learning methods (e.g., Random Forest, SVM, KNN) and compared the classification performance. Random Forest has the best classification performance, and the feature importance in Random Forest indicates that the pipe material is the most important factor, followed by the pipe age.
In September 2011, the Ministry of Land, Infrastructure, Transport and Tourism and the National Institute for Land and Infrastructure Management (NILIM) released the NILIM database (DB), consisting of sewage-pipe deterioration data, to support the introduction of asset management for sewage projects. Matsumiya et al. [
21] quantitatively estimated the soundness prediction equation and made predictions regarding the amount of renovation work required using the NILIM DB. However, the proposed soundness prediction equation uses only the number of years since installation as an explanatory variable, and it cannot take into consideration differences in soundness based on pipe type, pipe diameter, number of attached pipes, etc. Moreover, although this DB is suitable for predicting the amount of renovation work required for an entire municipality, it is impossible to make detailed predictions such as which specific sewage pipe is damaged and to what extent.
Fujiu et al. [
22,
23,
24] used the NILIM DB and converted qualitative and quantitative data that could be obtained from the DB into differential functions to prevent information reduction and apply linear discriminant analysis to this process. They were thus able to discriminate each sewage pipe span between Group 0—which is a set comprising pipes with a large degree of deterioration consisting of urgency levels I and II—and Group 1—which is a set comprising pipes with a small degree of deterioration with urgency levels III and IV.
Meanwhile, we previously used the NILIM DB to construct an urgency classification model using a one-dimensional convolutional neural network (1D-CNN), which is a type of deep learning method, wherein we concluded that improving the classification performance requires additional variables that can reproduce the environment in which the sewage pipes are buried [
25].
This study aims to identify factors influencing the deterioration of sewage pipes by using Explainable AI (XAI) for machine learning models.
In a study attempting to interpret machine learning models using XAI, Xudong et al. [
26] tested five algorithms, such as LightGBM, and an artificial neural network as machine learning methods using water network maintenance data, and they concluded that LightGBM exhibited the best prediction performance. Moreover, this classification model was interpreted using Shapley additive explanations (SHAP) to clarify that the socioeconomic factors of the local community influenced the water pipe damage.
Ito et al. [
27] focused on the corrosion of communication pipelines and obtained the buried state of these pipelines from National Land Numerical Information, and after combining the data with inspection results, they built a binary classification model for determining the presence of corrosion in communication pipelines using XGBoost. In addition, they applied permutation importance to the constructed model and analyzed the importance of explanatory variables.
Tsukamoto et al. [
28] built an evacuation selection behavior model for residents using a neural network and clarified the factors that impacted evacuation behavior selection and evacuation site selection by applying PI analysis and PD analysis in addition to XAI methods.
Koori et al. [
29] used Random Forest to build a prediction model for landslide occurrence points and used SHAP, which is a type of XAI, to explain the prediction model globally and locally, thereby clarifying the basis of judgment of the prediction result.
Tatsuta et al. [
30] used bridge chart information in an attempt to estimate the cause of damage and the repair method using a gradient boosting decision tree (GBDT). They also interpreted the trained model using SHAP to analyze the effects of the specifications on the cause of damage and the repair/reinforcement method, thereby demonstrating the validity of the model.
2.2. Positioning of This Study
There are two novelties in this study. First, multi-class classification models for sewage pipe soundness are built using machine learning methods.
Machine learning methods have been used previously to predict the deterioration of sewage pipes, but all of them were formulated as binary classification problems by combining the soundness ranks into binary groups. In the pipelines used in this study, there are four soundness ranks, with different measures required at each level of soundness. In this study, the process of combining soundness into binary groups is not conducted, with the aim of building a more practical model. There are no studies, to the best of the author’s knowledge, that use machine learning methods for multi-class classification that consider the buried condition of sewage pipes, as in this study.
Second, the Explainable AI method SHAP is used to interpret the classification model and to examine the deterioration factors of sewage pipes. As mentioned in the previous section, the deterioration process in sewage pipes is non-linear, and machine learning methods can be used to examine the deterioration process more accurately than statistical methods. Therefore, it is considered that interpreting the deterioration prediction models built using machine learning methods can provide appropriate knowledge on the deterioration factors. Previous studies have investigated deterioration factors by calculating feature importance in Random Forest. Feature importance makes it possible to identify the magnitude of the influence of the explanatory variables on the prediction. However, it cannot identify whether the explanatory variables have a positive or negative influence on the prediction. There are studies that use XAI methods, including SHAP, to interpret classification models, gain new knowledge, and examine the validity of the models. However, no studies have applied SHAP to sewage pipes and extracted the factors influencing the deterioration of sewage pipes, to the knowledge of the authors.