Diversified Kernel Latent Variable Space and Multi-Objective Optimization for Selective Ensemble Learning-Based Soft Sensor

Peng, Lijun; Gu, Lichen; He, Lin; Shi, Yuan

doi:10.3390/app13095224

Open AccessArticle

Diversified Kernel Latent Variable Space and Multi-Objective Optimization for Selective Ensemble Learning-Based Soft Sensor

¹

School of Mechanical and Electrical Engineering, Xi’an University of Architecture and Technology, Xi’an 710055, China

²

Engineering Comprehensive Training Center, Xi’an University of Architecture and Technology, Xi’an 710055, China

³

School of Science, Xi’an University of Architecture and Technology, Xi’an 710055, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(9), 5224; https://0-doi-org.brum.beds.ac.uk/10.3390/app13095224

Submission received: 3 April 2023 / Revised: 16 April 2023 / Accepted: 19 April 2023 / Published: 22 April 2023

Download

Browse Figures

Versions Notes

Abstract

:

The improvement of data-driven soft sensor modeling methods and techniques for the industrial process has strongly promoted the development of the intelligent process industry. Among them, ensemble learning is an excellent modeling framework. Accuracy and diversity are two key factors that run through the entire stage of building an ensemble learning-based soft sensor. Existing base model generating methods or ensemble pruning methods always consider the two factors separately, which has limited the development of high-performance but low-complexity soft sensors. To work out this issue, a selective ensemble learning-based soft sensor modeling method based on multi-kernel latent variable space and evolutionary multi-objective optimization is proposed, referred to as MOSE-MLV-VSPLS. This method designs a multiple diversity enhancement mechanism in the base model generation stage. Diversified input variable subspaces are first constructed using the maximum information coefficient on the bootstrapping random resampling subset. Then a set of base models that combine accuracy and diversity are generated on supervised latent variable subspaces under multiple kernel function perturbations. Further, two quantifiable parameters are designed for accuracy and diversity, and the multi-objective gray wolf optimization algorithm is used to select the base models that maximize these two important parameters to achieve effective ensemble pruning at the model ensemble stage. The MOSE-MLV-VSPLS method is applied to two typical industry processes, and the experimental results show that the method is effective and superior in selective ensemble-based soft sensor modeling.

Keywords:

soft sensor; ensemble learning; latent variable subspace; diversity enhancement; ensemble pruning

1. Introduction

The accurate prediction of key process parameters of industrial processes is one of the important aspects in the development of smart manufacturing in process industries [1,2,3]. Data-driven soft sensors that are constructed based on state variables and detection variables in industrial processes are widely used in chemical, pharmaceutical, and metallurgical industries because of their ability to describe actual process states more realistically and accurately, which strongly supports the development of intelligent levels of industrial processes [4,5,6,7,8,9]. However, it is worth noting that industrial big data often have more complex data characteristics. On one hand, the operational variables of the production process obtained according to a large number of instruments available at the production site show strong correlation and redundancy characteristics. On the other hand, the internal mechanism of the industrial production process is complex, and the relationship between process variables and detection indicators is strongly nonlinear [10,11]. The complex data relationships bring challenges to the performance requirements of accuracy and reliability of industrial process soft sensors.

According to the differences in the modeling framework, data-driven soft sensors are classified into global models and local models. The framework of global models uses some learning algorithms such as the artificial neural network [12,13,14,15], the Gaussian process regression algorithm [16], etc. to generate a single soft sensor model that can describe the whole process of the data. However, it is difficult for either of the superior learning algorithms to comprehensively and adequately learn all the information embedded under the complex data relationships of the data. The result is that those soft sensors tend to exhibit unstable generalization performance in applications. On the contrary, the local modeling framework adopts the idea of “divide and conquer”, by generating a series of “good but different” base learners for the same problem, and takes advantage of the “group wisdom” to reach a more reasonable conclusion [17,18]. Among them, the ensemble learning modeling approach, which has developed rapidly in the last decade or so, has achieved good results in practical tasks such as classification and parameter prediction [19,20,21].

The improvement of the generalization performance of the ensemble learning-based soft sensor consists of two important steps. The first one is to generate a set of base models with high accuracy and diversity levels [17]. Research shows that subspaces of input variables that are strongly correlated but have weak redundancy with output variables are a prerequisite for accurate and robust base model construction [22,23,24]. The maximum information coefficient algorithm (MIC) is a filter method that has the ability to measure the correlation of variables taking into account the linear and nonlinear characteristics of industrial process data and solves the problem of redundant input variables using the max-dependency metric [25,26]. The latent variable models (LVMs) are constructed based on the kernel method, by using the nonlinear mapping method, and can deeply explore the potential data relationships between variables and extract more useful data information only through a relatively simple and straightforward training process, which is suitable for the rapid construction of soft sensor base models for industrial data [27,28,29].

In addition, maintaining a high level of difference between base models is the key to the success of the ensemble learning soft sensor models [17]. Therefore, the design of the variable subspace diversity mechanism at the base model generation stage is particularly important. Research scholars add perturbation mechanisms to enhance the randomness of each stage at the level of input samples, variable feature extraction, etc. to achieve the goal of enhancing the level of diversity. Examples include multiple random resampling [30,31,32], random feature subspace method, etc. However, these methods make it difficult to obtain a subspace of modeling variables with large variability when the sample size is not sufficient or when there is little variability in the relationship of variables between local sample sets, and the diversity among base models is difficult to ensure. It is noteworthy that different kernel function types and kernel parameters have different characteristics in the LVMs [33,34,35]; thus, the same input variable structure will be mapped to different latent variable modeling subspaces (LVS). This character of LVMs can be added to the base model diversity enhancement mechanism [29,36].

In ensemble learning, many could be better than all [37,38]. Therefore, selectively choosing ensemble parts of base models that are based on certain standards and show better comprehensive performance and lower similarity from base models is another key step for high-performance ensemble learning soft sensor model construction. The selection process of base models is also called ensemble pruning [39,40]. Whether the ensemble pruning effect is satisfactory depends on reasonable quantitative indexes of base model accuracy and diversity criteria on one hand and quick and easy selection methods on the other hand. For soft sensor models, the accuracy is often measured by quantitative indicators such as root-mean-square error (RMSE), mean square error (MSE), etc. [7,18]. However, there is no unified standard for the quantitative parameters of diversity. Mutual information of variables or covariance and other correlation or complementarity indicators are usually chosen [41].

The based models in subensemble sets obtained by ensemble pruning are expected to be precise and different, but this work often is NP-complete hard and even intractable to approximate, so the selection method of base models has always been the focus of researchers. Based on the quantitative indicators of accuracy and diversity, a comprehensive performance evaluation function of the subensemble set was constructed by using weighting algorithms and other methods. Then the greedy search algorithm or optimization algorithm was used to solve the evaluation function for ensemble pruning. However, a multi-step analysis and a relatively large amount of overall computation are always required in these methods. In terms of detailed parameter design, not only does the weight value between the two indicators need to be constantly adjusted according to the changeable model condition and data characteristics, but the number of base models in the subensemble set also needs to be artificially set in advance [40,42,43].

Motivated by the above problems of ensemble soft sensor modeling, the mechanism of base model diversity generation and the method of base model integrated pruning are the focus of this study research. The main contributions of this study are as follows:

(1) A diversified base model construction method based on adaptive multi-kernel supervised LVSs is proposed. The method takes advantage of the diversification and variation of different kernel function types and kernel function parameters, uses multiple kernel functions and an adaptive kernel parameter choosing algorithm to map each variable space obtained by variable selection to different and reasonable modeling LVSs, adds random perturbations to the construction of variable subspace, and generates base model parameters.

(2) According to the predictions of each base model on the verification set, a diversity quantification parameter called the comprehensive Euclidean distance (CED) is designed, which is used to establish the diversity level of the subensemble model set during the ensemble pruning.

(3) A subensemble set generation method based on formalizing ensemble pruning with simultaneous diversity and accuracy into a multi-objective optimization problem is proposed. The selection of each base model is defined as the decision objective, and the comprehensive quantitative parameter of accuracy and diversity is used as the objective function. The multi-objective gray wolf optimization algorithm is applied to solve the problem, which simplifies the integrated pruning process.

(4) A detailed comparative experiment is designed to verify the effectiveness of this research method.

2. Methods

In this section, the preliminaries about the maximal information coefficient (MIC), kernel partial least squares (KPLS), and multi-objective gray wolf optimization algorithm (MOGWO) are briefly introduced.

2.1. Maximal Information Coefficient (MIC)

The maximal information coefficient has been proposed by Reshef et al. in 2011, which was called “A correlation for the 21st century” [25]. Relying on its universal, fair, and symmetrical properties, it can capture and quantify a wide range of complex relations between variables, provide similar scores to equally noisy relationships of different types, and be robust to outliers. Therefore, MIC is a good indicator applicable to measure the strengths of correlations between process variables and target variables of industrial non-linear process data, helps to remove redundant variables, and constructs a highly robust model.

MIC is the maximum value of normalized mutual information obtained from an optimal discrete partition. The range of MIC is between 0 and 1. A higher MIC value indicates the two variables have a closer relationship, whereas MIC(X,Y) = 0 describes the independent relationship between two variables of X and Y. MIC(X,Y) = 1 shows the two variables have deterministic relationship.

The specific steps for the correlation analysis between m-dimensional process variable X and target variable y using MIC are as follows:

First, combine variable X and variable y into a finite ordered dataset

D = \{(x_{1}, y_{1}), (x_{2}, y_{1}), \dots (x_{m}, y_{1})\}

. Define a grid G = (r,j), gridding a scatterplot consisting of the dataset D using the grid G. When the number of rows r and columns c are fixed, different grids can be obtained by different partitioning methods.

Then, calculate mutual information between variables X and Y according to the frequency of scattered points in each subgrid of grid G. It can be formed as

I (X; Y) = H (X |G) + H (Y |G) - H (X, Y |G) = I (D |G_{r_{i} c_{j}})

(1)

H (X |G) = - \sum_{i = 1}^{x} \frac{n_{i}}{n} \cdot \log_{2} \frac{n_{i}}{n}

,

H (Y |G) = - \sum_{j = 1}^{y} \frac{n_{j}}{n} \cdot \log_{2} \frac{n_{j}}{n}

,

H (X, Y |G) = - \sum_{i = 1}^{x} \sum_{j = 1}^{y} \frac{n_{i j}}{n} \cdot \log_{2} \frac{n_{i j}}{n}

, where

n

is the total number of scattered points,

n_{i}

is the scattered points in the ith row of grid G in the scatter chart,

n_{j}

is the scattered points in the jth column of grid G in the scatter chart, and

n_{i j}

is the scatter number in the subgrid formed by the ith row of the jth column of grid G in the scatter graph.

Changing the grid division produces different mutual information values, with the maximum value being

I_{\max} (D, r, c) = \max I (D |G_{r c} (k)), k = 1, 2, \dots

(2)

At last, for any number of rows

r

and columns

c

, the MIC standardizes the maximum mutual information to [0, 1] by Equation (3):

M I C (D) = \max_{r c < B} \{\frac{I_{\max} (D, r, c)}{\log_{2} \min (r, c)}\}

(3)

where B(n) = n^0.6 is the mesh division fineness, which is a function of sample size.

2.2. Kernel Partial Least Squares Algorithm (KPLS)

KPLS algorithm is an extension of PLS by using the kernel function, the original data matrix is replaced by the kernel matrix, and then PLS is used to model the kernel matrix linearly [44].

Set the low-dimensional sample input variable space as X,

X = [x_{1}, x_{2}, \dots, x_{d}] \in R^{n \times d}

, where

R^{n \times d}

is a d-dimensional real space, and n is the number of samples. Assume that the nonlinear mapping relationship is

Φ : X \to L

; X will be projected into the high-dimensional latent space variable L through this mapping. All input variables can be expressed as

φ (X) = (φ (x_{1}), φ (x_{2}), \dots φ (x_{n}))

.

In fact, when constructing the latent variable space, the known form of the kernel function is always introduced to replace the nonlinear mapping function, and then the corresponding kernel matrix of the kernel function is calculated and the eigenvector of the kernel matrix is solved to project the dataset in the high-dimensional space.

The kernel function is expressed as

K (x_{i}, x_{j}) = φ_{i}^{T} φ_{j}

(4)

where

x_{i}

and

x_{j}

is the input variable of low-dimensional space, and

i, j = 1, 2, \dots, n

; K(x) is a kernel function.

Therefore, the process of KPLS is briefly described as follows:

First, using the kernel function K, the inner product of high-dimensional latent variable space data is replaced by the

n \times n

Gram kernel matrix:

Φ (X) = K (X, X) = φ {(X)}^{'} φ (X) = \sum_{i = 1}^{n} φ_{i}^{T} φ_{i}

(5)

Next, like the PLS algorithm, the KPLS model is decomposed into

\{\begin{matrix} K = T P^{T} + E \\ Y = U Q^{T} + F \end{matrix}\}

(6)

where T is the score matrix of kernel matrix K,

T = [t_{1}, t_{2}, \dots, t_{l}] \in R^{n \times l}

, and U is the score matrix of target variable Y,

U = [u_{1}, u_{2}, \dots, u_{l}] \in R^{n \times l}

. P and Q are the loading matrices of K and Y, respectively,

P = [p_{1}, p_{2}, \dots, p_{l}] \in R^{d \times l}

,

Q = [q_{1}, q_{2}, \dots, q_{l}] \in R^{d \times l}

, E and F are the residual matrices of K and Y, respectively, and l is the number of latent variables.

At last, nonlinear iterative partial least squares (NIPALS) can be used to solve the model, and the specific steps can be referred to in the literature [19].

According to the above steps, LVS is closely related to a kernel function; there are many kinds of kernel functions, including local kernel functions, Gaussian kernel functions, Cauchy kernel functions, Laplacian kernel functions, global kernel functions, and polynomial kernel functions. The forms of common kernel functions include the following:

Polynomial kernel function : K_{Poly} (x, y) = {(x^{T} y + c)}^{d}

(7)

where d is the order of the polynomial kernel function. At x = 0.2, the kernel function curves of different d are shown in Figure 1. The characteristic curve of the kernel function shows that the distance from the test point has little effect on the result of the polynomial kernel function under the same kernel parameters.

Three typical kernel functions with local characteristics are as follows:

Gaussian kernel function : K_{Gau} (x, y) = \exp (- \frac{{‖x - y‖}^{2}}{2 σ^{2}})

(8)

Cauchy kernel function : K_{Gauchy} (x, y) = {(- \frac{{‖x - y‖}^{2}}{δ^{2}} + 1)}^{- 1}

(9)

Laplacian kernel function : K_{Laplace} (x, y) = \exp (- \frac{‖x - y‖}{γ})

(10)

where

σ

,

δ

, and

γ

are the width parameters of the three kernel functions, respectively.

At x = 0.2, Figure 2 shows the characteristic curves of three kernel functions with local characteristics under the same width parameter.

All three kernel functions have strong local characteristics, as shown by the fact that the values of the kernel functions are relatively large near the test point, while the values of the kernel functions become progressively smaller for data far away from the test point. However, the sensitivity of the three kernel functions to the width parameter has some differences.

Based on the above analysis, it can be seen that for the same original low-dimensional nonlinear input variable space, using different types of kernel function types and kernel function parameters can characterize the inherent properties of the original variable space from different perspectives and map it to a diverse LVS.

2.3. Multi-Objective Gray Wolf Optimization Algorithm (MOGWO)

Using a swarm intelligence optimization algorithm to solve the Pareto optimal solution set is a good solution to multi-objective optimization problems [45]. This study considers the convergence speed of the optimization algorithm and the global search mechanism; the multi-objective gray wolf optimization algorithm proposed by Mirjalili and other researchers in 2015 is adopted [46,47].

D = |C \cdot X_{p} (t) - X (t)|

(11)

X (t + 1) = |X_{p} (t) - A \cdot D|,

(12)

where t indicates the current iteration, A and C are coefficient vectors, X_p is the position vector of the prey, and X indicates the position vector of a gray wolf.

The vectors

A

and

C

are calculated as follows:

A = 2 a \cdot r_{1} - a

(13)

C = 2 \cdot r_{2}

(14)

where elements of a linearly decrease from 2 to 0 over the course of iterations, and r₁, r₂ are random vectors in [0, 1].

Gray wolf groups

α

,

β

, and

δ

update their positions under the leadership. Suppose the positions of wolves after t + 1 iterations as X_t+1.

\{\begin{matrix} D_{α} = |C_{1} \cdot X_{α} - X| \\ D_{β} = |C_{2} \cdot X_{β} - X| \\ D_{δ} = |C_{3} \cdot X_{δ} - X| \end{matrix}

(15)

\{\begin{matrix} X_{1} = |X_{α} - A_{1} D_{α}| \\ X_{2} = |X_{β} - A_{2} D_{β}| \\ X_{3} = |X_{δ} - A_{3} D_{δ}| \end{matrix}

(16)

X (t + 1) = \frac{X_{1} + X_{2} + X_{3}}{3}

(17)

|A|

is in the range of [−2, 2]; when

|A| \leq 1

, the wolves capture prey and obtain the global optimal solution.

Compared with the GWO algorithm, the MOGWO algorithm still maintains the social characteristics of gray wolf hunting behavior. The fittest solution is considered as the alpha (α) wolf, and the second and third best solutions are named beta (β) and delta (δ) wolves, respectively. The rest of the candidate solutions are assumed to be omega (ω) wolves. The hunting is guided by α, β, and δ. The ω wolves follow these three wolves in order to search for the optimum. However, in a multi-objective search space, the solutions cannot easily be compared due to the Pareto optimality concepts [47].

Two important components are added to solve the Pareto optimal solution. The first one is an archive, which can store the non-dominated Pareto optimal solution so far. This component ensures the diversity of the population and at the same time prunes the population to avoid flooding a large number of similar solutions, reduce the complexity of the algorithm, and speed up the optimization. The second one is a leader selection method, which is performed by a roulette wheel method, and to solve the alpha, beta, and delta solutions of the hunting process from the archive, it enhances the exploration ability of the algorithm.

The specific steps of solving the Pareto optimal solution set based on the MOGOW algorithm are referred to in reference [47].

3. Ensemble Soft Sensor Model Construction Based on Diversity Enhancement and Ensemble Pruning

In order to construct an ensemble soft measurement model with a high diversity level of the base model and a strong generalization performance, this study proposes a selective ensemble modeling algorithm framework called MOSE, MLV-VSPLS for short; the modeling framework is illustrated in Figure 3. The core contents include base model generation based on the diversity enhancement mechanism on training samples and ensemble pruning based on the multi-objective optimization algorithm on validation samples.

3.1. Diversified Base Model Construction Method Based on Multi-Kernel Latent Variable Space

The effective generation mechanism of diversified base models is an important factor to ensure the generalization performance of the ensemble soft sensor model. In this study, a base model generation method based on the most relevant variable selection and adaptive multi-kernel supervised latent variable subspace is proposed for homogeneous integration. The basic idea is as follows.

First, a set of subsample sets

{\{X_{i}, Y_{i}\}}_{i = 1}^{M}

is obtained by bootstrapping random resampling, where M is the times of resampling, representing the number of sample subsets. Each subsample set will form a modeling subspace; for example, the kth subspace is

S_{k} = \{X_{k}, Y_{k}\}

,

k \in (1, M)

, and

X = [x_{1}, x_{2}, \dots, x_{m}] \in R^{n \times m}

is the set of process variables, where m is the number of input variables.

Y = [y_{1}, y_{2}, \dots, y_{m}] \in R^{n \times l}

is the target variable, where l is the number of output variables. This is the most common diversity mechanism used in the sample set generation phase.

Next, delete the process variables whose MIC results are less than the threshold

σ

on each subsample set, and keep only the process variables with the strongest correlation with the target variables to build the initial modeling variable subspace S_in = { X_in, Y}, where in each initial variable subspace,

X_{i n} \in R^{n \times d}

means selected process variables, and d is the number of variables most relevant to the output variable Y. The relationships between variables in each subsample set are not the same, which means the initial constructed variable subspaces will be different. This is the diversity perturbation mechanism incorporated in the variable stage.

Then, multi-kernel supervised LVS are constructed using the KPLS algorithm based on different types of kernel functions with an adaptive kernel parameter. The number of initial subspaces S_in equal to the number of subsample sets is M, and the types of kernel functions are k_n. After that, the number of LVS MS_LVM is M × k_n. For each LVS, the SOGWO algorithm is used to optimize the parameters of each kernel function with the fitting error as the accuracy evaluation index. This is the diversity perturbation mechanism incorporated in the modeling variable space and model parameters.

3.2. Ensemble Pruning Based on Multi-Objective GWO Optimization

The selection of base models before the ensemble is another important factor that affects the final performance of the ensemble soft sensor model. This study takes into account the diversity and accuracy of the two indicators; in the ensemble pruning of the base models, the problem of base model selection is formalized as a multi-objective optimization problem, and an evolutionary algorithm is used to solve the problem in order to better weigh these two criteria, to achieve the goal of “both good and different”. The basic idea is as follows:

First, define a small-scale subensemble model set for the final ensemble, which is called SEM_sel. The RMSE is proposed as the accuracy index to measure the comprehensive prediction performance of SEM_sel, which is formed as RMSE_val. Take the distance value between the predicted outputs of the models as the model similarity index to measure the diversity of models in SEM_sel, which is formed as CED_val. Then, transform the model pruning problem into a multi-objective optimization problem that is formulated as follows:

minimize F (x) = {[f_{1} (x), f_{2} (x)]}^{T}

(18)

subject to x \in [a, b]

(19)

where x is the decision variable vector, and F(x) is the object vector.

The specific steps to build the SEM_sel base on MOGWO that takes into account both accuracy and diversity parameters are as follows:

(1) Define the decision variables.

The construction of the SEM_sel essentially corresponds to the question of whether all the base models are selected. Number all base models from 1 to MS_LVM, and use a 0–1 code to describe the selection status of those base models. Code 1 is assigned to those selected base models. Code 0 is assigned to the unselected base models. Then a binary number of MS_LVM bits represents the selecting status of the base models. Therefore, the decision variable of the multi-objective optimization problem in this study is defined as a binary number of MS_LVM bits.

(2) Define the constraints.

The purpose of ensemble pruning is to select appropriate base models to form a subensemble model set. Therefore, the size of the subensemble models set m_sem is taken as the constraint condition.

(3) Define the objective function.

Definition 1.

Prediction accuracy objective function.

On the verification set

D_{v a l} = \{Q_{v a l}, T_{v a l}\}

, RMSE is used as the prediction accuracy index. First, the prediction accuracy of each base model is estimated as

{RMSE}_{v a l}^{i}

; then the average value of RMSE of all base models is used as the overall prediction accuracy. The ensemble pruning accuracy objective function is defined as follows:

f_{1} (M) = {RMSE}_{v a l} = \frac{\sum_{i = 1}^{m_{s e m}} {RMSE}_{v a l}^{i}}{m_{s e m}}

(20)

where m_sem represents the number of actually selected model sets SEM_sel, and

{RMSE}_{v a l}^{i}

represents the RMSE of the ith base model on the validation set.

Definition 2.

Diversity objective function.

In ensemble learning modeling, there is no unified definition of quantitative parameters of diversity. This study defines a comprehensive diversity quantification parameter called CED based on the Euclidean distance of the representation of the base model on the verification set.

On the verification set

D_{v a l} = \{Q_{v a l}, T_{v a l}\}

, the output predictive value of the base model is named as yh_val. The smaller the similarity between the output predictive values of any two base models, the greater the difference between the characterization base models. The similarity measurement s_yh between any two models is defined as

s_{y h} (y h_{v a l}^{i}, y h_{v a l}^{j}) = \exp (- d i s_{i, j}), i, j = 1, 2, \dots, M

(21)

d i s_{i, j} = \sqrt{{\sum_{k = 1}^{n_{v a l}} (y h_{v a l}^{i} (k) - y h_{v a l}^{j} (k))}^{2}}

(22)

n_val is the number of samples in the validation set, and dis denotes the Euclidean distance between any two base models.

All the similarity values obtained from the above equation are averaged and then inversed as the comprehensive diversity quantification parameter as follows:

f_{2} (M) = {CED}_{m v a l} = - \frac{\sum_{j = i + 1}^{m_{s e m}} \sum_{i = 1}^{m_{s e m} - 1} s_{y h} (y h_{v a l}^{i}, y h_{v a l}^{j})}{(m_{s e m}^{2} - m_{s e m}) / 2}

(23)

According to the above analysis, the ensemble pruning problem is converted into a minimization multi-objective optimization solution problem:

\min [{RMSE}_{v a l}, {CED}_{v a l}]

(24)

The above multi-objective optimization problem is solved using the MOGWO algorithm. Each resolution in the Pareto optimal solution set corresponds to a set of selected results of the base model after ensemble pruning.

3.3. Model Ensemble

Each base model belongs to unprivileged equality construction when it is initially constructed. Therefore, in this study, average the output values of test samples on the

m_{s e m}

base models selected by ensemble pruning as the final ensemble soft sensor model output.

4. Case Study

In this section, two typical industrial process case studies were used to verify the effectiveness of the proposed MOSE-MLV-VSPLS algorithm. One was to predict the penicillin content in the penicillin fermentation process, and the other was the butane content in the debutanizer process.

The methods used for comparison were as follows:

(1) EPLS (ensemble partial least squares regression): an ensemble soft sensor modeling method that ensemble all base models generated by the PLS algorithm on the diversified subsample space obtained based on bootstrap resampling.

(2) VSEPLS (variable selection-based EPLS): an ensemble soft sensor model that uses the MIC algorithm for input variable selection on each subsample set of the constructed EPLS.

(3) LV-VSEPLS (latent variable space-based VSEPLS): an ensemble soft sensor modeling method that diversified base models are constructed on the latent variable space, which built based on an adaptive single kernel function on each variable space of VSEPLS.

(4) MLV-VSEPLS (multiple latent variable space-based VSEPLS): an ensemble soft sensor modeling method that diversified base models are constructed on the latent variable space which built base on adaptive multiple kernel functions on each variable space of VSEPLS.

(5) MOSE- MLV-VSPLS (selective ensemble MLV-VSPLS based on the muti-objective optimization algorithm): a selective ensemble soft sensor modeling method that ensemble those base models selected by the ensemble pruning method proposed in this paper that content diversity and accuracy simultaneously.

Take the root-mean-square error (RMSE) and coefficient of determination (R²) as the prediction performance indicators of the model established by the above method.

{RMSE}_{t e s t} = \sqrt{\frac{1}{n_{t}} \sum_{i = 1}^{n_{t}} {({\hat{y}}_{t i} - y_{t i})}^{2}}

(25)

R^{2} = 1 - \frac{\sum_{i = 1}^{n_{t}} {({\hat{y}}_{t i} - y_{t i})}^{2}}{\sum_{i = 1}^{n_{t}} {({\hat{y}}_{t i} - {\bar{y}}_{t})}^{2}}

(26)

where n_t means the total number of sample points in the test set,

{\hat{y}}_{t i}

and

y_{t i}

are the predicted output value and actual output value of the ith test sample, and

{\bar{y}}_{t}

represents the mean value of the sample points.

4.1. Fermentation Process for Penicillin Production

4.1.1. Process Description

The penicillin fermentation process is a typical biochemical batch benchmark for soft sensor and fault diagnosis algorithms. The penicillin fermentation industrial process simulation platform, which can comprehensively reflect the actual process of penicillin fermentation, etc., and provides standardized biochemical reaction data, was developed by a process modeling and monitoring research group led by Professor Cinar at the Illinois Institute of Technology (IIT) from 1998 to 2002. This simulation platform considers the effects of control variables such as the pH value, temperature, aeration rate, agitator power, and substrate acceleration rate on bacterial and penicillin production and comprehensively includes factors such as bacterial growth, carbon dioxide concentration, penicillin production, and generation of heat at the same time. The technological process of the penicillin fermentation process is shown in Figure 4, and the detailed process description and dynamic model are referred to [45].

Different reaction results can be obtained for different application requirements by adjusting the initial values of some process variables in the fermentation tank. In the construction of the soft sensor model in this study, penicillin concentration is used as the output variable, and 14 process variables are used as the input variables. See Table 1 for the meaning of specific variables.

Based on the simulation platform, the fermentation cycle was set to 400 h, the sampling interval was 1 h, and the initial value of the substrate concentration for each process was increased or decreased by 0.1–0.2 from the default value. Eight batches of fermentation process data were counted, and five batches of process data were randomly selected as training samples (62.5% of the total data volume) for model training. Two batches of process data were used as the validation set (25% of the total data volume) for model parameter optimization and integration pruning. Two batches of process data were used as the validation samples (25% of the total data volume) for model parameter optimization and integration pruning, and one batch of process data was used as the test set samples (12.5% of the total data volume) for soft measurement model performance evaluation.

The model parameters affect the prediction performance of the model. In this study, the parameters in the base model were optimized adaptively using the GWO algorithm with the objective of minimizing the RMSE criteria, and the specific content and search range of the parameter optimization included the following:

(1) The number of supervised latent variables

n_{l}

in EPLS and VSEPLS, where

n_{l} = [1, 2, \dots m]

, and m represents the characteristic dimension of the input variable.

(2) Three kernel function parameters in LV-VSEPLS and MLV-VSEPLS, where

σ \in [0.5, 0.7, 0.9, 1, 2]

for the Gaussian kernel function,

d \in [2, 3, 4, 5, 6]

for polynomial kernel functions, and

δ = [0.1, 0.3, 0.5, 0.7, 0.9, 1, 2, 3]

for the Cauchy kernel function.

(3) In the MOGWO algorithm, the number of populations is set to 200, the number of iterations is 100, and the one with the best performance among the Pareto optimal solutions is selected as the final screening combination of the base model.

4.1.2. Performance Test

In this study, to demonstrate the effectiveness of the diversity perturbation mechanism in the base model generation stage, we use an ensemble of all base models generated on different diversity mechanisms at first. At this stage of the experiment, the number of base models under each method is the same, which is set to 12, 30, 45, and 60. In the EPLS, VSEPLS, and LV-VSEPLS methods, the number of base models equals the resampling times. In MLV-VSEPLS, three kinds of kernel functions are used for each resampled sample space, and three different base models are constructed; the number of base models is three times the resampling times.

The performance of the penicillin fermentation process under different diversity perturbation mechanisms in the framework of the full ensemble soft sensor is given in Table 2.

According to the RMSE criteria, the optimal set of data in each method is bold. From this, the EPLS method incorporates only input data perturbations in the base model generation and shows the worst prediction accuracy on the test set. The VSEPLS method adds an input variable selection mechanism, which improves the prediction performance of the ensemble model to some extent compared with EPLS, but the accuracy of the two methods is not high. LV-VSEPLS combines the perturbation input features with the perturbation kernel function parameters on the basis of perturbed input data and improves the accuracy of the prediction to a certain extent. By comparison, in the MLV-VSEPLS method, the latent data relationships of the samples are fully explored from different perspectives, and a diversity of latent variable modeling subspaces are constructed. In the full ensemble framework, the number of resampling is reduced, and the optimal performance is achieved with the same number of base models. It is also clear from the data that the rate of improvement in the model accuracy is higher when the number of modeling sample spaces is increased from 15 to 30. However, the subsample set of the base model for constructing the ensemble learning algorithms in this study was obtained through bootstrap resampling, and the base learners were homogeneous. Due to the limited data samples, the base learners were not independent of each other, so when the number of base models was increased from 45 to 60, the improvement rate of prediction accuracy of ensemble learning soft sensor models showed a decreasing and stabilizing trend.

Next, comparative experiments are conducted on the full ensemble method and the selective ensemble method based on different selection criteria and ensemble pruning methods. In the SE-MLV-VSPLS algorithm, the quantitative parameters of accuracy or diversity are used as the criteria for ensemble pruning, and the GWO algorithm is applied to optimize the selection of the base model, which can maximize the parameters. In the MOSE-MLV-VSPLS method proposed in this study, after multi-objective optimization, one of the Pareto optimal solutions is selected and decoded to the ensemble for the soft sensor. Table 3 shows the performance of each model under the experiment.

The data show that the same effect of the full ensemble is obtained with a smaller number of base models after ensemble pruning is completed based on a single indicator of accuracy or diversity. In contrast, after pruning with dual objectives of accuracy and diversity, the redundant base models are effectively eliminated, the complexity of the ensemble modeling is reduced, and the optimal prediction performance is obtained with a smaller number of base models. Figure 5 shows the performance comparison of each soft sensor built based on four different methods; the horizontal axis in the figure represents the sample points, while the vertical axis represents the values of penicillin concentration. Method 1 is MOSE-MLV-VSPLS, method 2 is MLV-VSEPLS, method 3 is LV-VSEPLS, and method 4 is VSEPLS.

From the prediction trend curves, there is a significant difference between the real values and the predicted output values obtained using the VSEPLS and LV-VSEPLS methods. In contrast, almost no deviation can be found between the predicted output values and the real output values under the MOSE-MLV-VSPLS method, and these two curves fit well with each other.

4.2. Debutanizer Column

4.2.1. Process Description

In naphtha, propane and butane need to be removed from the process, in which the debutanizer column is an important part of the desulfurization and naphtha separation unit in the petroleum refinery production process. Referring to reference [48], a brief schematic diagram of the process is shown in Figure 6.

The goal of the debutanizer column is to minimize the concentration of butane at the bottom of the column while maximizing the yield of purified gasoline. Therefore, accurate estimation of the butane content is important to improve the effectiveness of the debutanizer column. This industrial process includes 7 process variables, and the specific meanings of the variables are shown in Table 4.

In this study, 2285 data points were collected in the process: 1142 random sample points in the dataset were divided into training set samples, accounting for 50% of the total data volume, for model training; 343 random sample points were used as validation set samples, accounting for 10% of the total data volume, for the optimization and integration pruning of model parameters; and 800 sample points were used as test set samples, accounting for 40% of the total data volume.

Similar to the penicillin fermentation process in Section 4.1, the GWO algorithm was used to optimize the kernel function parameters and the number of hidden variables adaptively in the base model construction process with the goal of minimizing the prediction error. According to the data characteristics of the debutanizer column, the selected kernel functions include the Gaussian kernel function, the Cauchy kernel function, and the Laplace kernel function, and the search range of the corresponding kernel functions is set as

σ \in [0.03, 0.05, 0.07, 0.09, 0.1, 0.2]

for the Gaussian kernel function,

δ \in [0.01, 0.02, 0.03, 0.05, 0.07, 0.09, 0.1, 0.2]

for the Cauchy kernel function, and

γ \in [0.07, 0.1, 0.2, 0.3, 0.4, 0.5, 0.7]

for the Laplace kernel function.

4.2.2. Performance Test

Similarly, in the EPLS, VSEPLS, and LV-VSPLS methods, the number of base models is equal to the number of sample spaces generated under resampling, which is 15, 30, and 45, respectively. In the MLV-VSEPLS method, the number of resampling times is 5, 10, and 15, and the number of base models is 15, 30, and 45, respectively. Table 5 shows the performance of the full ensemble-based soft sensor in the debutanizer column test set under different diversity perturbation mechanisms.

From the experimental results in Table 5, for the debutanizer column, the prediction accuracy of the EPLS method using only the input data perturbation is very low because of the strong nonlinearity of the seven process variables, and the prediction accuracy is improved to some extent by adding input variable perturbation, but it is far from meeting the requirements. Based on the kernel trick, under the framework of the full ensemble model, the accuracy of the soft sensor ensemble model has been improved to a certain extent. However, a single kernel function is used to construct the latent variable subspace, so even with the increase in resampling times in the LV-VSEPLS method, the precision improvement space of the soft sensor model is limited. By comparison, in the MLV-VSEPLS method, multiple diversity enhancement mechanisms are used in the process of subsample set generation, initial variable space, and latent variable subspace construction for diversified base models. Under the fully ensemble framework, the MLV-VSEPLS method has achieved better results.

However, due to the existence of poor performance and redundant base models, the full ensemble method used above cannot achieve the optimal performance. Therefore, in order to improve the performance of the ensemble soft measurement models, the ensemble pruning work is essential.

Similarly, a comparative study of ensemble pruning methods is designed to display the superiority of the multi-objective optimization selection method based on GWO. A multi-objective optimization method is used to decode one of the Pareto optimal solutions under 45 base models, and the target variables are predicted on the test set; the experimental results are shown in Table 6.

In order to more clearly show the performance, the box plots in Figure 7 are used to show comparatively the RMSE obtained with the selective ensemble and the full ensemble under the statistical results of 25 experiments.

The results in the Figure 7 are the same as those in Table 6. The ensemble pruning method based on the multi-objective optimization algorithm achieves optimal performance at a smaller base model size. In many experiments, the maximum RMSE generated is still smaller than the minimum RMSE of the full ensemble soft sensor model.

In detail, the prediction results of MLV-VSEPLS and MOSE-MLV-VSPLS in the test set are given separately in Figure 8.

The curves in Figure 8 show that for the MOSE-MLV-VSPLS method proposed in this study, by using the ensemble pruning method, the predicted output values match well with the actual values in data locations with large changes especially and achieve the most satisfactory prediction performance.

5. Conclusions

This study proposes a soft sensor construction method named MOSE-MLV-VSPLS in an ensemble learning framework to accurately measure industrial processes’ key parameters under complex data relationships. As an ensemble learning-based approach, at the base model generation stage, the proposed base model generation method first uses the MIC algorithm on each subsample space obtained by bootstrap random resampling to optimize the variable space. Then it uses the adaptive multi-kernel functions to obtain the latent variable space describing the data relationships from different perspectives, followed by the construction of the diversified base model. Compared with the EPLS algorithm, which does not use variable selection, and the LV-VSEPLS algorithm, which is based on a single kernel function, the proposed algorithm has the advantages of effectively removing redundant variables and significantly increasing the level of diversity of the underlying model’s strong nonlinear data relationships.

Since both the accuracy and diversity affect the performance of an ensemble learning-based soft sensor model, in the base model ensemble phase, a comprehensive diversity quantification indicator is designed, and the base model ensemble pruning, which considers the accuracy and diversity quantification metrics simultaneously, is transformed into a multi-objective optimization problem. The solution is performed using a multi-objective gray wolf optimization algorithm, which is an average ensemble of a selected set of high-performing and complementary base models. Compared with the full ensemble approach, the complexity of the ensemble modeling is significantly reduced.

The comparative experimental results for two application examples show that the ensemble soft sensor model with the MOSE-MLV-VSPLS method achieves satisfactory performance by an effective diversity enhancement mechanism and the selective mean ensemble of smaller base models. The MOSE-MLV-VSPLS method has strong generality in homogeneous integration modeling. The ensemble strategy can be updated, and other base models can be used according to different application requirements.

It is worth noting that in this study, for the generation of the multi-kernel LVS in the base model construction, the kind of kernel function for each LVS is single, and in the future, hybrid kernel functions can be used for a better description of the data space. On the other hand, after converting the ensemble pruning problem into a multi-objective optimization, the success of the proposed method depends to a large extent on the design of a suitable diversity of definitions of the objective functions, which is open to question. As new process data continue to be added to the data space, the model update mechanism of this algorithm still needs further research and strengthening.

Author Contributions

Methodology, L.P.; software, L.P. and L.H.; validation, L.P.; formal analysis, L.H.; investigation, L.P. and Y.S.; resources, L.G.; writing—original draft preparation, L.P.; writing—review and editing, L.H.; supervision, L.G.; project administration, L.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the key research and development program of Shaanxi Province, China (Grant no. 2022GY-134); the Science and Technology Foundation of Xi’an University of Architecture and Technology, China (Grant no. ZR19059); and the Special Scientific Research Project of the Education Department of Shaanxi Provincial Government of China under Grant no. 21JK0732.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing is not applicable to this article, as no datasets were generated during the current study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yang, T.; Yi, X.; Lu, S.; Johansson, K.H.; Chai, T. Intelligent Manufacturing for the Process Industry Driven by Industrial Artificial Intelligence. Engineering 2021, 7, 1224–1230. [Google Scholar] [CrossRef]
Chai, T.; Liu, Q.; Ding, J.; Lu, S.; Song, Y.; Zhang, Y. Perspectives on industrial-internet-driven intelligent optimizedmanufacturing mode for process industries. Sci. Sin.-Technoli. 2022, 52, 14–25. [Google Scholar] [CrossRef]
Chai, T.; Ding, J. Smart and Optimal Manufacturing for Process Industry. Chin. J. Eng. Sci. 2018, 20, 51–58. [Google Scholar] [CrossRef]
Jiang, Y.; Yin, S.; Dong, J.; Kaynak, O. A Review on Soft Sensors for Monitoring, Control, and Optimization of Industrial Processes. IEEE Sens. J. 2021, 21, 12868–12881. [Google Scholar] [CrossRef]
Yin, S.; Li, X.; Gao, H.; Kaynak, O. Data-Based Techniques Focused on Modern Industry: An Overview. IEEE Trans. Ind. Electron. 2015, 62, 657–667. [Google Scholar] [CrossRef]
Vijayan, S.V.; Mohanta, H.K.; Pani, A.K. Adaptive Non-Linear Soft Sensor for Quality Monitoring in Refineries Using Just-in-Time Learning—Generalized Regression Neural Network Approach. Appl. Soft Comput. 2022, 119, 108546. [Google Scholar] [CrossRef]
Yuan, X.; Zhou, J.; Wang, Y. A Spatial-Temporal LWPLS for Adaptive Soft Sensor Modeling and Its Application for an Industrial Hydrocracking Process. Chemom. Intell. Lab. Syst. 2020, 197, 103921. [Google Scholar] [CrossRef]
Zhou, P.; Lv, Y.; Wang, H.; Chai, T. Data-Driven Robust RVFLNs Modeling of a Blast Furnace Iron-Making Process Using Cauchy Distribution Weighted M-Estimation. IEEE Trans. Ind. Electron. 2017, 64, 7141–7151. [Google Scholar] [CrossRef]
Sekhar, R.; Solke, N.; Shah, P. Lean Manufacturing Soft Sensors for Automotive Industries. Appl. Syst. Innov. 2023, 6, 22. [Google Scholar] [CrossRef]
Ge, Z.; Song, Z.; Gao, F. Review of Recent Research on Data-Based Process Monitoring. Ind. Eng. Chem. Res. 2013, 52, 3543–3562. [Google Scholar] [CrossRef]
Ge, Z. Review on Data-Driven Modeling and Monitoring for Plant-Wide Industrial Processes. Chemom. Intell. Lab. Syst. 2017, 171, 16–25. [Google Scholar] [CrossRef]
Shang, C.; Yang, F.; Huang, D.; Lyu, W. Data-Driven Soft Sensor Development Based on Deep Learning Technique. J. Process Control 2014, 24, 223–233. [Google Scholar] [CrossRef]
Pani, A.K.; Vadlamudi, V.K.; Mohanta, H.K. Development and Comparison of Neural Network Based Soft Sensors for Online Estimation of Cement Clinker Quality. ISA Trans. 2013, 52, 19–29. [Google Scholar] [CrossRef]
Sekhar, R.; Shah, P.; Panchal, S.; Fowler, M.; Fraser, R. Distance to Empty Soft Sensor for Ford Escape Electric Vehicle. Results Control Optim. 2022, 9, 100168. [Google Scholar] [CrossRef]
Purohit, K.; Srivastava, S.; Nookala, V.; Joshi, V.; Shah, P.; Sekhar, R.; Panchal, S.; Fowler, M.; Fraser, R.; Tran, M.-K.; et al. Soft Sensors for State of Charge, State of Energy, and Power Loss in Formula Student Electric Vehicle. Appl. Syst. Innov. 2021, 4, 78. [Google Scholar] [CrossRef]
Liu, Y.; Chen, T.; Chen, J. Auto-Switch Gaussian Process Regression-Based Probabilistic Soft Sensors for Industrial Multigrade Processes with Transitions. Ind. Eng. Chem. Res. 2015, 54, 5037–5047. [Google Scholar] [CrossRef]
Zhou, Z.-H.; Yu, Y. Ensembling Local Learners Through Multimodal Perturbation. IEEE Trans. Syst. Man Cybern. B 2005, 35, 725–735. [Google Scholar] [CrossRef]
Jin, H.; Li, Z.; Chen, X.; Qian, B.; Yang, B.; Yang, J. Evolutionary Optimization Based Pseudo Labeling for Semi-Supervised Soft Sensor Development of Industrial Processes. Chem. Eng. Sci. 2021, 237, 116560. [Google Scholar] [CrossRef]
Tang, J.; Zhang, J.; Wu, Z.; Liu, Z.; Chai, T.; Yu, W. Modeling Collinear Data Using Double-Layer GA-Based Selective Ensemble Kernel Partial Least Squares Algorithm. Neurocomputing 2017, 219, 248–262. [Google Scholar] [CrossRef]
Rincy, T.N.; Gupta, R. Ensemble Learning Techniques and Its Efficiency in Machine Learning: A Survey. In Proceedings of the 2nd International Conference on Data, Engineering and Applications (IDEA), Bhopal, India, 28–29 February 2020; IEEE: New York, NY, USA, 2020; pp. 1–6. [Google Scholar]
Khaldi, B.; Harrou, F.; Benslimane, S.M.; Sun, Y. A Data-Driven Soft Sensor for Swarm Motion Speed Prediction Using Ensemble Learning Methods. IEEE Sens. J. 2021, 21, 19025–19037. [Google Scholar] [CrossRef]
Marković, I.; Jurić-Kavelj, S.; Petrović, I. Partial Mutual Information Based Input Variable Selection for Supervised Learning Approaches to Voice Activity Detection. Appl. Soft Comput. 2013, 13, 4383–4391. [Google Scholar] [CrossRef]
Zheng, K.; Wang, X.; Wu, B.; Wu, T. Feature Subset Selection Combining Maximal Information Entropy and Maximal Information Coefficient. Appl. Intell. 2020, 50, 487–501. [Google Scholar] [CrossRef]
Venkatesh, B.; Anuradha, J. A Review of Feature Selection and Its Methods. Cybern. Inf. Technol. 2019, 19, 3–26. [Google Scholar] [CrossRef]
Reshef, D.N.; Reshef, Y.A.; Finucane, H.K.; Grossman, S.R.; McVean, G.; Turnbaugh, P.J.; Lander, E.S.; Mitzenmacher, M.; Sabeti, P.C. Detecting Novel Associations in Large Data Sets. Science 2011, 334, 1518–1524. [Google Scholar] [CrossRef] [PubMed]
Lin, G.; Lin, A.; Gu, D. Using Support Vector Regression and K-Nearest Neighbors for Short-Term Traffic Flow Prediction Based on Maximal Information Coefficient. Inf. Sci. 2022, 608, 517–531. [Google Scholar] [CrossRef]
Bartolucci, F.; Bacci, S.; Mira, A. On the Role of Latent Variable Models in the Era of Big Data. Stat. Probab. Lett. 2018, 136, 165–169. [Google Scholar] [CrossRef]
Kong, X.; Jiang, X.; Zhang, B.; Yuan, J.; Ge, Z. Latent Variable Models in the Era of Industrial Big Data: Extension and Beyond. Annu. Rev. Control 2022, 54, 167–199. [Google Scholar] [CrossRef]
Liu, H.; Yang, C.; Huang, M.; Yoo, C. Soft Sensor Modeling of Industrial Process Data Using Kernel Latent Variables-Based Relevance Vector Machine. Appl. Soft Comput. 2020, 90, 106149. [Google Scholar] [CrossRef]
Wang, K.; Chen, T.; Lau, R. Bagging for Robust Non-Linear Multivariate Calibration of Spectroscopy. Chemom. Intell. Lab. Syst. 2011, 105, 1–6. [Google Scholar] [CrossRef]
Chen, T.; Ren, J. Bagging for Gaussian Process Regression. Neurocomputing 2009, 72, 1605–1610. [Google Scholar] [CrossRef]
Ge, Z.; Song, Z. Bagging Support Vector Data Description Model for Batch Process Monitoring. J. Process Control 2013, 23, 1090–1096. [Google Scholar] [CrossRef]
Deng, X.; Chen, Y.; Wang, P.; Cao, Y. Soft Sensor Modeling for Unobserved Multimode Nonlinear Processes Based on Modified Kernel Partial Least Squares With Latent Factor Clustering. IEEE Access 2020, 8, 35864–35872. [Google Scholar] [CrossRef]
Mansouri, M.; Nounou, M.N.; Nounou, H.N. Multiscale Kernel PLS-Based Exponentially Weighted-GLRT and Its Application to Fault Detection. IEEE Trans. Emerg. Top. Comput. Intell. 2019, 3, 49–58. [Google Scholar] [CrossRef]
Aiolli, F.; Donini, M. EasyMKL: A Scalable Multiple Kernel Learning Algorithm. Neurocomputing 2015, 169, 215–224. [Google Scholar] [CrossRef]
Lu, P.; Ye, L.; Tang, Y.; Zhao, Y.; Zhong, W.; Qu, Y.; Zhai, B. Ultra-Short-Term Combined Prediction Approach Based on Kernel Function Switch Mechanism. Renew. Energy 2021, 164, 842–866. [Google Scholar] [CrossRef]
Zhou, Z.-H.; Wu, J.; Tang, W. Ensembling Neural Networks: Many Could Be Better than All. Artif. Intell. 2002, 137, 239–263. [Google Scholar] [CrossRef]
Liu, T.; Chen, S.; Liang, S.; Gan, S.; Harris, C.J. Multi-Output Selective Ensemble Identification of Nonlinear and Nonstationary Industrial Processes. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 1867–1880. [Google Scholar] [CrossRef]
Shao, W.; Chen, S.; Harris, C.J. Adaptive Soft Sensor Development for Multi-Output Industrial Processes Based on Selective Ensemble Learning. IEEE Access 2018, 6, 55628–55642. [Google Scholar] [CrossRef]
Bian, Y.; Wang, Y.; Yao, Y.; Chen, H. Ensemble Pruning Based on Objection Maximization With a General Distributed Framework. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 3766–3774. [Google Scholar] [CrossRef]
Ni, Z.; Xia, P.; Zhu, X.; Ding, Y.; Ni, L. A Novel Ensemble Pruning Approach Based on Information Exchange Glowworm Swarm Optimization and Complementarity Measure. IFS 2020, 39, 8299–8313. [Google Scholar] [CrossRef]
Dai, Q.; Ye, R.; Liu, Z. Considering Diversity and Accuracy Simultaneously for Ensemble Pruning. Appl. Soft Comput. 2017, 58, 75–91. [Google Scholar] [CrossRef]
Mohammed, A.M.; Onieva, E.; Woźniak, M. Selective Ensemble of Classifiers Trained on Selective Samples. Neurocomputing 2022, 482, 197–211. [Google Scholar] [CrossRef]
Rosipal, R.; Trejo, L.J. Kernel Partial Least Squares Regression in Reproducing Kernel Hilbert Space. J. Mach. Learn. Res. 2002, 2, 97–123. [Google Scholar]
Vala, T.M.; Rajput, V.N.; Geem, Z.W.; Pandya, K.S.; Vora, S.C. Revisiting the Performance of Evolutionary Algorithms. Expert Syst. Appl. 2021, 175, 114819. [Google Scholar] [CrossRef]
Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey Wolf Optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
Mirjalili, S.; Saremi, S.; Mirjalili, S.M.; Coelho, L.D.S. Multi-Objective Grey Wolf Optimizer: A Novel Algorithm for Multi-Criterion Optimization. Expert Syst. Appl. 2016, 47, 106–119. [Google Scholar] [CrossRef]
Fortuna, L.; Graziani, S.; Xibilia, M.G. Soft Sensors for Product Quality Monitoring in Debutanizer Distillation Columns. Control Eng. Pract. 2005, 13, 499–508. [Google Scholar] [CrossRef]
Fortuna, L.; Graziani, S.; Rizzo, A.; Xibilia, M.G. Soft Sensors for Monitoring and Control of Industrial Processes; Fortuna, L., Ed.; Advances in Industrial Control; Springer: London, UK, 2007; ISBN 978-1-84628-479-3. [Google Scholar]

Figure 1. The characteristic curve of the polynomial kernel function.

Figure 2. Characteristic curves of three kernel functions under different parameters. (a) Width parameter equals 0.05. (b) Width parameter equals 0.5.

Figure 3. The modeling framework of MOSE- MLV-VSPLS.

Figure 4. Flow sheet of the process.

Figure 5. Prediction trend curves of different methods in the penicillin test set.

Figure 6. Flowchart of the debutanizer column [49].

Figure 7. Comparison of RMSE with different ensemble soft sensor model algorithms.

Figure 8. Prediction results of different methods on the debutanizer column test set. (a) Predictions of MLV-VSEPLS. (b) Predictions of MOSE-MLV-VSEPLS.

Table 1. Input variables in the penicillin fermentation process.

Input Variables	Variable Description	Units	Input Variables	Variable Description	Units
x₁	Aeration rate	L/h	x₈	Substrate feed temperature	K
x₂	Agitator power	W	x₉	Dissolved oxygen concentrate	g/L
x₃	Substrate feed rate	L/H	x₁₀	Culture volume	L
x₄	Acid flow rate	L/H	x₁₁	Carbon dioxide concentrate	g/L
x₅	Base flow rate	L/H	x₁₂	Fermenter temperature	K
x₆	Cooling water flow rate	L/H	x₁₃	pH	-
x₇	Hot water flow rate	L/H	x₁₄	Generated heat	kcal

Table 2. Performance of fully ensemble soft sensor models in penicillin fermentation under different diversity perturbation mechanisms.

Method	Kernel Function	Resampling Times	Number of Subspaces	RMSE	R²
EPLS	-	12	12	0.04977	0.9975
		30	30	0.04891	0.9976
		45	45	0.04890	0.9976
		60	60	0.04889	0.9976
VSEPLS	-	12	12	0.02794	0.9982
		30	30	0.02793	0.9982
		45	45	0.02791	0.9982
		60	60	0.02790	0.9982
LV-VSEPLS	Gaussian kernel function	12	12	0.01416	0.9979
		30	30	0.01406	0.9986
		45	45	0.01378	0.9986
		60	60	0.01375	0.99980
	Poly kernel function	12	12	0.01410	0.9987
		30	30	0.01466	0.9990
		45	45	0.01395	0.9994
		60	60	0.01451	0.9989
	Cauchy kernel function	12	12	0.01466	0.9977
		30	30	0.01464	0.9987
		45	45	0.01470	0.9987
		60	60	0.01441	0.9990
MLV-VSEPLS	Multiple kernel function	4	12	0.01267	0.9998
		10	30	0.01223	0.9998
		15	45	0.01220	0.0999
		20	60	0.011934	0.9998

Table 3. Performance of selective ensemble soft sensor models under ensemble pruning with different indicators.

Ensemble Type	Method	Criteria	Number of Subspaces	Size of Subensemble Model	RMSE	R²
ALL	MLV-VSEPLS	-	30	30	0.01223	0.09994
SEL	SE-MLV-VSPLS	Accuracy	30	9	0.01220	0.09992
SEL	SE-MLV-VSPLS	Diversity	30	8	0.01271	0.09990
SEL	MOSE-MLV-VSPLS	Accuracy and diversity	30	10	0.01063	0.09998

Table 4. Input variables in the debutanizer column.

Input Variables	Variable Description
u1	Top temperature
u2	Top pressure
u3	Reflux flow
u4	Flow to next process
u5	Sixth tray temperature
u6	Bottom temperature
u7	Bottom pressure

Table 5. Performance of fully ensemble soft sensor models in test sets under different diversity perturbation mechanisms.

Method	Kernel Function	Resampling Times	Number of Subspaces	RMSE	R²
EPLS	-	12	12	0.1555	0.1345
		30	30	0.1553	0.1300
		45	45	0.1554	0.1321
VSEPLS	-	15	15	0.1713	0.1250
		30	30	0.1621	0.1293
		45	45	0.1613	0.1295
LV-VSEPLS	Gaussian kernel function	15	15	0.0646	0.8576
		30	30	0.0564	0.8976
		45	45	0.0541	0.9001
	Laplace kernel function	15	15	0.0527	0.9053
		30	30	0.0519	0.9081
		45	45	0.0494	0.9196
	Cauchy kernel function	15	15	0.0552	0.8960
		30	30	0.0511	0.9110
		45	45	0.0497	0.9159
MLV-VSEPLS	Multiple kernel functions	5	15	0.0516	0.9142
		10	30	0.0513	0.9138
		15	45	0.0503	0.9195

Table 6. Performance of fully ensemble and selective ensemble under diversity enhancement mechanism.

Ensemble Type	Method	Criterion	Number of Subspaces	Size of Subensemble Model	RMSE	R²
ALL	MLV-VSEPLS	-	45	45	0.0503	0.9195
SEL	SE-MLV-VSPLS	Accuracy	45	12	0.0518	0.9051
SEL	SE-MLV-VSPLS	Diversity	45	13	0.0550	0.8942
SEL	MOSE-MLV-VSPLS	Accuracy and diversity	45	16	0.0453	0.9302

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Peng, L.; Gu, L.; He, L.; Shi, Y. Diversified Kernel Latent Variable Space and Multi-Objective Optimization for Selective Ensemble Learning-Based Soft Sensor. Appl. Sci. 2023, 13, 5224. https://0-doi-org.brum.beds.ac.uk/10.3390/app13095224

AMA Style

Peng L, Gu L, He L, Shi Y. Diversified Kernel Latent Variable Space and Multi-Objective Optimization for Selective Ensemble Learning-Based Soft Sensor. Applied Sciences. 2023; 13(9):5224. https://0-doi-org.brum.beds.ac.uk/10.3390/app13095224

Chicago/Turabian Style

Peng, Lijun, Lichen Gu, Lin He, and Yuan Shi. 2023. "Diversified Kernel Latent Variable Space and Multi-Objective Optimization for Selective Ensemble Learning-Based Soft Sensor" Applied Sciences 13, no. 9: 5224. https://0-doi-org.brum.beds.ac.uk/10.3390/app13095224

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Diversified Kernel Latent Variable Space and Multi-Objective Optimization for Selective Ensemble Learning-Based Soft Sensor

Abstract

1. Introduction

2. Methods

2.1. Maximal Information Coefficient (MIC)

2.2. Kernel Partial Least Squares Algorithm (KPLS)

2.3. Multi-Objective Gray Wolf Optimization Algorithm (MOGWO)

3. Ensemble Soft Sensor Model Construction Based on Diversity Enhancement and Ensemble Pruning

3.1. Diversified Base Model Construction Method Based on Multi-Kernel Latent Variable Space

3.2. Ensemble Pruning Based on Multi-Objective GWO Optimization

3.3. Model Ensemble

4. Case Study

4.1. Fermentation Process for Penicillin Production

4.1.1. Process Description

4.1.2. Performance Test

4.2. Debutanizer Column

4.2.1. Process Description

4.2.2. Performance Test

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI