1. Introduction
Rolling bearings are widely used in transmission devices of mechanical equipment and play a key role in energy, power, transportation, aerospace, and other fields [
1]. Due to the complex operating environment and long working time of mechanical equipment, the bearing has become an easily damaged component. Once faults occur, it may lead to shut down, and even cause property loss and casualties [
2]. Therefore, health inspection should be carried out for the whole life cycle of the bearing [
3,
4]. Immediate fault diagnosis can avoid more losses. Fault diagnosis requires the signal data collected by sensors. However, the vibration signal collected usually contains various noises in the actual operation of mechanical equipment. Especially in a strong noise environment, the fault features of the vibration signal will be weakened or distorted, or even drowned by the noise. It has become a key problem to filter noise and extract bearing fault information from bearing vibration signals effectively.
Bearing vibration signal is a periodic time series with non-stationary and non-linear features. Traditional bearing fault diagnosis methods usually use signal processing techniques, which identify faults by determining signal components related to the fault [
5]. For example, Qin et al. [
6] proposed an improved empirical wavelet transform strategy to process signals, which improved the bearing fault diagnosis performance of signals with low SNR. Cheng et al. [
7] improved comprehensive ensemble empirical mode decomposition to reduce the impact of noise. It can effectively reveal the fault information of bearings. However, the bearing operation is complex, and the vibration features of the signal are easily submerged by strong noise. Signal processing technology relies on expert experience and cannot achieve real-time bearing fault diagnosis. Therefore, traditional signal processing technologies are arduous to achieve effective diagnosis in a strong noise environment.
In recent years, intelligent fault diagnosis methods have gradually become a research hotspot in the field. The bearing intelligent diagnosis algorithm based on machine learning can diagnose fault without prior physical knowledge. Many scholars have carried out extensive and in-depth research on intelligent bearing fault diagnosis in noise environments. These methods train models to identify fault types by learning fault features, including support vector machines (SVM) [
8], artificial neural networks (ANN) [
9], Bayesian classifiers (BC) [
10], and deep learning (DL) [
11]. Deep learning is widely used to achieve end-to-end bearing fault diagnosis mode by integrating feature extraction and classification. It can automatically learn useful features for fault diagnosis to improve diagnostic accuracy, including deep neural network [
12], convolutional neural network [
13,
14,
15], sparse autoencoder [
16,
17], transfer learning [
18,
19], long short-term memory network [
20,
21], and deep residual network [
22,
23]. Although the above methods have good fault diagnosis performance when the noise intensity is low, the precision of fault diagnosis is not high in a strong noise environment. The main reason is that strong noises submerge the periodic shock features of signals, and these methods filter out useful information in the filtering process. Many scholars gradually tend to use signal processing technology to reduce noise before using deep learning algorithms to identify faults. For example, Chen et al. [
24] used cyclic spectrum coherent processing signals to reduce the difficulty of feature learning. Then the CNN model was built to learn advanced features and fault diagnosis. The combination of the two improved the fault diagnosis performance. Xu et al. [
25] combined variable mode decomposition and deep convolutional neural networks to solve the problem of insufficient feature extraction from a single source. This method enhanced the fault diagnosis accuracy of the model in a noisy environment by obtaining multi-source features. The above research also has the following problems: (1) Signal processing requires a lot of expert knowledge and experience; (2) these in-depth learning methods use scalar neuronal transmission feature scalar, losing time, space, overall and local related information.
In 2017, Sabour et al. [
26] proposed a capsule network [
27,
28,
29,
30,
31], which replaced traditional scalar neurons with vector neurons, so that a deep neural network could merge the fault feature scalar of vibration signal into vector space. Dynamic routing is used to establish the relationship between low-level features and high-level features. It can make models extract fault feature information of time dimension or space dimension more comprehensively. Based on the peculiarity of vector neurons, the capsule network can make the model mine useful information as much as possible and improve the fault diagnosis ability in detecting signals with noise. Zhu et al. [
32] proposed a capsule network-bearing fault diagnosis method with strong generalization ability. The method uses STFT to transform signals into feature maps and then uses convolution and inception modules to improve capsule nonlinearity. It achieves a better fault classification effect. Sun et al. [
33] connected wide-kernel convolution, small-kernel convolution, and multi-scale convolution in series to extract fault features of vibration signals layer by layer. This improves the feature expression ability of the model. A new fault diagnosis method based on the capsule neural network obtained the time-frequency diagram through wavelet time-frequency analysis [
34]. Then, the time-frequency diagrams are input into the Xception module and combined with the capsule network. This method improves the reliability of the model. Wang et al. [
35] combined wide convolution with multi-scale convolution and introduced an adaptive batch normalization algorithm [
36] and capsule network, which improved the anti-noise ability of the model. The above methods mainly improve the noise resistance of the model in two ways: (1) The signal processing technology is combined with the improved capsule network; (2) the first layer of the network adopts single-layer convolution and then connects multiple network layers with the improved capsule network in series to form a deep network. The first way requires expert experience and knowledge, which is time consuming. In the second way, because the first layer uses single-scale convolution, the scale of feature information captured by these models is single. The small size setting of the convolution kernel at the first layer results in that the network cannot pay attention to the effective components of low and middle-frequency signals, so it cannot effectively filter high-frequency noises. Serial networks cannot restore or compensate for features not captured by the previous layer. When fault features of the signal are completely submerged by strong noise, it is difficult to perform an accurate fault diagnosis.
Therefore, in order to solve the above problems and realize effective diagnosis in a strong noise environment, a bearing fault diagnosis method based on an enhanced integrated filter network (EIFN) is proposed in this paper. The main contributions of this method are as follows:
This method is an end-to-end bearing fault diagnosis system that integrates noise reduction, feature extraction, and fault recognition. It does not need signal processing and does not rely on expert experience and knowledge.
The method integrates multiple convolutional layers (weak filters) with different scales to form an enhanced integrated filter, which is connected in a parallel and cascaded way to achieve the effect of the enhanced filter. It can capture useful signals in the middle and low frequencies and filter high-frequency noise.
Finally, the method integrates the feature information of different receptive fields into vector space. It uses the peculiarity of vector to mine correlations between fault features at the time dimension, so as to improve the fault diagnosis precision of the model in a strong noise environment.
The structure of the rest of this paper is as follows: The basic theory is introduced in
Section 2.
Section 3 introduces the construction process and detailed architecture of the proposed method. Experimental results and visual analysis are presented in
Section 4.
Section 5 is the conclusion.
3. Proposed Methodology
The capsule network can use vector neurons and a dynamic routing mechanism to capture fault feature information in the signal and mine the potential correlation between features. However, the traditional capsule network has only one convolution layer, and the convolution kernel scale is fixed. The extracted features are single. In a strong noise environment, the network is not sensitive to the periodic fault features of bearing vibration signals. Therefore, EIFN builds an enhanced integrated filter combining parallel way and cascaded way. It integrates feature extraction and noise filtering to improve the anti-noise capability of the model. The extracted scalar features are integrated into vector space, and the potential correlation between fault features of time-domain signals is mined.
3.1. Enhanced Integrated Filter
Section 2.1 shows that a one-dimensional convolution kernel can be used as a low-pass filter, which achieves better dynamic filtering. Then, we need to set the size of the one-dimensional convolution kernel. Short-time Fourier transform (STFT) divides the original time-domain signal into segments and adds windows on the basis of Fourier transform (FT) through a sliding window mechanism. It optimizes the problem that FT cannot handle the frequency component. For time-varying unsteady signals, the wide window is suitable for medium and low-frequency signals with high-frequency resolution. A narrow window is suitable for a high-frequency signal with high time resolution. Therefore, the enhanced integrated filter proposed in this paper uses a super-wide convolution kernel in the first layer to extract features and performs low-pass filtering of input signals. The super-wide convolution kernel pays more attention to the medium and low-frequency parts of the signal in the process of feature extraction to reduce the interference of high-frequency noise. The advantage of a super-wide convolution kernel is that it is obtained by an optimization algorithm, whereas the window function of STFT is an infinite length trig function. In summary, the super-wide convolution kernel automatically learns the features that are useful for fault diagnosis and automatically removes the features that interfere with diagnosis. It integrates feature extraction and low-pass filtering to improve the anti-noise capability of the model.
In the first layer, a single convolution with a super-wide kernel still has a similar problem to STFT. That is, a fixed window cannot ensure the resolution of frequency and time at the same time. Although wavelet transform can obtain frequency and time by changing the basis function, it is difficult to determine and change the wavelet basis function. We need to improve the single-layer super-wide kernel convolution. Based on experience, the first super-wide kernel convolution layer is set at the network entrance, and its kernel size is set to three times the step size [
13]. Subsequently, three super-wide kernel convolution layers are added to the network entrance with successively increasing kernel size. Concatenate technology is used to connect four super-wide kernel convolution layers in parallel to achieve feature fusion in the channel dimension. Super-wide kernel convolution of diverse sizes can filter and retain different feature information in various visual fields. The fusion of extracted feature information can not only ensure time and frequency resolution to a certain extent but also reduce the interference of high-frequency noise by capturing the features of medium and low-frequency signals with super-wide windows. The convolution operation of parallel super-wide kernel convolutions on the original time-domain signal will intercept enough features. Further analysis of deep semantic features from fused features is another issue.
EIFN integrates multiple primary filters in parallel and cascaded modes to form the final strong filter, as shown in
Figure 3.
Parallel super-wide kernel convolutions can effectively reduce noise, but the extracted feature information is extensive. If the model only cascades a single convolution layer, it is difficult to extract accurate signal features in each channel. Therefore, it is necessary to construct various small-scale convolutions to enhance feature expression, so as to highlight the features of the medium and low-frequency signals in the original signal.
3.2. The Architecture of EIFN
The model includes a filter enhancement layer, expression enhancement layer, concatenate layer, pooling layer, primary capsule layer, and digital capsule layer. The network structure of EIFN is shown in
Figure 4.
EIFN firstly uses an enhanced integrated filter to perform noise filtering and feature extraction for the original vibration signal and then inputs to the primary capsule layer. The primary capsule layer reconstructs the input scalar feature layer into a vector neuron and transmits the feature in the way of dynamic routing. Finally, the length of the output vector of the digital capsule layer corresponds to different types of rolling bearing faults.
The filter enhancement layer in EIFN adopts four super-wide kernel convolutions of different scales to act on the original vibration signal. With different large receptive fields, it pays more attention to the useful signal features of middle and low frequencies and filters high-frequency noises to achieve multi-scale feature fusion. The expression enhancement layer uses various small convolution kernels to further extract features from the learning results of the upper layer. It can obtain better feature expression. The function of two pooling layers is to reduce the number of parameters, prevent overfitting, and optimize the network’s training speed. The leaky ReLU activation function is used for the filter enhancement layer, expression enhancement layer, and primary capsule layer. It can preserve the negative axis information in the feature vector, avoiding the constant zero neuron gradient. L2 regularization is also introduced.
The loss function is to measure the advantages and disadvantages of the model prediction. It describes the deviation between the model estimate and the observed value. During training, the model updated the weight to minimize the loss through the back propagation algorithm and optimized the weight in the process of continuous update and iteration. Since a capsule network allows multiple classifications to exist simultaneously, traditional cross-entropy cannot be used as the loss function in this paper. Therefore, interval loss is adopted, and the equation is as follows:
where
stands for classification indicator function,
c refers to classification number. Assuming that the correct label is 9, it can be considered that
of the 9-th capsule is 1 and
of other capsules is 0.
is the mold length of the output vector and represents the classification probability. The upper bound
is proposed and the value is set to 0.9. When
is 0, the right-hand formula is calculated, and
is 0.5 to ensure numerical stability of training. The lower bound
is set to 0.1. The closer
is to
or
, the smaller the loss value is. Square the formula to make the loss function
conform to L2 regularization.