1. Introduction
Rotating machinery is one of the essential equipment in today’s industrial environments. From petroleum, automobile, chemicals, pharmaceutical, mining, and power generation plants to consumer goods, at least there is a machine with a rotating component. The rotating component could be the gearbox, axles, wind, steam and gas turbines, centrifugal and oil-free screw compressors, and pumps. A total of 30% of rotating machinery breakdowns are mainly caused by loose, partially rubbed, misaligned, cracked, and unbalanced rotating parts [
1]. Machine breakdowns can present complex challenges during day-to-day operations and significantly impact business profitability and operations productivity. Monitoring machine health conditions can prevent machine breakdowns and reduce the maintenance costs of manufacturing systems [
2]. It is, hence, crucial to develop efficient diagnosis systems to analyze different health conditions of the rotating components.
There are two main approaches for coping with fault detection and diagnosis in rotating machinery: (1) physical-based control systems and (2) data-driven-based models. Recent advancements in computer processing and digital technologies enhanced the robustness and higher computational capabilities to use data-driven fault detection and diagnosis models. Implementing these models enable us to monitor and control the parameters of machines from a remote distance and drive insights. That is the main reason for which data-driven fault detection and diagnosis models are used in smart manufacturing systems [
2].
Figure 1 shows some expected steps that a practitioner should take to practice a data-driven fault detection and diagnosis.
The main contributions of this paper are as follows: (1) In order to get higher classification performance in different environments, a hybrid deep learning architecture is designed such that it takes Fourier and Wavelet spectra of the vibration signals. This architecture uses CNN blocks to find shift-agnostic characteristics of the fault types, a LSTM block which understands the spatiotemporal and sequential features of it and, finally, a Weighted ELM classifier which is effective in learning from scarce patterns, the necessity of which is examined through experimental comparisons. The proposed classifier is named CLSTM-ELM. (2) A Wasserstein–GAN model with a gradient penalty is developed and employed in the hybrid framework to reproduce rare patterns and enhance the training set. The effectiveness of this proposition is investigated in
Section 5. (3) A comprehensive set of scenarios is designed to study the effect of different imbalance severities and noise degrees on the performance of the framework. A sensitivity analysis is conducted on the scenarios revealing more insights about the characteristics of the model. (4) Seven state-of-the-art FDD models are chosen to compete with the proposed framework on four different dataset settings. The experimental comparison illustrates how implementing WGAN-GP and W-ELM can improve the classifier performance and shows the superiority of GAN-CLSTM-ELM over other algorithms.
The rest of the paper is organized as follows.
Section 2 provides an overview of the principal AI-based approaches proposed for FDD problems. In
Section 3, the theory behind WGAN-GP, LSTM, Convolutional layers and W-ELM is briefly reviewed. Then, the proposed hybrid framework, GAN-CLSTM-ELM, is presented in
Section 4.
Section 5 compares the performance of different FDD algorithms on different imbalance ratios and noise severities. Finally, some research conclusions and future extensions are provided in
Section 6.
2. Review of Current Models
Early data-driven fault detection and diagnosis (hereafter FDD) models have benefited from traditional Artificial Intelligence (AI) models, or “shallow learning” models, such as Support Vector Machines (SVM), Decision Trees (DT), and Multi-layer Perceptron (MLP) [
3]. Despite the applicability of traditional AI models to FDD problems, these models show poor performances and limitations when dealing with complicated fault patterns such as the above-mentioned rotating machinery faults [
4]. One of the first applications of rotating machinery FDD dates back to 1969 in Boeing Co., when Balderston [
5] illustrated some characteristics of the fault signs on the signals measured by an accelerometer in natural and high frequencies. Ref. [
6] employed the rectified envelope signals with a synchronous averaging, which was later called “envelope analysis”, to identify bearing local faults. The peak localization in the vibration signal spectrum is another classical example of the fault detection methods for the ball bearing faults [
7].
Recently, with the emergence of novel deep learning architectures and their promising pattern recognition capabilities, many researchers proposed deep learning solutions for data-driven-based FDD systems [
8]. These FDD approaches rely on the common assumption that the distribution of classes for different machine health conditions is approximately balanced. In practice, however, the number of instances may significantly differ from a fault class to another. This causes a crucial issue since a classifier which has been trained on such a data distribution primarily exhibits a skewed accuracy towards the majority class, or fails to learn the rare patterns. Most of the proposed FDD approaches, thus, suffer from higher misclassification ratios when dealing with scarce conditions such as in high-precision industries where the number of faults are limited [
9].
Through their deep architectures, deep learning-based methods are capable of adaptively capturing the information from sensory signals through non-linear transformations and approximate complex non-linear functions with small errors [
3]. Auto-encoders (AE) are among the most promising deep learning techniques for automatic feature extraction of mechanical signals. They have been adopted in a variety of FDD problems in the semiconductor industry [
10], foundry processes [
11], gearboxes [
12] and rotating machinery [
13,
14]. Ref. [
15] employed the “stacked” variation of AE to initialize the weights and offsets of a multi-layer neural network and to provide an expert knowledge for spacecraft conditions. However, to cope with mechanical signals, using a single AE architecture has shown some drawbacks: it may only learn similar features in feature extraction and the learned features may have shift variant properties which potentially lead to misclassification. Some approaches were proposed to make this architecture appropriate for signal-based fault diagnosis tasks. Ref. [
16] used a local connection network on a normalized sparse AE, called NSAE-LCN, to overcome these shortcomings. Ref. [
17] developed a stacked-AE to directly learn features of mechanical vibration signals on a motor bearing dataset and a locomotive bearing dataset; specifically, they first used a two-layer AE for sparse filtering and then applied a softmax regression to classify the motor condition. The combination of these two techniques let the method achieved high accuracy in bearing fault diagnosis.
Extreme learning machine (ELM) is a competitive machine learning technique, which is simple in theory and fast in implementation. As an effective and efficient machine learning technique, ELM has attracted tremendous attention from various fields in recent years. Some researchers suggest ELM and Online Sequential ELM (OS-ELM) for learning from imbalance data [
18,
19,
20]. ELM and OS-ELM can learn extremely fast due to their ability to learn data one-by-one or chunk-by-chunk [
21]. Despite their effective performances on online sequential data, the performance associated to their classical implementation on highly imbalanced data is controversial; according to [
22], for example, OS-ELM tends to have poor accuracy on such data. Therefore, they proposed a voting-based weighted version of it, called VWOS-ELM, to cope with severely rare patterns, whereas [
9] developed a two-stage hybrid strategy using a modified version of OS-ELM, named PL-OSELM. In offline stage, the principal curve is employed to explore the data distribution and develop an initial model on it. In online stage, some virtual samples are generated according to the principal curve. The algorithm chooses virtual minority class samples to feed more valuable training samples.
Considering the promising results obtained by ELM-based classifiers coping with imbalanced data, they accordingly became one of the mainstreams in FDD research area. In [
23], the authors developed an evolutionary OS-ELM for FDD for bearing elements of high-speed electric multiple units. They employed a K-means synthetic minority oversampling technique (SMOTE) for oversampling the minority class samples. They also used an artificial bee colony (ABC) algorithm to find a near-optimum combination of input weights, hidden layer bias, and number of hidden layer nodes of the OS-ELM. In another paper, Ref. [
24] used density-weighted one-class ELM for fault diagnosis in high-voltage circuit breakers (HVCBs), using vibration signals. Ref. [
25] applied an adaptive class-specific cost regulation ELM (ACCR-ELM) with variable-length brainstorm algorithm for its parameter optimization to conveyor belt FDD. The proposed algorithm exhibits a stable performance under different imbalance ratios. Ref. [
26] presented a feature extraction scheme on time-domain, frequency-domain, and time-frequency-domain, to feed a full spectrum of information gained from the vibration signals to the classifier. They also demonstrated that the cost-sensitive gradient boosting decision tree (CS-GBDT) shows a satisfactory performance for imbalanced fault diagnosis. In another FDD framework for rolling bearings [
27], the authors coupled an Optimized Unsupervised Extreme Learning Machine (OUSELM) with an Adaptive Sparse Contractive Auto-encoder (ASCAE). The ASCAE can gain an effectual sparse and more sensitive feature extraction from the bearing vibration signals. A Cuckoo search algorithm was also proposed to optimize the ELM hyper-parameters. Another variation of ELM was developed by [
28] to deal with imbalanced aircraft engines fault data which are derived from the engine’s thermodynamic maps. This ELM variation flexibly sets a soft target margin for each training sample; hence, it does not need to force the margins of all the training samples exactly equaling one from the perspective of margin learning theory. After some experiments on different datasets, including the aircraft engine, it is concluded that SELM outperforms ELM.
On the other hand, there are frameworks for imbalanced and noisy FDD without the employment of any ELM variation. Ref. [
16] proposed a Deep Normalized Convolutional Neural Network (DNCNN) for FDD under imbalanced conditions. The DNCNN employs a weighted softmax loss which assumes that the misclassification errors of different health conditions share an equivalent importance. Subsequently, it minimizes the overall classification errors during the training processes and achieves a better performance when dealing with imbalanced fault classification of machinery adaptively. Ref. [
29] used WGAN-GP to interpolate stochastically between the actual and virtual instances so that it ensures that the transition region between them is stable. They also utilized a Stacked-AE to classify the enhanced dataset and determined the availability of the virtual instances. Since a single GAN model encounters hardship and poor performance when dealing with FDD datasets, Ref. [
30] proposed a framework based on GANs under small sample size conditions which boost the adaptability of feature extraction and consequently diagnosis accuracy. The effectiveness and satisfactory performance of the proposed method were demonstrated using CWRU bearing and gearbox datasets. Another novel GANs-based framework, named dual discriminator conditional GANs (D2CGANs), has been recently proposed to learn from the signals on multi-modal fault samples [
31]. This framework automatically synthesizes realistic high-quality fake signals for each fault class and is used for data augmentation such that it solves the imbalanced dataset problem. After some experiments on the CWRU bearing dataset, the authors showed that Conditional-GANs, Auxiliary Classifier-GANs and D2CGANs significantly outperform GANs and Dual-GANs. Ref. [
32] proposed a framework which adopts a CNN-based GANs with the coordinative usage of two auxiliary classifiers. The experimental results on analog-circuit fault diagnosis data suggested that the proposed framework achieves a better classification performance than that of DBN, SVM and artificial neural networks (ANN). Ref. [
33] presented a CNN-based GANs for rotating machinery FDD which uses a Wavelet Transform (WT) technique. The proposed so-called WT-GAN-CNN approach extracts time-frequency image features from one-dimension raw signals using WT. Secondly, GANs are used to generate more training image samples while the built CNN model is used to accomplish the FDD on the augmented dataset. The experimental results demonstrated high testing accuracy in the interference of severe environment noise or when working conditions were changed.
5. Results
To evaluate the effectiveness of the proposed method, some experiments were run on one of the most widely used bearing fault datasets, known as Case Western Reserve University (CWRU) bearing dataset (
https://csegroups.case.edu/bearingdatacenter/home (accessed on 22 March 2022)). To conduct a comprehensive comparison, we defined different noise and imbalance conditions on which eight different deep learning-based FDD methods were tested. All experiments were performed by using Python 3.9 on a computer with a GPU of NVIDIA Geforce GTX 1070 with CUDA version of 10.1 and 16 GB of memory.
5.1. Dataset Description
The paper employs CWRU bearing dataset using the test stand shown in
Figure 6, that consists of a motor, a torque transducer/encoder, a dynamometer, and control electronics. The dataset consists of five different types of faults corresponding to inner race, balls and outer race in three different orientations: 3 o’clock (directly in the load zone), 6 o’clock (orthogonal to the load zone) and 12 o’clock (opposite to the load zone). Moreover, the faults are collected in a range of severity varying between 0.007 inches to 0.040 inches in diameter. The dataset is also recorded for motor loads, from 0 to 3 horsepower. However, for the sake of simplicity this paper uses only one motor speed of 1797 RPM. The samples are collected at 12,000 samples/second frequency from two accelerometers mounted on fan-end and drive-end of the machine. In the experiments we took signal bursts of 800 timestamps, equal to 66.6 milli-seconds, to generate some different datasets of approximately 25,500 signal bursts.
To explore the diagnostic capabilities of the proposed framework in imbalanced conditions, some non-equitant sets of samples were selected such that a fault class becomes rare.
Table 2 shows the distribution of samples for each machine condition in the selected sets, where the value of
denotes the percentage of minority class within the whole dataset. Accordingly, as
decreases the imbalance degree increases. In this paper we chose “out3” class to represent the minority class, whose samples correspond to the outer race faults of opposite load zone position. In these scenarios, the “health” class, corresponding to the healthy condition, represents (
) percent of the whole dataset, while the other fault classes account for 5% each. The generative algorithm, subsequently, strives to equalize the sample size of the fault classes in the training set by augmenting the minority class.
Adding “additive white Gaussian noise” with different signal-to-noise ratios (SNRs) to the original samples, the paper is able to examine the performance of GAN-CLSTM-ELM framework on different natural noise severity levels. These noisy samples better portray the real-world industrial production settings where the noise varies a lot. The original drive-end and fan-end signals with their driven noisy samples are exhibited in
Figure 7.
5.2. GAN Model Selection
As it is mentioned in the previous section, the proposition of Wasserstein loss function and adding the gradient penalty to its loss function help stabilize the generative algorithm.
Figure 8 depicts how the proposed WGAN-GP reaches an equilibrium after 9000 epochs where it can generate realistic samples. Whereas, the other GAN generators make samples which cannot devise their discriminators. As it can be clearly seen in
Figure 8 their generator loss values go significantly higher than those of the discriminators. This comparison demonstrates why the implementation of WGAN-GP is preferred.
Figure 9 shows some real samples of normal baseline, and fault conditions associated with the bearing ball, inner race and outer race with fault diameters of 7 mils and 21 mils.
Figure 10, similarly, visualizes the synthetic samples generated by WGAN-GP after 10,000 epochs.
5.3. The Sensitivity Analysis
In this section the paper illustrates a sensitivity analysis on the performance of the proposed model by changing the
values and the SNRs. Specifically, we considered 25 points with respect to
and SNR
, and run the model 10 times at each point to achieve a robust analysis.
Figure 11 and
Figure 12 demonstrate the performance of GAN-CLSTM-ELM model with different metrics on these points.
As it can be seen in the figures, high levels of noise impact on the performance of the model, changing the score from 100% to 95.91%, and from 99.7% to 81.45% when and , respectively. In this defined space the accuracy, AUC and recall values fall above 96.7%, 92.6% and 81.16%, respectively. The model shows a relatively high robustness to both noise and imbalance severities for SNRs greater than 20. At its best-case scenario, where and SNR = 100, it gains score of 100%; in its worst-case scenario, where and SNR = 50, it respectively gets 98.02% and 99.77% of score and accuracy. In the following, the paper conducts a comparison to figure out how these numbers are meaningful and whether the proposed model can better mitigate the adverse impacts of imbalanced and noisy conditions.
5.4. Model Performance Evaluation
In order to achieve meaningful comparisons, some novel FDD frameworks were employed to perform the diagnosis at different scenarios. CLSTM, df-CNN, sdAE, WELM, and CNN have shown promising performances in the literature, hence, they were selected for this purpose. Three traditional machine learning classifiers, SVM, ANN and Random Forest (RF), are also considered in this experimental comparison to draw insights from both machine learning and deep learning models. CLSTM-ELM was also added to the comparison panel to examine the necessity of Weighted ELM in the architecture of the proposed framework. A grid search was designed on the hyper-parameters of these models to achieve higher performances. Specifically, the learning rate, batch size and the architecture of fully connected layers were optimized for each algorithm. This paper uses two augmentation techniques: (i) “classic”: where the samples are flipped, mirrored and different white noises are added to them. (ii) “GAN”: with WGAN-GP, as discussed earlier in
Section 4. all the frameworks are examined with both augmentation techniques. A brief description of the selected frameworks is provided in
Table 3.
To avoid the weight initialization effect and randomness on the results, we ran each framework for ten independent times, using a five-fold cross validation technique on each imbalance and noise degree conditions. The data is stratified, such that each fold has the same class distribution. For the minority class, depending on the value, there will be between 51 and 816 samples on which it can train. The classic augmentation technique is used to multiply this number by 8, (mirroring and flipping the samples and adding random white noise to them). In each scenario after training the WGAN-GP on each class, we set it to produce between 512 to 4096 samples for the minority class such that its sample number matches the other classes.
Figure 13 and
Figure 14 illustrate the corresponding normalized confusion matrices and the model performances with both classic and WGAN-GP augmentations, respectively. Comparing the different scenarios, it can plainly be concluded that GAN-CLSTM-ELM has a better ability to extenuate the negative effects of imbalance and noise conditions compared to the other frameworks. Regarding the highly imbalanced situation, its
score has gently dropped by 0.32% in the first two scenarios (SNR:100,
:
and SNR:100,
:
) while the other frameworks have shown relatively substantial declines in their
scores, ranging from 1.14% (GAN-CNN) to roughly 48% (df-CNN). In the second scenario, while the proposed model correctly identifies all the minority class samples, CLSTM-ELM and GAN-CNN were able to classify roughly 92% of them. This percentage for CLSTM, sdAE, WELM and CNN was between 80 and 85. The df-CNN showed a lackluster performance on the minority class as it could not correctly diagnose any of the corresponding samples. The figures also show that replacing the fully connected layers with W-ELM in the CLSTM-ELM model has slightly increased its robustness when
plummets from 4 to 0.25.
In the presence of heavy noises, there are sudden falls in the performances of all the algorithms. Comparing the first and the third scenarios (SNR = 100, and SNR = 10, ), all the CLSTM-based methods alongside GAN-CNN had the least decrease in the score (roughly 5%); thus, they were the most robust algorithms in noisy conditions. Comparing CLSTM-ELM and CLSTM with CNN, in both figures, we can infer that the presence of LSTM and CWT, has made the model perform better against the noise. Moreover, CNN achieved comparatively poorer results when dips below 1. Its combination with a WGAN-GP, however, mitigated this loss and GAN-CNN achieved a satisfactory result. With the presence of heavy noises, GAN-WELM classification quality drastically plunged and, despite its comparatively satisfactory performance in the first two scenarios, the noise made it unable to diagnose the minority class in highly imbalanced situations.
By comparing the confusion matrices of WELM and CLSTM-ELM, it can be concluded that CLSTM architecture alongside WELM model improves its performance against the noise. It is worth noting that, adding WGAN-GP to the deep learning-based models, made them exhibit superiority over their root algorithms. This proves that WGAN-GP can effectively enhance the quality of the classifier not only in imbalanced situations but also in noisy environments. On the other hand, shallow learning techniques had comparably higher misclassification rates when it comes to either noisy or imbalanced conditions. GAN-based augmentation significantly improved RF and ANN accuracy scores, except for the SVM, as it was unable to diagnose the minority class in highly imbalanced situations.
Table 4 shows each deep learning algorithm training time per step and the learning hyper-parameters. As it is discussed in [
46], CLSTM has a relatively slow training phase. From the table, it can be seen that substituting the W-ELM for fully-connected layers has made it slightly faster to train and converge. Among the comparison panel, df-CNN followed by WELM and CNN were the quickest classifiers. From
Table 5, it can be inferred that the presence of noise makes the computations harder for the SVM and the RF to classify the samples. Their average training times were, therefore, drastically dependent on the scenarios. While, the deep learning-based classifiers had steady runtimes in different situations.
6. Discussion and Conclusions
In many real applications of fault detection and diagnosis data tend to be imbalanced and noisy, meaning that the number of samples for some fault classes is much fewer than the normal data samples and there are errors in recording the actual measurement by the sensors. These two conditions make many traditional FDD frameworks perform poorly in real-world industrial environments.
In this paper a novel framework called GAN-CLSTM-ELM is proposed, which enhances the performance of rotating machinery FDD systems coping with highly-imbalanced and noisy datasets. In this framework, WGAN-GP is first applied to augment the minority class and enhance the training set. A hybrid classifier is then developed, containing Convolutional LSTM and Weighted ELM, which learns more efficiently from vibration signals. The framework also benefits from both wavelet and Fourier transform techniques in its feature engineering step, revealing more hidden information of the fault signatures to make the classifier perform more accurately. The effectiveness of the proposed framework is verified by using four dataset settings with different imbalance severities and SNRs. After conducting the comparisons with state-of-the-art FDD algorithms, it is demonstrated that the GAN-CLSTM-ELM framework can reduce the misclassification rate and outperform the other methods, more significantly when the imbalance degree is higher. The efficiency of the WGAN-GP is also proved by comparing the results of the proposed model and CLSTM-ELM as well as all the other diagnosis models. The experimental results make it discernible that using a generative algorithm helps the classification model alleviate the adverse impacts of low SNRs. Therefore, it stresses the necessity of employing such hybrid frameworks for practitioners working on noisy and industrial applications. The paper also justifies the implementation of W-ELM in the architecture of CLSTM, since the adjusted model shows sturdy classification when decreases either in noisy or noiseless scenarios. A sensitivity analysis is designed with 25 dataset settings built on a range of and SNR values, to obtain insights of how these two factors impact on the model’s classification ability.
Extracting the FFT and CWT spectra needs some knowledge of signal processing and is still more convenient than extracting other hand-crafted features proposed in the literature. Another advantage of the proposed framework is that it gains comparatively high performances under noisy conditions while it requires no complex denoising pre-processing being handled by employees with expert knowledge of signal processing. These characteristics make GAN-CLSTM-ELM an attractive option for industrial practitioners who are in need of a relatively easy-to-use software without undergoing any complicated pre-processing task.
Future work will include more experiments on the behavior of different generative algorithms and the development of a more powerful architecture to create high-quality signals with fewer samples. We will also attempt to explore the feasibility of implementing and testing the proposed framework on other applications.