Diagnosis and Prognosis of Faults in High-Speed Aeronautical Bearings with a Collaborative Selection Incremental Deep Transfer Learning Approach

Berghout, Tarek; Benbouzid, Mohamed

doi:10.3390/app131910916

Open AccessArticle

Diagnosis and Prognosis of Faults in High-Speed Aeronautical Bearings with a Collaborative Selection Incremental Deep Transfer Learning Approach

by

Tarek Berghout

¹

and

Mohamed Benbouzid

^2,3,*

¹

Laboratory of Automation and Manufacturing Engineering, University of Batna 2, Batna 05000, Algeria

²

Institut de Recherche Dupuy de Lôme (UMR CNRS 6027), University of Brest, 29238 Brest, France

³

Logistics Engineering College, Shanghai Maritime University, Shanghai 201306, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(19), 10916; https://0-doi-org.brum.beds.ac.uk/10.3390/app131910916

Submission received: 27 August 2023 / Revised: 14 September 2023 / Accepted: 28 September 2023 / Published: 2 October 2023

(This article belongs to the Section Aerospace Science and Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Featured Application

This study proposes a well-structured methodology to assist in intelligent decision making and the real-time monitoring of flight conditions of safety-critical aircraft whose major use results in the establishment of a well-planned maintenance schedule.

Abstract

The diagnosis and prognosis of aeronautical-bearing health conditions are essential to proactively ensuring efficient power transmission, safety, and reduced downtime. The rarity of failures in such safety-critical systems drives this process towards data-driven analytics of fault injection and aging experiments, rather than complex physics-based modeling. Nonetheless, data-based condition monitoring is very challenging due to data complexity, unavailability, and drift resulting from distortions generated by harsh operating conditions, scarcity of failure patterns, and rapid data change, respectively. Accordingly, the objective of this work is three-fold. First, to reduce data complexity and improve feature space representation, a robust data engineering scheme, including feature extraction, denoising, outlier removal, filtering, smoothing, scaling, and balancing, is introduced in this work. Second, collaborative selection-based incremental deep transfer learning (CSIDTL) is introduced to overcome the problem of the lack of patterns, incrementing the number of source domains in different training rounds. Third, long short-term memory (LSTM) adaptive learning rules are fully taken into account to combat further data complexity and data change problems. The well-structured methodology is applied on a huge dataset of aeronautical bearings dedicated to both diagnostic and prognosis studies, which perfectly addresses the above challenges in a form of a classification problem with 13 different conditions, 7 operating modes, and 3 stages of damage severity. Conducting CSIDTL following a three-fold cross-validation process allows us to improve classification performance by about 12.15% and 10.87% compared with state-of-the-art methods, reaching classification accuracy rates of 93.63% and 95.65% in diagnosis and prognosis, respectively.

Keywords:

aircraft bearing; aircraft engine; deep learning; fault diagnosis; health stage; long short-term memory; prognosis; transfer learning

1. Introduction

The condition monitoring of aircraft engines is of utmost importance, as it plays a critical role in ensuring safety, reliability, and efficiency of aircraft operations. Thus, it enables the early detection of failures, facilitates proactive/reactive maintenance, enhances operational efficiency, and ensures compliance with regulatory standards [1]. The real-time condition monitoring of such critical systems solely depends on the accurate modeling of the systems themselves and the interaction of their sub-components. As some of the main critical sub-components of aeroengines, bearings are vital for load support, friction reduction, heat dissipation, vibration absorption, reliability, durability, and overall safety. For numerous reasons, monitoring system vibration and bearing quality is critical in high-speed aerospace applications [2]. First, vibrations might suggest possible problems within the aircraft, such as misalignment or imbalanced loads, which can result in mechanical breakdowns and jeopardize safety. Second, bearing condition monitoring assists in identifying wear, excessive heat, and lubrication concerns that, if not treated promptly, can lead to catastrophic failures. Finally, monitoring these indicators allows for preventative maintenance, which reduces downtime and increases aircraft availability [2]. In another way, their proper operation and maintenance are essential to the efficient and safe operation of aircraft engines. In this context, bearing failure and degradation modeling can usually be performed through physics-based modeling, data-driven modeling, or hybrid physics- and data-driven modeling. However, various difficulties exist in this particular case. Extreme temperatures and strong G-forces are common operating conditions in high-speed aerospace applications that can impair the accuracy and reliability of monitoring systems. Furthermore, the complexity of aerospace systems, as well as the requirement for real-time data processing, presents hurdles in terms of data interpretation and rapid decision making [3]. Because of the intrinsic complexity of aircraft engine modeling, physics-based modeling is a difficult undertaking that induces less generalization, forcing it to be constrained to particular defined components and conditions [4,5]. Data-driven models, on the other hand, are strongly encouraged, as they have fewer modeling needs and the ability to undertake fault injections and aging tests in environments specially built by specialized laboratories. In this regard, this part is dedicated to providing the motivation, research gaps, contributions, and outline of this study, with the main goal of the data-driven modeling of aeronautical bearings for diagnostic and prognostic studies.

1.1. Motivations

Data-driven modeling often encounters three major data challenges: complexity, unavailability, and drift. The existence of outliers or anomalies in historical data, which are data points that deviate considerably from others, is referred to as data complexity. These anomalies are caused by extremely complicated working conditions and severe environments, which distort sensor readings. These outliers can impact the learnability of the model, misleading the learning process or creating confusion in the patterns [6]. Such a problem should be addressed by considering different relevant analytical or trainable methods that could improve data representation, including feature engineering, outlier detection and removal, denoising, smoothing, filtering, etc. [7]. On the other hand, data unavailability is the result of scarcity of failure patterns, whether under real conditions or in artificial failure generation and aging tests. Practically, such unavailability occurs owing to the rarity of downtime caused by continuous preventative maintenance function. Similarly, in accelerated aging and failure generation tests, failure modes are constrained to a set of operating conditions, and/or modes do not exactly emulate practical applications at some point. The problem created by such a challenge is the lack of representative data to effectively train a machine learning model. Without enough quality data, it becomes difficult for the model to learn different patterns, make accurate predictions, or generalize to new, unseen data well. Therefore, the requirement for generative modeling and/or transfer learning is critical, since they can give additional sources of expertise to address data unavailability gaps (see Figure 2 form [6]). Additionally, data drift refers to rapid change and quickly outdated data. Because data are constantly changing and evolving, trained models on old data may not accurately represent the current state of the system. This can lead to biased or inaccurate predictions, as learning models are not able to adapt to rapid changes in data. Correspondingly, the key solution is to constantly update and retrain machine learning models in order to remain up to date with quickly changing data. Indeed, learning models need to be adaptive, up-to-date, and resilient to this problem [6]. For instance, LSTM networks, adaptive least squares methods with a forgetting mechanism, are very helpful, as they address specific penalization parameters acting as a memory focusing on new driven sequences [8].

Motivated by the urgent need for data-driven diagnosis and prognosis of aeronautical bearing health conditions in accordance with the aforementioned challenges, this work contributes further significant findings in this field. Accordingly, the following sections will also introduce a very interesting contribution to this field. This study looks into a more appropriate solution to such a problem by offering a well-structured research-gap extraction approach, well-defined analysis criteria, and a collection of contributions filling these gaps. Finally, at the end of this section, the key points of this work will be stated.

1.2. Extracting Research Gaps from Related Works

Developing a methodology to solve or improve a solution to a specific problem requires extensive literature research and analysis, and research gaps need to be clarified. Accordingly, this section highlights the methodology for collecting papers, analysis criteria of related works, and extracted research gaps.

1.2.1. Research Work Collection Methodology

As previously foretold, this study is based on a specific dataset of aeronautical bearings related to diagnostic and prognostic studies. When the gathered research articles are processed on the same dataset/topic, this is highly valuable in this context. By comparing identical endeavors using the same criteria, it becomes much simpler to trace developments in contributions in a given topic and disclose its gaps. It should be emphasized that this work used data from the Politecnico di Torino bearing test bench, which is open to the public at [7]. As a result, we followed its citations until the end of July 2023 and discovered that it is receiving a lot of attention, with a total of 70 citations as clarified by the Scopus citation monitoring database. The majority of citations are explicitly dedicated to the application, rather than only making a passing reference to it. As a result, it was quite challenging to compile and evaluate all the relevant works under this specific circumstance. Accordingly, the search mechanism, in this case, was forced to restrict the search period to 2023, and we came up with a total of six research publications that were considered sufficient to conduct an in-depth up-to-date analysis.

1.2.2. Analysis Criteria

We needed to create a strong criterion for analyzing the gathered works to guarantee that research gaps were identified and to support the robustness of and need for our suggested technique. Such criterion had to meet previous challenges and condition monitoring needs. The bearing dataset introduces massive sets of bearing tests, the “variable speed and load set” and the “endurance set”. The first one includes 13 subsets of different operating conditions and 7 records of operating modes, including healthy and unhealthy operating modes (further details will be revealed in the data description section). It is mainly dedicated to diagnosis studies solving a multi-class classification problem with imbalanced data with higher levels of volume (i.e., 46,592,000 samples × 6 channels

46,592,000 s a m p l e s \times 6 c h a n n e l s)

, velocity, and variety. The endurance test, with size of 54,067,200 samples × 6 channels, contains a set of three damage severity levels with class portions of

(0.27, 0.22, 0.50)

, respectively. It is generated for a specific type of bearing and dedicated to studying degradation process regression in a sort of health stage classification. It is also important that data can result in a different class distribution ratio, creating another challenge of imbalance classification. Accordingly, in addition to the three aforementioned challenges of data-driven modeling, other analysis criteria, such as whether the authors consider prognosis, diagnostic studies, or both, must also be discussed. Also, discussing the treated operating conditions is of great importance to revealing the complexity of the study and its correspondence to real-world problems. Additionally, the methods used for targeting the above challenges, whether data engineering or learning tools, must also be discussed to reveal research interest in reducing data complexity and improving data representativeness. In this context, the following list of criteria was chosen to conduct this analysis: data complexity (i.e., automatic/analytic feature extraction, outlier removing, noise suppression/reduction, data imbalance), data unavailability (i.e., generative modeling, transfer learning), data drift (i.e., adaptive learning), the problem treated (i.e., diagnosis, prognosis, operating conditions), and finally, the learning algorithms used.

1.2.3. Related-Work Analysis

The study in [9] uses a collection of real-world cases, some of which include the previously mentioned aeronautical bearing dataset [7]. To identify sensitive features, minimize noise, and prevent the loss of important information, the authors suggested a multi-scale slip-averaging method based on sensitive multi-scale symbol dynamic entropy. This is an important breakthrough in feature engineering. The data are randomly divided into training and test sets using a 50% ratio, and classification is performed using a support vector machine. Three-fold cross-validation is used for SVM evaluation and tuning. While the endurance test associated with prognosis was not addressed, the challenges raised here are closely connected to diagnosis. Recall, precision, specificity, and F1 score were the four measures employed in the evaluation. Thus, this work did not target the problem of data unavailability and drift but only focused on data complexity. In [10], the authors suggested a full graph dynamic autoencoder, which consists of modules for graph attention and full connection autoencoders. Damage severity was not explored in this situation; rather, it was solely considered in terms of multi-condition fault identification (i.e., diagnosis). The method was also used in many examples, including the study’s primary dataset. Dealing with automated feature extraction, denoising, imbalanced classification, and data drift were all part of this. However, in this case, only a small number of samples (i.e., hundreds) were used, and the problem complexity was lowered by dividing it into just two operating modes (i.e., healthy and unhealthy), even though the total number of data was massive (i.e.,

46,592,000

samples), and the presence of seven operating modes greatly increased the prediction complexity. Furthermore, the suggested model was evaluated using the same metrics previously used in [9]. In general, this approach does not address data unavailability and complexity at any point. In [11], a multilayer wavelet attention convolutional neural network was suggested to provide machine fault diagnostics while overcoming the distorted effects of noise. Utilizing a discrete wavelet attention layer, physics-based knowledge is added to the deep network as an extra source of information. While other data complexity challenges (e.g., outliers) are not completely taken into account, an adaptive learning technique is employed to combat data drift. The discrete wavelet attention layer is used to address data unavailability. Although the exact method by which the authors employed the moving window to reduce the number of samples is unknown, it seems that about 75% of the samples were used for training, and the other 25%, for testing. Multi-scale three-dimensional Holo–Hilbert spectral entropy, a coarse-grained entropy-based processing technique, was developed in [12], and almost identical research was performed in [9] to extract features from bearing failures while addressing complexity challenges overall. Bat-optimized support vector machine, which is similarly employed for classification (i.e., fault diagnosis only), was then used. The work appears to have decent findings; nevertheless, the author’s usage of a limited number of samples (30 for training and 20 for testing) was deemed insufficient for such analysis. Despite the findings’ accuracy, it is impossible to generalize the obtained conclusions for such very complicated investigation. Additionally, this study did not focus on data drift or availability problems. The authors of [13] treated the problem of data unavailability and its drawbacks in the generalization process. To find out more about learning behavior in various settings and to acquire general knowledge, they employed transfer learning across working conditions. To address the issues of feature extraction (i.e., automated data engineering) and fault diagnosis, deep transfer learning based on graph convolutional networks was presented. No particular attention was paid to the effects of numerous distortions, which include outliers, noise, and other disturbances. The unique aspect of this experiment is that only data from three loads with speed fluctuations between 6000 and 24,000 rpm were chosen to confirm the viability of the suggested approach. As a result, this study had several limitations and did not examine the complete diagnosis dataset. Additionally, no work was performed on prognosis issues. Likewise, the problem of data drift and system’s dynamics received no attention. Interesting work was performed in [14] to provide solutions to both data unavailability and data drift using domain adaptation, generative modeling, and adaptive learning. Complexity was seen as a problem of automatic feature extraction via deep learning, but no additional data complexity and outlier analysis were considered, with the exception of sliding-window overlap sampling, which increases the number of samples while normalization is performed, without further details regarding dataset splitting and the evaluation methodology.

1.2.4. Research Gaps

Analysis using these precise criteria uncovered some significant research gaps in the literature on data-driven diagnosis and prognosis of aeronautical bearings, the most significant of which are given below:

Most of the discussed related works (i) limited their study to a specific set of subsets and working conditions (e.g., [13]), (ii) reduced the problem complexity by turning it into the healthy–non-healthy problem only without in-depth multi-class classification of the seven operating modes (e.g., [10]), or (iii) sub-sampled the dataset by increasing the sliding-window length (e.g., [12]). This considerably reduces the complexity of the problem in general and also reduces the effectiveness of the model in terms of generalization when it comes to real-world application;
Data complexity from the noise reduction perspective received a little interest only from a few works (i.e., [9,11]), while the outlier removal problem has not been considered;
Solving the common problem of imbalanced classification by ensuring fair representation of all classes was not discussed in these cases, which is a big problem of model bias to be considered;
Most of the time, feature extraction techniques were performed automatically by involving deep learning (i.e., [10,13,14]); on the contrary, it is of great importance to explore the spectral nature of the recorded signal before obtaining better representations (e.g., [9,11,12,15]);
Most of the time, evaluation procedures were performed by considering training and test sets that were randomly selected and split. This does not guarantee the learning model’s generalizability, nor does it prevent the risk of overfitting;
Only a few works considered the use of generative modeling and/or domain adaptation transfer learning to overcome the data unavailability issue [13,14]; however, it is necessary to target such a problem, especially when data are artificially generated (i.e., failure patterns forced to exist and not naturally occurring) and not collected from deterioration or a failure mechanism;
Data drift and its dynamics behind the considered system were only addressed in a few works (i.e., [11,14]);
None of these works took into account the second endurance subset, which is tightly linked to the prognosis and severity of bearing damage.

Overall, these shortcomings lead to important conclusions about the generalizability of the models discussed in this work and the need for ongoing improvements, including the need for our contributions in this work.

1.3. Contributions

Based on the above-mentioned gaps, our work makes the following contributions:

In this work, in an attempt to keep the originality of the problem in terms of complexity, a time window with a size of 100 samples is used for extracting time-domain and frequency-domain features, while an overlap of 20 samples is used to increase the number of samples and provide further insights into the correlations among time windows. Compared with previous works, the number of samples is massively increased, to about 45,732 samples. Unlike previous works, we make sure that data scatters represent a very complex feature space difficult to separate, as in real-world applications. This is to make sure that the addressed problems are real-like and not easy;
A robust data engineering scheme, including feature extraction, a list of denoising algorithms, a set of outlier removals, filtering, smoothing, scaling, and balancing, is introduced in this work to serve against multiple types of data distortion and provides a better and more meaningful representation of the feature space while uncovering hidden patterns.
Data imbalance is taken into account in this work by introducing a synthetic minority oversampling technique (SMOTE) for augmenting data with low proportions and preventing loss of information.
This work takes advantage of 15 time- and frequency-domain features to improve the classification performance of the learning process by making the new feature space more robust to noise and providing efficient and interpretable representation more flexible and adaptable to the system current conditions;
Compared with previous works, all of which used a random training/test data division process when evaluating the learning model, this work uses a three-fold cross-validation process, resulting in the analysis of approximately 195 confusion matrices for the diagnostic process and about 18 confusion matrices for the prognostic process. These numerous results are all analyzed to ensure the certainty of the performance of the evaluation procedure.
CSIDTL, a methodology for selecting and aggregating top learners to transfer additional information at different levels of complexity in different cycles, is proposed in this work. The goal is to achieve better performance using the pattern separation ability of the top learners in each round by taking advantage of the pretrained learning weights to initialize the learning model each time;
This work adopts adaptive learning rules of the LSTM network, which adapts to data shifting in time-series analysis better than convolutional networks and other deep learning models [6];
Unlike previous works, the proposed CSIDTL model is further investigated on another complex endurance test classification problem that includes data of higher complexity and cardinality, where many samples of different classes look similar to each other at certain points of the representation. This provides further insights into using the model for prognostics investigations, providing better information for predictive maintenance (i.e., maintenance planning).

1.4. Outline

To make these contributions clearly illustrated in this work, this article is divided into four sections. Besides the introduction in Section 1, in Section 2, the description of the dataset, as well as its processing methodology, will be introduced. The description will be oriented towards the necessary information required for the understanding and reproduction of the experiments in the progress of this work. Likewise, some important illustrative examples will be used to understand the main purposes of the processing scheme and the advantages brought to data representation. Section 3 will introduce the proposed CSIDTL approach, its main concepts, and learning rules. In addition, this section is also devoted to describing the application procedures and the main results with enough illustrations and discussions. Finally, Section 4 will conclude this work and will also provide interesting hints on future opportunities.

2. Dataset Description and Processing

The test rig depicted in Figure 1 was utilized to produce the data used in this study. It consists of a high-speed spindle that rotates a shaft supported by three roller bearings, two accelerometers, and a load cell. The load cell is used to measure the axial force applied to the shaft, while the accelerometers are positioned at two separate points along the shaft, A1 and A2. Additionally, the test rig has a lubrication system distributing oil to the bearings throughout the test. The test rig was developed to monitor system vibration while different measurements are taken under varying operating conditions with variously damaged bearings. Specially designed for high-speed aerospace applications, bearings designated B1, B2, and B3 have varying characteristics in terms of pitch diameter, roller diameter, contact angle, and number of rolling components. In fact, the introductory paper provides a comprehensive analysis of the behavior of different bearing types and sizes under various working conditions. However, it is worth mentioning that the authors did not study each bearing separately. Instead, these bearings were studied as a single system coupled to a single shaft. This is the reason why the authors carried out all their experiments on the B1 bearing (see first paragraph of Section 3.1 in [7]). The Dynamic Identification and Research Group (DIRG) of Politecnico di Torino provided, from this test bed, a huge dataset with two main subsets (i.e., 46,592,000 samples × 6 channels and 54,067,200 samples × 6 channels, respectively) [7]. The first subset, “Variable speed and load”, includes tests under different working conditions up to 30,000 rpm and tests with different types and levels of damage and is mainly dedicated to diagnosis studies. The second one is the “endurance” set, comprising experiments verifying the speed of damage severity under standard operating conditions of load and speed and is dedicated to prognosis studies. Accordingly, six channels with duration

T

and sampling frequency

f_{s}

were collected. Each time-domain acceleration record consists of

T

×

f_{s}

s samples, while the values of

T

and

f_{s}

are given differently for the two investigations performed on the bearings. The data were collected using accelerometric acquisitions at variable rotational speed, radial load, and level of damage, as well as temperature measurements. Specific accelerometer sensitivity values were set in the OR38 signal analyzer, so the files contain acceleration time histories in m/s². The accuracy of the OR38 input channels is as follows: phase of ±0.02°, amplitude of ±0.02 dB, frequency of ±0.005%. The introductory paper [7] does not explicitly mention any potential sources of bias or limitations in the dataset. However, it is important to note that the data were collected under specific working conditions and may not be representative of all possible scenarios. Additionally, the analysis techniques used in the paper may have limitations and may not be applicable to all types of data. This work uses both subsets to test the capability of the proposed data engineering scheme associated with the SCIDTL methodology. Accordingly, this section is dedicated to describing the most important features of the dataset and its processing steps while passing through some important illustrative examples.

2.1. Variable Speed and Load

The variable-speed and -load experiment aimed to study the behavior of bearings with damage of different types and sizes under different operating conditions of rotational speed and load. The test was mainly carried out on the bearing in position B1 (see Figure 1a–c), which was designed to be easily removed from its support to allow for the artificial crafting of different types of defects, mounting, and the monitoring (i.e., taking photos) of bearings with damage of different types and sizes during testing. The test involved running the bearings at different speeds and under different loads to collect data on bearing behavior and performance using vibration sensors. Table 1 discusses different experimental conditions in this case. Table 1 demonstrates that there are 17 operational conditions in this scenario; however, only 13 are available in the public dataset files. Unfortunately, there is no direct and explicit explanation of this matter in the dataset’s original introductory publication [7]. However, according to paragraph 2 on page 265 in [7], the differences between the 17 conditions in Table 1 and the 13 conditions publicly available in the dataset may be attributed to the fact that only a restricted number of samples for each condition were used in the study. The remaining samples might have been removed for a variety of reasons, including data quality concerns or a desire to reduce the computational cost of the study. It is assumed that the load voltage was actually measured during the experiment. Its different values are provided in Appendix A, Table A1 in [7].

As a result, a list of several operating modes was investigated, including one healthy mode (0A) and six unhealthy/fault ones (1A–6A). Using a Rockwell tool, the localized faults on the rolling components resulted in a conical indentation on the inner ring or on a single roller. The damaged elements are reported in Table 2. The size of the resultant circular region is indicated by its approximate measured diameter (i.e., 150 mm, 250 mm, 450 mm). The entire process took around 30 min, and the greatest rotating speeds were not achieved under heavier load conditions due to the restricted power of the inverter.

Figure 2 is a further example showing the indentation of bearing 4A. This indentation is considered to be very useful, as monitoring its evolution makes it possible to understand fault characteristics (e.g., severity and location), assess their impact on bearing performance, guide design improvements, and evaluate corrective maintenance actions.

2.2. Endurance Test

An endurance test verifies the damage propagation rate under standard operating conditions (load and speed) and highlights the influence of lubricating oil. The bearing was not brought to failure during the endurance test, and the physical variations of the indentation, both in shape and in extension, were quite limited. Identifying the evolution of damage given such small and limited variations is a very challenging task. The endurance test was carried out under the same conditions for each measurement, with a rated speed of 300 Hz and a load of 1800 N. The lubricating oil was changed before starting the endurance test, and the new oil had almost the same viscosity as the previous one. However, the lubricating oil was not specifically designed for high-speed applications. Standard oil was not used during the endurance test because it is neurotoxic, and changing it is a mandatory requirement for laboratory experiment safety. The experiment consisted of a long test lasting around 330 h on the 4A bearing at constant speed and load (this information is not revealed in the original article due to non-disclosure arguments). Three different groups of endurance data can be highlighted. These clusters correspond to the groups of acquisitions 19–70 h (End1), 70–124 h (End2), 124–223 h (End3). The clustering of the data is due to the mounting and dismounting of the bearing for the purpose of inspecting the damage and producing the images in Figure 3 in different life stages, while the pictures in sub-figures (c,d) were taken at 300 rpm and 1800 N.

To sum up, Figure 4 shows a summary of the data generation experiments that more clearly addresses the experimental objectives and circumstances. The numbering (1–3), in this case, indicates different steps performed in order, and the time and type of each experiment are indicated.

2.3. Data Processing

To provide a clean and meaningful feature space to feed into the proposed CSIDTL learning process, a set of well-defined steps are followed, namely, feature extraction, a list of denoising algorithms, a set of outlier removals, filtering, smoothing, scaling, and balancing. Figure 5 addresses this data processing methodology by introducing these steps in the form of six different layers. The order of the presented layers is very important for the process to run smoothly. It should be mentioned that this section and the next ones contain many interesting tools and metrics that are used and discussed. Therefore, describing the mathematical background of these tools and their demonstration is not appropriate for this paper in terms of length and number. Instead, a set of very interesting references already well known in the literature will be cited and referred to when describing these tools. Furthermore, this work concentrates solely on the mathematical background of the innovative new formulas that are the focus of this paper and makes them easy to present for replication in programming terms.

Finally, it should be noted that the tools described in the following section might be well known in the literature. However, the main contribution of this paper in terms of data processing lies in the use of these tools in the form of different layers (which will be explained in the next sub-sections) in specific order. In addition, the combination of these different tools is specifically designed to tackle problems presented by data. The order of these layers is defined based on the authors’ area of expertise in maintaining feature scales according to the algorithmic requirements of deep learning. The following sub-sections will also give examples of layer order and the repetition of certain layers to solve similar problems.

2.3.1. Scaling Layer

Measurements are filtered with moving average, fixed window, and three-order one-dimensional median filtering [16]. Median filtering helps to effectively remove impulsive noise from vibration signals without distorting the underlying data. It is particularly useful in applications where preserving sharp edges and details as signal descriptors is important to maintain the general shape of this signal. The scaling layer also involves another step of fixed-time-window averaging based on the time-domain smoothing of signal amplitudes [17]. This further helps reduce noise and smooth signal data and improves overall data quality while achieving more accurate and reliable results. Afterwards, a final min–max normalization slice is included to scale each channel record in the range [0, 1], allowing for the better tuning of learning systems on different channels with different scales [18]. It is important to use this layer in different places and not just at the beginning of the data processing phase. For example, the extraction layer and the denoising layer can change the feature scales, and it is better to rescale them again to ensure the importance of the data after such steps.

2.3.2. Feature Extraction Layer

In this particular case, a set of well-known features are selected to be extracted from both the time domain and the frequency domain, in particular from vibration signals, which are well studied in this work. Based on prior research, these features have proven to be effective in diagnosis and prognosis studies, particularly for bearings. For instance, the work presented in [19,20,21] showcased the capability of these features to achieve higher prognosability. In other words, they allow for the distinguishing between data clusters of healthy and non-healthy patterns. Time-domain features include mean value, standard deviation (Std), skewness, kurtosis, peak to peak (peack2peak), square root of the arithmetic mean (RMS), crest factor, form, impulse factor, margin factor, and energy. Additionally, frequency-domain features include mean value of spectral kurtosis (SKMean), standard deviation of spectral kurtosis (SKStd), spectral kurtosis of skewness (SKSKewness), and spectral kurtosis of kurtosis (SKKurtosis). A detailed mathematical representation of these characteristics can be seen in Appendix A in [19]. It should be mentioned that a time window with a length of 100 samples is slid over the six channels to extract these features with an overlap of 20 samples. This helps reduce computational complexity by reducing 6 × 100 dimensional features to 1 × 15 features for each window while avoiding loss of information by using overlap and the most important features.

2.3.3. Denoising Layer

The extracted features are passed through another slice of denoising. The denoising process, in this case, is based on Cauchy a priori wavelets. Such algorithm is very useful in denoising, because it takes advantage of the inherent sparsity of wavelet coefficients and the robustness provided by the Cauchy prior [9]. The Cauchy prior assumes a heavy-tailed distribution for noise, which is often a more realistic model for real-world data. This helps to remove outliers or extreme noise values, thereby improving denoising performance. Additionally, by using wavelet coefficients, which encode the signal into various frequency components, the method can effectively separate the noise from the signal in the wavelet domain and adaptively estimate the noise-free coefficients. This allows for better preservation of important signal features while reducing noise, making it a powerful technique for denoising applications [9]. According to recent works on bearing fault diagnosis, such as [22,23], the wavelet denoising algorithm can help to improve fault signature extraction, increase classification accuracy, improve robustness to noise and outliers, improve multi-resolution signal analysis, improve feature extraction, enable the use of traditional machine learning and deep learning methods, and achieve a more interpretable signal representation.

2.3.4. Outlier Removal

Being the most important layer in the proposed flowchart of data processing, the outlier removal layer is very effective in improving data quality, especially when aircraft engines are considered, as they are highly dynamic systems susceptible to outliers due to harsh operating conditions. In this case, an outlier detection list that is based on different types of distances and statistical tests (i.e., Grubbs test; Mahalanobis, Euclidean, and Minkowski distances) is used. Overall, these distances and statistical tests play a vital role in outlier detection by quantifying the dissimilarity or anomaly of data points compared with the majority of the dataset. They provide valuable insights into the presence of outliers, helping to identify potential errors, anomalies, or interesting patterns in the data [24]. A mixture of these tools is used to strengthen the outlier detection process and overcome the shortcomings of each by helping to eliminate/reduce outliers as much as possible. It should be mentioned that in such a situation, it is necessary to take into account that each class of data must be processed separately by such algorithm, because processing global data all at once would lead to a massive loss of data samples, as different classes can be seen as tools for others, and so on. A different class is supposed to contain different models, and different models are more similar to each other.

2.3.5. Data Balancing Layer

This work follows a minority oversampling technique (SMOTE) [25] with k-nearest neighbors, which is a powerful method used in machine learning and data analysis to solve class imbalance problems. The technique offers several advantages. First, it helps solve the problem of data scarcity or lack of representative samples for the minority class by generating synthetic samples that closely resemble real instances. This improves the model’s ability to learn patterns and make accurate predictions for the minority class. Second, SMOTE reduces the bias towards the majority class, ensuring a fair representation of both classes in the dataset. This leads to better model performance and avoids the risk of misclassification or underestimation of the minority class. Overall, by effectively improving the training data, SMOTE significantly improves the performance and reliability of machine learning models.

2.4. Illustrative Example of Processed Data

The variable-speed and -load test includes a list of 13 operating conditions, and each condition has seven operating mode scatters (i.e., healthy (0A) and unhealthy (1A–6A)). On the other hand, the endurance test has unknown operating conditions of speed and load that have not been disclosed due to some undisclosed agreements. It includes three stages of bearing severity reflecting the damage propagation process of the bearings. Accordingly, this sub-section will present examples of data scatters in both cases while comparing prepared data and raw data. Comparing scatter plots of raw data and prepared data helps to assess the impact of data preprocessing techniques or methods. It allows for examining how data manipulation has influenced the distribution and pattern of the data points.

2.4.1. Variable-Speed and -Load Set

Regarding the variable-speed and -load test, since data scatters are very numerous, we decided to take some interesting examples of load and speed to be able to observe their effect on data complexity. Accordingly, four conditions, namely, 1, 4, 8, 13, with nominal speed (Hz) and voltage load (mV) of {(100, 0), (100, 900), (300, 500), (500, 500)} corresponding to the qualitative variables of {(minimum speed, minimum load), (minimum speed, maximum load), (average speed, maximum load), (maximum speed, maximum load)} are selected. The reason for selecting only these conditions is that the dataset is massive and contains a wide range of subsets and classes, which cannot be illustrated at once. Therefore, these examples are considered very adequate to obtain a general conclusion on the whole dataset, since they contain information on almost all possible cases of variations in load and working conditions. In our example, we provide rounded values for load voltage with sensitivity of 0.499 mV/N (see first paragraph, page 6 in [7]).

In this context, t-Distributed Stochastic Neighbor Embedding (t-SNE) is used to map high-dimensional data to a low-dimensional space while preserving similarities among data points, with emphasis on distances relative to each other. As a result, the data scatters in Figure 6 are obtained to visualize and explore our dataset. Figure 6a refers to healthy operating modes, low distortions, and less harsh working conditions. There is minimum speed and null load, and the bearings are completely healthy. In this case, it can be observed that samples of different data classes have some pattern agglomeration of different operating modes (e.g., class 4 and class 2) that can be easily distinguished even with a simple linear model. On the other hand, comparing these representations with each other or with other classes clearly explains the complexity of the data, reflected by higher cardinality. Contrariwise, the prepared version in Figure 6e increases this classification capability by showing an additional number of distinguished classes (e.g., 0, 4, 2, 6). This clearly illustrates the efficiency of data processing, especially denoising and removing outliers. It should be mentioned that these details only appear in two-dimensional t-SNE distributions. Thus, it is possible to observe better performance of the proposed data engineering details if we explore three-dimensional data visualization. The same explanation could be projected onto Figure 6b–d when comparing them to their prepared version in Figure 6f–h. This clearly shows that data engineering helps to reveal some important patterns in the data that are hidden in the original feature spaces. Finally, further mapping processes using CSITDL could improve representations of feature spaces.

So far, presentation and argumentation are based on comparing dataset scatters of raw data and prepared data. However, to obtain a clear conclusion about data characteristics beneficial for the next stages of adaptive deep learning, another important note in this case can be mentioned. We can observe, in Figure 6e–h, that the harsher the conditions are (i.e., greater speed and load), the more they lead to numerous distortions. This further complicates the process of understanding and distinguishing different kinds of patterns, increasing the level of cardinality, which definitely increases the complexity of the deep learning process.

2.4.2. Endurance Set

Concerning the endurance test, the experiments show regression in the long-term damage of bearing 4A. This means that the data are highly correlated and the propagation of damage is regressive and slow. Therefore, as the experiment is related to damage severity, the correlation among samples is higher, while it is more difficult to highlight the different patterns of each health stage separately. In this case, we modify the outlier removing layer using a loop, repeating the process over and over for 20 times while also increasing the span of the time window to 50 samples. In another way, no matter the complexity of the diagnosis problem in the first experiment of variable speed and load, the endurance test resembles a set of very complex data scatters with further complexity. Figure 7a represents the original feature space, reflecting a mixture of data patterns and a higher level of complexity. By comparing it with the prepared version, the data representations seem a kind of agglomeration of samples holding some similar patterns. It is actually a very positive indication that the data engineering process helps to process data classes and uncover hidden patterns. An additional comparison between data in the endurance test and variable-speed and -load test can be made in this case. If we compare the raw and prepared data of the first version with those of the second one, it is clearly confirmed that the endurance data are more complex, as previously indicated. In other words, the model reconstruction process also requires further abstractions in this case than in the variable-speed and -load experiment. The agglomeration of the samples shown in Figure 7b is caused by continuous smoothing and filtering as a result of repeating the scaling layer several times, thus allowing similar patterns to be grouped together, thereby improving the quality of the data.

3. Methods, Application, and Result Discussion

This section presents the CSIDTL method, its main learning rules, its application, and the result discussion, focusing only on the new mathematical representation, while the well-known mathematics available in the literature are not considered for description in this section but are referenced herein using straight-to-the-point references.

3.1. Proposed Approach

Figure 8 is a simplified schematic diagram of the proposed CSIDTL. The CSIDTL algorithmic architecture can be simplified in three particular steps.

Step 1: A set of homogeneous learning models, namely, LSTM networks (refer to [26] for details on the mathematical background of LSTM networks), are trained in a loop on the different subsets (13 subsets related to operating conditions) of the variable-speed and -load dataset. This means that each LSTM network is trained for the classification process of 7 different operating modes. In the first round,

k = 0

, the LSTM networks’ learning parameters are randomly initialized from specific probability distribution

P

with mean

μ

and standard deviation

δ

as in (1).

{{w}_{k}^{i}, w_{k}^{r}, b_{k}} = P (μ, δ)

(1)

In this case, the 3-fold cross-validation technique is used to train each network per condition in different rounds

k

. Consequently, there is a total of 13 × 3 trained LSTM networks in each round.

Step 2: Among these 13 × 3 models, only a few models are selected to accomplish the training process of LSTM networks in upcoming rounds. The selection process involves some specific criteria (i.e., testing accuracy, in our case) to perform the following SCIDTL rounds. These models are then stored, and their parameters, aggregated and transferred to initialize all deep networks in the following rounds, and so on. It should be mentioned that the number of stored learning models is incremented. This means that in each new round, both the old, selected models and the newest ones are used for the aggregation and initialization of the learning parameters using transfer learning.

Step 3: Weight initialization involves the aggregation of collected weights under different conditions. This can be referred to as a collaborative training, where the global learning parameters combine information across all different conditions. The transfer learning concept, in this case, appears in both cases of aggregation and weight initialization. This means collaboration across conditions and the fine tuning of the LSTM network for upcoming rounds. So, if we think of global loss in specific round

k

(

l_{G (k)}

) as our main objective function to be minimized,

l_{G (k)}

can be presented as the minimization of loss in both the source domain and the target domain as in (2).

l_{G (k)} = ‖ l_{(k - 1) +} l_{(k)} ‖

(2)

In our case of the LSTM network, from the set of input weights

w_{(c, f, k)}^{i}

and recurrent weights

w_{(c, f, k)}^{r}

and

b_{(c, f, k)}

, where

k = 1 : m

is the number of rounds, we only select a few models and limit them to a new set of parameters,

{w_{s}^{i}, w_{s}^{r}, b_{s}

}, where

s = 1 : n

is the number of selected models incremented in each round.

(c, f)

, in this case, refer to the condition label and the cross-validation fold index, respectively. Afterwards, these collected parameters are aggregated using an averaging method as illustrated in Equations (3) and (4) and used as initial parameters

\{w_{0}^{i}, w_{0}^{r}, b_{0}\}

of the all the models in all following rounds.

S

refers to the selected model index;

n

is the total number of selected models per round;

c

is the number of conditions;

f

is the number of folds;

k

is the index of a round; and

m

the is maximum number of rounds. Subscript 0 refers to initial parameters.

\forall {w_{s}^{i}, w_{s}^{r}, b_{s}}_{s = 1}^{n} \in {{w}_{(c, f, k)}^{i}, w_{(c, f, k)}^{r}, b_{(c, f, k)}}_{k = 1}^{m} | A c c u r a c y \geq 90 %,

\{w_{0}^{i}, w_{0}^{r}, b_{0}\} = \frac{1}{n} \sum_{s = 1}^{n} {w_{s}^{i}, w_{s}^{r}, b_{s}}

(3)

n = {n_{k} + n}_{k - 1} | n_{0} = 0

(4)

If we consider this approach as an algorithm running on a single microprocessor in a sort of repetitive loop under different working conditions, the pseudo-code of such CSIDTL can be presented as in Algorithm 1.

Algorithm 1: Simplified CSIDTL pseudo-code.

Inputs:

Inputs

x

and labels

y

;

Number of conditions

c

;

Number of cross-validation folds

f

;

Maximum number of rounds

m

;

Outputs:

Best initialization parameters

\{w_{0}^{i}, w_{0}^{r}, b_{0}\}

% Initialize the learning model randomly

{{w}_{0}^{i}, w_{0}^{r}, b_{0}} = P (μ, δ)

;

For

k = 1 : m

For

c = 1 : 13

For

f = 1 : 3

% Train the model and extract its parameters;

s.t.

l_{G (k)} = ‖ l_{(k - 1) +} l_{(k)} ‖

% Involvement of transfer learning

% Select

n

and aggregate best learners’ parameters according to a specified criterion;

% Initialize learning parameters for upcoming rounds

\forall {w_{s}^{i}, w_{s}^{r}, b_{s}}_{s = 1}^{n} \in {{w}_{(c, f, k)}^{i}, w_{(c, f, k)}^{r}, b_{(c, f, k)}}_{k = 1}^{m} | A c c u r a c y \geq 90 %

,

\{w_{0}^{i}, w_{0}^{r}, b_{0}\} = \frac{1}{n} \sum_{s = 1}^{n} {w_{s}^{i}, w_{s}^{r}, b_{s}}

;

n = {n_{k} + n}_{k - 1} | n_{0} = 0

;

End

3.2. Application and Result Discussion

In this work, we launched two experiments on two subsets of the dataset using a laptop with a four-core i7 microprocessor, 16 GB RAM, and 12 MB cache memory. In the first experiment of “variable speed and load”, the CSIDTL algorithm was run for five particular rounds in an amount of time of 5.9170 h. Since the process is very slow, there is a small chance of involving random search algorithms and the grid search mechanism for hyperparameter tuning. Therefore, the LSTM network parameters were manually tuned based on an error-and-trial basis instead. Three-fold cross-validation was involved in studying the performance of the proposed approach, while well-known metrics, such as

A c c u r a c y

,

R e c a l l

,

P r e c i s i o n

, and

F 1 s c o r e

, were used to evaluate the performance of the training process. For the mathematical background and significance of these metrics, please refer to the paper [27]. During the evaluation, we focused on collecting testing performance, as it is very important to assess both the approximation and generalization capabilities of learning models. The selection of the best learners was constrained by

A c c u r a c y \geq 90 %

. All trained models of three-folds were involved in the transfer learning-based selection and aggregation process, while the resulting illustration of each model per condition is the average value of the values obtained from the results. Accordingly, a single-layer LSTM network with 40 neurons,

l_{2}

regularization parameter = 1 × 10⁻⁴, initial learning rate = 1 × 10⁻², minibatch size of 200 samples, and the maximum number of epochs of 600 was adopted for the used hyperparameters. After the training process, the best results were collected and labeled as best-round training. It should be mentioned that the results, in this case, were numerous, as a matrix of 4 × 13 elements was obtained per round. Additionally, about 13 × 3 × 5 confusion matrices were analyzed in this case. Therefore, an average of the entire matrix was used to better illustrate and compare global results, because it is difficult to showcase such numerous results.

3.2.1. Variable-Speed and -Load Experiment

Accordingly, Figure 9 is introduced to showcase the obtained global results of the CSIDTL approach compared with the LSTM network, while Table 3 is used to indicate in which rounds the best learners were obtained under each condition. In this case, the classification performance metrics were improved by about 12.15% compared with traditional LSTM. This means that the accuracy was about 93.63% for the entire variable-speed and -load dataset. This is considered an excellent improvement in the field. More specifically, the LSTM network is considered a robust generalizer only when data are processed under less harsh conditions, as it is seen for conditions 3, 6, and 8 in Figure 9a. Contrariwise, the CSIDTL showed its performance on the entire subset. This means that the adopted approach of selecting and aggregating the best learners and transferring them across conditions better helps to overcome the lack of representative data and not only benefits the data drift mechanism of LSTM.

Table 3 introduces the best training models per round under each specific condition. It is clear that the load had an impact on the training process more than rotational speed. This is why the harsh conditions highlighted in green color show that the model needed further and deeper representations (i.e., CSIDTL) than ordinary deep learning (i.e., LSTM). The models also expressed their need for additional information across working conditions, leading to consuming more learning time to reach a better data representation. This fully confirms our initial perception: “learning under data unavailability, complexity, and drift requires generative modeling, adaptive learning, and deep learning all together in a global learning process”.

3.2.2. Endurance Experiment

For the endurance experiment, the best learning parameters obtained (Equations (1) and (2)) from the fifth round of the first experiment were transferred as they were to the learning process dedicated to prognosis. In this case, a single subset of three different classes related to different damage severity related to bearing health conditions was treated. The same learning CSIDTL options were kept in this case. The only thing that changed was the number of training rounds. So, the models seemed to have achieved great results in round 1 (see Table 4) in this case (meaning round 7 if we count the first five rounds in the first experiment resulting in weight initialization in the second experiment). Similarly, in this case, there were 3 × 3 confusion matrices, which are difficult to be illustrated in this case. Instead, the averaged results are showcased in Figure 10. Accordingly, after the learning process, which took about 0.1754 h, the learning performance was improved by about 10.87%, reaching about 95.65% accuracy, which is considered a great achievement for a very complex dataset with a higher level of complexity and drift.

3.3. General Discussion

In general terms, the problem addressed in the case of the two experiments shows the following:

The need to transfer learning across conditions to improve classification performance in terms of the generalization of the learning model (Figure 9 and Table 3);
CSDTL better expresses the relationships between the number of rounds required for training and data complexity to overcome data drift problems. This is highlighted by the more rounds needed for further complex conditions, as shown in Table 3;
The training process showcases the effectiveness of LSTM by retaining adaptive learning, especially for less complex datasets. This is why it needs to be boosted by transfer learning and collaborative aggregation;
The best results on the endurance set were obtained quickly compared with the variable-speed and -load experiment. This was due to the benefits of directly transferring the learning parameters from the fifth round (i.e., holding enough additional information across conditions) to the first learning round.
The improvement rates of performance in both diagnosis and prognosis also reflect the importance of the procedure in the training process.

Overall, the proposed data engineering scheme helps, as the primary step for data complexity reduction, to extract meaningful features and patterns, improving accuracy and reliability in bearing fault diagnosis and prognosis. Techniques like time-domain analysis, frequency-domain analysis, wavelet analysis, and statistical feature extraction help to distinguish among different types of patterns in sensor data of different fault types. The transfer learning and aggregation process leverages additional knowledge from diverse models trained on diverse conditions in other domains, which improves the accuracy and effectiveness of fault diagnosis and prediction algorithms. The LSTM adaptive deep learning capability effectively helps to follow data shifting, keeping the model always up to date and focused on new data. Thus, potentially, depending on the performance of the entire proposed data engineering methodology and CSIDTL, this approach could be applied to fault diagnosis, prognosis, and remaining-useful-life assessment, as well as data anomaly detection and mitigation.

From the perspective of the aerospace industry, where reliability and safety are paramount, this coupled data engineering scheme within CSIDTL could significantly improve maintenance practices. Additionally, it could help identify potential bearing failures or anomalies, enabling timely maintenance and reducing the risk of critical malfunctions. Additionally, the real-time monitoring provided by CSIDTL fault diagnosis/prognosis could help optimize operational efficiency by minimizing downtime and increasing productivity. This methodology could also be extended beyond aerospace to other areas, such as automotive or industrial machinery, where bearing efficiency and performance are vital. As a result, the practical application of such methodology for real-world fault diagnosis/prognosis has great potential for improving asset management and safety in a variety industries.

3.4. Comparison with Previous State-of-the-Art Works

As mentioned earlier in the section on research gaps, previous state-of-the-art works transformed datasets into less complex feature spaces by performing sub-sampling, selecting a few sets to illustrate the performance of their model, or merging different sub-classes of unhealthy modes to reduce the number of classes, thereby reducing the complexity of the problem to be solved. While doing so, most of the experiments discussed a simple random splitting of the dataset, while the prognosis case was not considered the main goal of their studies. As a result, these works clearly achieved great accuracy in this case. However, as far as generalization is concerned, their models lack generalization at some point compared with this study. In fact, this study is conducted on a way-too-massive feature space, with multi-class classification problems (i.e., seven operating modes) and different case-study scenarios (diagnosis and prognosis). Additionally, this paper treats the data unavailability problem, drift, and complexity instead of only treating complexity as in previous works. Furthermore, this paper specifically introduces a well-structured data engineering scheme, justified by data challenges, that leads to outstanding data representation as the primary step to target data complexity. The proposed work delves deeper into details than previous works, yielding outstanding results despite the increased complexity. This is why a numerical comparison between the results obtained in this paper and in previous ones cannot be performed and would not be fair, as our work and previous works follow different approaches and reach different objectives and conclusions.

4. Conclusions

This work supports the hypothesis that diagnosis and prognosis using data-driven methods in highly dynamic systems, in particular safety-critical systems, must take into account three main challenges: data drift, complexity, and unavailability. This work, therefore, addresses this hypothesis by introducing a learning model called CSIDTL that involves adaptive learning, deep learning, and transfer learning to address these challenges, respectively. The model is constructed to involve incremental selective learning and collaborative transfer learning in different learning rounds to address these challenges in a regressive manner. Compared with traditional deep learning models such as LSTM and, in general, compared with previous work that mainly tackles data complexity issues, the model achieves better results, whose stability is demonstrated using cross-validation. This confirms, to a large extent, the need for such a mix of learning tools and domains to perform such a complex process. In terms of future prospects and opportunities and concerning feature engineering, further algorithms and tools for detecting and removing outliers/noise should be discussed to further enhance data quality, and suggesting tools for assessing such quality instead of data visualization would be a great advantage. Further, from the perspective of learning systems, this work uses a single-layer LSTM model to realize the CSIDTL process, so the introduction of a deeper architecture and adaptive learning variants, such as bidirectional LSTM and gated recurrent units, with a different learning philosophy such as ensemble learning, in addition to additional generative modeling tools, could enable the three data-related challenges to be tackled at a higher level.

Author Contributions

Conceptualization, T.B.; methodology, T.B. and M.B.; validation, T.B. and M.B.; formal analysis, T.B. and M.B.; investigation, T.B.; resources, T.B.; data curation, T.B. and M.B.; writing—original draft preparation, T.B.; writing—review and editing, T.B. and M.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All the materials required to reproduce the findings of this study are available at https://zenodo.org/record/8385839 (accessed on 27 August 2023).

Acknowledgments

The authors would like to thank the Dynamic Research and Identification Group (DIRG) of Politecnico di Torino and the authors of the introductory article (Daga, A.P.; Fasana, A.; Marchesiello, S.; Garibaldi, L.) for making their dataset publicly available, which proved very important and useful for conducting the proposed CSIDTL experiments.

Conflicts of Interest

The authors declare no conflict of interest.

References

Berghout, T.; Mouss, M.-D.; Mouss, L.; Benbouzid, M. ProgNet: A Transferable Deep Network for Aircraft Engine Damage Propagation Prognosis under Real Flight Conditions. Aerospace 2022, 10, 10. [Google Scholar] [CrossRef]
Rejith, R.; Kesavan, D.; Chakravarthy, P.; Narayana Murty, S.V.S. Bearings for Aerospace Applications. Tribol. Int. 2023, 181, 108312. [Google Scholar] [CrossRef]
Wei, Z.; Zhang, S.; Jafari, S.; Nikolaidis, T. Gas Turbine Aero-Engines Real Time on-board Modelling: A Review, Research Challenges, and Exploring the Future. Prog. Aerosp. Sci. 2020, 121, 100693. [Google Scholar] [CrossRef]
Saxena, A.; Goebel, K.; Simon, D.; Eklund, N. Damage Propagation Modeling for Aircraft Engine Run-to-Failure Simulation. In Proceedings of the International Conference on Prognostics and Health Management, Denver, CO, USA, 6–9 October 2008; pp. 1–9. [Google Scholar]
Arias Chao, M.; Kulkarni, C.; Goebel, K.; Fink, O. Aircraft Engine Run-to-Failure Dataset under Real Flight Conditions for Prognostics and Diagnostics. Data 2021, 6, 5. [Google Scholar] [CrossRef]
Berghout, T.; Benbouzid, M. A Systematic Guide for Predicting Remaining Useful Life with Machine Learning. Electronics 2022, 11, 1125. [Google Scholar] [CrossRef]
Daga, A.P.; Fasana, A.; Marchesiello, S.; Garibaldi, L. The Politecnico Di Torino Rolling Bearing Test Rig: Description and Analysis of Open Access Data. Mech. Syst. Signal Process. 2019, 120, 252–273. [Google Scholar] [CrossRef]
Berghout, T.; Mouss, L.H.; Kadri, O.; Saïdi, L.; Benbouzid, M. Aircraft Engines Remaining Useful Life Prediction with an Adaptive Denoising Online Sequential Extreme Learning Machine. Eng. Appl. Artif. Intell. 2020, 96, 103936. [Google Scholar] [CrossRef]
Tan, H.; Xie, S.; Zhou, H.; Ma, W.; Yang, C.; Zhang, J. Sensible Multiscale Symbol Dynamic Entropy for Fault Diagnosis of Bearing. Int. J. Mech. Sci. 2023, 256, 108509. [Google Scholar] [CrossRef]
Yan, S.; Shao, H.; Min, Z.; Peng, J.; Cai, B.; Liu, B. FGDAE: A New Machinery Anomaly Detection Method towards Complex Operating Conditions. Reliab. Eng. Syst. Saf. 2023, 236, 109319. [Google Scholar] [CrossRef]
Wang, H.; Liu, Z.; Peng, D.; Zuo, M.J. Interpretable Convolutional Neural Network with Multilayer Wavelet for Noise-Robust Machinery Fault Diagnosis. Mech. Syst. Signal Process. 2023, 195, 110314. [Google Scholar] [CrossRef]
Zheng, J.; Ying, W.; Tong, J.; Li, Y. Multiscale Three-Dimensional Holo–Hilbert Spectral Entropy: A Novel Complexity-Based Early Fault Feature Representation Method for Rotating Machinery. Nonlinear Dyn. 2023, 111, 10309–10330. [Google Scholar] [CrossRef]
Zhao, X.; Zhu, X.; Yao, J.; Deng, W.; Cao, Y.; Ding, P.; Jia, M.; Shao, H. Intelligent Health Assessment of Aviation Bearing Based on Deep Transfer Graph Convolutional Networks under Large Speed Fluctuations. Sensors 2023, 23, 4379. [Google Scholar] [CrossRef]
Wang, X.; Jiang, H.; Wu, Z.; Yang, Q. Adaptive Variational Autoencoding Generative Adversarial Networks for Rolling Bearing Fault Diagnosis. Adv. Eng. Inform. 2023, 56, 102027. [Google Scholar] [CrossRef]
Thelaidjia, T.; Chetih, N.; Moussaoui, A.; Chenikher, S. Successive Variational Mode Decomposition and Blind Source Separation Based on Salp Swarm Optimization for Bearing Fault Diagnosis. Int. J. Adv. Manuf. Technol. 2023, 125, 5541–5556. [Google Scholar] [CrossRef]
Ohki, M.; Zervakis, M.E.; Venetsanopoulos, A. N. 3-D Digital Filters. Control. Dyn. Syst. 1995, 69, 49–88. [Google Scholar]
Smith, S.W. Moving Average Filters. In Digital Signal Processing; Elsevier: Amsterdam, The Netherlands, 2003; pp. 277–284. [Google Scholar]
Han, J.; Kamber, M.; Pei, J. Data Preprocessing. In Data Mining; Elsevier: Amsterdam, The Netherlands, 2012; pp. 83–124. [Google Scholar]
Qiu, G.; Gu, Y.; Chen, J. Selective Health Indicator for Bearings Ensemble Remaining Useful Life Prediction with Genetic Algorithm and Weibull Proportional Hazards Model. Meas. J. Int. Meas. Confed. 2020, 150, 107097. [Google Scholar] [CrossRef]
Zhang, J.; Xu, B.; Wang, Z.; Zhang, J. An FSK-MBCNN Based Method for Compound Fault Diagnosis in Wind Turbine Gearboxes. Meas. J. Int. Meas. Confed. 2021, 172, 108933. [Google Scholar] [CrossRef]
Schneider, T.; Helwig, N.; Schutze, A. Automatic Feature Extraction and Selection for Condition Monitoring and Related Datasets. In Proceedings of the IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Houston, TX, USA, 14–17 May 2018; pp. 1–6. [Google Scholar] [CrossRef]
Fu, S.; Wu, Y.; Wang, R.; Mao, M. A Bearing Fault Diagnosis Method Based on Wavelet Denoising and Machine Learning. Appl. Sci. 2023, 13, 5936. [Google Scholar] [CrossRef]
Yan, R.; Shang, Z.; Xu, H.; Wen, J.; Zhao, Z.; Chen, X.; Gao, R.X. Wavelet Transform for Rotary Machine Fault Diagnosis:10 Years Revisited. Mech. Syst. Signal Process. 2023, 200, 110545. [Google Scholar] [CrossRef]
Smiti, A. A Critical Overview of Outlier Detection Methods. Comput. Sci. Rev. 2020, 38, 100306. [Google Scholar] [CrossRef]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-Sampling Technique. Ecol. Appl. 2011, 30, e02043. [Google Scholar] [CrossRef]
Yu, Y.; Si, X.; Hu, C.; Zhang, J. A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures. Neural Comput. 2019, 31, 1235–1270. [Google Scholar] [CrossRef] [PubMed]
Tharwat, A. Classification Assessment Methods. Appl. Comput. Inform. 2021, 17, 168–192. [Google Scholar] [CrossRef]

Figure 1. Overview of the test rig and its main components: (a) the test rig; (b) accelerometers’ and reference system’s positions; (c) roller bearing shaft. Reproduced from [7], Elsevier: 2019.

Figure 2. Example of initial indentation of bearing 4A. Reproduced from [7], Elsevier, 2019.

Figure 3. Illustrative damage propagation examples in endurance test on bearing 4A: (a) fault patterns after 19 h; (b) fault patterns after 70 h; (c) fault patterns after 124 h; (d) fault patterns after 232 h. Reproduced from [7], Elsevier, 2019.

Figure 4. Dataset generation goals and scenarios.

Figure 5. Illustration of the data processing methodology.

Figure 6. Comparison between raw and processed data scatters of variable-speed and -load experiments: (a–d) raw data scatters of conditions 1, 4, 8, 13, respectively; (e–h) prepared data scatters of conditions 1, 4, 8, 13, respectively.

Figure 7. Comparison between raw and processed data scatters of endurance experiments: (a) raw data scatters; (b) prepared data scatters.

Figure 8. Schematic diagram of the proposed CSIDTL approach: (a) training a set of learning models separately for each operating condition; (b) using the cross-validation technique to evaluate model performance; (c) selecting and storing the learning parameters of the best learners based on accuracy; (d) unleashing the aggregation process only for selected parameters and determine initial parameters of following rounds.

Figure 9. Comparison of classification performance of studied models: (a) LSTM network; (b) LSTM network with CSIDTL learning methodology.

Figure 10. Results obtained in the endurance test dataset for each round.

Table 1. Different operating conditions of the tested bearings in the variable-speed and -load experiment [7].

Load (N)	Speed (Hz)
0	100	200	300	400	500
1000	100	200	300	400	500
1400	100	200	300	400	–
1800	100	200	300	–	–

Table 2. Different types of crafted defects of tested bearings [7].

Name	Defect	Size (mm)
0A	No defect	-
1A	Diameter of an indentation on the inner ring	450
2A	Diameter of an indentation on the inner ring	250
3A	Diameter of an indentation on the inner ring	150
4A	Diameter of an indentation on a roller	450
5A	Diameter of an indentation on a roller	250
6A	Diameter of an indentation on a roller	150

Table 3. Best learners per rounds.

Condition	1	2	3	4	5	6	7	8	9	10	11	12	13
Speed (Hz)	100	100	100	100	200	200	200	300	300	300	400	400	500
Load voltage (mV)	0	500	700	900	500	700	900	500	700	900	500	700	500
Best round	1	1	0 *	5	3	3	0 *	0 *	1	4	3	4	4
Elapsed time (h)	0.9237	0.9237	1.1060	0.7888	1.1097	1.1097	1.1060	1.1060	0.9237	1.0546	1.1097	1.0546	1.0546

* Round 0, in this case, means that the LSTM model without SCIDTL recurrence was used.

Table 4. Best learners per rounds.

Condition	Speed (Hz)	Load Voltage (mV)	Best Round	Elapsed Time (h)
1	-	-	1	0.1754

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Berghout, T.; Benbouzid, M. Diagnosis and Prognosis of Faults in High-Speed Aeronautical Bearings with a Collaborative Selection Incremental Deep Transfer Learning Approach. Appl. Sci. 2023, 13, 10916. https://0-doi-org.brum.beds.ac.uk/10.3390/app131910916

AMA Style

Berghout T, Benbouzid M. Diagnosis and Prognosis of Faults in High-Speed Aeronautical Bearings with a Collaborative Selection Incremental Deep Transfer Learning Approach. Applied Sciences. 2023; 13(19):10916. https://0-doi-org.brum.beds.ac.uk/10.3390/app131910916

Chicago/Turabian Style

Berghout, Tarek, and Mohamed Benbouzid. 2023. "Diagnosis and Prognosis of Faults in High-Speed Aeronautical Bearings with a Collaborative Selection Incremental Deep Transfer Learning Approach" Applied Sciences 13, no. 19: 10916. https://0-doi-org.brum.beds.ac.uk/10.3390/app131910916

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Diagnosis and Prognosis of Faults in High-Speed Aeronautical Bearings with a Collaborative Selection Incremental Deep Transfer Learning Approach

Abstract

Featured Application

Abstract

1. Introduction

1.1. Motivations

1.2. Extracting Research Gaps from Related Works

1.2.1. Research Work Collection Methodology

1.2.2. Analysis Criteria

1.2.3. Related-Work Analysis

1.2.4. Research Gaps

1.3. Contributions

1.4. Outline

2. Dataset Description and Processing

2.1. Variable Speed and Load

2.2. Endurance Test

2.3. Data Processing

2.3.1. Scaling Layer

2.3.2. Feature Extraction Layer

2.3.3. Denoising Layer

2.3.4. Outlier Removal

2.3.5. Data Balancing Layer

2.4. Illustrative Example of Processed Data

2.4.1. Variable-Speed and -Load Set

2.4.2. Endurance Set

3. Methods, Application, and Result Discussion

3.1. Proposed Approach

3.2. Application and Result Discussion

3.2.1. Variable-Speed and -Load Experiment

3.2.2. Endurance Experiment

3.3. General Discussion

3.4. Comparison with Previous State-of-the-Art Works

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI