Next Article in Journal
Long-Term Follow-Up of a Novel Surgical Option Combining Fibula Free Flap and 3D-Bioprinted, Patient-Specific Polycaprolactone (PCL) Implant for Mandible Reconstruction
Previous Article in Journal
Polysaccharides and Structural Proteins as Components in Three-Dimensional Scaffolds for Breast Cancer Tissue Models: A Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Disentangled VAE-BiLSTM Model for Heart Rate Anomaly Detection

by
Alessio Staffini
1,2,3,
Thomas Svensson
1,4,5,*,
Ung-il Chung
1,4,6 and
Akiko Kishi Svensson
1,5,7
1
Precision Health, Department of Bioengineering, Graduate School of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8655, Japan
2
Advanced Technology Department, ALBERT Inc., Shinjuku Front Tower 15F, 2-21-1, Kita-Shinjuku, Shinjuku-ku, Tokyo 169-0074, Japan
3
Department of Economics and Finance, Catholic University of Milan, Largo Gemelli 1, 20123 Milan, Italy
4
Graduate School of Health Innovation, Kanagawa University of Human Services, Research Gate Building Tonomachi 2-A 2, 3F, 3-25-10 Tonomachi, Kawasaki-ku, Kawasaki-shi 210-0821, Japan
5
Department of Clinical Sciences, Skåne University Hospital, Lund University, 205 02 Malmö, Sweden
6
Clinical Biotechnology, Center for Disease Biology and Integrative Medicine, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8655, Japan
7
Department of Diabetes and Metabolic Diseases, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan
*
Author to whom correspondence should be addressed.
Submission received: 14 April 2023 / Revised: 19 May 2023 / Accepted: 1 June 2023 / Published: 3 June 2023
(This article belongs to the Section Biosignal Processing)

Abstract

:
Cardiovascular diseases (CVDs) remain a leading cause of death globally. According to the American Heart Association, approximately 19.1 million deaths were attributed to CVDs in 2020, in particular, ischemic heart disease and stroke. Several known risk factors for CVDs include smoking, alcohol consumption, lack of regular physical activity, and diabetes. The last decade has been characterized by widespread diffusion in the use of wristband-style wearable devices which can monitor and collect heart rate data, among other information. Wearable devices allow the analysis and interpretation of physiological and activity data obtained from the wearer and can therefore be used to monitor and prevent potential CVDs. However, these data are often provided in a manner that does not allow the general user to immediately comprehend possible health risks, and often require further analytics to draw meaningful conclusions. In this paper, we propose a disentangled variational autoencoder (β-VAE) with a bidirectional long short-term memory network (BiLSTM) backend to detect in an unsupervised manner anomalies in heart rate data collected during sleep time with a wearable device from eight heterogeneous participants. Testing was performed on the mean heart rate sampled both at 30 s and 1 min intervals. We compared the performance of our model with other well-known anomaly detection algorithms, and we found that our model outperformed them in almost all considered scenarios and for all considered participants. We also suggest that wearable devices may benefit from the integration of anomaly detection algorithms, in an effort to provide users more processed and straightforward information.

Graphical Abstract

1. Introduction

Cardiovascular diseases (CVDs) are a group of disorders that affect the heart and the blood vessels and cause millions of deaths every year [1,2]. A recent report [3] highlights that CVDs are the leading cause of death regardless of sex and ethnicity, and that one person dies every 34 s in the United States due to CVDs. Heart attack can also be “silent”, meaning that while the individual does not realise it has happened, damage has nevertheless occurred [4]. In addition, the economic burden associated with CVDs continues to increase [5,6].
Many studies use heart rate (HR) as the primary vital sign when trying to infer CVDs [7,8,9]. It is therefore of paramount importance to develop tools that inform individuals of possible anomalies (to be understood as unusual observations that do not conform with the expected HR pattern) in their HR behaviours, with the aim of preventing as much as possible the onset of CVDs [10].
The onset of CVD is difficult to predict and a conclusive diagnosis is challenging, as both depend on the idiosyncratic characteristics of the individual patient. Detecting small anomalies is a challenging task even for experienced physicians [11]. In recent years, researchers have focused on developing multiple approaches for predicting risk factors, such as logistic regression models [12,13], Cox proportional-hazards regression models [14,15], and accelerated failure time models [16]. A systematic review of prediction models for CVD risk can be found in the work of Damen et al. [17].
Machine learning models, for example random survival forests [18] and K-nearest neighbours algorithms [19], are also now providing interesting results.
Nevertheless, none of these models can be applied interchangeably without any recalibration, due to different risk factor profiles among different regions and ethnicities [20,21]. The complex dependencies between risk factors and CVDs highlight a growing need to take into account the idiosyncrasies of individuals, to better prevent the onset of new cases and to better tackle existing cases.
Recent years have witnessed rapid diffusion in the use of wristband-style wearable devices. Prominent examples include the Fitbit, Apple Watch, and Garmin. These and others can monitor the user’s health and collect HR data [22]. Despite not being as accurate as a standard electrocardiograph (ECG) [23], these devices still provide much useful personalized information that can be used to identify irregularities or abnormal behaviours in vital data, for example HR anomalies. If properly processed, data obtained from wearable devices can accelerate the shift towards patient-centric care, which is a growing worldwide movement [24].
Anomaly detection is an active research field, with many applications in computer vision, manufacturing, finance, and medicine. It is concerned with the identification of observations that greatly differ from the majority of the dataset and from what can be considered a “normal” behaviour [25]. Not surprisingly, anomaly detection has also been extensively applied to wearable-device data. To name only a few such studies, Ref. [26] develops anomaly detection algorithms using scoring matrices extracted from wearables; Ref. [27] uses HR and sleep data obtained from Huami wearable devices to identify physiological anomalies caused by COVID-19 infection; Ref. [28] tests the ability of a smartwatch application to detect irregular pulses and identify atrial fibrillation; and ref. [29] compares multiple forecasting models on minute-by-minute HR data. A detailed review of anomaly detection for wearables data can be found in the work by Sunny et al. [24].
In the present study, we focus on HR values collected during sleep time using the Fitbit Versa (FV), a smartwatch produced by Fitbit Inc. We decided to focus only on sleep time for two reasons. First, the issue of missing values is a notorious problem when using data collected from wearable devices, which requires consideration of the most appropriate way to impute such values according to the situation at hand [24]. By including only participants who wore a FV during the night, we obtained continuous observations and thereby avoided the need to impute values and minimized the associated information loss. Second, anomalies in resting HR are known to be correlated with CVDs [30,31,32]; accordingly, by focusing on sleep time only, we minimized the noise in HR values introduced by individual peculiarities not clearly identifiable by the wearable device, such as alcohol consumption, physical activities, or anxiety/emotional distress. An important extension of this work (which is left for future research) would be to include other important HR predictors (such as activity level, effect of possible medications, and lifestyle habits) and also identify anomalies during the daytime.
In short, our contributions are as follows:
(1)
We propose a combination of multiple unsupervised machine learning algorithms and a sliding-window-based dynamic outlier detection approach to label the data, taking into consideration both contextual and global anomalies.
(2)
We develop an anomaly detection algorithm based on disentangled variational autoencoder (β-VAE) and bidirectional long short-term memory network (BiLSTM), and validate its effectiveness on HR wearables data by comparing its performance with well-known and state-of-the-art anomaly detection algorithms. Adding a BiLSTM backend to the VAE model allows us to capture contextual relationships in VAE-processed HR sequences by analyzing both forward and backward directions of the information flow. Ultimately, this leads the algorithm to better model the considered time series and to learn more accurate patterns.
(3)
We explore the latent space of our proposed algorithm and compare it with that of a standard VAE, giving consideration to how tuning the β parameter helps with anomaly detection and with encoding temporal sequences.
A timely diagnosis is important to effectively combat heart rhythm disturbances [1]. With the introduction of wearable devices, HR tracking has become simpler, and it can be continuously performed [22]. Automatically identifying the presence of anomalies in HR alerts the individual about the possible presence of some CVDs, so that he/she can perform targeted clinical checks. An arrhythmia is an abnormality in the heart rate or rhythm, in which the heart can beat too fast (tachycardia, with a beat greater than 100 beats per minute (BPM)), too slow (bradycardia, with a beat below 60 BPM), or with an irregular rhythm. When the heartbeat is irregular, it is necessary to contact a medical doctor and, for more detailed information, a cardiologist [5]. Arrhythmia experts can do a lot in the treatment of HR disturbances, which might be a sign of the onset of CVDs such as myocardial infarction, arterial hypertension, and heart failure [11]. The treatment of cardiac arrhythmias is possible with both surgical and pharmacological treatment.

2. Materials and Methods

2.1. Study Participants and Wearable Device

The original participants of this study were recruited from five companies in Tokyo, Japan, as part of a larger project with the goal of studying the impact of lifestyle choices on metabolic syndrome (MetS). Participants were recruited from among employees who, based on the results of their annual health check-up, had been categorized with being at risk of MetS or having MetS. A total of 272 individuals were enrolled in a three-month randomized controlled trial on lifestyle changes. For the present study, as a first filter we focused only on individuals in the intervention group.
All participants received a FV as a smartwatch, together with instructions on how to wear it. Furthermore, they were asked to complete a questionnaire about their lifestyle, socioeconomic conditions, and past/present medical history. More detailed information on the FV and how it detects HR values can be found on the official company website (https://www.fitbit.com; accessed on 18 January 2023) and in our previous study [29].
Ethical guidelines and current regulations in Japan were respected in conducting the research. All participants were provided with detailed information about the original study and its purpose, and gave written consent for participation.
We selected eight participants (hereafter Participant 1, Participant 2, etc.) with heterogeneous characteristics and different lifestyles as a convenience sample to evaluate the performance of our considered models across a heterogeneous sample. Table 1 summarizes the differentiation in participant characteristics, including their age, sex, medical history, smoking/drinking habits, and exercise habits.
As previously mentioned, to reduce noise and avoid the need to impute values, we focused on sleep time only, and considered eligible only those participants who wore the FV during the night. In principle, the possibility of missing data remains due to instrument malfunctions or power loss [24]. However, we detected no missing values for any considered participant throughout the examination period.
Mean HR data were sampled both in 30 s and 1 min frequency scenarios.
For all participants and for every night, we excluded the first and last 30 min of sleep to remove possible irregularities in HR associated with falling asleep and waking up phases.

2.2. Data Labeling and Preprocessing

Even if all the models considered in this study operate in an unsupervised manner and can be used, after training, for online anomaly detection, we needed to label the data to allow comparison of their performance in the test set. Data labeling for big data is known as a tedious and time-consuming task [33], and many anomaly detection algorithms have been proposed to automatically deal with the problem; for example, Ref. [34] has proposed the isolation forest (IF) [35] to create a labeled dataset, which is then used for training a long short-term memory (LSTM) autoencoder for land condition anomaly detection; Ref. [36] has applied the one-class support vector machine (OCSVM) [37] to identify anomalies in network traffic; and [38] has used clustering to separate anomalous log files from normal ones, and then fed the labeled data into an XGBoost [39] model to identify decision rules to perform classification.
To our knowledge, however, no clear medical criteria to define anomalies in wearable-device data (as opposed to ECGs) have appeared, and no study has proposed how to automatically label them. To develop an effective detection algorithm, we want to consider both global and contextual anomalous HR values. First, we separately applied IF, OCSVM (with radial basis function kernel), and kernel density estimation (KDE) with a Gaussian kernel as anomaly detectors. Each of these methods has different pros and cons [40,41,42], such as with regard to the number of false negative errors, excessive sensitivity to noise, and detection of global anomalies only. To deal with these problems, we took the intersection of the sets of anomalies identified by each algorithm in an ensemble learning fashion. By doing so, we identified a set of points that can be considered anomalies with high confidence but are mainly global outliers. We therefore applied the sliding windows concept, selecting a window length of 3%, and classified as an anomaly any point three standard deviations above or below the HR mean of the considered segment. We then merged the set of points identified by the sliding window approach with the set of anomalies collected by intersecting the unsupervised machine learning methods; although some of the points (mainly global outliers) were present in both sets, doing so allowed us to add contextual outliers with greater confidence. Algorithm 1 schematically presents the data labeling process we utilized.
Algorithm 1: Data Labeling
Input :  HR data of the participant  i  ( H R i ), set of participants  I
Output: set of anomalies for the participants ( A )
Models: Isolation Forest ( I F ), One-Class Support Vector Machine ( O C S V M ), Kernel Density Estimation ( K D E ), Sliding-Window ( S W )
for  i   =  1,2,…,  I  do
a i _ I F     I F H R i
a i _ O C S V M     O C S V M H R i
a i _ K D E     K D E H R i
a i _ S W     S W H R i
A i    { a i _ I F   a i _ O C S V M     a i _ K D E   a i _ S W
end for
A    [ A 1 A 2 ;…; A I ]
return  A  
After labeling, we split the data into training, validation, and test sets, considering 5 nights as training, 1 night as validation, and 1 night as testing. Our proposed anomaly detection algorithm and the other competitor models we introduce in the next section operate by being trained on “clean” (i.e., without anomalies) data. Therefore, we removed any anomalies from the training data by replacing their values with the previous non-anomalous HR values.
Table 2 reports the number of anomalies in the test set using both HR value scenarios, namely collection every 30 s and every minute.

2.3. Anomaly Detection Models

The main idea behind the anomaly detection models we implemented in this paper is as follows: first, we forecast values for the dependent variable (HR values) for a certain number of time steps; then, we measured the error between the predicted and true values and labeled those values whose error was above a selected threshold as anomalies. The threshold was defined in terms of the number of standard deviations from the mean of the squared errors over the training set; for each scenario we tuned this hyperparameter by evaluating the performance of the models on the validation set.

2.3.1. ARIMA

Autoregressive integrated moving average (ARIMA) models are popular in time series forecasting [43,44,45] and have also been applied to anomaly detection [46,47].
The equation of  A R I M A p , d , q  can be written as:
Δ d y t = β 0 + i = 1 p β i Δ d y t i + ε t + j = 1 q θ j ε t j ,
where  Δ d  denotes the  d -th difference of  y t  (the dependent variable at time  t ), expressed as a linear combination of its  p  lagged observations and  q  lagged observations of the residual error terms.  β 0  denotes the intercept of the ARIMA model,  B = β 1 , β 2 , , β p T  and  Θ = θ 1 , θ 2 , , θ q T  are vectors of coefficients, and  ε t ~ W N 0 , σ 2 .
In practice, however, after applying the augmented Dickey–Fuller (ADF) test [48], we found that all our time series were stationary, so no differencing was required, and our formulation was equivalent to an  A R M A p , q  model. Values of  p  and  q  were selected by observing the autocorrelation (AF) and partial autocorrelation (PACF) plots, whose specifications very often coincided with those provided by the Akaike information criterion (AIC) [49].

2.3.2. LSTM

Long short-term memory (LSTM) [50] is a kind of recurrent neural network (RNN) which deals well with the vanishing gradient problem faced by the classical formulation of the latter [51]. It is suitable for sequence-prediction problems as it can capture long-term dependencies, making it also applicable to anomaly detection tasks [52,53]. LSTM networks comprise gates that control which information from the input should enter the network, be stored, or be discarded. The LSTM equations at a given time step  t  are as follows:
F t = σ W F x t + U F h t 1 + b F ,
I t = σ W I x t + U I h t 1 + b I ,
C ˜ t = tanh W C x t + U C h t 1 + b C ,
O t = σ W O x t + U O h t 1 + b O ,
C t = F t C t 1 + I t C ˜ t ,
h t = O t tanh C t ,
where  F t I t , and  O t  denote the forget gate, the input gate, and the output gate, respectively;  C ˜ t  and  C t  are, respectively, the cell input activation vector and the cell status;  x t  is the input vector to the LSTM cell; and  h t  denotes the output of the cell.  W U , and  b  are, respectively, recurrent weights, input weights, and biases to be learned during training of the network;  σ  is the sigmoid activation function; and   denotes the Hadamard product.
We implemented an LSTM anomaly detector similar to the one proposed by [52]. In particular, we stacked three LSTM layers (with 32, 16, and 8 cells, respectively), using a hyperbolic tangent ( t a n h ) activation function. To prevent overfitting, we implemented max-norm regularization (bounding the norms of the weights and biases to be less than 3) and dropout [54] in each hidden layer. Adam [55] was used as optimizer, and early stopping was applied to further improve generalization performance [56].

2.3.3. CAE-LSTM

Autoencoders (AEs) are neural networks that try to learn an efficient lower-dimensional representation of the input data. They are made by encoder and decoder components. First, the encoder compresses the input vector  x R n  into a lower dimensional embedding  z R d , with  d < n ; the decoder then tries to reconstruct  x  from  z , producing an output vector  x ˜ R n . They have been used for dimensionality reduction [57,58,59], classification [60,61,62], and anomaly detection [63,64].
We implemented a convolutional autoencoder (CAE) coupled with an LSTM backend, following an idea similar to that described in [65]: first, the autoencoder was trained to learn good embeddings of the input data; then, we then passed these learned embeddings to an LSTM architecture like the one described in the above section. If the encoder generates meaningful embeddings, using them instead of the raw data allows the LSTM to track events over longer time windows.
We selected a CAE instead of a feedforward AE because it better reconstructed the raw samples. The encoder network was composed by three one-dimensional convolutional layers (16, 8, and 1 filter, all with stride lengths of 2 and with kernel sizes of 5, 5, and 1, respectively), with rectified linear unit ( ReLU ) activation function and stride length of 2. The raw samples to compress were of length  n = 32 , and the latent space dimensionality  d = 4 . The decoder network was a mirrored version of the encoder, which works well in practice [66]. Dropout and early stopping were again used as regularization techniques. With regard to the cost function  C , which measures how good the decoder reconstructions were, we simply used the mean squared error (MSE) loss, i.e.,
C x , x ˜ = || x x ˜ || 2 .
Figure 1 schematically illustrates the structure of the described autoencoder.

2.3.4. BiLSTM

Bidirectional long short-term memory (BiLSTM) [67] is an extension of the LSTM network and can be thought of as a combination of two LSTMs, one reading the input sequence forward and the second reading it backwards, after which their outputs are combined. By doing so we can increase the amount of information available to the network. BiLSTM proved itself particularly useful for sequential modeling tasks, such as text classification [68] and translation [69]. Recently, it has also been successfully applied in anomaly detection [70].
For a more meaningful comparison, we implemented a BiLSTM anomaly detector model with the same number of layers and cells per layer as its LSTM counterpart, described in the above section. Again, we applied dropout and max-norm regularization to prevent overfitting.

2.3.5. β-VAE-BiLSTM

Variational autoencoders (VAEs) [71] are a probabilistic version of AEs, where instead of a compressed version of the input, the encoder outputs two parameters (the mean vector  μ   R d  and the variance vector  σ 2 R d ), which describe the distribution of the input data over the latent space. Given  μ  and  σ 2 , we can sample a latent vector  z R d  using the reparameterization trick, which allows us to perform backpropagation during training:
z = μ + σ 2 ε ,
with  ε ~ N 0 ,   I . The VAE loss consists of two components:
L = || x x ˜ || 2 + β D K L ( N μ , σ 2 | | N 0 , I ) ,
where the first component is the reconstruction loss (as in AEs) and the second component, which acts as a regularization term, is the KL divergence between the encoded distribution and a standard Gaussian, with  β  denoting its strength. When  β > 1 , we put a stronger weight on the KL loss, encouraging a more disentangled representation in the latent space [72]; this might help to create a more “regular” latent space, at the cost of a higher reconstruction error [73]. Due to their properties, VAEs have been applied both for data generation [74,75,76] and anomaly detection [77,78,79].
In this paper, we propose a  β -VAE model coupled with a BiLSTM module, which we found helpful because reading the input sequence both forward and backward provided more context for the task. Inspired by [77], our approach is to first encode a vector of consecutive HR raw samples  x t  into the latent space using the  β -VAE encoder; then, the BiLSTM module is trained over the obtained embeddings  z t  to predict the next embedding  z t + 1 ; finally, the decoder decodes the predicted embedding  z ˜ t + 1  into  x ˜ t + 1 , and by measuring its discrepancy with  x t + 1  we label the anomalies.
Other works have integrated unsupervised anomaly detection with forecasting; for example, SeqVL [80] uses a VAE for anomaly detection, followed by an LSTM module for trend prediction that operates on the decoded samples. However, unlike that case, we applied a BiLSTM module over the latent space embeddings and then decoded the predicted result, in line with [77]. Unlike [77,80], we found it effective to also use a BiLSTM structure (BiLSTM-VAE) in both the encoder and the decoder. More specifically, the encoder and the decoder were each composed of a single layer made of 64 BiLSTM units with  t a n h  activation function. A possible drawback of using long windows is the delay in anomaly detection [77]; we tried to address this problem by shortening the window length and reducing the number of layers in the model, and we verified that (unlike the CAE-LSTM model) compressing samples of length  n = 10  into latent embeddings of dimensionality  d = 4  worked well empirically. Regarding the BiLSTM module trained to forecast the next embedding, we implemented an architecture similar (in terms of number of layers and number of units) to the LSTM described in the previous section. In most cases, we found it effective to provide as input only the current and the previous embeddings (i.e.,  z t  and  z t 1 ) to forecast  z t + 1 .
In line with [79] and our empirical observations, we set  β = 3 .
Figure 2 shows the structure of our proposed model and how it operates.

3. Results and Discussion

We considered three metrics widely used in anomaly detection tasks to evaluate the performance of the models: precision, recall, and F1-score. Their equations are as follows:
P r e c i s i o n = T P T P + F P
R e c a l l = T P T P + F N
F 1   s c o r e = 2 P r e c i s i o n R e c a l l P r e c i s i o n + R e c a l l = 2 T P 2 T P + F P + F N   ,
where  T P F P , and  F N  denote the number of true positives, false positives, and false negatives, respectively. Since in contiguous segments of anomalous values it is usually sufficient to receive an alert for the segment, instead of point-wise alerts, we augment the metrics with a strategy proposed in [81].
All the empirical analyses were conducted using Python 3.8.3 on an Anaconda platform (release: July 2020). The obtained results are reported in Table 3 and Table 4 for the 30 s and the 1 min frequency scenarios, respectively. Since neural networks have stochastic components in the training process (for example, the weight initialization), each algorithm was run 20 times for each scenario. Table 3 and Table 4 show the average mean of the runs, with the standard deviation in brackets.
From Table 3 and Table 4, we can observe that our proposed model performed significatively better than the competitor models in almost all the considered scenarios, regardless of participant or sampling frequency of the wearable device. ARIMA typically showed acceptable recall but low precision, meaning that many false alerts were often raised. LSTM and CAE-LSTM often performed closely and similarly, and we did not find any particular advantage in using the CAE encodings instead of the raw samples when forecasting the next time window. In general, both models tended to have acceptable precision and recall, and performed better than ARIMA. We found it useful to exploit bidirectionality to increase the context available to the anomaly detector model, as seen with the results of BiLSTM. Overall, our proposed  β -VAE-BiLSTM model showed the best performance, typically displaying both good precision and recall, which in turn resulted in a balanced F1-score.
It should be mentioned that the anomalies of some users were more difficult both to define and to identify than those of other users (e.g., Participant 3), so all the models displayed relatively poor performance. In particular, we noted the difficulty in predicting anomalies in those who have more “unhealthy” lifestyles (i.e., frequent consumption of alcohol, smoking, and no physical exercise) and in those who have existing diseases. In line with existing studies, we noted that these participants are often characterized by increased sleep fragmentation (which negatively affects sleep duration [82]), reduction in sleep quality [83], and increased sleep disturbance [84]; as a consequence, their HR patterns exhibited greater variance and were usually noisy, which made it more difficult for the labeling algorithm to classify points as anomalies, and for the anomaly detection models to identify them. The obtained results further underline the already known importance of leading a healthy lifestyle.
With our proposed model, thanks to the wearable device, the individual is notified if he/she has had any anomalies in the previous time frame (i.e., 10 min or 5 min, depending on the sampling frequency). By doing so, an almost real-time response on the HR status is provided to the individual. Reports on the presence of anomalies suggest further clinical investigations, and are not intended as a diagnostic tool for CVDs. Our proposed model learns patterns from a “clean” (i.e., without anomalies) individual’s HR time series data, and signals as anomalies behaviours that are different from those learned.
Among the key strengths of this study, our focus on sleep only—a state which has fewer individual activities that are non-identifiable by the wearable device—allowed us to identify “true” HR anomalies with greater confidence. Furthermore, we considered the results provided by two different sampling frequencies for the same participants and compared and validated the proposed model with multiple competitor algorithms; furthermore, we analyzed both the distribution of the metrics produced by each model and how the different parameters influenced the calculated metrics. Finally, we validated the results obtained in multiple heterogeneous participants.
This study has also some limitations that warrant mention. First, although abnormal values in resting HR are indicative of possible CVDs, we focused on a shortened period of time (night period only). Second, the models are applied to a single time series (resting HR), but it is important to consider the influence of other variables both in characterizing anomalies and in training the algorithms. Third, even if we tried to be comprehensive and consider heterogeneous participants, we considered only the factors listed in Table 1, and we could not control for some important lifestyle factors. Finally, the threshold used for anomaly detection was defined in a relatively simple way and could likely be improved.
By considering only measurements taken during sleep, we substantially reduced the variables that need to be considered. We are working to expand the data labeling algorithm and the proposed model to take into account other variables that can be detected by the wearable device, such as blood oxygen level, blood pressure, body temperature, and environmental temperature. Exploiting multivariate time series is likely to provide a more complete picture and consequently lead to better results, in both correct labeling and identification. A significant step forward would be to analyze also daytime data, by including in addition to the above-mentioned variables physical activity (a valid proxy could be the number of steps per minute) and weather data.

Temporal Embeddings in the VAE Latent Space

In this subsection we compare how the temporal sequences in the test set are encoded in the latent space of our trained  β -VAE (with  β = 3 ) with respect to different  β  values, including a standard VAE model (i.e., with  β = 1 ). We conducted sensitivity analyses by setting  β  to 0.1, 0.5, 1, and 10 and, while the results did not vary much, we observed on average a slight deterioration in the metrics with respect to  β = 3  (see Table S1 in the Supplementary Material).
Figure 3 shows the results of mapping in a two-dimensional space the temporal embeddings in the latent space of  β -VAE (with different  β  values) for Participant 5 (1 min frequency), obtained by applying t-distributed stochastic neighbor embedding (t-SNE) [85]. The embeddings were scaled into the  0 , 1 2  interval in order to remove the impact of the original scale. Similar mappings were also obtained for other participants (see Figures S1–S7 in the Supplementary Material). We used t-SNE instead of principal component analysis (PCA) because t-SNE better preserves the local structure of the data and because PCA is restricted to learn linear mapping.
We can notice how, as  β  increases, the latent clusters of normal and abnormal data become less dense and more spread out, in line with the observations in [86]. In principle, this behaviour is undesirable if we were to classify datapoints as anomalous or not directly from the embeddings in the latent space (for example, using a clustering algorithm). However, we argue that when passed in input to a BiLSTM backend, it is better to treat all the temporal embeddings as coming from the same distribution. In this way, the BiLSTM module is less affected by the possible presence of anomalies in the embeddings given as input and tends to output more “conservative” predictions which, combined with how we designed the detection system, ultimately results in better performance. However, t-SNE results are a non-linear mapping of the data into a lower-dimensional space, and should therefore be interpreted with caution.
Different detection rules would likely provide different results, and we leave this comparison for future research.

4. Conclusions

In this paper we propose a disentangled VAE-BiLSTM architecture for unsupervised anomaly detection. We applied this architecture to heart rate data of multiple heterogeneous participants. Data were collected during sleep time with a wearable device, and we considered both 30 s and 1 min frequency scenarios. Empirical results showed that our model outperformed the competitor models we investigated in most cases.
To our knowledge, there is no consensus on the characteristics of anomalies in wearable data. To this end, we also proposed a simple but effective algorithm for automatic data labeling which takes into account both global and contextual anomalies.
Finally, we observed and compared the effects of how increasing the value of the  β  parameter affects the encoding of HR sequences into the latent space of VAE models.

Supplementary Materials

The following supporting information can be downloaded at: https://0-www-mdpi-com.brum.beds.ac.uk/article/10.3390/bioengineering10060683/s1, Table S1: Precision, recall, and F1-score of the β-VAE models. Values indicate the mean (standard deviation) of 20 runs for each model. Text in bold denotes the best results (95% confidence level) per participant; Figure S1: t-SNE maps for Participant 1; Figure S2: t-SNE maps for Participant 2; Figure S3: t-SNE maps for Participant 3; Figure S4: t-SNE maps for Participant 4; Figure S5: t-SNE maps for Participant 6; Figure S6: t-SNE maps for Participant 7; Figure S7: t-SNE maps for Participant 8.

Author Contributions

Conceptualization, A.S. and T.S.; methodology, A.S.; software, A.S. and T.S.; validation, A.S., T.S., U.-i.C. and A.K.S.; formal analysis, A.S.; investigation, A.S.; resources, T.S., U.-i.C. and A.K.S.; data curation, T.S., U.-i.C. and A.K.S.; writing—original draft preparation, A.S.; writing—review and editing, T.S., U.-i.C. and A.K.S.; visualization, A.S.; supervision, T.S.; project administration, T.S., U.-i.C. and A.K.S.; funding acquisition, T.S., U.-i.C. and A.K.S. All authors have read and agreed to the published version of the manuscript.

Funding

The research was supported by the Center of Innovation Program from the Japan Science and Technology Agency (Grant Number JPMJCE1304) and Kanagawa Prefecture’s “A project to expand the use of metabolic syndrome risk index in municipalities” (2018). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Institutional Review Board Statement

The study was conducted in accordance with the relevant ethical guidelines and regulations in Japan. All participants received detailed information about the original study and its purpose and provided written consent to participate. The study was approved by the Ethics Committee of the School of Engineering, The University of Tokyo (approval number: KE18-44).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study. Written informed consent has been obtained from the patients to publish this paper.

Data Availability Statement

The authors cannot publicly provide access to individual data due to participant privacy in accordance with ethical guidelines. Additionally, the written informed consent obtained from study participants does not include a provision for publicly sharing data. Qualifying researchers may apply to access a minimal dataset upon reasonable request by contacting Saori Miyake at the following email address: [email protected]. The code of our model will be made available at: https://github.com/staale92/disentangled-vae-bilstm (accessed on 1 June 2023).

Acknowledgments

The authors wish to express their most sincere gratitude to all staff members at the Center of Innovation, The University of Tokyo for their extensive efforts and help in the conduct of the study. The authors would like to thank all members of Precision Health, The University of Tokyo for their invaluable assistance. The authors also wish to thank Tatsushi Matsubayashi (ALBERT Inc.) and Ryo Furukawa (ALBERT Inc.) for their helpful comments and fruitful discussions.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. World Health Organization. Cardiovascular Diseases. Available online: https://www.who.int/health-topics/cardiovascular-diseases#tab=tab_1 (accessed on 20 January 2023).
  2. American Heart Association. 2022 Heart Disease & Stroke Statistical Update Fact Sheet Global Burden of Disease. Available online: https://professional.heart.org/-/media/PHD-Files-2/Science-News/2/2022-Heart-and-Stroke-Stat-Update/2022-Stat-Update-factsheet-GIobal-Burden-of-Disease.pdf (accessed on 20 January 2023).
  3. Centers for Disease Control and Prevention; National Center for Health Statistics. About Multiple Cause of Death, 1999–2020; CDC WONDER Online Database Website; Centers for Disease Control and Prevention: Atlanta, GA, USA, 2022. Available online: https://wonder.cdc.gov/mcd-icd10.html (accessed on 21 February 2022).
  4. Tsao, C.W.; Aday, A.W.; Almarzooq, Z.I.; Alonso, A.; Beaton, A.Z.; Bittencourt, M.S.; Boehme, A.K.; Buxton, A.E.; Carson, A.P.; Commodore-Mensah, Y.; et al. Heart Disease and Stroke Statistics—2022 Update: A Report From the American Heart Association. Circulation 2022, 145, e153–e639. [Google Scholar] [CrossRef]
  5. Mensah, G.A.; Roth, G.A.; Fuster, V. The Global Burden of Cardiovascular Diseases and Risk Factors: 2020 and Beyond. J. Am. Coll. Cardiol. 2019, 74, 2529–2532. [Google Scholar] [CrossRef]
  6. Allarakha, S.; Yadav, J.; Yadav, A.K. Financial Burden and financing strategies for treating the cardiovascular diseases in India. Soc. Sci. Humanit. Open 2022, 6, 100275. [Google Scholar] [CrossRef]
  7. Agliari, E.; Barra, A.; Barra, O.A.; Fachechi, A.; Vento, L.F.; Moretti, L. Detecting cardiac pathologies via machine learning on heart-rate variability time series and related markers. Sci. Rep. 2020, 10, 8845. [Google Scholar] [CrossRef]
  8. Sajadieh, A.; Rasmussen, V.; Hein, H.O.; Hansen, J.F. Familial predisposition to premature heart attack and reduced heart rate variability. Am. J. Cardiol. 2003, 92, 234–236. [Google Scholar] [CrossRef] [PubMed]
  9. Melillo, P.; Izzo, R.; Orrico, A.; Scala, P.; Attanasio, M.; Mirra, M.; DE Luca, N.; Pecchia, L. Automatic Prediction of Cardiovascular and Cerebrovascular Events Using Heart Rate Variability Analysis. PLoS ONE 2015, 10, e0118504. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  10. Neubeck, L.; Coorey, G.; Peiris, D.; Mulley, J.; Heeley, E.; Hersch, F.; Redfern, J. Development of an integrated e-health tool for people with, or at high risk of, cardiovascular disease: The Consumer Navigation of Electronic Cardiovascular Tools (CONNECT) web application. Int. J. Med. Inform. 2016, 96, 24–37. [Google Scholar] [CrossRef] [PubMed]
  11. Chatellier, G.; Blinowska, A.; Menard, J.; Degoulet, P. Do physicians estimate reliably the cardiovascular risk of hypertensive patients? Medinfo 1995, 8 Pt 2, 876–879. [Google Scholar]
  12. Zhang, Y.; Diao, L.; Ma, L. Logistic Regression Models in Predicting Heart Disease. J. Phys. Conf. Ser. 2021, 1769, 012024. [Google Scholar] [CrossRef]
  13. Ciu, T.; Oetama, R.S. Logistic Regression Prediction Model for Cardiovascular Disease. IJNMT (Int. J. New Media Technol.) 2020, 7, 33–38. [Google Scholar] [CrossRef]
  14. Jia, X.; Baig, M.M.; Mirza, F.; GholamHosseini, H. A Cox-Based Risk Prediction Model for Early Detection of Cardiovascular Disease: Identification of Key Risk Factors for the Development of a 10-Year CVD Risk Prediction. Adv. Prev. Med. 2019, 2019, 8392348. [Google Scholar] [CrossRef] [Green Version]
  15. Brophy, J.M.; Dagenais, G.R.; McSherry, F.; Williford, W.; Yusuf, S. A multivariate model for predicting mortality in patients with heart failure and systolic dysfunction. Am. J. Med. 2004, 116, 300–304. [Google Scholar] [CrossRef]
  16. Liu, E.; Lim, K. Using the Weibull accelerated failure time regression model to predict time to health events. bioRxiv 2018. [Google Scholar] [CrossRef]
  17. Damen, J.A.A.G.; Hooft, L.; Schuit, E.; Debray, T.P.A.; Collins, G.S.; Tzoulaki, I.; Lassale, C.M.; Siontis, G.C.M.; Chiocchia, V.; Roberts, C.; et al. Prediction models for cardiovascular disease risk in the general population: Systematic review. BMJ 2016, 353, i2416. [Google Scholar] [CrossRef] [Green Version]
  18. Hsich, E.; Gorodeski, E.Z.; Blackstone, E.H.; Ishwaran, H.; Lauer, M.S. Identifying Important Risk Factors for Survival in Patient With Systolic Heart Failure Using Random Survival Forests. Circ. Cardiovasc. Qual. Outcomes 2011, 4, 39–45. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  19. Pal, M.; Parija, S.; Panda, G.; Dhama, K.; Mohapatra, R.K. Risk prediction of cardiovascular disease using machine learning classifiers. Open Med. 2022, 17, 1100–1113. [Google Scholar] [CrossRef]
  20. Gopal, D.P.; Usher-Smith, J.A. Cardiovascular risk models for South Asian populations: A systematic review. Int. J. Public Health 2016, 61, 525–534. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  21. Zhiting, G.; Jiaying, T.; Haiying, H.; Yuping, Z.; Qunfei, Y.; Jingfen, J. Cardiovascular disease risk prediction models in the Chinese population- a systematic review and meta-analysis. BMC Public Health 2022, 22, 1608. [Google Scholar] [CrossRef] [PubMed]
  22. Fuller, D.; Colwell, E.; Low, J.; Orychock, K.; Tobin, M.A.; Simango, B.; Buote, R.; Van Heerden, D.; Luan, H.; Cullen, K.; et al. Reliability and Validity of Commercially Available Wearable Devices for Measuring Steps, Energy Expenditure, and Heart Rate: Systematic Review. JMIR mHealth uHealth 2020, 8, e18694. [Google Scholar] [CrossRef] [PubMed]
  23. Benedetto, S.; Caldato, C.; Bazzan, E.; Greenwood, D.C.; Pensabene, V.; Actis, P. Assessment of the Fitbit Charge 2 for monitoring heart rate. PLoS ONE 2018, 13, e0192691. [Google Scholar] [CrossRef] [Green Version]
  24. Sunny, J.S.; Patro, C.P.K.; Karnani, K.; Pingle, S.C.; Lin, F.; Anekoji, M.; Jones, L.D.; Kesari, S.; Ashili, S. Anomaly Detection Framework for Wearables Data: A Perspective Review on Data Concepts, Data Analysis Algorithms and Prospects. Sensors 2022, 22, 756. [Google Scholar] [CrossRef] [PubMed]
  25. Chandola, V.; Banerjee, A.; Kumar, V. Anomaly detection: A survey. ACM Comput. Surv. 2009, 41, 15. [Google Scholar] [CrossRef]
  26. Wang, P.; Han, Y.; Qin, J.; Wang, B.; Yang, X. Anomaly Detection for Streaming Data from Wearable Sensor Network. In Proceedings of the 2017 IEEE 15th Intl Conf on Dependable, Autonomic and Secure Computing, 15th Intl Conf on Pervasive Intelligence and Computing, 3rd Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress (DASC/PiCom/DataCom/CyberSciTech), Orlando, FL, USA, 6–10 November 2017; pp. 263–268. [Google Scholar] [CrossRef]
  27. Zhu, G.; Li, J.; Meng, Z.; Yu, Y.; Li, Y.; Tang, X.; Dong, Y.; Sun, G.; Zhou, R.; Wang, H.; et al. Learning from Large-Scale Wearable Device Data for Predicting the Epidemic Trend of COVID-19. Discret. Dyn. Nat. Soc. 2020, 2020, 6152041. [Google Scholar] [CrossRef]
  28. Perez, M.V.; Mahaffey, K.W.; Hedlin, H.; Rumsfeld, J.S.; Garcia, A.; Ferris, T.; Balasubramanian, V.; Russo, A.M.; Rajmane, A.; Cheung, L.; et al. Large-Scale Assessment of a Smartwatch to Identify Atrial Fibrillation. N. Engl. J. Med. 2019, 381, 1909–1917. [Google Scholar] [CrossRef] [PubMed]
  29. Staffini, A.; Svensson, T.; Chung, U.-I.; Svensson, A.K. Heart Rate Modeling and Prediction Using Autoregressive Models and Deep Learning. Sensors 2022, 22, 34. [Google Scholar] [CrossRef] [PubMed]
  30. Fox, K.; Borer, J.S.; Camm, A.J.; Danchin, N.; Ferrari, R.; Sendon, J.L.L.; Steg, P.G.; Tardif, J.-C.; Tavazzi, L.; Tendera, M. Resting Heart Rate in Cardiovascular Disease. J. Am. Coll. Cardiol. 2007, 50, 823–830. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  31. Zhang, M.; Han, C.; Wang, C.; Wang, J.; Li, L.; Zhang, L.; Wang, B.; Ren, Y.; Zhang, H.; Yang, X.; et al. Association of resting heart rate and cardiovascular disease mortality in hypertensive and normotensive rural Chinese. J. Cardiol. 2017, 69, 779–784. [Google Scholar] [CrossRef] [PubMed]
  32. Larsson, S.C.; Drca, N.; Mason, A.M.; Burgess, S. Resting Heart Rate and Cardiovascular Disease. Circ. Genom. Precis. Med. 2019, 12, e002459. [Google Scholar] [CrossRef]
  33. Lee, J.; Lim, H.; Chung, K.S. CLC: Noisy Label Correction via Curriculum Learning. In Proceedings of the 2021 IEEE Symposium Series on Computational Intelligence (SSCI), Orlando, FL, USA, 5–7 December 2021; pp. 1–7. [Google Scholar] [CrossRef]
  34. Priyanto, C.Y.; Hendry; Purnomo, H.D. Combination of Isolation Forest and LSTM Autoencoder for Anomaly Detection. In Proceedings of the 2021 2nd International Conference on Innovative and Creative Information Technology (ICITech), Salatiga, Indonesia, 23–25 September 2021; pp. 35–38. [Google Scholar] [CrossRef]
  35. Liu, F.T.; Ting, K.M.; Zhou, Z.-H. Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar] [CrossRef]
  36. Lamrini, B.; Gjini, A.; Daudin, S.; Pratmarty, P.; Armando, F.; Travé-Massuyès, L. Anomaly Detection Using Similarity-based One-Class SVM for Network Traffic Characterization. In Proceedings of the 29th International Workshop on Principles of Diagnosis, Warsaw, Poland, 27–30 August 2018. [Google Scholar]
  37. Schölkopf, B.; Williamson, R.C.; Smola, A.; Shawe-Taylor, J.; Platt, J. Support vector method for novelty detection. Adv. Neural Inf. Process. Syst 1999, 12, 582–588. [Google Scholar]
  38. Henriques, J.; Caldeira, F.; Cruz, T.; Simões, P. Combining K-Means and XGBoost Models for Anomaly Detection Using Log Datasets. Electronics 2020, 9, 1164. [Google Scholar] [CrossRef]
  39. Chen, T.; Guestrin, C. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar] [CrossRef] [Green Version]
  40. Xu, H.; Pang, G.; Wang, Y.; Wang, Y. Deep Isolation Forest for Anomaly Detection. arXiv 2022, arXiv:2206.06602. [Google Scholar] [CrossRef]
  41. Tian, Y.; Mirzabagheri, M.; Bamakan, S.M.H.; Wang, H.; Qu, Q. Ramp loss one-class support vector machine; A robust and effective approach to anomaly detection problems. Neurocomputing 2018, 310, 223–235. [Google Scholar] [CrossRef]
  42. Rosenberger, J.; Müller, K.; Selig, A.; Bühren, M.; Schramm, D. Extended kernel density estimation for anomaly detection in streaming data. Procedia CIRP 2022, 112, 156–161. [Google Scholar] [CrossRef]
  43. Sato, R.C. Disease management with ARIMA model in time series. Einstein 2013, 11, 128–131. [Google Scholar] [CrossRef] [Green Version]
  44. Zhang, R.; Song, H.; Chen, Q.; Wang, Y.; Wang, S.; Li, Y. Comparison of ARIMA and LSTM for prediction of hemorrhagic fever at different time scales in China. PLoS ONE 2022, 17, e0262009. [Google Scholar] [CrossRef]
  45. Kazmi, S.; Bozanta, A.; Cevik, M. Time series forecasting for patient arrivals in online health services. In Proceedings of the CASCON ‘21: Proceedings of the 31st Annual International Conference on Computer Science and Software Engineering, Toronto, ON, Canada, 22–25 November 2021; pp. 43–52.
  46. Moayedi, H.Z.; Masnadi-Shirazi, M.A. Arima model for network traffic prediction and anomaly detection. In Proceedings of the 2008 International Symposium on Information Technology, Kuala Lumpur, Malaysia, 26–29 August 2008; pp. 1–6. [Google Scholar] [CrossRef]
  47. Pincombe, B. Anomaly detection in time series of graphs using arma processes. Asor Bull. 2005, 24, 2. [Google Scholar]
  48. Dickey, D.A.; Fuller, W.A. Distribution of the Estimators for Autoregressive Time Series with a Unit Root. J. Am. Stat. Assoc. 1979, 74, 427–431. [Google Scholar] [CrossRef]
  49. Akaike, H. Information Theory as an Extension of the Maximum Likelihood Principle. In Second International Symposium on Information Theory; Petrov, B.N., Csaki, F., Eds.; Akademiai Kiado: Budapest, Hungary, 1973; pp. 267–281. [Google Scholar] [CrossRef]
  50. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  51. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
  52. Malhotra, P.; Vig, L.; Shroff, G.; Agarwal, P. Long short-term memory networks for anomaly detection in time series. In Proceedings: ESANN; Presses Universitaires de Louvain: Louvain-la-Neuve, Belgium, 2015; Volume 89, pp. 89–94. [Google Scholar]
  53. Nguyen, H.; Tran, K.; Thomassey, S.; Hamad, M. Forecasting and Anomaly Detection approaches using LSTM and LSTM Autoencoder techniques with the applications in supply chain management. Int. J. Inf. Manag. 2021, 57, 102282. [Google Scholar] [CrossRef]
  54. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
  55. Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  56. Morgan, N.; Nelson; Bourlard, H. Generalization and parameter estimation in feedforward nets: Some experiments. Adv. Neural Inf. Process. Syst. 1990, 2, 630–637. [Google Scholar]
  57. Wang, J.; He, H.; Prokhorov, D.V. A Folded Neural Network Autoencoder for Dimensionality Reduction. Procedia Comput. Sci. 2012, 13, 120–127. [Google Scholar] [CrossRef] [Green Version]
  58. Arai, H.; Chayama, Y.; Iyatomi, H.; Oishi, K. Significant Dimension Reduction of 3D Brain MRI using 3D Convolutional Autoencoders. In Proceedings of the 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Honolulu, HI, USA, 18–21 July 2018; pp. 5162–5165. [Google Scholar] [CrossRef]
  59. Shinde, K.; Itier, V.; Mennesson, J.; Vasiukov, D.; Shakoor, M. Dimensionality reduction through convolutional autoencoders for fracture patterns prediction. Appl. Math. Model. 2023, 114, 94–113. [Google Scholar] [CrossRef]
  60. Gogoi, M.; Begum, S.A. Image Classification Using Deep Autoencoders. In Proceedings of the 2017 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC), Coimbatore, India, 14–16 December 2017; pp. 1–5. [Google Scholar] [CrossRef]
  61. Briciu, A.; Czibula, G.; Lupea, M. AutoAt: A deep autoencoder-based classification model for supervised authorship attribution. Procedia Comput. Sci. 2021, 192, 397–406. [Google Scholar] [CrossRef]
  62. Toma, R.N.; Piltan, F.; Kim, J.-M. A Deep Autoencoder-Based Convolution Neural Network Framework for Bearing Fault Classification in Induction Motors. Sensors 2021, 21, 8453. [Google Scholar] [CrossRef]
  63. Chen, Z.; Yeo, C.K.; Lee, B.S.; Lau, C.T. Autoencoder-based network anomaly detection. In Proceedings of the 2018 Wireless Telecommunications Symposium (WTS), Phoenix, AZ, USA, 17–20 April 2018; pp. 1–5. [Google Scholar] [CrossRef]
  64. Tziolas, T.; Papageorgiou, K.; Theodosiou, T.; Papageorgiou, E.; Mastos, T.; Papadopoulos, A. Autoencoders for Anomaly Detection in an Industrial Multivariate Time Series Dataset. Eng. Proc. 2022, 18, 23. [Google Scholar] [CrossRef]
  65. Wei, W.; Wu, H.; Ma, H. An AutoEncoder and LSTM-Based Traffic Flow Prediction Method. Sensors 2019, 19, 2946. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  66. Michelucci, U. An Introduction to Autoencoders. arXiv 2022, arXiv:2201.03898. [Google Scholar]
  67. Graves, A.H.; Schmidhuber, J. Framewise phoneme classification with bidirectional LSTM networks. In Proceedings of the 2005 IEEE International Joint Conference on Neural Networks, Montreal, QC, Canada, 31 July–4 August 2005; Volume 4, pp. 2047–2052. [Google Scholar]
  68. Liu, G.; Guo, J. Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 2019, 337, 325–338. [Google Scholar] [CrossRef]
  69. Sundermeyer, M.; Alkhouli, T.; Wuebker, J.; Ney, H. Translation Modeling with Bidirectional Recurrent Neural Networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, 25–29 October 2014; pp. 14–25. [Google Scholar] [CrossRef]
  70. Aljbali, S.; Roy, K. Anomaly Detection Using Bidirectional LSTM. In Intelligent Systems and Applications; Springer: Cham, Switzerland, 2021; pp. 612–619. [Google Scholar]
  71. Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
  72. Higgins, I.; Matthey, L.; Pal, A.; Burgess, C.; Glorot, X.; Botvinick, M.; Mohamed, S.; Lerchner, A. Beta-vae: Learning basic visual concepts with a constrained variational framework. In Proceedings of the 5th International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
  73. Shao, H.; Yao, S.; Sun, D.; Zhang, A.; Liu, S.; Liu, D.; Wang, J.; Abdelzaher, T. ControlVAE: Controllable Variational Autoencoder. arXiv 2020, arXiv:2004.05988. [Google Scholar]
  74. Wan, Z.; Zhang, T.; He, H. Variational Autoencoder Based Synthetic Data Generation for Imbalanced Learning. In Proceedings of the 2017 IEEE Symposium Series on Computational Intelligence (SSCI), Honolulu, HI, USA, 27 November–1 December 2017; pp. 1–7. [Google Scholar] [CrossRef]
  75. Saldanha, J.; Chakraborty, S.; Patil, S.; Kotecha, K.; Kumar, S.; Nayyar, A. Data augmentation using Variational Autoencoders for improvement of respiratory disease classification. PLoS ONE 2022, 17, e0266467. [Google Scholar] [CrossRef]
  76. Nishizaki, H. Data augmentation and feature extraction using variational autoencoder for acoustic modelling. In Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), Kuala Lumpur, Malaysia, 12–15 December 2017; pp. 1222–1227. [Google Scholar] [CrossRef]
  77. Lin, S.; Clark, R.; Birke, R.; Schonborn, S.; Trigoni, N.; Roberts, S. Anomaly Detection for Time Series Using VAE-LSTM Hybrid Model. In Proceedings of the ICASSP 2020—2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; pp. 4322–4326. [Google Scholar] [CrossRef]
  78. Zhou, L.; Deng, W.; Wu, X. Unsupervised anomaly localization using VAE and beta-VAE. arXiv 2020, arXiv:2005.10686. [Google Scholar]
  79. Ulger, F.; Yuksel, S.E.; Yilmaz, A. Anomaly Detection for Solder Joints Using β-VAE. IEEE Trans. Components, Packag. Manuf. Technol. 2021, 11, 2214–2221. [Google Scholar] [CrossRef]
  80. Chen, R.Q.; Shi, G.H.; Zhao, W.L.; Liang, C.H. Sequential VAE-LSTM for Anomaly Detection on Time Series. arXiv 2021, arXiv:1910.03818v5. [Google Scholar]
  81. Xu, H.; Feng, Y.; Chen, J.; Wang, Z.; Qiao, H.; Chen, W.; Zhao, N.; Li, Z.; Bu, J.; Li, Z.; et al. Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018; pp. 187–196. [Google Scholar] [CrossRef] [Green Version]
  82. Palmer, C.; Harrison, G.; Hiorns, R. Association between smoking and drinking and sleep duration. Ann. Hum. Biol. 1980, 7, 103–107. [Google Scholar] [CrossRef]
  83. Pietilä, J.; Helander, E.; Korhonen, I.; Myllymäki, T.; Kujala, U.M.; Lindholm, H. Acute Effect of Alcohol Intake on Cardiovascular Autonomic Regulation During the First Hours of Sleep in a Large Real-World Sample of Finnish Employees: Observational Study. JMIR Ment. Health 2018, 5, e23. [Google Scholar] [CrossRef]
  84. Colrain, I.M.; Nicholas, C.L.; Baker, F.C. Alcohol and the sleeping brain. In Handbook of Clinical Neurology; Elsevier: Amsterdam, The Netherlands, 2014; Volume 125, pp. 415–431. [Google Scholar] [CrossRef] [Green Version]
  85. Van der Maaten, L.; Hinton, G.E. Visualizing data using t-SNE. Journal of Machine Learning Research. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
  86. Li, X.; Kiringa, I.; Yeap, T.; Zhu, X.; Li, Y. Anomaly Detection Based on Unsupervised Disentangled Representation Learning in Combination with Manifold Learning. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–10. [Google Scholar] [CrossRef]
Figure 1. The structure of an autoencoder. An input vector  x R n  is first compressed by the encoder into a lower-dimensional representation  z R d . The decoder then tries to reconstruct  x  from  z , producing an output vector  x ˜ R n . The autoencoder is trained to minimize the cost function  C , measuring the discrepancy between  x  and  x ˜ .
Figure 1. The structure of an autoencoder. An input vector  x R n  is first compressed by the encoder into a lower-dimensional representation  z R d . The decoder then tries to reconstruct  x  from  z , producing an output vector  x ˜ R n . The autoencoder is trained to minimize the cost function  C , measuring the discrepancy between  x  and  x ˜ .
Bioengineering 10 00683 g001
Figure 2. The structure of the proposed  β -VAE-BiLSTM model. Non-overlapping input vectors  x R n  are compressed by the encoder into lower-dimensional representations  z R d , which are embedded in a more disentangled manner with respect to standard VAE, since we set  β  = 3. Then, the BiLSTM module takes as input the current embedding  z t  (and possibly previous embeddings  z t 1 , etc.) to predict  z t + 1 . Finally, the decoder operates on the BiLSTM-forecasted embedding  z ˜ t + 1 , trying to reconstruct  x t + 1  from it. We then measure the element-wise differences between  x ˜ t + 1  and  x t + 1 , and label the anomalies accordingly.
Figure 2. The structure of the proposed  β -VAE-BiLSTM model. Non-overlapping input vectors  x R n  are compressed by the encoder into lower-dimensional representations  z R d , which are embedded in a more disentangled manner with respect to standard VAE, since we set  β  = 3. Then, the BiLSTM module takes as input the current embedding  z t  (and possibly previous embeddings  z t 1 , etc.) to predict  z t + 1 . Finally, the decoder operates on the BiLSTM-forecasted embedding  z ˜ t + 1 , trying to reconstruct  x t + 1  from it. We then measure the element-wise differences between  x ˜ t + 1  and  x t + 1 , and label the anomalies accordingly.
Bioengineering 10 00683 g002
Figure 3. t-SNE maps for Participant 5. Projection of the latent space of the  β -VAE model, with (a β = 0.1 ; (b β = 0.5 ; (c β = 1 ; (d β = 3 ; and (e β = 10 . Numbering near the points denotes the order of temporal embeddings, each point being the embedding of a segment of length 10 of the 1 min frequency HR time series. Blue points denote normal test data (absence of anomalies in the original sequence), while red points denote abnormal test data (presence of anomalies in the original sequence).
Figure 3. t-SNE maps for Participant 5. Projection of the latent space of the  β -VAE model, with (a β = 0.1 ; (b β = 0.5 ; (c β = 1 ; (d β = 3 ; and (e β = 10 . Numbering near the points denotes the order of temporal embeddings, each point being the embedding of a segment of length 10 of the 1 min frequency HR time series. Blue points denote normal test data (absence of anomalies in the original sequence), while red points denote abnormal test data (presence of anomalies in the original sequence).
Bioengineering 10 00683 g003
Table 1. Characteristics of the selected participants.
Table 1. Characteristics of the selected participants.
ParticipantAge(Decade)SexPastDiseasesPresent DiseasesSmoking/Drinking HabitsExercise Habits
Participant 140sMaleNo diseasesNo diseasesPast smoker; consumes alcohol 2–4 times per monthExercises 3 or more days per week
Participant 250sMale2 diseases1 diseaseSmoker; consumes alcohol 2–4 times per monthExercises 1–2 days per week
Participant 330sMale3 diseases3 diseasesSmoker; consumes alcohol 4 or more times per weekNo exercise
Participant 430sFemaleNo diseasesNo diseasesNon–smoker; consumes alcohol 2–3 times per weekExercises 1–2 days per week
Participant 540sFemale1 diseaseNo diseasesNon–smoker; consumes alcohol 2–4 times per monthNo exercise
Participant 650sFemale1 disease1 diseaseNon–smoker; consumes alcohol 4 or more times per weekExercises 3 or more days per week
Participant 750sMale3 diseases1 diseaseNon–smoker; consumes alcohol 2–3 times per weekExercises 1–2 days per week
Participant 850sFemale2 diseasesNo diseasesNon–smoker; consumes alcohol 1 time or less per monthNo exercise
Table 2. Number of anomalies in the test set.
Table 2. Number of anomalies in the test set.
ParticipantNumber of Anomalies (30 s)Number of Anomalies (1 min)
Participant 13317
Participant 2105
Participant 365
Participant 4204
Participant 52118
Participant 6124
Participant 7125
Participant 883
Table 3. Precision, recall, and F1-score of the tested models in the 30 sec frequency scenario. Values indicate the mean (standard deviation) of 20 runs for each model. Text in bold denotes the best results (95% confidence level) per participant.
Table 3. Precision, recall, and F1-score of the tested models in the 30 sec frequency scenario. Values indicate the mean (standard deviation) of 20 runs for each model. Text in bold denotes the best results (95% confidence level) per participant.
ParticipantMetricsARIMALSTMCAE-LSTMBiLSTMβ-VAE-BiLSTM
Participant 1Precision0.3720.689 (0.073)0.671 (0.048)0.908 (0.032)0.949 (0.065)
Recall0.9700.628 (0.013)0.688 (0.027)0.656 (0.045)0.715 (0.015)
F1-score0.5380.655 (0.034)0.678 (0.023)0.761 (0.031)0.815 (0.029)
Participant 2Precision0.1620.800 (0.163)0.812 (0.108)0.732 (0.080)0.950 (0.068)
Recall0.6000.585 (0.036)0.575 (0.043)0.595 (0.022)0.611 (0.016)
F1-score0.2550.669 (0.063)0.668 (0.034)0.653 (0.045)0.742 (0.022)
Participant 3Precision0.1110.441 (0.199)0.408 (0.083)0.517 (0.318)0.625 (0.252)
Recall0.1670.208 (0.072)0.275 (0.079)0.300 (0.145)0.400 (0.111)
F1-score0.1330.265 (0.052)0.315 (0.059)0.319 (0.105)0.460 (0.115)
Participant 4Precision0.3510.708 (0.239)0.875 (0.153)0.933 (0.220)0.966 (0.073)
Recall0.6500.175 (0.097)0.200 (0.100)0.160 (0.110)0.590 (0.080)
F1-score0.4560.258 (0.156)0.312 (0.114)0.238 (0.181)0.727 (0.065)
Participant 5Precision0.3480.676 (0.120)0.860 (0.043)0.778 (0.041)0.915 (0.024)
Recall0.7620.829 (0.038)0.769 (0.077)0.765 (0.077)0.817 (0.017)
F1-score0.4780.738 (0.078)0.809 (0.042)0.768 (0.044)0.863 (0.014)
Participant 6Precision0.1740.712 (0.065)0.760 (0.196)0.745 (0.160)0.888 (0.124)
Recall0.333 0.250 (0.000)0.250 (0.000)0.250 (0.000)0.250 (0.000)
F1-score0.2290.369 (0.010)0.372 (0.023)0.371 (0.019)0.389 (0.012)
Participant 7Precision0.1880.438 (0.006)0.491 (0.032)0.403 (0.047)0.747 (0.212)
Recall0.5000.658 (0.069)0.754 (0.104)0.667 (0.053)0.767 (0.081)
F1-score0.2730.525 (0.023)0.591 (0.043)0.500 (0.036)0.736 (0.105)
Participant 8Precision0.2190.694 (0.350)0.843 (0.313)1.000 (0.000)1.000 (0.000)
Recall0.8750.550 (0.061) 0.462 (0.057)0.462 (0.057)0.581 (0.006)
F1-score0.3500.561 (0.180)0.562 (0.142)0.630 (0.056)0.733 (0.049)
Table 4. Precision, recall, and F1-score of the tested models in the 1 min frequency scenario. Values indicate the mean (standard deviation) of 20 runs for each model. Text in bold denotes the best results (95% confidence level) per participant.
Table 4. Precision, recall, and F1-score of the tested models in the 1 min frequency scenario. Values indicate the mean (standard deviation) of 20 runs for each model. Text in bold denotes the best results (95% confidence level) per participant.
ParticipantMetricsARIMALSTMCAE-LSTMBiLSTMβ-VAE-BiLSTM
Participant 1Precision0.4480.722 (0.065)0.610 (0.043)0.792 (0.166)0.919 (0.069)
Recall0.7650.753 (0.024)0.741 (0.073)0.671 (0.073)0.806 (0.035)
F1-score0.5650.735 (0.035)0.666 (0.038)0.714 (0.081)0.857 (0.038)
Participant 2Precision0.2310.660 (0.073)0.705 (0.069)0.738 (0.054)0.938 (0.108)
Recall0.6000.600 (0.000)0.600 (0.000)0.600 (0.000)0.600 (0.000)
F1-score0.3330.627 (0.033)0.647 (0.031)0.661 (0.026)0.729 (0.036)
Participant 3Precision0.3330.466 (0.371)0.268 (0.133)0.548 (0.303)0.731 (0.203)
Recall0.2000.230 (0.145)0.400 (0.000)0.560 (0.332)0.610 (0.325)
F1-score0.2500.224 (0.186)0.300 (0.106)0.466 (0.174)0.611 (0.219)
Participant 4Precision0.1430.600 (0.184)1.000 (0.000)1.000 (0.000)1.000 (0.000)
Recall0.7500.675 (0.225)0.540 (0.049)0.538 (0.089)0.682 (0.075)
F1-score0.2400.570 (0.249)0.700 (0.041)0.695 (0.068)0.809 (0.053)
Participant 5Precision0.4780.800 (0.200)0.813 (0.184)0.922 (0.073)0.933 (0.037)
Recall0.6110.903 (0.133)0.853 (0.174)0.825 (0.194)0.872 (0.090)
F1-score0.5370.829 (0.124)0.805 (0.116)0.857 (0.116)0.898 (0.045)
Participant 6Precision0.6670.340 (0.143)0.850 (0.166)0.940 (0.092)0.990 (0.044)
Recall0.5000.660 (0.358)0.500 (0.000)0.700 (0.245)0.775 (0.249)
F1-score0.5710.406 (0.216)0.624 (0.047)0.776 (0.154)0.847 (0.169)
Participant 7Precision0.2780.748 (0.042)0.825 (0.238)0.736 (0.037)0.980 (0.060)
Recall 1.0000.890 (0.099)0.970 (0.071) 0.930 (0.095) 0.960 (0.080)
F1-score0.4350.811 (0.054)0.870 (0.158) 0.818 (0.038)0.968 (0.059)
Participant 8Precision0.2000.404 (0.344)1.000 (0.000)1.000 (0.000)1.000 (0.000)
Recall1.000 0.667 (0.000) 0.500 (0.167) 0.467 (0.164) 0.667 (0.000)
F1-score0.333 0.436 (0.211) 0.650 (0.150) 0.620 (0.147)0.800 (0.000)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Staffini, A.; Svensson, T.; Chung, U.-i.; Svensson, A.K. A Disentangled VAE-BiLSTM Model for Heart Rate Anomaly Detection. Bioengineering 2023, 10, 683. https://0-doi-org.brum.beds.ac.uk/10.3390/bioengineering10060683

AMA Style

Staffini A, Svensson T, Chung U-i, Svensson AK. A Disentangled VAE-BiLSTM Model for Heart Rate Anomaly Detection. Bioengineering. 2023; 10(6):683. https://0-doi-org.brum.beds.ac.uk/10.3390/bioengineering10060683

Chicago/Turabian Style

Staffini, Alessio, Thomas Svensson, Ung-il Chung, and Akiko Kishi Svensson. 2023. "A Disentangled VAE-BiLSTM Model for Heart Rate Anomaly Detection" Bioengineering 10, no. 6: 683. https://0-doi-org.brum.beds.ac.uk/10.3390/bioengineering10060683

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop