Data Mining for Temporal Data Analysis

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Engineering Mathematics".

Deadline for manuscript submissions: closed (1 December 2022) | Viewed by 21453

Special Issue Editor


E-Mail Website
Guest Editor
Universite Rennes, Inria, CNRS, IRISA, Rennes, France
Interests: explanaible AI (Interpretability of data mining/machine learning models and explainable decisions); Time Series Analysis (anomaly detection, classification, forecasting); Outdoor scene analysis with deep learning (problems related to the fusion of multimodal data, domain selection, domain adaptation, class imbalance, ...)

Special Issue Information

Dear Colleagues,

Temporal data in general and times series in particular are ubiquitous in our current world. They are recorded from various sensors in many application domains ranging from bio-informatics, computer vision, natural language processing, … to medicine, finance or engineering (as a mean to build, for example, smart cities). Contrarily to static data, temporal data are of complex nature, they are generally noisy, of high dimensionality, they may be non-stationary, they may have several invariant domain-dependent factors as time delay, translation, scale or trend effects. These temporal peculiarities pose a challenge to standard statistical models and machine learning approaches, that mainly assume i.i.d data, homoscedasticity, normality of residuals, etc.

To tackle such challenging data, we invite our colleagues to submit papers that propose new advanced approaches at the intersection of statistics, time series analysis, signal processing and machine learning.

Prof. Dr. Élisa Fromont
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Classification and regression of univariate and multivariate time series
  • Early classification of temporal data
  • Deep learning and learning representations for temporal data Modeling temporal dependencies
  • Time series forecasting
  • Time series annotation, segmentation and anomaly detection
  • Temporal data clustering
  • Spatial-temporal statistical analysis
  • Explainable temporal data analysis
  • Data mining methods for data streams

Published Papers (6 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

31 pages, 1435 KiB  
Article
Error Correction Based Deep Neural Networks for Modeling and Predicting South African Wildlife–Vehicle Collision Data
by Irene Nandutu, Marcellin Atemkeng, Nokubonga Mgqatsa, Sakayo Toadoum Sari, Patrice Okouma, Rockefeller Rockefeller, Theophilus Ansah-Narh, Jean Louis Ebongue Kedieng Fendji and Franklin Tchakounte
Mathematics 2022, 10(21), 3988; https://0-doi-org.brum.beds.ac.uk/10.3390/math10213988 - 27 Oct 2022
Viewed by 1926
Abstract
The seasonal autoregressive integrated moving average with exogenous factors (SARIMAX) has shown promising results in modeling small and sparse observed time-series data by capturing linear features using independent and dependent variables. Long short-term memory (LSTM) is a promising neural network for learning nonlinear [...] Read more.
The seasonal autoregressive integrated moving average with exogenous factors (SARIMAX) has shown promising results in modeling small and sparse observed time-series data by capturing linear features using independent and dependent variables. Long short-term memory (LSTM) is a promising neural network for learning nonlinear dependence features from data. With the increase in wildlife roadkill patterns, the SARIMAX-only and LSTM-only models would likely fail to learn the precise endogenous and/or exogenous variables driven by this wildlife roadkill data. In this paper, we design and implement an error correction mathematical framework based on LSTM-only. The framework extracts features from the residual error generated by a SARIMAX-only model. The learned residual features correct the output time-series prediction of the SARIMAX-only model. The process combines SARIMAX-only predictions and LSTM-only residual predictions to obtain a hybrid SARIMAX-LSTM. The models are evaluated using South African wildlife–vehicle collision datasets, and the experiments show that compared to single models, SARIMAX-LSTM increases the accuracy of a taxon whose linear components outweigh the nonlinear ones. In addition, the hybrid model fails to outperform LSTM-only when a taxon contains more nonlinear components rather than linear components. Our assumption of the results is that the collected exogenous and endogenous data are insufficient, which limits the hybrid model’s performance since it cannot accurately detect seasonality on residuals from SARIMAX-only and minimize the SARIMAX-LSTM error. We conclude that the error correction framework should be preferred over single models in wildlife time-series modeling and predictions when a dataset contains more linear components. Adding more related data may improve the prediction performance of SARIMAX-LSTM. Full article
(This article belongs to the Special Issue Data Mining for Temporal Data Analysis)
Show Figures

Figure 1

17 pages, 1114 KiB  
Article
Incremental Decision Rules Algorithm: A Probabilistic and Dynamic Approach to Decisional Data Stream Problems
by Nuria Mollá, Alejandro Rabasa, Jesús J. Rodríguez-Sala, Joaquín Sánchez-Soriano and Antonio Ferrándiz
Mathematics 2022, 10(1), 16; https://0-doi-org.brum.beds.ac.uk/10.3390/math10010016 - 21 Dec 2021
Cited by 1 | Viewed by 2415
Abstract
Data science is currently one of the most promising fields used to support the decision-making process. Particularly, data streams can give these supportive systems an updated base of knowledge that allows experts to make decisions with updated models. Incremental Decision Rules Algorithm (IDRA) [...] Read more.
Data science is currently one of the most promising fields used to support the decision-making process. Particularly, data streams can give these supportive systems an updated base of knowledge that allows experts to make decisions with updated models. Incremental Decision Rules Algorithm (IDRA) proposes a new incremental decision-rule method based on the classical ID3 approach to generating and updating a rule set. This algorithm is a novel approach designed to fit a Decision Support System (DSS) whose motivation is to give accurate responses in an affordable time for a decision situation. This work includes several experiments that compare IDRA with the classical static but optimized ID3 (CREA) and the adaptive method VFDR. A battery of scenarios with different error types and rates are proposed to compare these three algorithms. IDRA improves the accuracies of VFDR and CREA in most common cases for the simulated data streams used in this work. In particular, the proposed technique has proven to perform better in those scenarios with no error, low noise, or high-impact concept drifts. Full article
(This article belongs to the Special Issue Data Mining for Temporal Data Analysis)
Show Figures

Figure 1

19 pages, 901 KiB  
Article
XCM: An Explainable Convolutional Neural Network for Multivariate Time Series Classification
by Kevin Fauvel, Tao Lin, Véronique Masson, Élisa Fromont and Alexandre Termier
Mathematics 2021, 9(23), 3137; https://0-doi-org.brum.beds.ac.uk/10.3390/math9233137 - 05 Dec 2021
Cited by 44 | Viewed by 7162
Abstract
Multivariate Time Series (MTS) classification has gained importance over the past decade with the increase in the number of temporal datasets in multiple domains. The current state-of-the-art MTS classifier is a heavyweight deep learning approach, which outperforms the second-best MTS classifier only on [...] Read more.
Multivariate Time Series (MTS) classification has gained importance over the past decade with the increase in the number of temporal datasets in multiple domains. The current state-of-the-art MTS classifier is a heavyweight deep learning approach, which outperforms the second-best MTS classifier only on large datasets. Moreover, this deep learning approach cannot provide faithful explanations as it relies on post hoc model-agnostic explainability methods, which could prevent its use in numerous applications. In this paper, we present XCM, an eXplainable Convolutional neural network for MTS classification. XCM is a new compact convolutional neural network which extracts information relative to the observed variables and time directly from the input data. Thus, XCM architecture enables a good generalization ability on both large and small datasets, while allowing the full exploitation of a faithful post hoc model-specific explainability method (Gradient-weighted Class Activation Mapping) by precisely identifying the observed variables and timestamps of the input data that are important for predictions. We first show that XCM outperforms the state-of-the-art MTS classifiers on both the large and small public UEA datasets. Then, we illustrate how XCM reconciles performance and explainability on a synthetic dataset and show that XCM enables a more precise identification of the regions of the input data that are important for predictions compared to the current deep learning MTS classifier also providing faithful explainability. Finally, we present how XCM can outperform the current most accurate state-of-the-art algorithm on a real-world application while enhancing explainability by providing faithful and more informative explanations. Full article
(This article belongs to the Special Issue Data Mining for Temporal Data Analysis)
Show Figures

Figure 1

26 pages, 874 KiB  
Article
F4: An All-Purpose Tool for Multivariate Time Series Classification
by Ángel López-Oriona and José A. Vilar
Mathematics 2021, 9(23), 3051; https://0-doi-org.brum.beds.ac.uk/10.3390/math9233051 - 27 Nov 2021
Cited by 4 | Viewed by 1948
Abstract
We propose Fast Forest of Flexible Features (F4), a novel approach for classifying multivariate time series, which is aimed to discriminate between underlying generating processes. This goal has barely been addressed in the literature. F4 consists of two steps. First, a set of [...] Read more.
We propose Fast Forest of Flexible Features (F4), a novel approach for classifying multivariate time series, which is aimed to discriminate between underlying generating processes. This goal has barely been addressed in the literature. F4 consists of two steps. First, a set of features based on the quantile cross-spectral density and the maximum overlap discrete wavelet transform are extracted from each series. Second, a random forest is fed with the extracted features. An extensive simulation study shows that F4 outperforms some powerful classifiers in a wide variety of situations, including stationary and nonstationary series. The proposed method is also capable of successfully discriminating between electrocardiogram (ECG) signals of healthy subjects and those with myocardial infarction condition. Additionally, despite lacking shape-based information, F4 attains state-of-the-art results in some datasets of the University of East Anglia (UEA) multivariate time series classification archive. Full article
(This article belongs to the Special Issue Data Mining for Temporal Data Analysis)
Show Figures

Figure 1

17 pages, 4413 KiB  
Article
Time Series Clustering with Topological and Geometric Mixed Distance
by Yunsheng Zhang, Qingzhang Shi, Jiawei Zhu, Jian Peng and Haifeng Li
Mathematics 2021, 9(9), 1046; https://0-doi-org.brum.beds.ac.uk/10.3390/math9091046 - 06 May 2021
Cited by 3 | Viewed by 3174
Abstract
Time series clustering is an essential ingredient of unsupervised learning techniques. It provides an understanding of the intrinsic properties of data upon exploiting similarity measures. Traditional similarity-based methods usually consider local geometric properties of raw time series or the global topological properties of [...] Read more.
Time series clustering is an essential ingredient of unsupervised learning techniques. It provides an understanding of the intrinsic properties of data upon exploiting similarity measures. Traditional similarity-based methods usually consider local geometric properties of raw time series or the global topological properties of time series in the phase space. In order to overcome their limitations, we put forward a time series clustering framework, referred to as time series clustering with Topological-Geometric Mixed Distance (TGMD), which jointly considers local geometric features and global topological characteristics of time series data. More specifically, persistent homology is employed to extract topological features of time series and to compute topological similarities among persistence diagrams. The geometric properties of raw time series are captured by using shape-based similarity measures such as Euclidean distance and dynamic time warping. The effectiveness of the proposed TGMD method is assessed by extensive experiments on synthetic noisy biological and real time series data. The results reveal that the proposed mixed distance-based similarity measure can lead to promising results and that it performs better than standard time series analysis techniques that consider only topological or geometrical similarity. Full article
(This article belongs to the Special Issue Data Mining for Temporal Data Analysis)
Show Figures

Figure 1

17 pages, 2819 KiB  
Article
DBTMPE: Deep Bidirectional Transformers-Based Masked Predictive Encoder Approach for Music Genre Classification
by Lvyang Qiu, Shuyu Li and Yunsick Sung
Mathematics 2021, 9(5), 530; https://0-doi-org.brum.beds.ac.uk/10.3390/math9050530 - 03 Mar 2021
Cited by 18 | Viewed by 2871
Abstract
Music is a type of time-series data. As the size of the data increases, it is a challenge to build robust music genre classification systems from massive amounts of music data. Robust systems require large amounts of labeled music data, which necessitates time- [...] Read more.
Music is a type of time-series data. As the size of the data increases, it is a challenge to build robust music genre classification systems from massive amounts of music data. Robust systems require large amounts of labeled music data, which necessitates time- and labor-intensive data-labeling efforts and expert knowledge. This paper proposes a musical instrument digital interface (MIDI) preprocessing method, Pitch to Vector (Pitch2vec), and a deep bidirectional transformers-based masked predictive encoder (MPE) method for music genre classification. The MIDI files are considered as input. MIDI files are converted to the vector sequence by Pitch2vec before being input into the MPE. By unsupervised learning, the MPE based on deep bidirectional transformers is designed to extract bidirectional representations automatically, which are musicological insight. In contrast to other deep-learning models, such as recurrent neural network (RNN)-based models, the MPE method enables parallelization over time-steps, leading to faster training. To evaluate the performance of the proposed method, experiments were conducted on the Lakh MIDI music dataset. During MPE training, approximately 400,000 MIDI segments were utilized for the MPE, for which the recovery accuracy rate reached 97%. In the music genre classification task, the accuracy rate and other indicators of the proposed method were more than 94%. The experimental results indicate that the proposed method improves classification performance compared with state-of-the-art models. Full article
(This article belongs to the Special Issue Data Mining for Temporal Data Analysis)
Show Figures

Figure 1

Back to TopTop