Submit to Mathematics Review for Mathematics Propose a Special Issue

Journal Browser

Recent Advances in Data Mining and Their Applications

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Keywords
Published Papers

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: closed (31 October 2021) | Viewed by 42612

Share This Special Issue

Special Issue Editors

Prof. Dr. Oliviu Matei

E-Mail Website
Guest Editor

Electric, Electronic and Computer Engineering Department, Technical University of Cluj-Napoca, 400114 Cluj-Napoca, Romania
Interests: evolutionary computation; data mining
Special Issues, Collections and Topics in MDPI journals

Prof. Dr. Anca Andreica

E-Mail Website
Guest Editor

Department of Computer Science, Babeş-Bolyai University, 400000 Cluj-Napoca, Romania
Interests: artificial intelligence

Special Issue Information

Dear Colleagues,

Data mining has developed ever since its invention in various specific fields dealing with raw data, media, geospatial, web, and text. Its potential has increased by hybridizing other subsymbolic algorithms, such as genetic algorithms. Data mining adds value in many practical fields, such as health, biology, and emergency situations, but its true power is expressed in the context of the Internet of Things.

The purpose of this Special Issue is to gather a collection of articles reflecting the latest developments in data mining and related fields, both in terms of practical and theoretical applications: Optimization methods parallel and distributed data mining algorithms, learning algorithms, knowledge discovery and extraction, image analysis, classification and clustering, heuristics and metaheuristics, soft computing, operation research, business analytics, and many others.

Contributions are welcome on both theoretical and practical models. The selection criteria will be based on the formal and technical soundness, experimental support, and the relevance of the contribution.

Prof. Dr. Oliviu Matei
Prof. Dr. Anca Andreica
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

Optimization methods
Probabilistic and statistical methods
Parallel and distributed data mining algorithms
Learning algorithms
Feature extraction, selection, and dimensionality reduction
Knowledge discovery and extraction
Image analysis
Classification and clustering
Mining data streams, graphs, and complex data
Mining semistructured data
Spatial data mining
Mining text, web, and social media
Multimedia data mining
Personalization and recommendation systems
Data mining visualization
Heuristics and metaheuristics
Soft computing
Computational complexity
Process modeling
Operation research
Pattern recognition
Uncertainty management

Published Papers (11 papers)

Download All Papers

Research

14 pages, 4077 KiB

Open AccessArticle

Applications of Discrete Wavelet Transform for Feature Extraction to Increase the Accuracy of Monitoring Systems of Liquid Petroleum Products

by Mohammed Balubaid, Mohammad Amir Sattari, Osman Taylan, Ahmed A. Bakhsh and Ehsan Nazemi

Mathematics 2021, 9(24), 3215; https://0-doi-org.brum.beds.ac.uk/10.3390/math9243215 - 13 Dec 2021

Cited by 28 | Viewed by 2466

Abstract

This paper presents a methodology to monitor the liquid petroleum products which pass through transmission pipes. A simulation setup consisting of an X-ray tube, a detector, and a pipe was established using a Monte Carlo n-particle X-version transport code to investigate a two-by-two mixture of four different petroleum products, namely, ethylene glycol, crude oil, gasoline, and gasoil, in deferent volumetric ratios. After collecting the signals of each simulation, discrete wavelet transform (DWT) was applied as the feature extraction system. Then, the statistical feature, named the standard deviation, was calculated from the approximation of the fifth level, and the details of the second to fifth level provide appropriate inputs for neural network training. Three multilayer perceptron neural networks were utilized to predict the volume ratio of three types of petroleum products, and the volume ratio of the fourth product could easily be obtained from the results of the three presented networks. Finally, a root mean square error of less than 1.77 was obtained in predicting the volume ratio, which was much more accurate than in previous research. This high accuracy was due to the use of DWT for feature extraction. Full article

(This article belongs to the Special Issue Recent Advances in Data Mining and Their Applications)

► Show Figures

Figure 1

31 pages, 3705 KiB

Open AccessArticle

AI versus Classic Methods in Modelling Isotopic Separation Processes: Efficiency Comparison

by Vlad Mureșan, Mihaela-Ligia Ungureșan, Mihail Abrudean, Honoriu Vălean, Iulia Clitan, Roxana Motorga, Emilian Ceuca and Marius Fișcă

Mathematics 2021, 9(23), 3088; https://0-doi-org.brum.beds.ac.uk/10.3390/math9233088 - 30 Nov 2021

Cited by 1 | Viewed by 1529

Abstract

In the paper, the comparison between the efficiency of using artificial intelligence methods and the efficiency of using classical methods in modelling the industrial processes is made, considering as a case study the separation process of the ¹⁸O isotope. Firstly, the behavior of the considered isotopic separation process is learned using neural networks. The comparison between the efficiency of these methods is highlighted by the simulations of the process model, using the mentioned modelling techniques. In this context, the final part of the paper presents the proposed model being simulated in different scenarios that can occur in practice, thus resulting in some interesting interpretations and conclusions. The paper proves the feasibility of using artificial intelligence methods for industrial processes modeling; the obtained models being intended for use in designing automatic control systems. Full article

(This article belongs to the Special Issue Recent Advances in Data Mining and Their Applications)

► Show Figures

Figure 1

16 pages, 2629 KiB

Open AccessArticle

Network Analysis Based on Important Node Selection and Community Detection

by Attila Mester, Andrei Pop, Bogdan-Eduard-Mădălin Mursa, Horea Greblă, Laura Dioşan and Camelia Chira

Mathematics 2021, 9(18), 2294; https://0-doi-org.brum.beds.ac.uk/10.3390/math9182294 - 17 Sep 2021

Cited by 16 | Viewed by 3122

Abstract

The stability and robustness of a complex network can be significantly improved by determining important nodes and by analyzing their tendency to group into clusters. Several centrality measures for evaluating the importance of a node in a complex network exist in the literature, each one focusing on a different perspective. Community detection algorithms can be used to determine clusters of nodes based on the network structure. This paper shows by empirical means that node importance can be evaluated by a dual perspective—by combining the traditional centrality measures regarding the whole network as one unit, and by analyzing the node clusters yielded by community detection. Not only do these approaches offer overlapping results but also complementary information regarding the top important nodes. To confirm this mechanism, we performed experiments for synthetic and real-world networks and the results indicate the interesting relation between important nodes on community and network level. Full article

(This article belongs to the Special Issue Recent Advances in Data Mining and Their Applications)

► Show Figures

Figure 1

14 pages, 1378 KiB

Open AccessArticle

Using K-Means Cluster Analysis and Decision Trees to Highlight Significant Factors Leading to Homelessness

by Andrea Yoder Clark, Nicole Blumenfeld, Eric Lal, Shikar Darbari, Shiyang Northwood and Ashkan Wadpey

Mathematics 2021, 9(17), 2045; https://0-doi-org.brum.beds.ac.uk/10.3390/math9172045 - 25 Aug 2021

Cited by 7 | Viewed by 3918

Abstract

Homelessness has been a persistent social concern in the United States. A combination of political and economic events since the 1960s has driven increases in poverty that, by 1991, had surpassed 1928 depression era levels in some accounts. This paper explores how the emerging field of behavioral economics can use machine learning and data science methods to explore preventative responses to homelessness. In this study, machine learning data mining strategies, specifically K-means cluster analysis and later, decision trees, were used to understand how environmental factors and resultant behaviors can contribute to the experience of homelessness. Prevention of the first homeless event is especially important as studies show that if a person has experienced homelessness once, they are 2.6 times more likely to have another homeless episode. Study findings demonstrate that when someone is at risk for not being able to pay utility bills at the same time as they experience challenges with two or more of the other social determinants of health, the individual is statistically significantly more likely to have their first homeless event. Additionally, for men over 50 who are not in the workforce, have a health hardship, and experience two or more other social determinants of health hardships at the same time, the individual has a high statistically significant probability of experiencing homelessness for the first time. Full article

(This article belongs to the Special Issue Recent Advances in Data Mining and Their Applications)

► Show Figures

Figure 1

14 pages, 459 KiB

Open AccessArticle

Modeling Recidivism through Bayesian Regression Models and Deep Neural Networks

by Rolando de la Cruz, Oslando Padilla, Mauricio A. Valle and Gonzalo A. Ruz

Mathematics 2021, 9(6), 639; https://0-doi-org.brum.beds.ac.uk/10.3390/math9060639 - 17 Mar 2021

Cited by 5 | Viewed by 2175

Abstract

This study aims to analyze and explore criminal recidivism with different modeling strategies: one based on an explanation of the phenomenon and another based on a prediction task. We compared three common statistical approaches for modeling recidivism: the logistic regression model, the Cox regression model, and the cure rate model. The parameters of these models were estimated from a Bayesian point of view. Additionally, for prediction purposes, we compared the Cox proportional model, a random survival forest, and a deep neural network. To conduct this study, we used a real dataset that corresponds to a cohort of individuals which consisted of men convicted of sexual crimes against women in 1973 in England and Wales. The results show that the logistic regression model tends to give more precise estimations of the probabilities of recidivism both globally and with the subgroups considered, but at the expense of running a model for each moment of the time that is of interest. The cure rate model with a relatively simple distribution, such as Weibull, provides acceptable estimations, and these tend to be better with longer follow-up periods. The Cox regression model can provide the most biased estimations with certain subgroups. The prediction results show the deep neural network’s superiority compared to the Cox proportional model and the random survival forest. Full article

(This article belongs to the Special Issue Recent Advances in Data Mining and Their Applications)

► Show Figures

Figure 1

25 pages, 4984 KiB

Open AccessArticle

Discovery of Resident Behavior Patterns Using Machine Learning Techniques and IoT Paradigm

by Josimar Reyes-Campos, Giner Alor-Hernández, Isaac Machorro-Cano, José Oscar Olmedo-Aguirre, José Luis Sánchez-Cervantes and Lisbeth Rodríguez-Mazahua

Mathematics 2021, 9(3), 219; https://0-doi-org.brum.beds.ac.uk/10.3390/math9030219 - 22 Jan 2021

Cited by 21 | Viewed by 2769

Abstract

In recent years, technological paradigms such as Internet of Things (IoT) and machine learning have become very important due to the benefit that their application represents in various areas of knowledge. It is interesting to note that implementing these two technologies promotes more and better automatic control systems that adjust to each user’s particular preferences in the home automation area. This work presents Smart Home Control, an intelligent platform that offers fully customized automatic control schemes for a home’s domotic devices by obtaining residents’ behavior patterns and applying machine learning to the records of state changes of each device connected to the platform. The platform uses machine learning algorithm C4.5 and the Weka API to identify the behavior patterns necessary to build home devices’ configuration rules. Besides, an experimental case study that validates the platform’s effectiveness is presented, where behavior patterns of smart homes residents were identified according to the IoT devices usage history. The discovery of behavior patterns is essential to improve the automatic configuration schemes of personalization according to the residents’ history of device use. Full article

(This article belongs to the Special Issue Recent Advances in Data Mining and Their Applications)

► Show Figures

Graphical abstract

21 pages, 5873 KiB

Open AccessArticle

Design and Analysis of a Cluster-Based Intelligent Hybrid Recommendation System for E-Learning Applications

by Sundaresan Bhaskaran, Raja Marappan and Balachandran Santhi

Mathematics 2021, 9(2), 197; https://0-doi-org.brum.beds.ac.uk/10.3390/math9020197 - 19 Jan 2021

Cited by 52 | Viewed by 5179

Abstract

Recently, different recommendation techniques in e-learning have been designed that are helpful to both the learners and the educators in a wide variety of e-learning systems. Customized learning, which requires e-learning systems designed based on educational experience that suit the interests, goals, abilities, and willingness of both the learners and the educators, is required in some situations. In this research, we develop an intelligent recommender using split and conquer strategy-based clustering that can adapt automatically to the requirements, interests, and levels of knowledge of the learners. The recommender analyzes and learns the styles and characteristics of learners automatically. The different styles of learning are processed through the split and conquer strategy-based clustering. The proposed cluster-based linear pattern mining algorithm is applied to extract the functional patterns of the learners. Then, the system provides intelligent recommendations by evaluating the ratings of frequent sequences. Experiments were conducted on different groups of learners and datasets, and the proposed model suggested essential learning activities to learners based on their style of learning, interest classification, and talent features. It was experimentally found that the proposed cluster-based recommender improves the recommendation performance by resulting in more lessons completed when compared to learners present in the no-recommender cluster category. It was found that more than 65% of the learners considered all criteria to evaluate the proposed recommender. The simulation of the proposed recommender showed that for learner size values of <1000, better metric values were produced. When the learner size exceeded 1000, significant differences were obtained in the evaluated metrics. The significant differences were analyzed in terms of a computational structure depending on

|L|

, the recommendation list size, and the attributes of learners. The learners were also satisfied with the accuracy and speed of the recommender. For the sample dataset considered, a significant difference was observed in the standard deviation σ and mean μ of parameters, in terms of the Recall (List, User) and Ranking Score (User) measures, compared to other methods. The devised method performed well concerning all the considered metrics when compared to other methods. The simulation results signify that this recommender minimized the mean absolute error metric for the different clusters in comparison with some well-known methods. Full article

(This article belongs to the Special Issue Recent Advances in Data Mining and Their Applications)

► Show Figures

Figure 1

19 pages, 11754 KiB

Open AccessArticle

Recognizing Human Races through Machine Learning—A Multi-Network, Multi-Features Study

by Adrian Sergiu Darabant, Diana Borza and Radu Danescu

Mathematics 2021, 9(2), 195; https://0-doi-org.brum.beds.ac.uk/10.3390/math9020195 - 19 Jan 2021

Cited by 13 | Viewed by 11040

Abstract

The human face holds a privileged position in multi-disciplinary research as it conveys much information—demographical attributes (age, race, gender, ethnicity), social signals, emotion expression, and so forth. Studies have shown that due to the distribution of ethnicity/race in training datasets, biometric algorithms suffer from “cross race effect”—their performance is better on subjects closer to the “country of origin” of the algorithm. The contributions of this paper are two-fold: (a) first, we gathered, annotated and made public a large-scale database of (over 175,000) facial images by automatically crawling the Internet for celebrities’ images belonging to various ethnicity/races, and (b) we trained and compared four state of the art convolutional neural networks on the problem of race and ethnicity classification. To the best of our knowledge, this is the largest, data-balanced, publicly-available face database annotated with race and ethnicity information. We also studied the impact of various face traits and image characteristics on the race/ethnicity deep learning classification methods and compared the obtained results with the ones extracted from psychological studies and anthropomorphic studies. Extensive tests were performed in order to determine the facial features to which the networks are sensitive to. These tests and a recognition rate of 96.64% on the problem of human race classification demonstrate the effectiveness of the proposed solution. Full article

(This article belongs to the Special Issue Recent Advances in Data Mining and Their Applications)

► Show Figures

Figure 1

25 pages, 437 KiB

Open AccessArticle

A Comparative Performance Evaluation of Classification Algorithms for Clinical Decision Support Systems

by Bayu Adhi Tama and Sunghoon Lim

Mathematics 2020, 8(10), 1814; https://0-doi-org.brum.beds.ac.uk/10.3390/math8101814 - 16 Oct 2020

Cited by 17 | Viewed by 3159

Abstract

Classification algorithms are widely taken into account for clinical decision support systems. However, it is not always straightforward to understand the behavior of such algorithms on a multiple disease prediction task. When a new classifier is introduced, we, in most cases, will ask ourselves whether the classifier performs well on a particular clinical dataset or not. The decision to utilize classifiers mostly relies upon the type of data and classification task, thus making it often made arbitrarily. In this study, a comparative evaluation of a wide-array classifier pertaining to six different families, i.e., tree, ensemble, neural, probability, discriminant, and rule-based classifiers are dealt with. A number of real-world publicly datasets ranging from different diseases are taken into account in the experiment in order to demonstrate the generalizability of the classifiers in multiple disease prediction. A total of 25 classifiers, 14 datasets, and three different resampling techniques are explored. This study reveals that the classifier that is likely to become the best performer is the conditional inference tree forest (cforest), followed by linear discriminant analysis, generalize linear model, random forest, and Gaussian process classifier. This work contributes to existing literature regarding a thorough benchmark of classification algorithms for multiple diseases prediction. Full article

(This article belongs to the Special Issue Recent Advances in Data Mining and Their Applications)

► Show Figures

Figure 1

18 pages, 3989 KiB

Open AccessArticle

Feasibility of Automatic Seed Generation Applied to Cardiac MRI Image Analysis

by Radu Mărginean, Anca Andreica, Laura Dioşan and Zoltán Bálint

Mathematics 2020, 8(9), 1511; https://0-doi-org.brum.beds.ac.uk/10.3390/math8091511 - 04 Sep 2020

Cited by 7 | Viewed by 1978

Abstract

We present a method of using interactive image segmentation algorithms to reduce specific image segmentation problems to the task of finding small sets of pixels identifying the regions of interest. To this end, we empirically show the feasibility of automatically generating seeds for GrowCut, a popular interactive image segmentation algorithm. The principal contribution of our paper is the proposal of a method for automating the seed generation method for the task of whole-heart segmentation of MRI scans, which achieves competitive unsupervised results (0.76 Dice on the MMWHS dataset). Moreover, we show that segmentation performance is robust to seeds with imperfect precision, suggesting that GrowCut-like algorithms can be applied to medical imaging tasks with little modeling effort. Full article

(This article belongs to the Special Issue Recent Advances in Data Mining and Their Applications)

► Show Figures

Figure 1

21 pages, 4796 KiB

Open AccessArticle

Soil Temperature Estimation with Meteorological Parameters by Using Tree-Based Hybrid Data Mining Models

by Mohammad Taghi Sattari, Anca Avram, Halit Apaydin and Oliviu Matei

Mathematics 2020, 8(9), 1407; https://0-doi-org.brum.beds.ac.uk/10.3390/math8091407 - 21 Aug 2020

Cited by 15 | Viewed by 3431

Abstract

The temperature of the soil at different depths is one of the most important factors used in different disciplines, such as hydrology, soil science, civil engineering, construction, geotechnology, ecology, meteorology, agriculture, and environmental studies. In addition to physical and spatial variables, meteorological elements are also effective in changing soil temperatures at different depths. The use of machine-learning models is increasing day by day in many complex and nonlinear branches of science. These data-driven models seek solutions to complex and nonlinear problems using data observed in the past. In this research, decision tree (DT), gradient boosted trees (GBT), and hybrid DT–GBT models were used to estimate soil temperature. The soil temperatures at 5, 10, and 20 cm depths were estimated using the daily minimum, maximum, and mean temperature; sunshine intensity and duration, and precipitation data measured between 1993 and 2018 at Divrigi station in Sivas province in Turkey. To predict the soil temperature at different depths, the time windowing technique was used on the input data. According to the results, hybrid DT–GBT, GBT, and DT methods estimated the soil temperature at 5 cm depth the most successfully, respectively. However, the best estimate was obtained with the DT model at soil depths of 10 and 20 cm. According to the results of the research, the accuracy rate of the models has also increased with increasing soil depth. In the prediction of soil temperature, sunshine duration and air temperature were determined as the most important factors and precipitation was the most insignificant meteorological variable. According to the evaluation criteria, such as Nash-Sutcliffe coefficient, R, MAE, RMSE, and Taylor diagrams used, it is recommended that all three (DT, GBT, and hybrid DT–GBT) data-based models can be used for predicting soil temperature. Full article

(This article belongs to the Special Issue Recent Advances in Data Mining and Their Applications)

► Show Figures