Recent Advances in Data Mining and Their Applications

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: closed (31 October 2021) | Viewed by 42612

Special Issue Editors


E-Mail Website
Guest Editor
Electric, Electronic and Computer Engineering Department, Technical University of Cluj-Napoca, 400114 Cluj-Napoca, Romania
Interests: evolutionary computation; data mining
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Computer Science, Babeş-Bolyai University, 400000 Cluj-Napoca, Romania
Interests: artificial intelligence

Special Issue Information

Dear Colleagues,

Data mining has developed ever since its invention in various specific fields dealing with raw data, media, geospatial, web, and text. Its potential has increased by hybridizing other subsymbolic algorithms, such as genetic algorithms. Data mining adds value in many practical fields, such as health, biology, and emergency situations, but its true power is expressed in the context of the Internet of Things.

The purpose of this Special Issue is to gather a collection of articles reflecting the latest developments in data mining and related fields, both in terms of practical and theoretical applications: Optimization methods parallel and distributed data mining algorithms, learning algorithms, knowledge discovery and extraction, image analysis, classification and clustering, heuristics and metaheuristics, soft computing, operation research, business analytics, and many others.

Contributions are welcome on both theoretical and practical models. The selection criteria will be based on the formal and technical soundness, experimental support, and the relevance of the contribution.

Prof. Dr. Oliviu Matei
Prof. Dr. Anca Andreica
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Optimization methods
  • Probabilistic and statistical methods
  • Parallel and distributed data mining algorithms
  • Learning algorithms
  • Feature extraction, selection, and dimensionality reduction
  • Knowledge discovery and extraction
  • Image analysis
  • Classification and clustering
  • Mining data streams, graphs, and complex data
  • Mining semistructured data
  • Spatial data mining
  • Mining text, web, and social media
  • Multimedia data mining
  • Personalization and recommendation systems
  • Data mining visualization
  • Heuristics and metaheuristics
  • Soft computing
  • Computational complexity
  • Process modeling
  • Operation research
  • Pattern recognition
  • Uncertainty management

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

14 pages, 4077 KiB  
Article
Applications of Discrete Wavelet Transform for Feature Extraction to Increase the Accuracy of Monitoring Systems of Liquid Petroleum Products
by Mohammed Balubaid, Mohammad Amir Sattari, Osman Taylan, Ahmed A. Bakhsh and Ehsan Nazemi
Mathematics 2021, 9(24), 3215; https://0-doi-org.brum.beds.ac.uk/10.3390/math9243215 - 13 Dec 2021
Cited by 28 | Viewed by 2466
Abstract
This paper presents a methodology to monitor the liquid petroleum products which pass through transmission pipes. A simulation setup consisting of an X-ray tube, a detector, and a pipe was established using a Monte Carlo n-particle X-version transport code to investigate a two-by-two [...] Read more.
This paper presents a methodology to monitor the liquid petroleum products which pass through transmission pipes. A simulation setup consisting of an X-ray tube, a detector, and a pipe was established using a Monte Carlo n-particle X-version transport code to investigate a two-by-two mixture of four different petroleum products, namely, ethylene glycol, crude oil, gasoline, and gasoil, in deferent volumetric ratios. After collecting the signals of each simulation, discrete wavelet transform (DWT) was applied as the feature extraction system. Then, the statistical feature, named the standard deviation, was calculated from the approximation of the fifth level, and the details of the second to fifth level provide appropriate inputs for neural network training. Three multilayer perceptron neural networks were utilized to predict the volume ratio of three types of petroleum products, and the volume ratio of the fourth product could easily be obtained from the results of the three presented networks. Finally, a root mean square error of less than 1.77 was obtained in predicting the volume ratio, which was much more accurate than in previous research. This high accuracy was due to the use of DWT for feature extraction. Full article
(This article belongs to the Special Issue Recent Advances in Data Mining and Their Applications)
Show Figures

Figure 1

31 pages, 3705 KiB  
Article
AI versus Classic Methods in Modelling Isotopic Separation Processes: Efficiency Comparison
by Vlad Mureșan, Mihaela-Ligia Ungureșan, Mihail Abrudean, Honoriu Vălean, Iulia Clitan, Roxana Motorga, Emilian Ceuca and Marius Fișcă
Mathematics 2021, 9(23), 3088; https://0-doi-org.brum.beds.ac.uk/10.3390/math9233088 - 30 Nov 2021
Cited by 1 | Viewed by 1529
Abstract
In the paper, the comparison between the efficiency of using artificial intelligence methods and the efficiency of using classical methods in modelling the industrial processes is made, considering as a case study the separation process of the 18O isotope. Firstly, the behavior [...] Read more.
In the paper, the comparison between the efficiency of using artificial intelligence methods and the efficiency of using classical methods in modelling the industrial processes is made, considering as a case study the separation process of the 18O isotope. Firstly, the behavior of the considered isotopic separation process is learned using neural networks. The comparison between the efficiency of these methods is highlighted by the simulations of the process model, using the mentioned modelling techniques. In this context, the final part of the paper presents the proposed model being simulated in different scenarios that can occur in practice, thus resulting in some interesting interpretations and conclusions. The paper proves the feasibility of using artificial intelligence methods for industrial processes modeling; the obtained models being intended for use in designing automatic control systems. Full article
(This article belongs to the Special Issue Recent Advances in Data Mining and Their Applications)
Show Figures

Figure 1

16 pages, 2629 KiB  
Article
Network Analysis Based on Important Node Selection and Community Detection
by Attila Mester, Andrei Pop, Bogdan-Eduard-Mădălin Mursa, Horea Greblă, Laura Dioşan and Camelia Chira
Mathematics 2021, 9(18), 2294; https://0-doi-org.brum.beds.ac.uk/10.3390/math9182294 - 17 Sep 2021
Cited by 16 | Viewed by 3122
Abstract
The stability and robustness of a complex network can be significantly improved by determining important nodes and by analyzing their tendency to group into clusters. Several centrality measures for evaluating the importance of a node in a complex network exist in the literature, [...] Read more.
The stability and robustness of a complex network can be significantly improved by determining important nodes and by analyzing their tendency to group into clusters. Several centrality measures for evaluating the importance of a node in a complex network exist in the literature, each one focusing on a different perspective. Community detection algorithms can be used to determine clusters of nodes based on the network structure. This paper shows by empirical means that node importance can be evaluated by a dual perspective—by combining the traditional centrality measures regarding the whole network as one unit, and by analyzing the node clusters yielded by community detection. Not only do these approaches offer overlapping results but also complementary information regarding the top important nodes. To confirm this mechanism, we performed experiments for synthetic and real-world networks and the results indicate the interesting relation between important nodes on community and network level. Full article
(This article belongs to the Special Issue Recent Advances in Data Mining and Their Applications)
Show Figures

Figure 1

14 pages, 1378 KiB  
Article
Using K-Means Cluster Analysis and Decision Trees to Highlight Significant Factors Leading to Homelessness
by Andrea Yoder Clark, Nicole Blumenfeld, Eric Lal, Shikar Darbari, Shiyang Northwood and Ashkan Wadpey
Mathematics 2021, 9(17), 2045; https://0-doi-org.brum.beds.ac.uk/10.3390/math9172045 - 25 Aug 2021
Cited by 7 | Viewed by 3918
Abstract
Homelessness has been a persistent social concern in the United States. A combination of political and economic events since the 1960s has driven increases in poverty that, by 1991, had surpassed 1928 depression era levels in some accounts. This paper explores how the [...] Read more.
Homelessness has been a persistent social concern in the United States. A combination of political and economic events since the 1960s has driven increases in poverty that, by 1991, had surpassed 1928 depression era levels in some accounts. This paper explores how the emerging field of behavioral economics can use machine learning and data science methods to explore preventative responses to homelessness. In this study, machine learning data mining strategies, specifically K-means cluster analysis and later, decision trees, were used to understand how environmental factors and resultant behaviors can contribute to the experience of homelessness. Prevention of the first homeless event is especially important as studies show that if a person has experienced homelessness once, they are 2.6 times more likely to have another homeless episode. Study findings demonstrate that when someone is at risk for not being able to pay utility bills at the same time as they experience challenges with two or more of the other social determinants of health, the individual is statistically significantly more likely to have their first homeless event. Additionally, for men over 50 who are not in the workforce, have a health hardship, and experience two or more other social determinants of health hardships at the same time, the individual has a high statistically significant probability of experiencing homelessness for the first time. Full article
(This article belongs to the Special Issue Recent Advances in Data Mining and Their Applications)
Show Figures

Figure 1

14 pages, 459 KiB  
Article
Modeling Recidivism through Bayesian Regression Models and Deep Neural Networks
by Rolando de la Cruz, Oslando Padilla, Mauricio A. Valle and Gonzalo A. Ruz
Mathematics 2021, 9(6), 639; https://0-doi-org.brum.beds.ac.uk/10.3390/math9060639 - 17 Mar 2021
Cited by 5 | Viewed by 2175
Abstract
This study aims to analyze and explore criminal recidivism with different modeling strategies: one based on an explanation of the phenomenon and another based on a prediction task. We compared three common statistical approaches for modeling recidivism: the logistic regression model, the Cox [...] Read more.
This study aims to analyze and explore criminal recidivism with different modeling strategies: one based on an explanation of the phenomenon and another based on a prediction task. We compared three common statistical approaches for modeling recidivism: the logistic regression model, the Cox regression model, and the cure rate model. The parameters of these models were estimated from a Bayesian point of view. Additionally, for prediction purposes, we compared the Cox proportional model, a random survival forest, and a deep neural network. To conduct this study, we used a real dataset that corresponds to a cohort of individuals which consisted of men convicted of sexual crimes against women in 1973 in England and Wales. The results show that the logistic regression model tends to give more precise estimations of the probabilities of recidivism both globally and with the subgroups considered, but at the expense of running a model for each moment of the time that is of interest. The cure rate model with a relatively simple distribution, such as Weibull, provides acceptable estimations, and these tend to be better with longer follow-up periods. The Cox regression model can provide the most biased estimations with certain subgroups. The prediction results show the deep neural network’s superiority compared to the Cox proportional model and the random survival forest. Full article
(This article belongs to the Special Issue Recent Advances in Data Mining and Their Applications)
Show Figures

Figure 1

25 pages, 4984 KiB  
Article
Discovery of Resident Behavior Patterns Using Machine Learning Techniques and IoT Paradigm
by Josimar Reyes-Campos, Giner Alor-Hernández, Isaac Machorro-Cano, José Oscar Olmedo-Aguirre, José Luis Sánchez-Cervantes and Lisbeth Rodríguez-Mazahua
Mathematics 2021, 9(3), 219; https://0-doi-org.brum.beds.ac.uk/10.3390/math9030219 - 22 Jan 2021
Cited by 21 | Viewed by 2769
Abstract
In recent years, technological paradigms such as Internet of Things (IoT) and machine learning have become very important due to the benefit that their application represents in various areas of knowledge. It is interesting to note that implementing these two technologies promotes more [...] Read more.
In recent years, technological paradigms such as Internet of Things (IoT) and machine learning have become very important due to the benefit that their application represents in various areas of knowledge. It is interesting to note that implementing these two technologies promotes more and better automatic control systems that adjust to each user’s particular preferences in the home automation area. This work presents Smart Home Control, an intelligent platform that offers fully customized automatic control schemes for a home’s domotic devices by obtaining residents’ behavior patterns and applying machine learning to the records of state changes of each device connected to the platform. The platform uses machine learning algorithm C4.5 and the Weka API to identify the behavior patterns necessary to build home devices’ configuration rules. Besides, an experimental case study that validates the platform’s effectiveness is presented, where behavior patterns of smart homes residents were identified according to the IoT devices usage history. The discovery of behavior patterns is essential to improve the automatic configuration schemes of personalization according to the residents’ history of device use. Full article
(This article belongs to the Special Issue Recent Advances in Data Mining and Their Applications)
Show Figures

Graphical abstract

21 pages, 5873 KiB  
Article
Design and Analysis of a Cluster-Based Intelligent Hybrid Recommendation System for E-Learning Applications
by Sundaresan Bhaskaran, Raja Marappan and Balachandran Santhi
Mathematics 2021, 9(2), 197; https://0-doi-org.brum.beds.ac.uk/10.3390/math9020197 - 19 Jan 2021
Cited by 52 | Viewed by 5179
Abstract
Recently, different recommendation techniques in e-learning have been designed that are helpful to both the learners and the educators in a wide variety of e-learning systems. Customized learning, which requires e-learning systems designed based on educational experience that suit the interests, goals, abilities, [...] Read more.
Recently, different recommendation techniques in e-learning have been designed that are helpful to both the learners and the educators in a wide variety of e-learning systems. Customized learning, which requires e-learning systems designed based on educational experience that suit the interests, goals, abilities, and willingness of both the learners and the educators, is required in some situations. In this research, we develop an intelligent recommender using split and conquer strategy-based clustering that can adapt automatically to the requirements, interests, and levels of knowledge of the learners. The recommender analyzes and learns the styles and characteristics of learners automatically. The different styles of learning are processed through the split and conquer strategy-based clustering. The proposed cluster-based linear pattern mining algorithm is applied to extract the functional patterns of the learners. Then, the system provides intelligent recommendations by evaluating the ratings of frequent sequences. Experiments were conducted on different groups of learners and datasets, and the proposed model suggested essential learning activities to learners based on their style of learning, interest classification, and talent features. It was experimentally found that the proposed cluster-based recommender improves the recommendation performance by resulting in more lessons completed when compared to learners present in the no-recommender cluster category. It was found that more than 65% of the learners considered all criteria to evaluate the proposed recommender. The simulation of the proposed recommender showed that for learner size values of <1000, better metric values were produced. When the learner size exceeded 1000, significant differences were obtained in the evaluated metrics. The significant differences were analyzed in terms of a computational structure depending on L, the recommendation list size, and the attributes of learners. The learners were also satisfied with the accuracy and speed of the recommender. For the sample dataset considered, a significant difference was observed in the standard deviation σ and mean μ of parameters, in terms of the Recall (List, User) and Ranking Score (User) measures, compared to other methods. The devised method performed well concerning all the considered metrics when compared to other methods. The simulation results signify that this recommender minimized the mean absolute error metric for the different clusters in comparison with some well-known methods. Full article
(This article belongs to the Special Issue Recent Advances in Data Mining and Their Applications)
Show Figures

Figure 1

19 pages, 11754 KiB  
Article
Recognizing Human Races through Machine Learning—A Multi-Network, Multi-Features Study
by Adrian Sergiu Darabant, Diana Borza and Radu Danescu
Mathematics 2021, 9(2), 195; https://0-doi-org.brum.beds.ac.uk/10.3390/math9020195 - 19 Jan 2021
Cited by 13 | Viewed by 11040
Abstract
The human face holds a privileged position in multi-disciplinary research as it conveys much information—demographical attributes (age, race, gender, ethnicity), social signals, emotion expression, and so forth. Studies have shown that due to the distribution of ethnicity/race in training datasets, biometric algorithms suffer [...] Read more.
The human face holds a privileged position in multi-disciplinary research as it conveys much information—demographical attributes (age, race, gender, ethnicity), social signals, emotion expression, and so forth. Studies have shown that due to the distribution of ethnicity/race in training datasets, biometric algorithms suffer from “cross race effect”—their performance is better on subjects closer to the “country of origin” of the algorithm. The contributions of this paper are two-fold: (a) first, we gathered, annotated and made public a large-scale database of (over 175,000) facial images by automatically crawling the Internet for celebrities’ images belonging to various ethnicity/races, and (b) we trained and compared four state of the art convolutional neural networks on the problem of race and ethnicity classification. To the best of our knowledge, this is the largest, data-balanced, publicly-available face database annotated with race and ethnicity information. We also studied the impact of various face traits and image characteristics on the race/ethnicity deep learning classification methods and compared the obtained results with the ones extracted from psychological studies and anthropomorphic studies. Extensive tests were performed in order to determine the facial features to which the networks are sensitive to. These tests and a recognition rate of 96.64% on the problem of human race classification demonstrate the effectiveness of the proposed solution. Full article
(This article belongs to the Special Issue Recent Advances in Data Mining and Their Applications)
Show Figures

Figure 1

25 pages, 437 KiB  
Article
A Comparative Performance Evaluation of Classification Algorithms for Clinical Decision Support Systems
by Bayu Adhi Tama and Sunghoon Lim
Mathematics 2020, 8(10), 1814; https://0-doi-org.brum.beds.ac.uk/10.3390/math8101814 - 16 Oct 2020
Cited by 17 | Viewed by 3159
Abstract
Classification algorithms are widely taken into account for clinical decision support systems. However, it is not always straightforward to understand the behavior of such algorithms on a multiple disease prediction task. When a new classifier is introduced, we, in most cases, will ask [...] Read more.
Classification algorithms are widely taken into account for clinical decision support systems. However, it is not always straightforward to understand the behavior of such algorithms on a multiple disease prediction task. When a new classifier is introduced, we, in most cases, will ask ourselves whether the classifier performs well on a particular clinical dataset or not. The decision to utilize classifiers mostly relies upon the type of data and classification task, thus making it often made arbitrarily. In this study, a comparative evaluation of a wide-array classifier pertaining to six different families, i.e., tree, ensemble, neural, probability, discriminant, and rule-based classifiers are dealt with. A number of real-world publicly datasets ranging from different diseases are taken into account in the experiment in order to demonstrate the generalizability of the classifiers in multiple disease prediction. A total of 25 classifiers, 14 datasets, and three different resampling techniques are explored. This study reveals that the classifier that is likely to become the best performer is the conditional inference tree forest (cforest), followed by linear discriminant analysis, generalize linear model, random forest, and Gaussian process classifier. This work contributes to existing literature regarding a thorough benchmark of classification algorithms for multiple diseases prediction. Full article
(This article belongs to the Special Issue Recent Advances in Data Mining and Their Applications)
Show Figures

Figure 1

18 pages, 3989 KiB  
Article
Feasibility of Automatic Seed Generation Applied to Cardiac MRI Image Analysis
by Radu Mărginean, Anca Andreica, Laura Dioşan and Zoltán Bálint
Mathematics 2020, 8(9), 1511; https://0-doi-org.brum.beds.ac.uk/10.3390/math8091511 - 04 Sep 2020
Cited by 7 | Viewed by 1978
Abstract
We present a method of using interactive image segmentation algorithms to reduce specific image segmentation problems to the task of finding small sets of pixels identifying the regions of interest. To this end, we empirically show the feasibility of automatically generating seeds for [...] Read more.
We present a method of using interactive image segmentation algorithms to reduce specific image segmentation problems to the task of finding small sets of pixels identifying the regions of interest. To this end, we empirically show the feasibility of automatically generating seeds for GrowCut, a popular interactive image segmentation algorithm. The principal contribution of our paper is the proposal of a method for automating the seed generation method for the task of whole-heart segmentation of MRI scans, which achieves competitive unsupervised results (0.76 Dice on the MMWHS dataset). Moreover, we show that segmentation performance is robust to seeds with imperfect precision, suggesting that GrowCut-like algorithms can be applied to medical imaging tasks with little modeling effort. Full article
(This article belongs to the Special Issue Recent Advances in Data Mining and Their Applications)
Show Figures

Figure 1

21 pages, 4796 KiB  
Article
Soil Temperature Estimation with Meteorological Parameters by Using Tree-Based Hybrid Data Mining Models
by Mohammad Taghi Sattari, Anca Avram, Halit Apaydin and Oliviu Matei
Mathematics 2020, 8(9), 1407; https://0-doi-org.brum.beds.ac.uk/10.3390/math8091407 - 21 Aug 2020
Cited by 15 | Viewed by 3431
Abstract
The temperature of the soil at different depths is one of the most important factors used in different disciplines, such as hydrology, soil science, civil engineering, construction, geotechnology, ecology, meteorology, agriculture, and environmental studies. In addition to physical and spatial variables, meteorological elements [...] Read more.
The temperature of the soil at different depths is one of the most important factors used in different disciplines, such as hydrology, soil science, civil engineering, construction, geotechnology, ecology, meteorology, agriculture, and environmental studies. In addition to physical and spatial variables, meteorological elements are also effective in changing soil temperatures at different depths. The use of machine-learning models is increasing day by day in many complex and nonlinear branches of science. These data-driven models seek solutions to complex and nonlinear problems using data observed in the past. In this research, decision tree (DT), gradient boosted trees (GBT), and hybrid DT–GBT models were used to estimate soil temperature. The soil temperatures at 5, 10, and 20 cm depths were estimated using the daily minimum, maximum, and mean temperature; sunshine intensity and duration, and precipitation data measured between 1993 and 2018 at Divrigi station in Sivas province in Turkey. To predict the soil temperature at different depths, the time windowing technique was used on the input data. According to the results, hybrid DT–GBT, GBT, and DT methods estimated the soil temperature at 5 cm depth the most successfully, respectively. However, the best estimate was obtained with the DT model at soil depths of 10 and 20 cm. According to the results of the research, the accuracy rate of the models has also increased with increasing soil depth. In the prediction of soil temperature, sunshine duration and air temperature were determined as the most important factors and precipitation was the most insignificant meteorological variable. According to the evaluation criteria, such as Nash-Sutcliffe coefficient, R, MAE, RMSE, and Taylor diagrams used, it is recommended that all three (DT, GBT, and hybrid DT–GBT) data-based models can be used for predicting soil temperature. Full article
(This article belongs to the Special Issue Recent Advances in Data Mining and Their Applications)
Show Figures

Figure 1

Back to TopTop