Machine Learning and Data Mining: Techniques and Tasks

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Mathematics and Computer Science".

Deadline for manuscript submissions: closed (31 December 2022) | Viewed by 15968

Special Issue Editor


E-Mail Website
Guest Editor
School of Computer Science, China University of Geosciences, Wuhan 430074, China
Interests: machine learning and data mining; bayesian learning; nearest neighbor learning; decision tree learning; cost sensitive learning; crowdsourcing learning; deep learning; classification; sorting; class probability estimation; clustering; regression; distance measurement; feature selection
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

In recent years, advances in data gathering, storage, and distribution have created a great need for computational theories and techniques to aid in intelligent data analysis. Machine learning and data mining is a rapidly growing area of research and application that builds on theories and techniques from many fields.

The aim of this Special Issue is to provide a forum for researchers to present their original contributions describing their experience and approaches to a wide range of machine learning techniques applied to a variety of data mining tasks, including but not limited to:

  • Machine learning techniques: Bayesian learning, nearest neighbor learning, decision tree learning, cost-sensitive learning, crowdsourcing learning, deep learning, etc.
  • Data mining tasks: Classification, ranking, class probability estimation, clustering, regression, distance measure, feature selection, etc.

Submissions should be original and unpublished. Extended versions of conference publications will be considered if they contain at least 50% new content.

Prof. Dr. Liangxiao Jiang
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine learning
  • data mining
  • pattern recognition
  • artificial intelligence
  • knowledge-based systems
  • intelligent data analysis

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 6580 KiB  
Article
Discriminative Semantic Feature Pyramid Network with Guided Anchoring for Logo Detection
by Baisong Zhang, Sujuan Hou, Awudu Karim, Jing Wang, Weikuan Jia and Yuanjie Zheng
Mathematics 2023, 11(2), 481; https://0-doi-org.brum.beds.ac.uk/10.3390/math11020481 - 16 Jan 2023
Cited by 4 | Viewed by 1252
Abstract
Logo detection is a technology that identifies logos in images and returns their locations. With logo detection technology, brands can check how often their logos are displayed on social media platforms and elsewhere online and how they appear. It has received a lot [...] Read more.
Logo detection is a technology that identifies logos in images and returns their locations. With logo detection technology, brands can check how often their logos are displayed on social media platforms and elsewhere online and how they appear. It has received a lot of attention for its wide applications across different sectors, such as brand identity protection, product brand management, and logo duration monitoring. Particularly, logo detection technology can offer various benefits for companies to help brands measure their logo coverage, track their brand perception, secure their brand value, increase the effectiveness of their marketing campaigns and build brand awareness more effectively. However, compared with the general object detection, logo detection is more challenging due to the existence of both small logo objects and large aspect ratio logo objects. In this paper, we propose a novel approach, named Discriminative Semantic Feature Pyramid Network with Guided Anchoring (DSFP-GA), which can address these challenges via aggregating the semantic information and generating different aspect ratio anchor boxes. More specifically, our approach mainly consists of two components, namely Discriminative Semantic Feature Pyramid (DSFP) and Guided Anchoring (GA). The former is proposed to fuse semantic features into low-level feature maps to obtain discriminative representation of small logo objects, while the latter is further integrated into DSFP to generate large aspect ratio anchor boxes for detecting large aspect ratio logo objects. Extensive experimental results on four benchmarks demonstrate the effectiveness of the proposed DSFP-GA. Moreover, we further conduct visual analysis and ablation studies to illustrate the strength of the proposed DSFP-GA when detecting both small logo objects and large aspect logo objects. Full article
(This article belongs to the Special Issue Machine Learning and Data Mining: Techniques and Tasks)
Show Figures

Figure 1

25 pages, 1144 KiB  
Article
Reinforcement Learning-Based Delay-Aware Path Exploration of Parallelized Service Function Chains
by Zhongwei Huang, Dagang Li, Chenhao Wu and Hua Lu
Mathematics 2022, 10(24), 4698; https://0-doi-org.brum.beds.ac.uk/10.3390/math10244698 - 11 Dec 2022
Cited by 2 | Viewed by 1382
Abstract
The parallel processing of the service function chain (SFC) is expected to provide better low-delay service delivery, because it breaks through the bottleneck of traditional serial processing mode in which service delay increases linearly with the SFC length. However, the provision of parallelized [...] Read more.
The parallel processing of the service function chain (SFC) is expected to provide better low-delay service delivery, because it breaks through the bottleneck of traditional serial processing mode in which service delay increases linearly with the SFC length. However, the provision of parallelized SFC (PSFC) is much more difficult due to the unique construction of PSFCs, inevitable parallelization overhead, and delay balancing requirement of PSFC branches; therefore, existing mechanisms for serial SFC cannot be directly applied to PSFC. After a comprehensive review of recent related work, we find that traffic scheduling mechanisms for PSFCs is still lacking. In this paper, a delay-aware traffic scheduling mechanism (DASM) for PSFCs is proposed. DASM first transforms PSFC into several serial SFCs by releasing the upstream VNF constraints so as to handle them independently while keeping their parallel relations. Secondly, DASM realizes delay-aware PSFC traffic scheduling based on the reinforcement learning (RL) method. To the best knowledge of the authors, this is the first attempt to address the PSFC traffic scheduling problem by transforming them into independent serial SFCs. Simulation results show that the proposed DASM outperforms the advanced PSFCs scheduling strategies in terms of delay balance and throughput. Full article
(This article belongs to the Special Issue Machine Learning and Data Mining: Techniques and Tasks)
Show Figures

Figure 1

20 pages, 3855 KiB  
Article
A Method Combining Multi-Feature Fusion and Optimized Deep Belief Network for EMG-Based Human Gait Classification
by Jie He, Farong Gao, Jian Wang, Qiuxuan Wu, Qizhong Zhang and Weijie Lin
Mathematics 2022, 10(22), 4387; https://0-doi-org.brum.beds.ac.uk/10.3390/math10224387 - 21 Nov 2022
Cited by 6 | Viewed by 1218
Abstract
In this paper, a gait classification method based on the deep belief network (DBN) optimized by the sparrow search algorithm (SSA) is proposed. The multiple features obtained based on surface electromyography (sEMG) are fused. These functions are used to train the model. First, [...] Read more.
In this paper, a gait classification method based on the deep belief network (DBN) optimized by the sparrow search algorithm (SSA) is proposed. The multiple features obtained based on surface electromyography (sEMG) are fused. These functions are used to train the model. First, the sample features, such as the time domain and frequency domain features of the denoised sEMG are extracted and then the fused features are obtained by feature combination. Second, the SSA is utilized to optimize the architecture of DBN and its weight parameters. Finally, the optimized DBN classifier is trained and used for gait recognition. The classification results are obtained by varying different factors and the recognition rate is compared with the previous classification algorithms. The results show that the recognition rate of SSA-DBN is higher than other classifiers, and the recognition accuracy is improved by about 2% compared with the unoptimized DBN. This indicates that for the application in gait recognition, SSA can optimize the network performance of DBN, thus improving the classification accuracy. Full article
(This article belongs to the Special Issue Machine Learning and Data Mining: Techniques and Tasks)
Show Figures

Figure 1

18 pages, 2218 KiB  
Article
Evaluation of Educational Interventions Based on Average Treatment Effect: A Case Study
by Jingyu Liang and Jie Liu
Mathematics 2022, 10(22), 4333; https://0-doi-org.brum.beds.ac.uk/10.3390/math10224333 - 18 Nov 2022
Cited by 1 | Viewed by 1057
Abstract
Relative to randomized controlled trials (RCTs) with privacy and ethical concerns, observational studies are becoming dominant in education research. In an observational study, it is necessary and important to correctly evaluate the effects of different interventions (i.e., covariates) on student performance with observational [...] Read more.
Relative to randomized controlled trials (RCTs) with privacy and ethical concerns, observational studies are becoming dominant in education research. In an observational study, it is necessary and important to correctly evaluate the effects of different interventions (i.e., covariates) on student performance with observational data. However, these effects’ evaluation results are probably derived from biased estimations because the distributions of “control” and “treatment” student groups can hardly be equivalent to those in RCTs. Moreover, the collected covariates on possible educational interventions (i.e., treatments) may be confounded with student characteristics that are not included in the data. In this work, an estimation method based on the Rubin causal model (RCM) is proposed to calculate the average treatment effect (ATE) of different educational interventions. Specifically, with the selected covariates, the propensity score (i.e., the probability of treatment exposure conditional on covariates) is considered as a criterion to stratify the observational data into sub-classes with balanced covariate distributions between the control and treatment groups. Combined with Neyman’s estimation, the ATE of each sample is then obtained. We verify the effectiveness of this method with real observational data on student performance and its covariates. Full article
(This article belongs to the Special Issue Machine Learning and Data Mining: Techniques and Tasks)
Show Figures

Figure 1

15 pages, 1282 KiB  
Article
A Malicious Webpage Detection Method Based on Graph Convolutional Network
by Yilin Wang, Siqing Xue and Jun Song
Mathematics 2022, 10(19), 3496; https://0-doi-org.brum.beds.ac.uk/10.3390/math10193496 - 25 Sep 2022
Viewed by 1336
Abstract
In recent years, with the rapid development of the Internet and information technology, video websites, shopping websites, and other portals have grown rapidly. However, malicious webpages can disguise themselves as benign websites and steal users’ private information, which seriously threatens network security. Current [...] Read more.
In recent years, with the rapid development of the Internet and information technology, video websites, shopping websites, and other portals have grown rapidly. However, malicious webpages can disguise themselves as benign websites and steal users’ private information, which seriously threatens network security. Current detection methods for malicious webpages do not fully utilize the syntactic and semantic information in the web source code. In this paper, we propose a GCN-based malicious webpage detection method (GMWD), which constructs a text graph to describe and then a GCN model to learn the syntactic and semantic correlations within and between webpage source codes. We replace word nodes in the text graph with phrase nodes to better maintain the syntactic and semantic integrity of the webpage source code. In addition, we use the URL links appearing in the source code as auxiliary detection information to further improve the detection accuracy. The experiments showed that the proposed method can achieve 99.86% accuracy and a 0.137% false negative rate, achieving a better performance than other related malicious webpage detection methods. Full article
(This article belongs to the Special Issue Machine Learning and Data Mining: Techniques and Tasks)
Show Figures

Figure 1

18 pages, 5835 KiB  
Article
Knowledge Trajectories on Public Crisis Management Research from Massive Literature Text Using Topic-Clustered Evolution Extraction
by Feng Wu, Wanqiang Xu, Chaoran Lin and Yanwei Zhang
Mathematics 2022, 10(12), 1966; https://0-doi-org.brum.beds.ac.uk/10.3390/math10121966 - 07 Jun 2022
Cited by 2 | Viewed by 1558
Abstract
Current research has ignored the hiddenness and the stochasticity of the evolution of public crisis management research, making the knowledge trajectories still unclear. This paper introduces a combined approach, LDA-HMM, to mine the hidden topics, present the evolutionary trajectories of the topics, and [...] Read more.
Current research has ignored the hiddenness and the stochasticity of the evolution of public crisis management research, making the knowledge trajectories still unclear. This paper introduces a combined approach, LDA-HMM, to mine the hidden topics, present the evolutionary trajectories of the topics, and predict the future trends in the coming years to fill the research gaps. We reviewed 8543 articles in WOS from 1997 to 2021, extracted 39 hidden topics from the text using the LDA; 33 remained by manual labeling. The development of the topics over the years verifies that the topics are co-evolving with the public crisis events. The confusion and transition features indicate that most topics are confused or transferred to the others. The transition network and the direction of the topics show that six main transfer paths exist, and in the evolution process, the topics have become more focused. By training the HMM, we predict the trends in the next five years; the results show that the heat of the topic that focuses on traditional crisis issues will decrease while the focus on non-traditional issues will increase. We take the average error to test this model’s prediction effect by comparing it with the other approaches, concluding that it is better than the others. This study has practical implications for preventing crisis events, optimizing related policies, and grasping key research areas in the future. Full article
(This article belongs to the Special Issue Machine Learning and Data Mining: Techniques and Tasks)
Show Figures

Figure 1

21 pages, 529 KiB  
Article
Using Locality-Sensitive Hashing for SVM Classification of Large Data Sets
by Maria D. Gonzalez-Lima and Carenne C. Ludeña
Mathematics 2022, 10(11), 1812; https://0-doi-org.brum.beds.ac.uk/10.3390/math10111812 - 25 May 2022
Cited by 6 | Viewed by 1647
Abstract
We propose a novel method using Locality-Sensitive Hashing (LSH) for solving the optimization problem that arises in the training stage of support vector machines for large data sets, possibly in high dimensions. LSH was introduced as an efficient way to look for neighbors [...] Read more.
We propose a novel method using Locality-Sensitive Hashing (LSH) for solving the optimization problem that arises in the training stage of support vector machines for large data sets, possibly in high dimensions. LSH was introduced as an efficient way to look for neighbors in high dimensional spaces. Random projections-based LSH functions create bins so that when great probability points belonging to the same bin are close, the points that are far will not be in the same bin. Based on these bins, it is not necessary to consider the whole original set but representatives in each one of them, thus reducing the effective size of the data set. A key of our proposal is that we work with the feature space and use only the projections to search for closeness in this space. Moreover, instead of choosing the projection directions at random, we sample a small subset and solve the associated SVM problem. Projections in this direction allows for a more precise sample in many cases and an approximation of the solution of the large problem is found in a fraction of the running time with small degradation of the classification error. We present two algorithms, theoretical support, and numerical experiments showing their performances on real life problems taken from the LIBSVM data base. Full article
(This article belongs to the Special Issue Machine Learning and Data Mining: Techniques and Tasks)
Show Figures

Figure 1

15 pages, 391 KiB  
Article
A Novel Hybrid Approach: Instance Weighted Hidden Naive Bayes
by Liangjun Yu, Shengfeng Gan, Yu Chen and Dechun Luo
Mathematics 2021, 9(22), 2982; https://0-doi-org.brum.beds.ac.uk/10.3390/math9222982 - 22 Nov 2021
Cited by 4 | Viewed by 1514
Abstract
Naive Bayes (NB) is easy to construct but surprisingly effective, and it is one of the top ten classification algorithms in data mining. The conditional independence assumption of NB ignores the dependency between attributes, so its probability estimates are often suboptimal. Hidden naive [...] Read more.
Naive Bayes (NB) is easy to construct but surprisingly effective, and it is one of the top ten classification algorithms in data mining. The conditional independence assumption of NB ignores the dependency between attributes, so its probability estimates are often suboptimal. Hidden naive Bayes (HNB) adds a hidden parent to each attribute, which can reflect dependencies from all the other attributes. Compared with other Bayesian network algorithms, it offers significant improvements in classification performance and avoids structure learning. However, the assumption that HNB regards each instance equivalent in terms of probability estimation is not always true in real-world applications. In order to reflect different influences of different instances in HNB, the HNB model is modified into the improved HNB model. The novel hybrid approach called instance weighted hidden naive Bayes (IWHNB) is proposed in this paper. IWHNB combines instance weighting with the improved HNB model into one uniform framework. Instance weights are incorporated into the improved HNB model to calculate probability estimates in IWHNB. Extensive experimental results show that IWHNB obtains significant improvements in classification performance compared with NB, HNB and other state-of-the-art competitors. Meanwhile, IWHNB maintains the low time complexity that characterizes HNB. Full article
(This article belongs to the Special Issue Machine Learning and Data Mining: Techniques and Tasks)
Show Figures

Figure 1

19 pages, 332 KiB  
Article
Attribute Selecting in Tree-Augmented Naive Bayes by Cross Validation Risk Minimization
by Shenglei Chen, Zhonghui Zhang and Linyuan Liu
Mathematics 2021, 9(20), 2564; https://0-doi-org.brum.beds.ac.uk/10.3390/math9202564 - 13 Oct 2021
Cited by 7 | Viewed by 1514
Abstract
As an important improvement to naive Bayes, Tree-Augmented Naive Bayes (TAN) exhibits excellent classification performance and efficiency since it allows that every attribute depends on at most one other attribute in addition to the class variable. However, its performance might be lowered as [...] Read more.
As an important improvement to naive Bayes, Tree-Augmented Naive Bayes (TAN) exhibits excellent classification performance and efficiency since it allows that every attribute depends on at most one other attribute in addition to the class variable. However, its performance might be lowered as some attributes might be redundant. In this paper, we propose an attribute Selective Tree-Augmented Naive Bayes (STAN) algorithm which builds a sequence of approximate models each involving only the top certain attributes and searches the model to minimize the cross validation risk. Five different approaches to ranking the attributes have been explored. As the models can be evaluated simultaneously in one pass learning through the data, it is efficient and can avoid local optima in the model space. The extensive experiments on 70 UCI data sets demonstrated that STAN achieves superior performance while maintaining the efficiency and simplicity. Full article
(This article belongs to the Special Issue Machine Learning and Data Mining: Techniques and Tasks)
Show Figures

Figure 1

14 pages, 298 KiB  
Article
Adapting Hidden Naive Bayes for Text Classification
by Shengfeng Gan, Shiqi Shao, Long Chen, Liangjun Yu and Liangxiao Jiang
Mathematics 2021, 9(19), 2378; https://doi.org/10.3390/math9192378 - 25 Sep 2021
Cited by 8 | Viewed by 2039
Abstract
Due to its simplicity, efficiency, and effectiveness, multinomial naive Bayes (MNB) has been widely used for text classification. As in naive Bayes (NB), its assumption of the conditional independence of features is often violated and, therefore, reduces its classification performance. Of the numerous [...] Read more.
Due to its simplicity, efficiency, and effectiveness, multinomial naive Bayes (MNB) has been widely used for text classification. As in naive Bayes (NB), its assumption of the conditional independence of features is often violated and, therefore, reduces its classification performance. Of the numerous approaches to alleviating its assumption of the conditional independence of features, structure extension has attracted less attention from researchers. To the best of our knowledge, only structure-extended MNB (SEMNB) has been proposed so far. SEMNB averages all weighted super-parent one-dependence multinomial estimators; therefore, it is an ensemble learning model. In this paper, we propose a single model called hidden MNB (HMNB) by adapting the well-known hidden NB (HNB). HMNB creates a hidden parent for each feature, which synthesizes all the other qualified features’ influences. For HMNB to learn, we propose a simple but effective learning algorithm without incurring a high-computational-complexity structure-learning process. Our improved idea can also be used to improve complement NB (CNB) and the one-versus-all-but-one model (OVA), and the resulting models are simply denoted as HCNB and HOVA, respectively. The extensive experiments on eleven benchmark text classification datasets validate the effectiveness of HMNB, HCNB, and HOVA. Full article
(This article belongs to the Special Issue Machine Learning and Data Mining: Techniques and Tasks)
Back to TopTop