Advances in Big Data and Machine Learning

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 March 2023) | Viewed by 23410

Special Issue Editor


E-Mail Website
Guest Editor
Department of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Taipei 10672, Taiwan
Interests: deep learning and big data; biometric recognition; information security; cloud and fault computing; multimedia applications; medical applications
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Deep learning is a branch of machine learning. It uses linear or non-linear transforms in multiple processing layers to automatically extract features that are sufficient to represent the characteristics of the data. In traditional machine learning, features are usually analyzed and researched by manpower to produce useful and good features. Compared with the traditional method, deep learning has the ability to automatically extract features. Because of the powerful ability of the automatic feature extraction, the applications of deep learning that machine learning has not been able to break through in the past are now possible. With the development of the Internet, more and more digital data are generated in the social network. How to extract useful knowledge from big data becomes more important. There are many methods of data mining. The use of deep learning technology in data mining is also one of the mainstream. Any research papers on big data and deep learning are welcome for submission.

Prof. Shi-Jinn Horng
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • big data
  • deep learning
  • feature extraction
  • data mining

Published Papers (8 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Other

0 pages, 2861 KiB  
Article
RETRACTED: Continual Learning Approach for Continuous Data Stream Analysis in Dynamic Environments
by K. Prasanna, Mudassir Khan, Saeed M. Alshahrani, Ajmeera Kiran, P. Phanindra Kumar Reddy, Mofadal Alymani and J. Chinna Babu
Appl. Sci. 2023, 13(14), 8004; https://0-doi-org.brum.beds.ac.uk/10.3390/app13148004 - 08 Jul 2023
Cited by 2 | Viewed by 2239 | Retraction
Abstract
Continuous data stream analysis primarily focuses on the unanticipated changes in the transmission of data distribution over time. Conceptual change is defined as the signal distribution changes over the transmission of continuous data streams. A drift detection scenario is set forth to develop [...] Read more.
Continuous data stream analysis primarily focuses on the unanticipated changes in the transmission of data distribution over time. Conceptual change is defined as the signal distribution changes over the transmission of continuous data streams. A drift detection scenario is set forth to develop methods and strategies for detecting, interpreting, and adapting to conceptual changes over data streams. Machine learning approaches can produce poor learning outcomes in the conceptual change environment if the sudden change is not addressed. Furthermore, due to developments in concept drift, learning methodologies have been significantly systematic in recent years. The research introduces a novel approach using the fully connected committee machine (FCM) and different activation functions to address conceptual changes in continuous data streams. It explores scenarios of continual learning and investigates the effects of over-learning and weight decay on concept drift. The findings demonstrate the effectiveness of the FCM framework and provide insights into improving machine learning approaches for continuous data stream analysis. We used a layered neural network framework to experiment with different scenarios of continual learning on continuous data streams in the presence of change in the data distribution using a fully connected committee machine (FCM). In this research, we conduct experiments in various scenarios using a layered neural network framework, specifically the fully connected committee machine (FCM), to address conceptual changes in continuous data streams for continual learning under a conceptual change in the data distribution. Sigmoidal and ReLU (Rectified Linear Unit) activation functions are considered for learning regression in layered neural networks. When the layered framework is trained from the input data stream, the regression scheme changes consciously in all scenarios. A fully connected committee machine (FCM) is trained to perform the tasks described in continual learning with M hidden units on dynamically generated inputs. In this method, we run Monte Carlo simulations with the same number of units on both sides, K and M, to define the advancement of intersections between several hidden units and the calculation of generalization error. This is applied to over-learnability as a method of over-forgetting, integrating weight decay, and examining its effects when a concept drift is presented. Full article
(This article belongs to the Special Issue Advances in Big Data and Machine Learning)
Show Figures

Figure 1

16 pages, 2046 KiB  
Article
Feature Drift in Fake News Detection: An Interpretable Analysis
by Chenbo Fu, Xingyu Pan, Xuejiao Liang, Shanqing Yu, Xiaoke Xu and Yong Min
Appl. Sci. 2023, 13(1), 592; https://0-doi-org.brum.beds.ac.uk/10.3390/app13010592 - 01 Jan 2023
Cited by 1 | Viewed by 1653
Abstract
In recent years, fake news detection and its characteristics have attracted a number of researchers. However, most detection algorithms are driven by data rather than theories, which causes the existing approaches to only perform well on specific datasets. To the extreme, several features [...] Read more.
In recent years, fake news detection and its characteristics have attracted a number of researchers. However, most detection algorithms are driven by data rather than theories, which causes the existing approaches to only perform well on specific datasets. To the extreme, several features only perform well on specific datasets. In this study, we first define the feature drift in fake news detection methods, and then demonstrate the existence of feature drift and use interpretable models (i.e., Shapley Additive Explanations and Partial Dependency Plots) to verify the feature drift. Furthermore, by controlling the distribution of tweets’ creation times, a novel sampling method is proposed to explain the reason for feature drift. Finally, the Anchors method is used in this paper as a supplementary interpretation to exhibit the potential characteristics of feature drift further. Our work provides deep insights into the temporal patterns of fake news detection, proving that the model’s performance is also highly related to the distribution of datasets. Full article
(This article belongs to the Special Issue Advances in Big Data and Machine Learning)
Show Figures

Figure 1

16 pages, 1108 KiB  
Article
A Novel Stream Mining Approach as Stream-Cluster Feature Tree Algorithm: A Case Study in Turkish Job Postings
by Yunus Doğan, Feriştah Dalkılıç, Alp Kut, Kemal Can Kara and Uygar Takazoğlu
Appl. Sci. 2022, 12(15), 7893; https://0-doi-org.brum.beds.ac.uk/10.3390/app12157893 - 06 Aug 2022
Viewed by 1275
Abstract
Large numbers of job postings with complex content can be found on the Internet at present. Therefore, analysis through natural language processing and machine learning techniques plays an important role in the evaluation of job postings. In this study, we propose a novel [...] Read more.
Large numbers of job postings with complex content can be found on the Internet at present. Therefore, analysis through natural language processing and machine learning techniques plays an important role in the evaluation of job postings. In this study, we propose a novel data structure and a novel algorithm whose aims are effective storage and analysis in data warehouses of big and complex data such as job postings. State-of-the-art approaches in the literature, such as database queries, semantic networking, and clustering algorithms, were tested in this study to compare their results with those of the proposed approach using 100,000 Kariyer.net job postings in Turkish, which can be considered to have an agglutinative language with a grammatical structure differing from that of other languages. The algorithm proposed in this study also utilizes stream logic. Considering the growth potential of job postings, this study aimed to recommend new sub-qualifications to advertisers for new job postings through the analysis of similar postings stored in the system. Finally, complexity and accuracy analyses demonstrate that the proposed approach, using the Cluster Feature approach, can obtain state-of-the-art results on Turkish job posting texts. Full article
(This article belongs to the Special Issue Advances in Big Data and Machine Learning)
Show Figures

Figure 1

13 pages, 370 KiB  
Article
Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences
by Jiwon Hong, Dongho Jeong and Sang-Wook Kim
Appl. Sci. 2022, 12(8), 4088; https://0-doi-org.brum.beds.ac.uk/10.3390/app12084088 - 18 Apr 2022
Cited by 1 | Viewed by 2283
Abstract
Cyberattacks widely occur by using malicious documents. A malicious document is an electronic document containing malicious codes along with some plain-text data that is human-readable. In this paper, we propose a novel framework that takes advantage of such plaintext data to determine whether [...] Read more.
Cyberattacks widely occur by using malicious documents. A malicious document is an electronic document containing malicious codes along with some plain-text data that is human-readable. In this paper, we propose a novel framework that takes advantage of such plaintext data to determine whether a given document is malicious. We extracted plaintext features from the corpus of electronic documents and utilized them to train a classification model for detecting malicious documents. Our extensive experimental results with different combinations of three well-known vectorization strategies and three popular classification methods on five types of electronic documents demonstrate that our framework provides high prediction accuracy in detecting malicious documents. Full article
(This article belongs to the Special Issue Advances in Big Data and Machine Learning)
Show Figures

Figure 1

17 pages, 5148 KiB  
Article
Building Unmanned Store Identification Systems Using YOLOv4 and Siamese Network
by Shi-Jinn Horng and Pin-Siang Huang
Appl. Sci. 2022, 12(8), 3826; https://0-doi-org.brum.beds.ac.uk/10.3390/app12083826 - 10 Apr 2022
Cited by 3 | Viewed by 2034
Abstract
Labor is the most expensive in retail stores. In order to increase the profit of retail stores, unmanned stores could be a solution for reducing labor cost. Deep learning is a good way for recognition, classification, and so on; in particular, it has [...] Read more.
Labor is the most expensive in retail stores. In order to increase the profit of retail stores, unmanned stores could be a solution for reducing labor cost. Deep learning is a good way for recognition, classification, and so on; in particular, it has high accuracy and can be implemented in real time. Based on deep learning, in this paper, we use multiple deep learning models to solve the problems often encountered in unmanned stores. Instead of using multiple different sensors, only five cameras are used as sensors to build a high-accuracy, low-cost unmanned store; for the full use of space, we then propose a method for calculating stacked goods, so that the space can be effectively used. For checkout, without a checking counter, we use a Siamese network combined with the deep learning model to directly identify products instantly purchased. As for protecting the store from theft, a new architecture was proposed, which can detect possible theft from any angle of the store and prevent unnecessary financial losses in unmanned stores. As all the customers’ buying records are identified and recorded in the server, it can be used to identify the popularity of the product. In particular, it can reduce the stock of unpopular products and reduce inventory. Full article
(This article belongs to the Special Issue Advances in Big Data and Machine Learning)
Show Figures

Figure 1

22 pages, 25305 KiB  
Article
Automatic Hate Speech Detection in English-Odia Code Mixed Social Media Data Using Machine Learning Techniques
by Sudhir Kumar Mohapatra, Srinivas Prasad, Dwiti Krishna Bebarta, Tapan Kumar Das, Kathiravan Srinivasan and Yuh-Chung Hu
Appl. Sci. 2021, 11(18), 8575; https://0-doi-org.brum.beds.ac.uk/10.3390/app11188575 - 15 Sep 2021
Cited by 14 | Viewed by 5164
Abstract
Hate speech on social media may spread quickly through online users and subsequently, may even escalate into local vile violence and heinous crimes. This paper proposes a hate speech detection model by means of machine learning and text mining feature extraction techniques. In [...] Read more.
Hate speech on social media may spread quickly through online users and subsequently, may even escalate into local vile violence and heinous crimes. This paper proposes a hate speech detection model by means of machine learning and text mining feature extraction techniques. In this study, the authors collected the hate speech of English-Odia code mixed data from a Facebook public page and manually organized them into three classes. In order to build binary and ternary datasets, the data are further converted into binary classes. The modeling of hate speech employs the combination of a machine learning algorithm and features extraction. Support vector machine (SVM), naïve Bayes (NB) and random forest (RF) models were trained using the whole dataset, with the extracted feature based on word unigram, bigram, trigram, combined n-grams, term frequency-inverse document frequency (TF-IDF), combined n-grams weighted by TF-IDF and word2vec for both the datasets. Using the two datasets, we developed two kinds of models with each feature—binary models and ternary models. The models based on SVM with word2vec achieved better performance than the NB and RF models for both the binary and ternary categories. The result reveals that the ternary models achieved less confusion between hate and non-hate speech than the binary models. Full article
(This article belongs to the Special Issue Advances in Big Data and Machine Learning)
Show Figures

Figure 1

13 pages, 3499 KiB  
Article
Classification of Apple Disease Based on Non-Linear Deep Features
by Hamail Ayaz, Erick Rodríguez-Esparza, Muhammad Ahmad, Diego Oliva, Marco Pérez-Cisneros and Ram Sarkar
Appl. Sci. 2021, 11(14), 6422; https://0-doi-org.brum.beds.ac.uk/10.3390/app11146422 - 12 Jul 2021
Cited by 24 | Viewed by 3320
Abstract
Diseases in apple orchards (rot, scab, and blotch) worldwide cause a substantial loss in the agricultural industry. Traditional hand picking methods are subjective to human efforts. Conventional machine learning methods for apple disease classification depend on hand-crafted features that are not robust and [...] Read more.
Diseases in apple orchards (rot, scab, and blotch) worldwide cause a substantial loss in the agricultural industry. Traditional hand picking methods are subjective to human efforts. Conventional machine learning methods for apple disease classification depend on hand-crafted features that are not robust and are complex. Advanced artificial methods such as Convolutional Neural Networks (CNN’s) have become a promising way for achieving higher accuracy although they need a high volume of samples. This work investigates different Deep CNN (DCNN) applications to apple disease classification using deep generative images to obtain higher accuracy. In order to achieve this, our work progressively modifies a baseline model by using an end-to-end trained DCNN model that has fewer parameters, better recognition accuracy than existing models (i.e., ResNet, SqeezeNet, and MiniVGGNet). We have performed a comparative study with state-of-the-art CNN as well as conventional methods proposed in the literature, and comparative results confirm the superiority of our proposed model. Full article
(This article belongs to the Special Issue Advances in Big Data and Machine Learning)
Show Figures

Figure 1

Other

Jump to: Research

2 pages, 145 KiB  
Retraction
RETRACTED: Prasanna et al. Continual Learning Approach for Continuous Data Stream Analysis in Dynamic Environments. Appl. Sci. 2023, 13, 8004
by K. Prasanna, Mudassir Khan, Saeed M. Alshahrani, Ajmeera Kiran, P. Phanindra Kumar Reddy, Mofadal Alymani and J. Chinna Babu
Appl. Sci. 2024, 14(2), 476; https://0-doi-org.brum.beds.ac.uk/10.3390/app14020476 - 05 Jan 2024
Viewed by 3649
Abstract
The journal retracts the article “Continual Learning Approach for Continuous Data Stream Analysis in Dynamic Environments” [...] Full article
(This article belongs to the Special Issue Advances in Big Data and Machine Learning)
Back to TopTop