Research

Jump to: Other

0 pages, 2861 KiB

Open AccessArticle

RETRACTED: Continual Learning Approach for Continuous Data Stream Analysis in Dynamic Environments

by K. Prasanna, Mudassir Khan, Saeed M. Alshahrani, Ajmeera Kiran, P. Phanindra Kumar Reddy, Mofadal Alymani and J. Chinna Babu

Appl. Sci. 2023, 13(14), 8004; https://0-doi-org.brum.beds.ac.uk/10.3390/app13148004 - 08 Jul 2023

Cited by 2 | Viewed by 2239 | Retraction

Abstract

Continuous data stream analysis primarily focuses on the unanticipated changes in the transmission of data distribution over time. Conceptual change is defined as the signal distribution changes over the transmission of continuous data streams. A drift detection scenario is set forth to develop [...] Read more.

Continuous data stream analysis primarily focuses on the unanticipated changes in the transmission of data distribution over time. Conceptual change is defined as the signal distribution changes over the transmission of continuous data streams. A drift detection scenario is set forth to develop methods and strategies for detecting, interpreting, and adapting to conceptual changes over data streams. Machine learning approaches can produce poor learning outcomes in the conceptual change environment if the sudden change is not addressed. Furthermore, due to developments in concept drift, learning methodologies have been significantly systematic in recent years. The research introduces a novel approach using the fully connected committee machine (FCM) and different activation functions to address conceptual changes in continuous data streams. It explores scenarios of continual learning and investigates the effects of over-learning and weight decay on concept drift. The findings demonstrate the effectiveness of the FCM framework and provide insights into improving machine learning approaches for continuous data stream analysis. We used a layered neural network framework to experiment with different scenarios of continual learning on continuous data streams in the presence of change in the data distribution using a fully connected committee machine (FCM). In this research, we conduct experiments in various scenarios using a layered neural network framework, specifically the fully connected committee machine (FCM), to address conceptual changes in continuous data streams for continual learning under a conceptual change in the data distribution. Sigmoidal and ReLU (Rectified Linear Unit) activation functions are considered for learning regression in layered neural networks. When the layered framework is trained from the input data stream, the regression scheme changes consciously in all scenarios. A fully connected committee machine (FCM) is trained to perform the tasks described in continual learning with M hidden units on dynamically generated inputs. In this method, we run Monte Carlo simulations with the same number of units on both sides, K and M, to define the advancement of intersections between several hidden units and the calculation of generalization error. This is applied to over-learnability as a method of over-forgetting, integrating weight decay, and examining its effects when a concept drift is presented. Full article

(This article belongs to the Special Issue Advances in Big Data and Machine Learning)

► Show Figures

Figure 1

16 pages, 2046 KiB

Open AccessArticle

Feature Drift in Fake News Detection: An Interpretable Analysis

by Chenbo Fu, Xingyu Pan, Xuejiao Liang, Shanqing Yu, Xiaoke Xu and Yong Min

Appl. Sci. 2023, 13(1), 592; https://0-doi-org.brum.beds.ac.uk/10.3390/app13010592 - 01 Jan 2023

Cited by 1 | Viewed by 1653

Abstract

In recent years, fake news detection and its characteristics have attracted a number of researchers. However, most detection algorithms are driven by data rather than theories, which causes the existing approaches to only perform well on specific datasets. To the extreme, several features [...] Read more.

In recent years, fake news detection and its characteristics have attracted a number of researchers. However, most detection algorithms are driven by data rather than theories, which causes the existing approaches to only perform well on specific datasets. To the extreme, several features only perform well on specific datasets. In this study, we first define the feature drift in fake news detection methods, and then demonstrate the existence of feature drift and use interpretable models (i.e., Shapley Additive Explanations and Partial Dependency Plots) to verify the feature drift. Furthermore, by controlling the distribution of tweets’ creation times, a novel sampling method is proposed to explain the reason for feature drift. Finally, the Anchors method is used in this paper as a supplementary interpretation to exhibit the potential characteristics of feature drift further. Our work provides deep insights into the temporal patterns of fake news detection, proving that the model’s performance is also highly related to the distribution of datasets. Full article

(This article belongs to the Special Issue Advances in Big Data and Machine Learning)

► Show Figures

Figure 1

16 pages, 1108 KiB

Open AccessArticle

A Novel Stream Mining Approach as Stream-Cluster Feature Tree Algorithm: A Case Study in Turkish Job Postings

by Yunus Doğan, Feriştah Dalkılıç, Alp Kut, Kemal Can Kara and Uygar Takazoğlu

Appl. Sci. 2022, 12(15), 7893; https://0-doi-org.brum.beds.ac.uk/10.3390/app12157893 - 06 Aug 2022

Viewed by 1275

Abstract

Large numbers of job postings with complex content can be found on the Internet at present. Therefore, analysis through natural language processing and machine learning techniques plays an important role in the evaluation of job postings. In this study, we propose a novel [...] Read more.

Large numbers of job postings with complex content can be found on the Internet at present. Therefore, analysis through natural language processing and machine learning techniques plays an important role in the evaluation of job postings. In this study, we propose a novel data structure and a novel algorithm whose aims are effective storage and analysis in data warehouses of big and complex data such as job postings. State-of-the-art approaches in the literature, such as database queries, semantic networking, and clustering algorithms, were tested in this study to compare their results with those of the proposed approach using 100,000 Kariyer.net job postings in Turkish, which can be considered to have an agglutinative language with a grammatical structure differing from that of other languages. The algorithm proposed in this study also utilizes stream logic. Considering the growth potential of job postings, this study aimed to recommend new sub-qualifications to advertisers for new job postings through the analysis of similar postings stored in the system. Finally, complexity and accuracy analyses demonstrate that the proposed approach, using the Cluster Feature approach, can obtain state-of-the-art results on Turkish job posting texts. Full article

(This article belongs to the Special Issue Advances in Big Data and Machine Learning)

► Show Figures

Figure 1

13 pages, 370 KiB

Open AccessArticle

Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences

by Jiwon Hong, Dongho Jeong and Sang-Wook Kim

Appl. Sci. 2022, 12(8), 4088; https://0-doi-org.brum.beds.ac.uk/10.3390/app12084088 - 18 Apr 2022

Cited by 1 | Viewed by 2283

Abstract

Cyberattacks widely occur by using malicious documents. A malicious document is an electronic document containing malicious codes along with some plain-text data that is human-readable. In this paper, we propose a novel framework that takes advantage of such plaintext data to determine whether [...] Read more.

Cyberattacks widely occur by using malicious documents. A malicious document is an electronic document containing malicious codes along with some plain-text data that is human-readable. In this paper, we propose a novel framework that takes advantage of such plaintext data to determine whether a given document is malicious. We extracted plaintext features from the corpus of electronic documents and utilized them to train a classification model for detecting malicious documents. Our extensive experimental results with different combinations of three well-known vectorization strategies and three popular classification methods on five types of electronic documents demonstrate that our framework provides high prediction accuracy in detecting malicious documents. Full article

(This article belongs to the Special Issue Advances in Big Data and Machine Learning)

► Show Figures

Figure 1

17 pages, 5148 KiB

Open AccessArticle

Building Unmanned Store Identification Systems Using YOLOv4 and Siamese Network

by Shi-Jinn Horng and Pin-Siang Huang

Appl. Sci. 2022, 12(8), 3826; https://0-doi-org.brum.beds.ac.uk/10.3390/app12083826 - 10 Apr 2022

Cited by 3 | Viewed by 2034

Abstract

Labor is the most expensive in retail stores. In order to increase the profit of retail stores, unmanned stores could be a solution for reducing labor cost. Deep learning is a good way for recognition, classification, and so on; in particular, it has [...] Read more.

Labor is the most expensive in retail stores. In order to increase the profit of retail stores, unmanned stores could be a solution for reducing labor cost. Deep learning is a good way for recognition, classification, and so on; in particular, it has high accuracy and can be implemented in real time. Based on deep learning, in this paper, we use multiple deep learning models to solve the problems often encountered in unmanned stores. Instead of using multiple different sensors, only five cameras are used as sensors to build a high-accuracy, low-cost unmanned store; for the full use of space, we then propose a method for calculating stacked goods, so that the space can be effectively used. For checkout, without a checking counter, we use a Siamese network combined with the deep learning model to directly identify products instantly purchased. As for protecting the store from theft, a new architecture was proposed, which can detect possible theft from any angle of the store and prevent unnecessary financial losses in unmanned stores. As all the customers’ buying records are identified and recorded in the server, it can be used to identify the popularity of the product. In particular, it can reduce the stock of unpopular products and reduce inventory. Full article

(This article belongs to the Special Issue Advances in Big Data and Machine Learning)

► Show Figures

Figure 1

22 pages, 25305 KiB

Open AccessArticle

Automatic Hate Speech Detection in English-Odia Code Mixed Social Media Data Using Machine Learning Techniques

by Sudhir Kumar Mohapatra, Srinivas Prasad, Dwiti Krishna Bebarta, Tapan Kumar Das, Kathiravan Srinivasan and Yuh-Chung Hu

Appl. Sci. 2021, 11(18), 8575; https://0-doi-org.brum.beds.ac.uk/10.3390/app11188575 - 15 Sep 2021

Cited by 14 | Viewed by 5164

Abstract

Hate speech on social media may spread quickly through online users and subsequently, may even escalate into local vile violence and heinous crimes. This paper proposes a hate speech detection model by means of machine learning and text mining feature extraction techniques. In [...] Read more.

Hate speech on social media may spread quickly through online users and subsequently, may even escalate into local vile violence and heinous crimes. This paper proposes a hate speech detection model by means of machine learning and text mining feature extraction techniques. In this study, the authors collected the hate speech of English-Odia code mixed data from a Facebook public page and manually organized them into three classes. In order to build binary and ternary datasets, the data are further converted into binary classes. The modeling of hate speech employs the combination of a machine learning algorithm and features extraction. Support vector machine (SVM), naïve Bayes (NB) and random forest (RF) models were trained using the whole dataset, with the extracted feature based on word unigram, bigram, trigram, combined n-grams, term frequency-inverse document frequency (TF-IDF), combined n-grams weighted by TF-IDF and word2vec for both the datasets. Using the two datasets, we developed two kinds of models with each feature—binary models and ternary models. The models based on SVM with word2vec achieved better performance than the NB and RF models for both the binary and ternary categories. The result reveals that the ternary models achieved less confusion between hate and non-hate speech than the binary models. Full article

(This article belongs to the Special Issue Advances in Big Data and Machine Learning)

► Show Figures

Figure 1

13 pages, 3499 KiB

Open AccessFeature PaperArticle

Classification of Apple Disease Based on Non-Linear Deep Features

by Hamail Ayaz, Erick Rodríguez-Esparza, Muhammad Ahmad, Diego Oliva, Marco Pérez-Cisneros and Ram Sarkar

Appl. Sci. 2021, 11(14), 6422; https://0-doi-org.brum.beds.ac.uk/10.3390/app11146422 - 12 Jul 2021

Cited by 24 | Viewed by 3320

Abstract

Diseases in apple orchards (rot, scab, and blotch) worldwide cause a substantial loss in the agricultural industry. Traditional hand picking methods are subjective to human efforts. Conventional machine learning methods for apple disease classification depend on hand-crafted features that are not robust and [...] Read more.

Diseases in apple orchards (rot, scab, and blotch) worldwide cause a substantial loss in the agricultural industry. Traditional hand picking methods are subjective to human efforts. Conventional machine learning methods for apple disease classification depend on hand-crafted features that are not robust and are complex. Advanced artificial methods such as Convolutional Neural Networks (CNN’s) have become a promising way for achieving higher accuracy although they need a high volume of samples. This work investigates different Deep CNN (DCNN) applications to apple disease classification using deep generative images to obtain higher accuracy. In order to achieve this, our work progressively modifies a baseline model by using an end-to-end trained DCNN model that has fewer parameters, better recognition accuracy than existing models (i.e., ResNet, SqeezeNet, and MiniVGGNet). We have performed a comparative study with state-of-the-art CNN as well as conventional methods proposed in the literature, and comparative results confirm the superiority of our proposed model. Full article

(This article belongs to the Special Issue Advances in Big Data and Machine Learning)

► Show Figures

Figure 1

Other

Jump to: Research

2 pages, 145 KiB

Open AccessRetraction

RETRACTED: Prasanna et al. Continual Learning Approach for Continuous Data Stream Analysis in Dynamic Environments. Appl. Sci. 2023, 13, 8004

by K. Prasanna, Mudassir Khan, Saeed M. Alshahrani, Ajmeera Kiran, P. Phanindra Kumar Reddy, Mofadal Alymani and J. Chinna Babu

Appl. Sci. 2024, 14(2), 476; https://0-doi-org.brum.beds.ac.uk/10.3390/app14020476 - 05 Jan 2024

Viewed by 3649

Abstract

The journal retracts the article “Continual Learning Approach for Continuous Data Stream Analysis in Dynamic Environments” [...] Full article

(This article belongs to the Special Issue Advances in Big Data and Machine Learning)

Journal Menu

Journal Browser

Advances in Big Data and Machine Learning

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Published Papers (8 papers)

Research

Other

Further Information

Guidelines

MDPI Initiatives

Follow MDPI