Research

23 pages, 1326 KiB

Open AccessArticle

UAV Dynamic Object Tracking with Lightweight Deep Vision Reinforcement Learning

by Hy Nguyen, Srikanth Thudumu, Hung Du, Kon Mouzakis and Rajesh Vasa

Algorithms 2023, 16(5), 227; https://0-doi-org.brum.beds.ac.uk/10.3390/a16050227 - 27 Apr 2023

Cited by 2 | Viewed by 1800

Several approaches have applied Deep Reinforcement Learning (DRL) to Unmanned Aerial Vehicles (UAVs) to do autonomous object tracking. These methods, however, are resource intensive and require prior knowledge of the environment, making them difficult to use in real-world applications. In this paper, we [...] Read more.

Several approaches have applied Deep Reinforcement Learning (DRL) to Unmanned Aerial Vehicles (UAVs) to do autonomous object tracking. These methods, however, are resource intensive and require prior knowledge of the environment, making them difficult to use in real-world applications. In this paper, we propose a Lightweight Deep Vision Reinforcement Learning (LDVRL) framework for dynamic object tracking that uses the camera as the only input source. Our framework employs several techniques such as stacks of frames, segmentation maps from the simulation, and depth images to reduce the overall computational cost. We conducted the experiment with a non-sparse Deep Q-Network (DQN) (value-based) and a Deep Deterministic Policy Gradient (DDPG) (actor-critic) to test the adaptability of our framework with different methods and identify which DRL method is the most suitable for this task. In the end, a DQN is chosen for several reasons. Firstly, a DQN has fewer networks than a DDPG, hence reducing the computational resources on physical UAVs. Secondly, it is surprising that although a DQN is smaller in model size than a DDPG, it still performs better in this specific task. Finally, a DQN is very practical for this task due to the ability to operate in continuous state space. Using a high-fidelity simulation environment, our proposed approach is verified to be effective. Full article

(This article belongs to the Special Issue Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

19 pages, 1873 KiB

Open AccessArticle

JointContrast: Skeleton-Based Interaction Recognition with New Representation and Contrastive Learning

by Ji Zhang, Xiangze Jia, Zhen Wang, Yonglong Luo, Fulong Chen, Gaoming Yang and Lihui Zhao

Algorithms 2023, 16(4), 190; https://0-doi-org.brum.beds.ac.uk/10.3390/a16040190 - 30 Mar 2023

Cited by 2 | Viewed by 1422

Abstract

Skeleton-based action recognition depends on skeleton sequences to detect categories of human actions. In skeleton-based action recognition, the recognition of action scenes with more than one subject is named as interaction recognition. Different from the single-subject action recognition methods, interaction recognition requires an [...] Read more.

Skeleton-based action recognition depends on skeleton sequences to detect categories of human actions. In skeleton-based action recognition, the recognition of action scenes with more than one subject is named as interaction recognition. Different from the single-subject action recognition methods, interaction recognition requires an explicit representation of the interaction information between subjects. Recalling the success of skeletal graph representation and graph convolution in modeling the spatial structural information of skeletal data, we consider whether we can embed the inter-subject interaction information into the skeletal graph and use graph convolution for a unified feature representation. In this paper, we propose the interaction information embedding skeleton graph representation (IE-Graph) and use the graph convolution operation to represent the intra-subject spatial structure information and inter-subject interaction information in a uniform manner. Inspired by recent pre-training methods in 2D vision, we propose unsupervised pre-training methods for skeletal data as well as contrast loss. In SBU datasets, JointContrast achieves

98.2 %

recognition accuracy. in NTU60 datasets, JointContrast respectively achieves

94.1 %

and

96.8 %

recognition accuracy under Cross-Subject and Cross-View evaluation metrics. Full article

(This article belongs to the Special Issue Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

21 pages, 2472 KiB

Open AccessArticle

RUemo—The Classification Framework for Russia-Ukraine War-Related Societal Emotions on Twitter through Machine Learning

by Piyush Vyas, Gitika Vyas and Gaurav Dhiman

Algorithms 2023, 16(2), 69; https://0-doi-org.brum.beds.ac.uk/10.3390/a16020069 - 20 Jan 2023

Cited by 17 | Viewed by 4526

Abstract

The beginning of this decade brought utter international chaos with the COVID-19 pandemic and the Russia-Ukraine war (RUW). The ongoing war has been building pressure across the globe. People have been showcasing their opinions through different communication media, of which social media is [...] Read more.

The beginning of this decade brought utter international chaos with the COVID-19 pandemic and the Russia-Ukraine war (RUW). The ongoing war has been building pressure across the globe. People have been showcasing their opinions through different communication media, of which social media is the prime source. Consequently, it is important to analyze people’s emotions toward the RUW. This paper therefore aims to provide the framework for automatically classifying the distinct societal emotions on Twitter, utilizing the amalgamation of Emotion Robustly Optimized Bidirectional Encoder Representations from the Transformers Pre-training Approach (Emoroberta) and machine-learning (ML) techniques. This combination shows the originality of our proposed framework, i.e., Russia-Ukraine War emotions (RUemo), in the context of the RUW. We have utilized the Twitter dataset related to the RUW available on Kaggle.com. The RUemo framework can extract the 27 distinct emotions of Twitter users that are further classified by ML techniques. We have achieved 95% of testing accuracy for multilayer perceptron and logistic regression ML techniques for the multiclass emotion classification task. Our key finding indicates that:First, 81% of Twitter users in the survey show a neutral position toward RUW; second, there is evidence of social bots posting RUW-related tweets; third, other than Russia and Ukraine, users mentioned countries such as Slovakia and the USA; and fourth, the Twitter accounts of the Ukraine President and the US President are also mentioned by Twitter users. Overall, the majority of tweets describe the RUW in key terms related more to Ukraine than to Russia. Full article

(This article belongs to the Special Issue Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

32 pages, 5442 KiB

Open AccessArticle

Digital Authorship Attribution in Russian-Language Fanfiction and Classical Literature

by Anastasia Fedotova, Aleksandr Romanov, Anna Kurtukova and Alexander Shelupanov

Algorithms 2023, 16(1), 13; https://0-doi-org.brum.beds.ac.uk/10.3390/a16010013 - 26 Dec 2022

Cited by 4 | Viewed by 1917

Abstract

This article is the third paper in a series aimed at the establishment of the authorship of Russian-language texts. This paper considers methods for determining the authorship of classical Russian literary texts, as well as fanfiction texts. The process of determining the author [...] Read more.

This article is the third paper in a series aimed at the establishment of the authorship of Russian-language texts. This paper considers methods for determining the authorship of classical Russian literary texts, as well as fanfiction texts. The process of determining the author was first considered in the classical version of classification experiments using a closed set of authors, and experiments were also completed for a complicated modification of the problem using an open set of authors. The use of methods to identify the author of the text is justified by the conclusions about the effectiveness of the fastText and Support Vector Machine (SVM) methods with the selection of informative features discussed in our past studies. In the case of open attribution, the proposed methods are based on the author’s combination of fastText and One-Class SVM as well as statistical estimates of a vector’s similarity measures. The feature selection algorithm for a closed set of authors is chosen based on a comparison of five different selection methods, including the previously considered genetic algorithm as a baseline. The regularization-based algorithm (RbFS) was found to be the most efficient method, while methods based on a complete enumeration (FFS and SFS) are found to be ineffective for any set of authors. The accuracy of the RbFS and SVM methods in the case of classical literary texts averaged 83%, which outperforms other selection methods by 3 to 10% for an identical number of features, and the average accuracy of fastText was 84%. For the open attribution in cross-topic classification, the average accuracy of the method based on the combination of One-Class SVM with RbFS and fastText was 85%, and for in-group classification, it was 75 to 78%, depending on the group, which is the best result among the open attribution methods considered. Full article

(This article belongs to the Special Issue Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

16 pages, 1293 KiB

Open AccessArticle

SAPBERT: Speaker-Aware Pretrained BERT for Emotion Recognition in Conversation

by Seunguook Lim and Jihie Kim

Algorithms 2023, 16(1), 8; https://0-doi-org.brum.beds.ac.uk/10.3390/a16010008 - 22 Dec 2022

Cited by 2 | Viewed by 2762

Abstract

Emotion recognition in conversation (ERC) is receiving more and more attention, as interactions between humans and machines increase in a variety of services such as chat-bot and virtual assistants. As emotional expressions within a conversation can heavily depend on the contextual information of [...] Read more.

Emotion recognition in conversation (ERC) is receiving more and more attention, as interactions between humans and machines increase in a variety of services such as chat-bot and virtual assistants. As emotional expressions within a conversation can heavily depend on the contextual information of the participating speakers, it is important to capture self-dependency and inter-speaker dynamics. In this study, we propose a new pre-trained model, SAPBERT, that learns to identify speakers in a conversation to capture the speaker-dependent contexts and address the ERC task. SAPBERT is pre-trained with three training objectives including Speaker Classification (SC), Masked Utterance Regression (MUR), and Last Utterance Generation (LUG). We investigate whether our pre-trained speaker-aware model can be leveraged for capturing speaker-dependent contexts for ERC tasks. Experiments show that our proposed approach outperforms baseline models through demonstrating the effectiveness and validity of our method. Full article

(This article belongs to the Special Issue Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

17 pages, 2132 KiB

Open AccessArticle

Deep Learning Models for Yoga Pose Monitoring

by Debabrata Swain, Santosh Satapathy, Biswaranjan Acharya, Madhu Shukla, Vassilis C. Gerogiannis, Andreas Kanavos and Dimitris Giakovis

Algorithms 2022, 15(11), 403; https://0-doi-org.brum.beds.ac.uk/10.3390/a15110403 - 31 Oct 2022

Cited by 13 | Viewed by 4821

Abstract

Activity recognition is the process of continuously monitoring a person’s activity and movement. Human posture recognition can be utilized to assemble a self-guidance practice framework that permits individuals to accurately learn and rehearse yoga postures without getting help from anyone else. With the [...] Read more.

Activity recognition is the process of continuously monitoring a person’s activity and movement. Human posture recognition can be utilized to assemble a self-guidance practice framework that permits individuals to accurately learn and rehearse yoga postures without getting help from anyone else. With the use of deep learning algorithms, we propose an approach for the efficient detection and recognition of various yoga poses. The chosen dataset consists of 85 videos with 6 yoga postures performed by 15 participants, where the keypoints of users are extracted using the Mediapipe library. A combination of Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) has been employed for yoga pose recognition through real-time monitored videos as a deep learning model. Specifically, the CNN layer is used for the extraction of features from the keypoints and the following LSTM layer understands the occurrence of sequence of frames for predictions to be implemented. In following, the poses are classified as correct or incorrect; if a correct pose is identified, then the system will provide user the corresponding feedback through text/speech. This paper combines machine learning foundations with data structures as the synergy between these two areas can be established in the sense that machine learning techniques and especially deep learning can efficiently recognize data schemas and make them interoperable. Full article

(This article belongs to the Special Issue Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

22 pages, 5031 KiB

Open AccessArticle

Comparing Approaches for Explaining DNN-Based Facial Expression Classifications

by Kaya ter Burg and Heysem Kaya

Algorithms 2022, 15(10), 367; https://0-doi-org.brum.beds.ac.uk/10.3390/a15100367 - 03 Oct 2022

Cited by 5 | Viewed by 1973

Abstract

Classifying facial expressions is a vital part of developing systems capable of aptly interacting with users. In this field, the use of deep-learning models has become the standard. However, the inner workings of these models are unintelligible, which is an important issue when [...] Read more.

Classifying facial expressions is a vital part of developing systems capable of aptly interacting with users. In this field, the use of deep-learning models has become the standard. However, the inner workings of these models are unintelligible, which is an important issue when deploying them to high-stakes environments. Recent efforts to generate explanations for emotion classification systems have been focused on this type of models. In this work, an alternative way of explaining the decisions of a more conventional model based on geometric features is presented. We develop a geometric-features-based deep neural network (DNN) and a convolutional neural network (CNN). Ensuring a sufficient level of predictive accuracy, we analyze explainability using both objective quantitative criteria and a user study. Results indicate that the fidelity and accuracy scores of the explanations approximate the DNN well. From the performed user study, it becomes clear that the explanations increase the understanding of the DNN and that they are preferred over the explanations for the CNN, which are more commonly used. All scripts used in the study are publicly available. Full article

(This article belongs to the Special Issue Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

21 pages, 716 KiB

Open AccessArticle

Towards Sentiment Analysis for Romanian Twitter Content

by Dan Claudiu Neagu, Andrei Bogdan Rus, Mihai Grec, Mihai Augustin Boroianu, Nicolae Bogdan and Attila Gal

Algorithms 2022, 15(10), 357; https://0-doi-org.brum.beds.ac.uk/10.3390/a15100357 - 28 Sep 2022

Cited by 4 | Viewed by 2444

Abstract

With the increased popularity of social media platforms such as Twitter or Facebook, sentiment analysis (SA) over the microblogging content becomes of crucial importance. The literature reports good results for well-resourced languages such as English, Spanish or German, but open research space still [...] Read more.

With the increased popularity of social media platforms such as Twitter or Facebook, sentiment analysis (SA) over the microblogging content becomes of crucial importance. The literature reports good results for well-resourced languages such as English, Spanish or German, but open research space still exists for underrepresented languages such as Romanian, where there is a lack of public training datasets or pretrained word embeddings. The majority of research on Romanian SA tackles the issue in a binary classification manner (positive vs. negative), using a single public dataset which consists of product reviews. In this paper, we respond to the need for a media surveillance project to possess a custom multinomial SA classifier for usage in a restrictive and specific production setup. We describe in detail how such a classifier was built, with the help of an English dataset (containing around

15, 000

tweets) translated to Romanian with a public translation service. We test the most popular classification methods that could be applied to SA, including standard machine learning, deep learning and BERT. As we could not find any results for multinomial sentiment classification (positive, negative and neutral) in Romanian, we set two benchmark accuracies of ≈78% using standard machine learning and ≈81% using BERT. Furthermore, we demonstrate that the automatic translation service does not downgrade the learning performance by comparing the accuracies achieved by the models trained on the original dataset with the models trained on the translated data. Full article

(This article belongs to the Special Issue Machine Learning in Pattern Recognition)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Machine Learning in Pattern Recognition

Share This Special Issue

Special Issue Editors

Special Issue Information

Keywords

Published Papers (8 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI