Submit to Applied Sciences Review for Applied Sciences Propose a Special Issue

Journal Browser

Recent Trends in Natural Language Processing and Its Applications

Print Special Issue Flyer
Special Issue Editors
Special Issue Information
Published Papers

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (30 April 2023) | Viewed by 26960

Share This Special Issue

Special Issue Editors

Dr. Paolo Mengoni

E-Mail Website
Guest Editor

School of Communication and Film, Hong Kong Baptist University, 224 Waterloo Rd., Kowloon Tong, Hong Kong
Interests: artificial intelligence; natural language processing; complex network analysis

Dr. Valentino Santucci

E-Mail Website
Guest Editor

Department of Humanities and Social Sciences, University for Foreigners of Perugia, 06123 Perugia, Italy
Interests: natural language processing; evolutionary computation; computational optimization
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The advances of Artificial Intelligence led to significant achievements in tasks that are challenging also for humans. Among the other applications, Natural Language Processing (NLP) has recently been introduced in a variety of fields to dive into important tasks such as machine translation, natural language understanding, question answering, fake news detection, and many others. Significant challenges remain to be approached and new techniques are being introduced to solve such novel problems.

This Special Issue focuses on recent trends in NLP and its applications. The aim is to gather researchers from various fields and backgrounds that use NLP methods to solve the most diverse problems. It is an opportunity to report on all their latest works, achievements, and bring new perspectives to the future directions of the NLP field.

The topics of interest for this Special Issue include but are not limited to:

Natural language understanding and generation
Machine translation
Question answering and dialogue systems
Knowledge extraction and modelling
Knowledge graph based approaches
Text summarization and style transfer
Text classification, topic extraction, and discourse analysis
Fake news, misinformation, and disinformation detection
Echo chamber and polarization detection
Sentiment analysis, emotion recognition, and stance detection
Document analysis, information extraction, and text mining
Behaviour modelling
Multi- and cross- language approaches
Transfer learning in NLP
Text complexity
Linguistic analysis of texts
Topic modelling

Dr. Paolo Mengoni
Dr. Valentino Santucci
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (13 papers)

Download All Papers

Editorial

Jump to: Research

2 pages, 175 KiB

Open AccessEditorial

Special Issue “Recent Trends in Natural Language Processing and Its Applications”

by Paolo Mengoni and Valentino Santucci

Appl. Sci. 2023, 13(12), 7284; https://0-doi-org.brum.beds.ac.uk/10.3390/app13127284 - 19 Jun 2023

Cited by 1 | Viewed by 816

Abstract

The recent advancements in Artificial Intelligence have paved the way for remarkable achievements in tasks that have traditionally posed challenges even for humans [...] Full article

(This article belongs to the Special Issue Recent Trends in Natural Language Processing and Its Applications)

Research

Jump to: Editorial

15 pages, 1112 KiB

Open AccessArticle

Neural Network-Based Bilingual Lexicon Induction for Indonesian Ethnic Languages

by Kartika Resiandi, Yohei Murakami and Arbi Haza Nasution

Appl. Sci. 2023, 13(15), 8666; https://0-doi-org.brum.beds.ac.uk/10.3390/app13158666 - 27 Jul 2023

Viewed by 698

Abstract

Indonesia has a variety of ethnic languages, most of which belong to the same language family: the Austronesian languages. Due to the shared language family, words in Indonesian ethnic languages are very similar. However, previous research suggests that these Indonesian ethnic languages are endangered. Thus, to prevent that, we propose the creation of a bilingual dictionary between ethnic languages, using a neural network approach to extract transformation rules, employing character-level embedding and the Bi-LSTM method in a sequence-to-sequence model. The model has an encoder and decoder. The encoder reads the input sequence character by character, generates context, and then extracts a summary of the input. The decoder produces an output sequence wherein each character at each timestep, as well as the subsequent character output, are influenced by the previous character. The first experiment focuses on Indonesian and Minangkabau languages with 10,277 word pairs. To evaluate the model’s performance, five-fold cross-validation was used. The character-level seq2seq method (Bi-LSTM as an encoder and LSTM as a decoder) with an average precision of 83.92% outperformed the SentencePiece byte pair encoding (vocab size of 33) with an average precision of 79.56%. Furthermore, to evaluate the performance of the neural network model in finding the pattern, a rule-based approach was conducted as the baseline. The neural network approach obtained 542 more correct translations compared to the baseline. We implemented the best setting (character-level embedding with Bi-LSTM as the encoder and LSTM as the decoder) for four other Indonesian ethnic languages: Malay, Palembang, Javanese, and Sundanese. These have half the size of input dictionaries. The average precision scores for these languages are 65.08%, 62.52%, 59.69%, and 58.46%, respectively. This shows that the neural network approach can identify transformation patterns of the Indonesian language to closely related languages (such as Malay and Palembang) better than distantly related languages (such as Javanese and Sundanese). Full article

(This article belongs to the Special Issue Recent Trends in Natural Language Processing and Its Applications)

► Show Figures

Figure 1

17 pages, 1928 KiB

Open AccessArticle

SA-SGRU: Combining Improved Self-Attention and Skip-GRU for Text Classification

by Yuan Huang, Xiaohong Dai, Junhao Yu and Zheng Huang

Appl. Sci. 2023, 13(3), 1296; https://0-doi-org.brum.beds.ac.uk/10.3390/app13031296 - 18 Jan 2023

Cited by 9 | Viewed by 2193

Abstract

When reading texts for text classification tasks, a large number of words are irrelevant, and in text classification tasks, the traditional self-attention mechanism has the problem of weight distribution limitations. Therefore, a text classification model that combines an improved self-attention mechanism with a Skip-GRU (Skip-grate recurrent unit) network (SA-SGRU) is proposed in this paper. Firstly, Skip-GRU, the enhanced model of GRU (Grate Recurrent Unit), is used to skip the content that is not important for text classification when reading texts and only capture effective global information. Then, the improved self-attention mechanism is introduced to redistribute the weight of the deep text sequences. Secondly, the optimized CNN (convolutional neural network) is combined to bring up the local features of texts. Finally, a Softmax classifier is used to obtain the classification results of sample labels. Experimental results show that the proposed method can achieve better performance on three public datasets compared with other baseline methods. The ablation experiments also demonstrate the effectiveness of each module in the proposed model. Full article

(This article belongs to the Special Issue Recent Trends in Natural Language Processing and Its Applications)

► Show Figures

Figure 1

18 pages, 1198 KiB

Open AccessArticle

A Cloud-Native Web Application for Assisted Metadata Generation and Retrieval: THESPIAN-NER

by Alessandro Bombini, Ahmad Alkhansa, Laura Cappelli, Achille Felicetti, Francesco Giacomini and Alessandro Costantini

Appl. Sci. 2022, 12(24), 12910; https://0-doi-org.brum.beds.ac.uk/10.3390/app122412910 - 15 Dec 2022

Cited by 3 | Viewed by 1584

Abstract

Within the context of the Competence Centre for the Conservation of Cultural Heritage (4CH) project, the design and deployment of a platform-as-a-service cloud infrastructure for the first European competence centre of cultural heritage (CH) has begun, and some web services have been integrated into the platform. The first integrated service is the INFN-CHNet web application for FAIR storage of scientific analysis on CH: THESPIAN-Mask. It is based on CIDOC-CRM-compatible ontology and CRMhs, describing the scientific metadata. To ease the process of metadata generation and data injection, another web service has been developed: THESPIAN-NER. It is a tool based on a deep neural network for named entity recognition (NER), enabling users to upload their Italian-written report files and obtain labelled entities. Those entities are used as keywords either to serve as (semi)automatically custom queries for the database, or to fill (part of) the metadata form as a descriptor for the file to be uploaded. The services have been made freely available in the 4CH PaaS cloud platform. Full article

(This article belongs to the Special Issue Recent Trends in Natural Language Processing and Its Applications)

► Show Figures

Figure 1

13 pages, 1469 KiB

Open AccessArticle

Detecting Hateful and Offensive Speech in Arabic Social Media Using Transfer Learning

by Zakaria Boulouard, Mariya Ouaissa, Mariyam Ouaissa, Moez Krichen, Mutiq Almutiq and Karim Gasmi

Appl. Sci. 2022, 12(24), 12823; https://0-doi-org.brum.beds.ac.uk/10.3390/app122412823 - 14 Dec 2022

Cited by 8 | Viewed by 1952

Abstract

The democratization of access to internet and social media has given an opportunity for every individual to openly express his or her ideas and feelings. Unfortunately, this has also created room for extremist, racist, misogynist, and offensive opinions expressed either as articles, posts, or comments. While controlling offensive speech in English-, Spanish-, and French- speaking social media communities and websites has reached a mature level, it is much less the case for their counterparts in Arabic-speaking countries. This paper presents a transfer learning solution to detect hateful and offensive speech on Arabic websites and social media platforms. This paper will compare the performance of different BERT-based models trained to classify comments as either abusive or neutral. The training dataset contains comments in standard Arabic as well as four dialects. We will also use their English translations for comparative purposes. The models were evaluated based on five metrics: Accuracy, Precision, Recall, F1-Score, and Confusion Matrix. Full article

(This article belongs to the Special Issue Recent Trends in Natural Language Processing and Its Applications)

► Show Figures

Figure 1

14 pages, 1697 KiB

Open AccessArticle

Applying a Character-Level Model to a Short Arabic Dialect Sentence: A Saudi Dialect as a Case Study

by Tahani Alqurashi

Appl. Sci. 2022, 12(23), 12435; https://0-doi-org.brum.beds.ac.uk/10.3390/app122312435 - 05 Dec 2022

Cited by 1 | Viewed by 1574

Abstract

Arabic dialect identification (ADI) has recently drawn considerable interest among researchers in language recognition and natural language processing fields. This study investigated the use of a character-level model that is effectively unrestricted in its vocabulary, to identify fine-grained Arabic language dialects in the form of short written text. The Saudi dialects, particularly the four main Saudi dialects across the country, were considered in this study. The proposed ADI approach consists of five main phases, namely dialect data collection, data preprocessing and labelling, character-based feature extraction, deep learning character-based model/classical machine learning character-based models, and model evaluation performance. Several classical machine learning methods, including logistic regression, stochastic gradient descent, variations of the naive Bayes models, and support vector classification, were applied to the dataset. For the deep learning, the character convolutional neural network (CNN) model was adapted with a bidirectional long short-term memory approach. The collected data were tested under various classification tasks, including two-, three- and four-way ADI tasks. The results revealed that classical machine learning algorithms outperformed the CNN approach. Moreover, the use of the term frequency–inverse document frequency, combined with a character n-grams model ranging from unigrams to four-grams achieved the best performance among the tested parameters. Full article

(This article belongs to the Special Issue Recent Trends in Natural Language Processing and Its Applications)

► Show Figures

Figure 1

24 pages, 3158 KiB

Open AccessArticle

Exploiting Stacked Autoencoders for Improved Sentiment Analysis

by Kanwal Ahmed, Muhammad Imran Nadeem, Dun Li, Zhiyun Zheng, Yazeed Yasin Ghadi, Muhammad Assam and Heba G. Mohamed

Appl. Sci. 2022, 12(23), 12380; https://0-doi-org.brum.beds.ac.uk/10.3390/app122312380 - 03 Dec 2022

Cited by 11 | Viewed by 1614

Abstract

Sentiment analysis is an ongoing research field within the discipline of data mining. The majority of academics employ deep learning models for sentiment analysis due to their ability to self-learn and process vast amounts of data. However, the performance of deep learning models depends on the values of the hyperparameters. Determining suitable values for hyperparameters is a cumbersome task. The goal of this study is to increase the accuracy of stacked autoencoders for sentiment analysis using a heuristic optimization approach. In this study, we propose a hybrid model GA(SAE)-SVM using a genetic algorithm (GA), stacked autoencoder (SAE), and support vector machine (SVM) for fine-grained sentiment analysis. Features are extracted using continuous bag-of-words (CBOW), and then input into the SAE. In the proposed GA(SAE)-SVM, the hyperparameters of the SAE algorithm are optimized using GA. The features extracted by SAE are input into the SVM for final classification. A comparison is performed with a random search and grid search for parameter optimization. GA optimization is faster than grid search, and selects more optimal values than random search, resulting in improved accuracy. We evaluate the performance of the proposed model on eight benchmark datasets. The proposed model outperformed when compared to the baseline and state-of-the-art techniques. Full article

(This article belongs to the Special Issue Recent Trends in Natural Language Processing and Its Applications)

► Show Figures

Figure 1

21 pages, 2745 KiB

Open AccessFeature PaperArticle

Exploring Pandemics Events on Twitter by Using Sentiment Analysis and Topic Modelling

by Zhikang Qin and Elisabetta Ronchieri

Appl. Sci. 2022, 12(23), 11924; https://0-doi-org.brum.beds.ac.uk/10.3390/app122311924 - 22 Nov 2022

Cited by 6 | Viewed by 1876

Abstract

At the end of 2019, while the world was being hit by the COVID-19 virus and, consequently, was living a global health crisis, many other pandemics were putting humankind in danger. The role of social media is of paramount importance in these kinds of contexts because they help health systems to cope with emergencies by contributing to conducting some activities, such as the identification of public concerns, the detection of infections’ symptoms, and the traceability of the virus diffusion. In this paper, we have analysed comments on events related to cholera, Ebola, HIV/AIDS, influenza, malaria, Spanish influenza, swine flu, tuberculosis, typhus, yellow fever, and Zika, collecting 369,472 tweets from 3 March to 15 September 2022. Our analysis has started with the collection of comments composed of unstructured texts on which we have applied natural language processing solutions. Following, we have employed topic modelling and sentiment analysis techniques to obtain a collection of people’s concerns and attitudes towards these pandemics. According to our findings, people’s discussions were mostly about malaria, influenza, and tuberculosis, and the focus was on the diseases themselves. As regards emotions, the most popular were fear, trust, and disgust, where trust is mainly regarding HIV/AIDS tweets. Full article

(This article belongs to the Special Issue Recent Trends in Natural Language Processing and Its Applications)

► Show Figures

Figure 1

17 pages, 2671 KiB

Open AccessArticle

Research on Short Video Hotspot Classification Based on LDA Feature Fusion and Improved BiLSTM

by Linhui Li, Dan Dai, Hongjiu Liu, Yubo Yuan, Lizhong Ding and Yujie Xu

Appl. Sci. 2022, 12(23), 11902; https://0-doi-org.brum.beds.ac.uk/10.3390/app122311902 - 22 Nov 2022

Cited by 3 | Viewed by 1591

Abstract

Short video hot spot classification is a fundamental method to grasp the focus of consumers and improve the effectiveness of video marketing. The limitations of traditional short text classification are sparse content as well as inconspicuous feature extraction. To solve the problems above, this paper proposes a short video hot spot classification model combining latent dirichlet allocation (LDA) feature fusion and improved bi-directional long short-term memory (BiLSTM), namely the LDA-BiLSTM-self-attention (LBSA) model, to carry out the study of hot spot classification that targets Carya cathayensis walnut short video review data under the TikTok platform. Firstly, the LDA topic model was used to expand the topic features of the Word2Vec word vector, which was then fused and input into the BiLSTM model to learn the text features. Afterwards, the self-attention mechanism was employed to endow different weights to the output information of BiLSTM in accordance with the importance, to enhance the precision of feature extraction and complete the hot spot classification of review data. Experimental results show that the precision of the proposed LBSA model reached 91.52%, which is significantly improved compared with the traditional model in terms of precision and F1 value. Full article

(This article belongs to the Special Issue Recent Trends in Natural Language Processing and Its Applications)

► Show Figures

Figure 1

19 pages, 1297 KiB

Open AccessArticle

Roman Urdu Sentiment Analysis Using Transfer Learning

by Dun Li, Kanwal Ahmed, Zhiyun Zheng, Syed Agha Hassnain Mohsan, Mohammed H. Alsharif, Myriam Hadjouni, Mona M. Jamjoom and Samih M. Mostafa

Appl. Sci. 2022, 12(20), 10344; https://0-doi-org.brum.beds.ac.uk/10.3390/app122010344 - 14 Oct 2022

Cited by 11 | Viewed by 3697

Abstract

Numerous studies have been conducted to meet the growing need for analytic tools capable of processing increasing amounts of textual data available online, and sentiment analysis has emerged as a frontrunner in this field. Current studies are focused on the English language, while minority languages, such as Roman Urdu, are ignored because of their complex syntax and lexical varieties. In recent years, deep neural networks have become the standard in this field. The entire potential of DL models for text SA has not yet been fully explored, despite their early success. For sentiment analysis, CNN has surpassed in accuracy, although it still has some imperfections. To begin, CNNs need a significant amount of data to train. Second, it presumes that all words have the same impact on the polarity of a statement. To fill these voids, this study proposes a CNN with an attention mechanism and transfer learning to improve SA performance. Compared to state-of-the-art methods, our proposed model appears to have achieved greater classification accuracy in experiments. Full article

(This article belongs to the Special Issue Recent Trends in Natural Language Processing and Its Applications)

► Show Figures

Figure 1

18 pages, 864 KiB

Open AccessArticle

Threatening URDU Language Detection from Tweets Using Machine Learning

by Aneela Mehmood, Muhammad Shoaib Farooq, Ansar Naseem, Furqan Rustam, Mónica Gracia Villar, Carmen Lili Rodríguez and Imran Ashraf

Appl. Sci. 2022, 12(20), 10342; https://0-doi-org.brum.beds.ac.uk/10.3390/app122010342 - 14 Oct 2022

Cited by 10 | Viewed by 2261

Abstract

Technology’s expansion has contributed to the rise in popularity of social media platforms. Twitter is one of the leading social media platforms that people use to share their opinions. Such opinions, sometimes, may contain threatening text, deliberately or non-deliberately, which can be disturbing for other users. Consequently, the detection of threatening content on social media is an important task. Contrary to high-resource languages like English, Dutch, and others that have several such approaches, the low-resource Urdu language does not have such a luxury. Therefore, this study presents an intelligent threatening language detection for the Urdu language. A stacking model is proposed that uses an extra tree (ET) classifier and Bayes theorem-based Bernoulli Naive Bayes (BNB) as the based learners while logistic regression (LR) is employed as the meta learner. A performance analysis is carried out by deploying a support vector classifier, ET, LR, BNB, fully connected network, convolutional neural network, long short-term memory, and gated recurrent unit. Experimental results indicate that the stacked model performs better than both machine learning and deep learning models. With 74.01% accuracy, 70.84% precision, 75.65% recall, and 73.99% F1 score, the model outperforms the existing benchmark study. Full article

(This article belongs to the Special Issue Recent Trends in Natural Language Processing and Its Applications)

► Show Figures

Figure 1

17 pages, 2134 KiB

Open AccessArticle

Spoken Language Identification System Using Convolutional Recurrent Neural Network

by Adal A. Alashban, Mustafa A. Qamhan, Ali H. Meftah and Yousef A. Alotaibi

Appl. Sci. 2022, 12(18), 9181; https://0-doi-org.brum.beds.ac.uk/10.3390/app12189181 - 13 Sep 2022

Cited by 13 | Viewed by 3407

Abstract

Following recent advancements in deep learning and artificial intelligence, spoken language identification applications are playing an increasingly significant role in our day-to-day lives, especially in the domain of multi-lingual speech recognition. In this article, we propose a spoken language identification system that depends on the sequence of feature vectors. The proposed system uses a hybrid Convolutional Recurrent Neural Network (CRNN), which combines a Convolutional Neural Network (CNN) with a Recurrent Neural Network (RNN) network, for spoken language identification on seven languages, including Arabic, chosen from subsets of the Mozilla Common Voice (MCV) corpus. The proposed system exploits the advantages of both CNN and RNN architectures to construct the CRNN architecture. At the feature extraction stage, it compares the Gammatone Cepstral Coefficient (GTCC) feature and Mel Frequency Cepstral Coefficient (MFCC) feature, as well as a combination of both. Finally, the speech signals were represented as frames and used as the input for the CRNN architecture. After conducting experiments, the results of the proposed system indicate higher performance with combined GTCC and MFCC features compared to GTCC or MFCC features used individually. The average accuracy of the proposed system was 92.81% in the best experiment for spoken language identification. Furthermore, the system can learn language-specific patterns in various filter size representations of speech files. Full article

(This article belongs to the Special Issue Recent Trends in Natural Language Processing and Its Applications)

► Show Figures

Graphical abstract

17 pages, 2584 KiB

Open AccessArticle

Computational Linguistics with Deep-Learning-Based Intent Detection for Natural Language Understanding

by Hala J. Alshahrani, Khaled Tarmissi, Hussain Alshahrani, Mohamed Ahmed Elfaki, Ayman Yafoz, Raed Alsini, Omar Alghushairy and Manar Ahmed Hamza

Appl. Sci. 2022, 12(17), 8633; https://0-doi-org.brum.beds.ac.uk/10.3390/app12178633 - 29 Aug 2022

Cited by 3 | Viewed by 1761

Abstract

Computational linguistics explores how human language is interpreted automatically and then processed. Research in this area takes the logical and mathematical features of natural language and advances methods and statistical procedures for automated language processing. Slot filling and intent detection are significant modules in task-based dialogue systems. Intent detection is a critical task in any natural language understanding (NLU) system and constitutes the base of a task-based dialogue system. In order to build high-quality, real-time conversational solutions for edge gadgets, there is a demand for deploying intent-detection methods on devices. This mandates an accurate, lightweight, and fast method that effectively operates in a resource-limited environment. Earlier works have explored the usage of several machine-learning (ML) techniques for detecting intent in user queries. In this article, we propose Computational Linguistics with Deep-Learning-Based Intent Detection and Classification (CL-DLBIDC) for natural language understanding. The presented CL-DLBIDC technique receives word embedding as input and learned meaningful features to determine the probable intention of the user query. In addition, the presented CL-DLBIDC technique uses the GloVe approach. In addition, the CL-DLBIDC technique makes use of the deep learning modified neural network (DLMNN) model for intent detection and classification. For the hyperparameter tuning process, the mayfly optimization (MFO) algorithm was used in this study. The experimental analysis of the CL-DLBIDC method took place under a set of simulations, and the results were scrutinized for distinct aspects. The simulation outcomes demonstrate the significant performance of the CL-DLBIDC algorithm over other DL models. Full article

(This article belongs to the Special Issue Recent Trends in Natural Language Processing and Its Applications)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Recent Trends in Natural Language Processing and Its Applications

Share This Special Issue

Special Issue Editors

Special Issue Information

Published Papers (13 papers)

Editorial

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI