Advances in Artificial Intelligence Methods for Natural Language Processing

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 April 2022) | Viewed by 45723

Special Issue Editors


E-Mail Website
Guest Editor
Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, 80-233 Gdansk, Poland
Interests: natural language processing; intelligent signal analysis; artificial intelligence; data/text mining; machine learning; classification; pattern recognition; clustering
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Computer Architecture, Gdansk University of Technology, Gdańsk, Poland
Interests: artificial neural network; data mining; computer engineering; machine learning; anti plagiarism systems

E-Mail Website
Guest Editor
Computing Technology and Data Processing Department, University of Alicante, Alicante, Spain
Interests: computer architecture; image and text processing; cloud computing

E-Mail Website
Guest Editor
Faculty of Computer Science and Engineering, Frankfurt University of Applied Sciences, Frankfurt am Main, Germany
Interests: artificial intelligence; algorithms; document categorisation; multiobjective optimization

Special Issue Information

Dear Colleagues,

Recent years have shown significant progress in natural language processing using methods related to artificial intelligence. This progress is made both in the construction of new algorithms, language representation, and data sets as well as more and more efficient hardware. 

The advances in the NLP domain have influences on information retrieval methods, dialogue systems, automatic categorization of large text repositories etc. Thus, the purpose of this Special Issue is to publish high-quality research papers as well as review articles addressing recent advances in the field of computational linguistics. We welcome works that relate to NLP, such as: machine translation, dialogue and chatbots, and summarization, natural language generation, and understanding.

Topics of Interest:

We are seeking papers on (but not limited to) the following general topics related to NLP:

  • Automatic categorization (classification and clustering) of the documents
  • Text representation
  • Language resources and tools   
  • Sentiment analysis
  • Effective algorithms for text processing (HPC 4 NLP)
  • Machine translation
  • Text summarization
  • Generation of natural language
  • Cognitive models of language understanding
  • Anti-plagiarism systems 
  • Word sense disambiguation
  • Information retrieval
  • Ontologies for natural language
  • Deep learning in NLP
  • Entity identification and linking

Prof. Dr. Julian Szymanski
Dr. Andrzej Sobecki
Prof. Dr. Higinio Mora
Prof. Dr. Doina Logofătu
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (18 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

23 pages, 2053 KiB  
Article
Automatic Text Summarization for Hindi Using Real Coded Genetic Algorithm
by Arti Jain, Anuja Arora, Jorge Morato, Divakar Yadav and Kumar Vimal Kumar
Appl. Sci. 2022, 12(13), 6584; https://0-doi-org.brum.beds.ac.uk/10.3390/app12136584 - 29 Jun 2022
Cited by 16 | Viewed by 2828
Abstract
In the present scenario, Automatic Text Summarization (ATS) is in great demand to address the ever-growing volume of text data available online to discover relevant information faster. In this research, the ATS methodology is proposed for the Hindi language using Real Coded Genetic [...] Read more.
In the present scenario, Automatic Text Summarization (ATS) is in great demand to address the ever-growing volume of text data available online to discover relevant information faster. In this research, the ATS methodology is proposed for the Hindi language using Real Coded Genetic Algorithm (RCGA) over the health corpus, available in the Kaggle dataset. The methodology comprises five phases: preprocessing, feature extraction, processing, sentence ranking, and summary generation. Rigorous experimentation on varied feature sets is performed where distinguishing features, namely- sentence similarity and named entity features are combined with others for computing the evaluation metrics. The top 14 feature combinations are evaluated through Recall-Oriented Understudy for Gisting Evaluation (ROUGE) measure. RCGA computes appropriate feature weights through strings of features, chromosomes selection, and reproduction operators: Simulating Binary Crossover and Polynomial Mutation. To extract the highest scored sentences as the corpus summary, different compression rates are tested. In comparison with existing summarization tools, the ATS extractive method gives a summary reduction of 65%. Full article
Show Figures

Figure 1

28 pages, 2082 KiB  
Article
An Empirical Evaluation of Document Embeddings and Similarity Metrics for Scientific Articles
by Joaquin Gómez and Pere-Pau Vázquez
Appl. Sci. 2022, 12(11), 5664; https://0-doi-org.brum.beds.ac.uk/10.3390/app12115664 - 02 Jun 2022
Cited by 4 | Viewed by 2566
Abstract
The comparison of documents—such as articles or patents search, bibliography recommendations systems, visualization of document collections, etc.—has a wide range of applications in several fields. One of the key tasks that such problems have in common is the evaluation of a similarity metric. [...] Read more.
The comparison of documents—such as articles or patents search, bibliography recommendations systems, visualization of document collections, etc.—has a wide range of applications in several fields. One of the key tasks that such problems have in common is the evaluation of a similarity metric. Many such metrics have been proposed in the literature. Lately, deep learning techniques have gained a lot of popularity. However, it is difficult to analyze how those metrics perform against each other. In this paper, we present a systematic empirical evaluation of several of the most popular similarity metrics when applied to research articles. We analyze the results of those metrics in two ways, with a synthetic test that uses scientific papers and Ph.D. theses, and in a real-world scenario where we evaluate their ability to cluster papers from different areas of research. Full article
Show Figures

Figure 1

25 pages, 455 KiB  
Article
Quantum Natural Language Processing: Challenges and Opportunities
by Raffaele Guarasci, Giuseppe De Pietro and Massimo Esposito
Appl. Sci. 2022, 12(11), 5651; https://0-doi-org.brum.beds.ac.uk/10.3390/app12115651 - 02 Jun 2022
Cited by 15 | Viewed by 5526
Abstract
The meeting between Natural Language Processing (NLP) and Quantum Computing has been very successful in recent years, leading to the development of several approaches of the so-called Quantum Natural Language Processing (QNLP). This is a hybrid field in which the potential of quantum [...] Read more.
The meeting between Natural Language Processing (NLP) and Quantum Computing has been very successful in recent years, leading to the development of several approaches of the so-called Quantum Natural Language Processing (QNLP). This is a hybrid field in which the potential of quantum mechanics is exploited and applied to critical aspects of language processing, involving different NLP tasks. Approaches developed so far span from those that demonstrate the quantum advantage only at the theoretical level to the ones implementing algorithms on quantum hardware. This paper aims to list the approaches developed so far, categorizing them by type, i.e., theoretical work and those implemented on classical or quantum hardware; by task, i.e., general purpose such as syntax-semantic representation or specific NLP tasks, like sentiment analysis or question answering; and by the resource used in the evaluation phase, i.e., whether a benchmark dataset or a custom one has been used. The advantages offered by QNLP are discussed, both in terms of performance and methodology, and some considerations about the possible usage QNLP approaches in the place of state-of-the-art deep learning-based ones are given. Full article
Show Figures

Figure 1

18 pages, 1026 KiB  
Article
Domain Adversarial Network for Cross-Domain Emotion Recognition in Conversation
by Hongchao Ma, Chunyan Zhang, Xiabing Zhou, Junyi Chen and Qinglei Zhou
Appl. Sci. 2022, 12(11), 5436; https://0-doi-org.brum.beds.ac.uk/10.3390/app12115436 - 27 May 2022
Cited by 1 | Viewed by 1397
Abstract
Emotion Recognition in Conversation (ERC) aims to recognize the emotion for each utterance in a conversation automatically. Due to the difficulty of collecting and labeling, this task lacks the dataset corpora available on a large scale. This increases the difficulty of finishing the [...] Read more.
Emotion Recognition in Conversation (ERC) aims to recognize the emotion for each utterance in a conversation automatically. Due to the difficulty of collecting and labeling, this task lacks the dataset corpora available on a large scale. This increases the difficulty of finishing the supervised training required by large-scale neural networks. Introducing the large-scale generative conversational dataset can assist with modeling dialogue. However, the spatial distribution of feature vectors in the source and target domains is inconsistent after introducing the external dataset. To alleviate the problem, we propose a Domain Adversarial Network for Cross-Domain Emotion Recognition in Conversation (DAN-CDERC) model, consisting of domain adversarial and emotion recognition models. The domain adversarial model consists of the encoders, a generator and a domain discriminator. First, the encoders and generator learn contextual features from a large-scale source dataset. The discriminator performs domain adaptation by discriminating the domain to make the feature space of the source and target domain consistent, so as to obtain domain invariant features. Then DAN-CDERC transfers the learned domain invariant dialogue context knowledge from the domain adversarial model to the emotion recognition model to assist in modeling the dialogue context. Due to the use of a domain adversarial network, DAN-CDERC obtains dialogue-level contextual information that is domain invariant, thereby reducing the negative impact of inconsistency in domain space. Empirical studies illustrate that the proposed model outperforms the baseline models on three benchmark emotion recognition datasets. Full article
Show Figures

Figure 1

28 pages, 1908 KiB  
Article
Spider Taylor-ChOA: Optimized Deep Learning Based Sentiment Classification for Review Rating Prediction
by Santosh Kumar Banbhrani, Bo Xu, Hongfei Lin and Dileep Kumar Sajnani
Appl. Sci. 2022, 12(7), 3211; https://0-doi-org.brum.beds.ac.uk/10.3390/app12073211 - 22 Mar 2022
Cited by 5 | Viewed by 1881
Abstract
The prediction of review rating is an imperative sentiment assessment task that aims to discover the intensity of users’ sentiment toward a target product from several reviews. This paper devises a technique based on sentiment classification for predicting the review rating. Here, the [...] Read more.
The prediction of review rating is an imperative sentiment assessment task that aims to discover the intensity of users’ sentiment toward a target product from several reviews. This paper devises a technique based on sentiment classification for predicting the review rating. Here, the review data are taken from the database. The significant features, such as SentiWordNet-based statistical features, term frequency–inverse document frequency (TF-IDF), number of capitalized words, numerical words, punctuation marks, elongated words, hashtags, emoticons, and number of sentences are mined in feature extraction. The features are mined for sentiment classification, which is performed by random multimodal deep learning (RMDL). The training of RMDL is done using the proposed Spider Taylor-ChOA, which is devised by combining spider monkey optimization (SMO) and Taylor-based chimp optimization algorithm (Taylor-ChOA). Concurrently, the features are considered input for the review rating prediction, which determines positive and negative reviews using the hierarchical attention network (HAN), and training is done using proposed Spider Taylor-ChOA. The proposed Spider Taylor-ChOA-based RMDL performed best with the highest precision of 94.1%, recall of 96.5%, and highest F-measure of 95.3%. The proposed spider Taylor-ChOA-based HAN performed best with the highest precision of 93.1%, recall of 95.4% and highest F-measure of 94.3%. Full article
Show Figures

Figure 1

17 pages, 482 KiB  
Article
Commonsense Knowledge-Aware Prompt Tuning for Few-Shot NOTA Relation Classification
by Bo Lv, Li Jin, Yanan Zhang, Hao Wang, Xiaoyu Li and Zhi Guo
Appl. Sci. 2022, 12(4), 2185; https://0-doi-org.brum.beds.ac.uk/10.3390/app12042185 - 19 Feb 2022
Cited by 3 | Viewed by 2105
Abstract
Compared with the traditional few-shot task, the few-shot none-of-the-above (NOTA) relation classification focuses on the realistic scenario of few-shot learning, in which a test instance might not belong to any of the target categories. This undoubtedly increases the task’s difficulty because given only [...] Read more.
Compared with the traditional few-shot task, the few-shot none-of-the-above (NOTA) relation classification focuses on the realistic scenario of few-shot learning, in which a test instance might not belong to any of the target categories. This undoubtedly increases the task’s difficulty because given only a few support samples, this cannot represent the distribution of NOTA categories in space. The model needs to make full use of the syntactic information and word meaning information learned in the pre-training stage to distinguish the NOTA category and the support sample category in the embedding space. However, previous fine-tuning methods mainly focus on optimizing the extra classifiers (on top of pre-trained language models (PLMs)) and neglect the connection between pre-training objectives and downstream tasks. In this paper, we propose the commonsense knowledge-aware prompt tuning (CKPT) method for a few-shot NOTA relation classification task. First, a simple and effective prompt-learning method is developed by constructing relation-oriented templates, which can further stimulate the rich knowledge distributed in PLMs to better serve downstream tasks. Second, external knowledge is incorporated into the model by a label-extension operation, which forms knowledgeable prompt tuning to improve and stabilize prompt tuning. Third, to distinguish the NOTA pairs and positive pairs in embedding space more accurately, a learned scoring strategy is proposed, which introduces a learned threshold classification function and improves the loss function by adding a new term focused on NOTA identification. Experiments on two widely used benchmarks (FewRel 2.0 and Few-shot TACRED) show that our method is a simple and effective framework, and a new state of the art is established in the few-shot classification field. Full article
Show Figures

Figure 1

23 pages, 1776 KiB  
Article
MUFFLE: Multi-Modal Fake News Influence Estimator on Twitter
by Cheng-Lin Wu, Hsun-Ping Hsieh, Jiawei Jiang, Yi-Chieh Yang, Chris Shei and Yu-Wen Chen
Appl. Sci. 2022, 12(1), 453; https://0-doi-org.brum.beds.ac.uk/10.3390/app12010453 - 04 Jan 2022
Cited by 4 | Viewed by 2094
Abstract
To alleviate the impact of fake news on our society, predicting the popularity of fake news posts on social media is a crucial problem worthy of study. However, most related studies on fake news emphasize detection only. In this paper, we focus on [...] Read more.
To alleviate the impact of fake news on our society, predicting the popularity of fake news posts on social media is a crucial problem worthy of study. However, most related studies on fake news emphasize detection only. In this paper, we focus on the issue of fake news influence prediction, i.e., inferring how popular a fake news post might become on social platforms. To achieve our goal, we propose a comprehensive framework, MUFFLE, which captures multi-modal dynamics by encoding the representation of news-related social networks, user characteristics, and content in text. The attention mechanism developed in the model can provide explainability for social or psychological analysis. To examine the effectiveness of MUFFLE, we conducted extensive experiments on real-world datasets. The experimental results show that our proposed method outperforms both state-of-the-art methods of popularity prediction and machine-based baselines in top-k NDCG and hit rate. Through the experiments, we also analyze the feature importance for predicting fake news influence via the explainability provided by MUFFLE. Full article
Show Figures

Figure 1

16 pages, 1019 KiB  
Article
Relation-Aware Graph Transformer for SQL-to-Text Generation
by Da Ma, Xingyu Chen, Ruisheng Cao, Zhi Chen, Lu Chen and Kai Yu
Appl. Sci. 2022, 12(1), 369; https://0-doi-org.brum.beds.ac.uk/10.3390/app12010369 - 31 Dec 2021
Cited by 2 | Viewed by 2296
Abstract
Generating natural language descriptions for structured representation (e.g., a graph) is an important yet challenging task. In this work, we focus on SQL-to-text, a task that maps a SQL query into the corresponding natural language question. Previous work represents SQL as a sparse [...] Read more.
Generating natural language descriptions for structured representation (e.g., a graph) is an important yet challenging task. In this work, we focus on SQL-to-text, a task that maps a SQL query into the corresponding natural language question. Previous work represents SQL as a sparse graph and utilizes a graph-to-sequence model to generate questions, where each node can only communicate with k-hop nodes. Such a model will degenerate when adapted to more complex SQL queries due to the inability to capture long-term and the lack of SQL-specific relations. To tackle this problem, we propose a relation-aware graph transformer (RGT) to consider both the SQL structure and various relations simultaneously. Specifically, an abstract SQL syntax tree is constructed for each SQL to provide the underlying relations. We also customized self-attention and cross-attention strategies to encode the relations in the SQL tree. Experiments on benchmarks WikiSQL and Spider demonstrate that our approach yields improvements over strong baselines. Full article
Show Figures

Figure 1

12 pages, 537 KiB  
Article
A Multilayer CARU Framework to Obtain Probability Distribution for Paragraph-Based Sentiment Analysis
by Wei Ke and Ka-Hou Chan
Appl. Sci. 2021, 11(23), 11344; https://0-doi-org.brum.beds.ac.uk/10.3390/app112311344 - 30 Nov 2021
Cited by 12 | Viewed by 1801
Abstract
Paragraph-based datasets are hard to analyze by a simple RNN, because a long sequence always contains lengthy problems of long-term dependencies. In this work, we propose a Multilayer Content-Adaptive Recurrent Unit (CARU) network for paragraph information extraction. In addition, we present a type [...] Read more.
Paragraph-based datasets are hard to analyze by a simple RNN, because a long sequence always contains lengthy problems of long-term dependencies. In this work, we propose a Multilayer Content-Adaptive Recurrent Unit (CARU) network for paragraph information extraction. In addition, we present a type of CNN-based model as an extractor to explore and capture useful features in the hidden state, which represent the content of the entire paragraph. In particular, we introduce the Chebyshev pooling to connect to the end of the CNN-based extractor instead of using the maximum pooling. This can project the features into a probability distribution so as to provide an interpretable evaluation for the final analysis. Experimental results demonstrate the superiority of the proposed approach, being compared to the state-of-the-art models. Full article
Show Figures

Figure 1

15 pages, 275 KiB  
Article
DATLMedQA: A Data Augmentation and Transfer Learning Based Solution for Medical Question Answering
by Shuohua Zhou and Yanping Zhang
Appl. Sci. 2021, 11(23), 11251; https://0-doi-org.brum.beds.ac.uk/10.3390/app112311251 - 26 Nov 2021
Cited by 6 | Viewed by 2781
Abstract
With the outbreak of COVID-19 that has prompted an increased focus on self-care, more and more people hope to obtain disease knowledge from the Internet. In response to this demand, medical question answering and question generation tasks have become an important part of [...] Read more.
With the outbreak of COVID-19 that has prompted an increased focus on self-care, more and more people hope to obtain disease knowledge from the Internet. In response to this demand, medical question answering and question generation tasks have become an important part of natural language processing (NLP). However, there are limited samples of medical questions and answers, and the question generation systems cannot fully meet the needs of non-professionals for medical questions. In this research, we propose a BERT medical pretraining model, using GPT-2 for question augmentation and T5-Small for topic extraction, calculating the cosine similarity of the extracted topic and using XGBoost for prediction. With augmentation using GPT-2, the prediction accuracy of our model outperforms the state-of-the-art (SOTA) model performance. Our experiment results demonstrate the outstanding performance of our model in medical question answering and question generation tasks, and its great potential to solve other biomedical question answering challenges. Full article
Show Figures

Figure 1

23 pages, 8928 KiB  
Article
Efficient Estimate of Low-Frequency Words’ Embeddings Based on the Dictionary: A Case Study on Chinese
by Xianwen Liao, Yongzhong Huang, Changfu Wei, Chenhao Zhang, Yongqing Deng and Ke Yi
Appl. Sci. 2021, 11(22), 11018; https://0-doi-org.brum.beds.ac.uk/10.3390/app112211018 - 21 Nov 2021
Cited by 2 | Viewed by 1415
Abstract
Obtaining high-quality embeddings of out-of-vocabularies (OOVs) and low-frequency words is a challenge in natural language processing (NLP). To efficiently estimate the embeddings of OOVs and low-frequency words, we propose a new method that uses the dictionary to estimate the embeddings of OOVs and [...] Read more.
Obtaining high-quality embeddings of out-of-vocabularies (OOVs) and low-frequency words is a challenge in natural language processing (NLP). To efficiently estimate the embeddings of OOVs and low-frequency words, we propose a new method that uses the dictionary to estimate the embeddings of OOVs and low-frequency words. More specifically, the explanatory note of an entry in dictionaries accurately describes the semantics of the corresponding word. Naturally, we adopt the sentence representation model to extract the semantics of the explanatory note and regard the semantics as the embedding of the corresponding word. We design a new sentence representation model to encode sentences to extract the semantics from the explanatory notes of entries more efficiently. Based on the assumption that the higher quality of word embeddings will lead to better performance, we design an extrinsic experiment to evaluate the quality of low-frequency words’ embeddings. The experimental results show that the embeddings of low-frequency words estimated by our proposed method have higher quality. In addition, both intrinsic and extrinsic experiments show that our proposed sentence representation model can represent the semantics of sentences well. Full article
Show Figures

Figure 1

11 pages, 891 KiB  
Article
Text Classification Model Enhanced by Unlabeled Data for LaTeX Formula
by Hua Cheng, Renjie Yu, Yixin Tang, Yiquan Fang and Tao Cheng
Appl. Sci. 2021, 11(22), 10536; https://0-doi-org.brum.beds.ac.uk/10.3390/app112210536 - 09 Nov 2021
Cited by 3 | Viewed by 2051
Abstract
Generic language models pretrained on large unspecific domains are currently the foundation of NLP. Labeled data are limited in most model training due to the cost of manual annotation, especially in domains including massive Proper Nouns such as mathematics and biology, where it [...] Read more.
Generic language models pretrained on large unspecific domains are currently the foundation of NLP. Labeled data are limited in most model training due to the cost of manual annotation, especially in domains including massive Proper Nouns such as mathematics and biology, where it affects the accuracy and robustness of model prediction. However, directly applying a generic language model on a specific domain does not work well. This paper introduces a BERT-based text classification model enhanced by unlabeled data (UL-BERT) in the LaTeX formula domain. A two-stage Pretraining model based on BERT(TP-BERT) is pretrained by unlabeled data in the LaTeX formula domain. A double-prediction pseudo-labeling (DPP) method is introduced to obtain high confidence pseudo-labels for unlabeled data by self-training. Moreover, a multi-rounds teacher–student model training approach is proposed for UL-BERT model training with few labeled data and more unlabeled data with pseudo-labels. Experiments on the classification of the LaTex formula domain show that the classification accuracies have been significantly improved by UL-BERT where the F1 score has been mostly enhanced by 2.76%, and lower resources are needed in model training. It is concluded that our method may be applicable to other specific domains with enormous unlabeled data and limited labelled data. Full article
Show Figures

Figure 1

10 pages, 531 KiB  
Article
A Neural N-Gram-Based Classifier for Chinese Clinical Named Entity Recognition
by Ching-Sheng Lin, Jung-Sing Jwo and Cheng-Hsiung Lee
Appl. Sci. 2021, 11(18), 8682; https://0-doi-org.brum.beds.ac.uk/10.3390/app11188682 - 17 Sep 2021
Viewed by 1716
Abstract
Clinical Named Entity Recognition (CNER) focuses on locating named entities in electronic medical records (EMRs) and the obtained results play an important role in the development of intelligent biomedical systems. In addition to the research in alphabetic languages, the study of non-alphabetic languages [...] Read more.
Clinical Named Entity Recognition (CNER) focuses on locating named entities in electronic medical records (EMRs) and the obtained results play an important role in the development of intelligent biomedical systems. In addition to the research in alphabetic languages, the study of non-alphabetic languages has attracted considerable attention as well. In this paper, a neural model is proposed to address the extraction of entities from EMRs written in Chinese. To avoid erroneous noise being caused by the Chinese word segmentation, we employ the character embeddings as the only feature without extra resources. In our model, concatenated n-gram character embeddings are used to represent the context semantics. The self-attention mechanism is then applied to model long-range dependencies of embeddings. The concatenation of the new representations obtained by the attention module is taken as the input to bidirectional long short-term memory (BiLSTM), followed by a conditional random field (CRF) layer to extract entities. The empirical study is conducted on the CCKS-2017 Shared Task 2 dataset to evaluate our method and the experimental results show that our model outperforms other approaches. Full article
Show Figures

Figure 1

14 pages, 486 KiB  
Article
Full-Abstract Biomedical Relation Extraction with Keyword-Attentive Domain Knowledge Infusion
by Xian Zhu, Lele Zhang, Jiangnan Du and Zhifeng Xiao
Appl. Sci. 2021, 11(16), 7318; https://0-doi-org.brum.beds.ac.uk/10.3390/app11167318 - 09 Aug 2021
Cited by 5 | Viewed by 2273
Abstract
Relation extraction (RE) is an essential task in natural language processing. Given a context, RE aims to classify an entity-mention pair into a set of pre-defined relations. In the biomedical field, building an efficient and accurate RE system is critical for the construction [...] Read more.
Relation extraction (RE) is an essential task in natural language processing. Given a context, RE aims to classify an entity-mention pair into a set of pre-defined relations. In the biomedical field, building an efficient and accurate RE system is critical for the construction of a domain knowledge base to support upper-level applications. Recent advances have witnessed a focus shift from sentence to document-level RE problems, which are more challenging due to the need for inter- and intra-sentence semantic reasoning. This type of distant dependency is difficult to understand and capture for a learning algorithm. To address the challenge, prior efforts either attempted to improve the cross sentence text representation or infuse domain or local knowledge into the model. Both strategies demonstrated efficacy on various datasets. In this paper, a keyword-attentive knowledge infusion strategy is proposed and integrated into BioBERT. A domain keyword collection mechanism is developed to discover the most relation-suggestive word tokens for bio-entities in a given context. By manipulating the attention masks, the model can be guided to focus on the semantic interaction between bio-entities linked by the keywords. We validated the proposed method on the Biocreative V Chemical Disease Relation dataset with an F1 of 75.6%, outperforming the state-of-the-art by 5.6%. Full article
Show Figures

Figure 1

25 pages, 755 KiB  
Article
Study of Statistical Text Representation Methods for Performance Improvement of a Hierarchical Attention Network
by Adam Wawrzyński and Julian Szymański
Appl. Sci. 2021, 11(13), 6113; https://0-doi-org.brum.beds.ac.uk/10.3390/app11136113 - 30 Jun 2021
Cited by 2 | Viewed by 1706
Abstract
To effectively process textual data, many approaches have been proposed to create text representations. The transformation of a text into a form of numbers that can be computed using computers is crucial for further applications in downstream tasks such as document classification, document [...] Read more.
To effectively process textual data, many approaches have been proposed to create text representations. The transformation of a text into a form of numbers that can be computed using computers is crucial for further applications in downstream tasks such as document classification, document summarization, and so forth. In our work, we study the quality of text representations using statistical methods and compare them to approaches based on neural networks. We describe in detail nine different algorithms used for text representation and then we evaluate five diverse datasets: BBCSport, BBC, Ohsumed, 20Newsgroups, and Reuters. The selected statistical models include Bag of Words (BoW), Term Frequency-Inverse Document Frequency (TFIDF) weighting, Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). For the second group of deep neural networks, Partition-Smooth Inverse Frequency (P-SIF), Doc2Vec-Distributed Bag of Words Paragraph Vector (Doc2Vec-DBoW), Doc2Vec-Memory Model of Paragraph Vectors (Doc2Vec-DM), Hierarchical Attention Network (HAN) and Longformer were selected. The text representation methods were benchmarked in the document classification task and BoW and TFIDF models were used were used as a baseline. Based on the identified weaknesses of the HAN method, an improvement in the form of a Hierarchical Weighted Attention Network (HWAN) was proposed. The incorporation of statistical features into HAN latent representations improves or provides comparable results on four out of five datasets. The article presents how the length of the processed text affects the results of HAN and variants of HWAN models. Full article
Show Figures

Figure 1

13 pages, 1256 KiB  
Article
Financial Context News Sentiment Analysis for the Lithuanian Language
by Rokas Štrimaitis, Pavel Stefanovič, Simona Ramanauskaitė and Asta Slotkienė
Appl. Sci. 2021, 11(10), 4443; https://0-doi-org.brum.beds.ac.uk/10.3390/app11104443 - 13 May 2021
Cited by 22 | Viewed by 2747
Abstract
Financial area analysis is not limited to enterprise performance analysis. It is worth analyzing as wide an area as possible to obtain the full impression of a specific enterprise. News website content is a datum source that expresses the public’s opinion on enterprise [...] Read more.
Financial area analysis is not limited to enterprise performance analysis. It is worth analyzing as wide an area as possible to obtain the full impression of a specific enterprise. News website content is a datum source that expresses the public’s opinion on enterprise operations, status, etc. Therefore, it is worth analyzing the news portal article text. Sentiment analysis in English texts and financial area texts exist, and are accurate, the complexity of Lithuanian language is mostly concentrated on sentiment analysis of comment texts, and does not provide high accuracy. Therefore in this paper, the supervised machine learning model was implemented to assign sentiment analysis on financial context news, gathered from Lithuanian language websites. The analysis was made using three commonly used classification algorithms in the field of sentiment analysis. The hyperparameters optimization using the grid search was performed to discover the best parameters of each classifier. All experimental investigations were made using the newly collected datasets from four Lithuanian news websites. The results of the applied machine learning algorithms show that the highest accuracy is obtained using a non-balanced dataset, via the multinomial Naive Bayes algorithm (71.1%). The other algorithm accuracies were slightly lower: a long short-term memory (71%), and a support vector machine (70.4%). Full article
Show Figures

Figure 1

17 pages, 2824 KiB  
Article
Evaluation of English–Slovak Neural and Statistical Machine Translation
by Lucia Benkova, Dasa Munkova, Ľubomír Benko and Michal Munk
Appl. Sci. 2021, 11(7), 2948; https://0-doi-org.brum.beds.ac.uk/10.3390/app11072948 - 25 Mar 2021
Cited by 10 | Viewed by 2901
Abstract
This study is focused on the comparison of phrase-based statistical machine translation (SMT) systems and neural machine translation (NMT) systems using automatic metrics for translation quality evaluation for the language pair of English and Slovak. As the statistical approach is the predecessor of [...] Read more.
This study is focused on the comparison of phrase-based statistical machine translation (SMT) systems and neural machine translation (NMT) systems using automatic metrics for translation quality evaluation for the language pair of English and Slovak. As the statistical approach is the predecessor of neural machine translation, it was assumed that the neural network approach would generate results with a better quality. An experiment was performed using residuals to compare the scores of automatic metrics of the accuracy (BLEU_n) of the statistical machine translation with those of the neural machine translation. The results showed that the assumption of better neural machine translation quality regardless of the system used was confirmed. There were statistically significant differences between the SMT and NMT in favor of the NMT based on all BLEU_n scores. The neural machine translation achieved a better quality of translation of journalistic texts from English into Slovak, regardless of if it was a system trained on general texts, such as Google Translate, or specific ones, such as the European Commission’s (EC’s) tool, which was trained on a specific-domain. Full article
Show Figures

Figure 1

14 pages, 848 KiB  
Article
Analyzing and Controlling Inter-Head Diversity in Multi-Head Attention
by Hyeongu Yun, Taegwan Kang and Kyomin Jung
Appl. Sci. 2021, 11(4), 1548; https://0-doi-org.brum.beds.ac.uk/10.3390/app11041548 - 08 Feb 2021
Cited by 5 | Viewed by 3075
Abstract
Multi-head attention, a powerful strategy for Transformer, is assumed to utilize information from diverse representation subspaces. However, measuring diversity between heads’ representations or exploiting the diversity has been rarely studied. In this paper, we quantitatively analyze inter-head diversity of multi-head attention by applying [...] Read more.
Multi-head attention, a powerful strategy for Transformer, is assumed to utilize information from diverse representation subspaces. However, measuring diversity between heads’ representations or exploiting the diversity has been rarely studied. In this paper, we quantitatively analyze inter-head diversity of multi-head attention by applying recently developed similarity measures between two deep representations: Singular Vector Canonical Correlation Analysis (SVCCA) and Centered Kernel Alignment (CKA). By doing so, we empirically show that multi-head attention does diversify representation subspaces of each head as the number of heads increases. Based on our analysis, we hypothesize that there exists an optimal inter-head diversity with which a model can achieve better performance. To examine our hypothesis, we deeply inspect three techniques to control the inter-head diversity; (1) Hilbert-Schmidt Independence Criterion regularizer among representation subspaces, (2) Orthogonality regularizer, and (3) Drophead as zero-outing each head randomly in every training step. In our experiments on various machine translation and language modeling tasks, we show that controlling inter-head diversity leads to the best performance among baselines. Full article
Show Figures

Figure 1

Back to TopTop