Sentiment Analysis of COVID-19 Tweets Using Deep Learning and Lexicon-Based Approaches

Ainapure, Bharati Sanjay; Pise, Reshma Nitin; Reddy, Prathiba; Appasani, Bhargav; Srinivasulu, Avireni; Khan, Mohammad S.; Bizon, Nicu

doi:10.3390/su15032573

Open AccessArticle

Sentiment Analysis of COVID-19 Tweets Using Deep Learning and Lexicon-Based Approaches

¹

Department of Computer Engineering, Faculty of Science and Technology, Vishwakarma University, Pune 411056, Maharashtra, India

²

Department of Electronics and Telecommunication Engineering, G. H. Raisoni College of Engineering and Management, Pune 412207, Maharashtra, India

³

School of Electronics Engineering, Kalinga Institute of Industrial Technology, Patia 751024, Bhubaneswar, India

⁴

Department of Electronics & Communication Engineering, Mohan Babu University, Tirupati 517102, Andhra Pradesh, India

⁵

Department of Computer & Information Sciences, East Tennessee State University, Johnson City, TN 37614, USA

⁶

Faculty of Electronics, Communication and Computers, University of Pitesti, 110040 Pitesti, Romania

⁷

ICSI Energy Department, National Research and Development Institute for Cryogenic and Isotopic Technologies, 240050 Ramnicu Valcea, Romania

⁸

Doctoral School, University Politehnica of Bucharest, Splaiul Independentei Street No. 313, 060042 Bucharest, Romania

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(3), 2573; https://0-doi-org.brum.beds.ac.uk/10.3390/su15032573

Submission received: 30 December 2022 / Revised: 24 January 2023 / Accepted: 25 January 2023 / Published: 31 January 2023

(This article belongs to the Special Issue Covid-19 and Its Impact on Environmental, Economic And Social Sustainability)

Download

Browse Figures

Versions Notes

Abstract

:

Social media is a platform where people communicate, share content, and build relationships. Due to the current pandemic, many people are turning to social networks such as Facebook, WhatsApp, Twitter, etc., to express their feelings. In this paper, we analyse the sentiments of Indian citizens about the COVID-19 pandemic and vaccination drive using text messages posted on the Twitter platform. The sentiments were classified using deep learning and lexicon-based techniques. A lexicon-based approach was used to classify the polarity of the tweets using the tools VADER and NRCLex. A recurrent neural network was trained using Bi-LSTM and GRU techniques, achieving 92.70% and 91.24% accuracy on the COVID-19 dataset. Accuracy values of 92.48% and 93.03% were obtained for the vaccination tweets classification with Bi-LSTM and GRU, respectively. The developed models can assist healthcare workers and policymakers to make the right decisions in the upcoming pandemic outbreaks.

Keywords:

deep learning; Bi-LSTM; GRU; tweets; lexicon; sentiment analysis; social network analysis

1. Introduction

Social media is a platform over the Internet where users share their ideas, exchange information, and build relationships. Two-thirds of Internet users access social networks and social sites [1]. The recent COVID-19 pandemic has increased the use of social media. According to statistics published by [1], an increasing number of people used social media platforms to convey their feelings to their friends and family during the pandemic. Another reason for the increase in social media is the decline of the newspaper supply during COVID-19 due to the transmission of the novel coronavirus through the newspaper. Before the lockdown, people used to spend an hour reading the newspapers. This has decreased by 22% during the lockdown. This has resulted in a significant shift in social media usage. Twitter is one online micro-blogging site on which people like spending more time. Therefore, Twitter is an important source of sentiments where users express their opinions about every topic. The users of this channel vary from politicians, celebrities, business representatives, and ordinary people. It is possible to access, visualise and interpret users’ views from different socioeconomic and interest groups on Twitter [2,3]. Hence, analysis of Twitter data during this pandemic has attracted a great number of researchers.

Sentiment analysis or opinion mining (SAOM) is a domain that automates extracting public opinions or thoughts expressed in a written language (text) across social media, blogs, reviews, news, etc. It aims to analyse people’s personal experiences, opinions, emotions, or attitudes towards an entity, such as products, individuals, organisations, services, etc. The term sentiment means one’s personal feeling or experience or “an attitude towards something” or “an opinion” [4]. SAOM is an active field of research and an interdisciplinary area that includes text mining, Natural Language Processing (NLP), and data mining [5]. Sentiment analysis and opinion mining tasks are usually carried out at various levels: word level, sentence level, document level, and aspect level [4,6]. The sentiment extractor automatically classifies the opinions expressed in the tweet as positive, negative, or neutral. Table 1 shows some example tweets and their sentiment polarity level.

In the proposed work, deep learning-based models, Bidirectional long/short-term memory (Bi-LSTM) and Gated Recurrent Unit (GRU) networks and lexicon analysers, National Research Council of Canada Emotion Lexicon (NRCLex) and Valence Aware Dictionary for Sentiment Reasoning (VEDAR), were used to identify the public sentiments related to COVID-19 and vaccination. The work focuses on analysing public emotions based on tweets posted by people from India. The study also quantifies the sentiment with a positive, neutral, or negative value called polarity [7,8]. It provides insights into emotions such as happiness, anger, neutrality, etc., to execute the best policies against upcoming pandemic waves in India. The proposed methodology described in this paper can be useful for Indian government policymakers to take proper managerial decisions by being aware of the public’s emotions towards COVID-19 pandemic and vaccination drives. The managerial implications of the analysis are that the policymakers can:

(i.): Understand the concerns and issues raised by people about current facilities related to COVID-19 pandemic and vaccination drives;
(ii.): Ensure sufficient provisions are made;
(iii.): Understand misunderstandings about vaccination;
(iv.): Take appropriate initiatives to create awareness about the current situation.

A large dataset of around 1,000,000 tweets from India related to COVID-19 and vaccination was used as a case study. The following are the unique contributions of this study:

The main intention of the study is to prove the significance of sentiment analysis using two approaches: lexicon-based and deep learning methods.
The size of tweets datasets used in this study is large compared with previous studies.
The classification accuracy obtained is higher than the results of existing similar works.
The study aims to analyse people’s emotions towards COVID-19 pandemic and vaccination as positive, negative and neutral. Furthermore, the tweets are classified into positive (“joy”, “positive” and “trust” etc.) and negative (“fear”, “sadness” and “negative” etc.) emotion affects.
The policymakers can use these approaches to understand public emotions and make appropriate decisions about future outbreaks and planning resources such as COVID care hospitals, setting up COVID care centres and planning for vaccination drives.

This paper is arranged into six sections. Section 2 presents the literature review. Section 3 explains the proposed methodology. Section 4 and Section 5 explain the implementation of lexicon and deep learning methods and the results. The conclusions of the study are discussed in Section 6.

2. Literature Review

After COVID-19 was declared as a pandemic, many researchers have investigated the sentiment analysis of COVID-19 posts extracted from social media. Different perspectives of sentiment analysis include:

Trend analysis for different time intervals using COVID-19 datasets [9,10,11]
Topic modelling [12,13]
Sentiment analysis on social distancing [14], vaccination [15]
Disease surveillance [16,17].

Twitter is a commonly used social media platform for conveying one’s opinions. The current approaches used for Twitter sentiment analysis fall into four main categories: lexicon-based, machine learning techniques, deep learning, and hybrid methods [18,19].

In the past decade there have been several research studies conducted to analyse sentiments of the tweets with the above approaches. National Research Council (NRC) and VADER models were used to calculate the intensity score of the sentiments for the Canadian tweets about COVID-19. Data were collected from 24 February 2020 to 14 October 2020 from four cities in Canada, i.e., Toronto, Montreal, Vancouver, and Calgary. Time series analysis was carried out on these data. The authors computed sentiment scores for vaccine, mask, and lockdown. The results were compared with the sentiment scores of the tweets posted from four cities in the United States. The analysis showed that sentiment scores vary depending on the location and time, and people were positive about using masks but negative about vaccines and lockdown. This analysis has limitations in interpreting the meaning of negative sentiments [20].

A lexicon-based approach was used to perform topic-based sentiment analysis of the tweets about COVID-19 pandemic. This research focused on two main pandemic periods: 1 March 2020 to 30 April 2020 and 1 September 2020 to 31 October 2020. In the first phase, topic extraction was performed using Latent Dirichlet Allocation (LDA), and in the second phase, lexicon-based sentiment analysis was carried out using VADER. For this purpose, around 600,000 English tweets about COVID-19 were extracted and processed [21]. This work has a few limitations: topic lemmatisation is performed several times to obtain good results, resulting in a slower process. The data from the two periods selected for the study produced contradictory results and required computationally intensive data pre-processing.

Sentiment analysis and topic modelling of tweets on COVID-19 vaccines were performed on the tweets posted from 11 March 2020 (when COVID-19 became a pandemic worldwide) to 31 January 2021. The tweets were extracted from Georgia State University’s Panacea Lab dataset. Emotions expressed by tweets were analysed using NRCLex. The result of the sentiment analysis showed that people’s sentiments towards COVID-19 vaccination became more positive over time. The maximum sentiment score was observed during early November 2020, when the Pfizer vaccine was reported to be highly effective. In emotion analysis, it was found that trust emotion reached highest score with 22.78% around the same time, and also, the percentage of tweets with fear decreased after that period [22].

Lexicon and VADER methods were used to analyse the sentiment of tweets on COVID-19 from six countries (France, Italy, USA, India, UK, and Spain). The tweets were collected from 15 March to 15 April 2020 from Twitter. Tweets were classified into Negative, Neutral, or Positive sentiment classes. Both approaches showed negativity about the pandemic from all countries. The lexicon-based approach indicated that the UK has the highest negative sentiment score of 23.03%, followed by France (22.71%), USA (22.01%), and India (18.39%). In the VADER-based approach, the results were: 35.92% in France, 35.68% in UK, and 35.38% in USA, while India has a minimum score of 31.03% [23].

In another study, a lexicon-based sentiment analysis approach was used to analyse the sentiments of public towards COVID-19 pandemic during initial phase of vaccination from Ohio and Michigan. NRCLex and VADER libraries from Python were used to calculate the sentiment scores of tweets. Tweets were classified into four sentiment categories: positive, negative, neutral and compound. The results revealed that tweets from Ohio state exhibited more negative feelings associated with the emotions of “fear” and “sadness” compared to the tweets from Michigan state [24].

The machine learning approaches for sentiment analysis are mainly based on supervised learning and ensemble techniques. In the supervised learning technique, a dataset of labelled instances of tweets is input to train a machine learning model using classification algorithms such as Support Vector Machine (SVM), Bayesian classifier, and Entropy classifier [25,26,27,28,29] to classify the tweets into sentiment categories such as positive, neutral, and negative. The trained model is used to predict the sentiment of new tweets. The main disadvantage of the machine learning approach is that we need to generate a large labelled training dataset as the model’s performance is dependent on the dataset [30,31]. It is difficult to obtain a correctly labelled dataset of adequate size. Features such as Part-of-Speech, Hashtag, negation, Term frequency, Term Presence, and n-gram are used to extract the semantic orientation of the text. Some researchers have tried ensemble strategies for sentiment analysis. In ensemble frameworks, a model is built by combining several base classifiers, e.g., Naïve Bayes and an SVM classifier [32]. Furthermore, sarcasm in the word is identified using a multitask learning framework based on deep learning methods and OntoSenticNet [33,34].

COVID-19 vaccination tweet analysis was carried out using Naïve bays and decision tree algorithms. Natural Language Toolkit (NLTK) library, available in Python, is used to tokenise the tweets. Sentiments were classified into three classes: neutral, positive and negative. In the decision tree algorithm, unigram and bigram methods are applied to achieve an accuracy of 96% in prediction. However, this research has considered only vaccination tweets [35].

In recent years, deep learning approaches such as Bidirectional Encoder Representations from Transformers (BERT), Long-Short Term Memory (LSTM) and Bidirectional Emotional Recurrent Unit (BiERU) [36] approaches have been successfully employed for NLP tasks. The BERT model was built to analyse the sentiments of COVID-19 Tweets from India. The tweets were collected during the lockdown period from 23 March 2020 to 15 July 2020. Along with BERT, three other models, SVM, Logistic Regression, and LSTM, were built. It was observed that the BERT model obtained a maximum precision of 89%. The study revealed a high incidence among keywords and their related terms in the tweets. Emotions extracted in the study cannot be applied globally, as it is based on tweets from a single country [37].

A deep learning model based on LSTM was built to predict the public’s feelings towards the pandemic. The dataset consisted of tweets collected from nine states in the United States. The Python library, TextBlog was used to classify three emotions: negative, positive, and neutral. The results indicated that most of the sentiments were neutral. Along with LSTM, another machine learning algorithm, SVM, was used to predict emotions. The purpose of the LSTM model is unclear from the study [38]. Using Bangladesh tweets, a Bi-LSTM model was trained to analyse public emotions related to COVID-19 vaccines and vaccination campaigns. The model consisted of two Bi-LSTM layers, with the first layer of 100 units and a second layer of 32 units to train. The drawback of this model is that it predicts only two classes of emotions, i.e., positive and negative, and there was no provision to handle the ambiguous tweets [39].

The ensemble deep learning model was proposed to analyse COVID-19 sentiments in real time for Indian and European tweets. The dataset contained 3100 tweets from India and Europe, collected from 23 March 2020 to 1 of November2021. The model was built using five steps: data collection, pre-processing, feature extraction, exploratory analysis and prediction. Using the ensemble classifiers GRU and Capsule Neural Network (CapsNet), tweets were classified into four sentiment categories: joy, sadness, anger and fear. The model achieved a classification accuracy of 97.28% and 95.20% for Indian and European datasets. This model training required high computational complexity [40].

The deep learning technique CNN-LSTM was used to analyse sentiments related to the COVID-19 vaccine. In this study, a total of 803,278 Persian tweets were collected for the period between 1 April 2021 and 30 September 2021. This method classified tweets into three categories: negative, positive and neutral. During the analysis, it was observed that: (1) there were more negative tweets on national and international vaccines in the initial stage (2) There were notable variations in positive and negative sentiments towards vaccination among Iranian people. The limitation of this method is that the duration of tweets collection was very short, which might lead to misinterpretation of the analysis [41].

Hybrid techniques combine lexicon-based and machine-learning and methods to take advantage of both methods [12]. A deep learning method was used to extract the sentiments from Korean tweets about COVID-19 vaccines. The tweets were collected from 23 February 2021 to 22 March 2021. As the tweets were in Korean language, the KNU, Korean Sentiment Lexicon dictionary, was used to extract the topics. Eight topics were mined using the LDA model, and the sentiment score was computed using Bi-LSTM. The Bi-LSTM was applied to calculate the sentiment polarity of each topic selected for the study. The analysis revealed a rise in negative tweets after the surge in COVID-19 confirmed cases as the vaccination process was limited to only healthcare workers. The outcomes of this analysis cannot be generalised to reflect the entire Korean population as very few tweets were used in this study. Most of the tweets collected were posted using indirect experiences. This study could be repeated after the vaccination, which would have changed sentiment scores. Therefore, further in-depth future work is required [42].

Neurosymbolic based on Artificial Intelligence (AI) is another recent approach for sentiment analysis. The technique involves the extraction of polarity from the text in an explainable manner using rules and symbols. This method can be applied to enhance the performance of machine learning or deep learning models built to perform sentiment analysis [43,44,45]. However, none of the studies has used this approach for COVID-19 tweet analysis [46]. The limitations of the previous studies are:

i.: The datasets used were of limited numbers of tweets. These studies analysed public sentiment about either COVID-19 or vaccination.
ii.: The researchers applied only a single sentiment computation technique, i.e., either lexicon-based or machine learning, to compute the sentiment.

Considering existing studies’ limitations, the proposed work analysed tweets on both COVID-19 and vaccination. The size of the datasets included is large compared with the existing studies. The sentiment analysis is performed using both lexicon and deep learning approaches. This helps evaluate the most effective model to predict public opinion about COVID-19 and vaccination.

3. Proposed Methodology with Relevant Case Studies

During the COVID-19 pandemic, many people tweeted about the disease and its vaccine. This study tries to understand people’s reactions to India’s COVID-19 pandemic and the COVID vaccines. A tweet can contain text content of a maximum of 280 characters in length, images, hashtags, and videos. The scope of the tweets is public. i.e., tweets sent by the users are visible publicly, but the sender can restrict the delivery of the message by making it available only to his/her followers. According to statista.com, Twitter has 237.8 million active users daily [47]. On Twitter, hashtags are very important and popular. Hashtags are words preceded by the symbol “#”; this word combines keywords, excluding punctuations and white spaces. The hashtags help users to find similar interest groups to follow. In this study, hashtags related to COVID-19 and vaccination were extracted to analyse the data. The proposed methodology for sentiment analysis of tweets (COVID and vaccines) is shown in Figure 1.

3.1. Data Collection and Preprocessing

In this study, two datasets containing tweets from India were used. The first dataset includes around 80,000 tweets about COVID-19, posted over three months from March 2020 to May 2020 [48]. Another dataset [49] contains 218,791 tweets about vaccination. The proposed methodology tries to analyse sentiments and emotional effects of people towards the pandemic and vaccine using these datasets. Table 2 and Table 3 show an overview of the dataset for COVID-19 and vaccination, respectively.

Data collected from Twitter cannot be used directly to prepare models. The text of a tweet is noisy, with typos and grammatical mistakes. In addition, the presence of acronyms (e.g., lol, gr8), sarcasm, slang words, emoticons ( :), :( ), URLs, extra spaces, lemmatisation, stemming, and reserved words make it problematic to interpret the sentiment [4]. Therefore, data cleaning becomes very important. Data cleaning is the process of extracting useful information. It excludes the noise in the data and selects high-quality data. The Python inbuilt regular expression module “re” was used to pre-process the raw data. The module has a set of functions for pattern matching with regular expression. The URLs, white spaces, emoticons, punctuations and stop words were removed from COVID-19 and vaccination tweets using common methods available in “re” module.

4. Sentiment’s Analysis Using Lexicon Based Approaches

The lexicon-based approach was used to visualise and analyse the emotions for the same datasets. Lexicon-based strategies are very effective and straightforward methods. They depend on a sentiment dictionary, i.e., a lexicon with a predefined list of lexical features (e.g., words, phrases etc.) where each word is labelled with a semantic orientation as either positive or negative or neutral. Semantic orientation (SO) refers to the polarity and intensity (the extent to which the document, sentence or word is positive or negative) of words, phrases or texts. In many lexicon-based studies, adjectives have been used as lexical features to determine the text’s semantic orientation [50]. Researchers have also used adverbs, verbs and nouns as features, and a list of adjectives and their SO values are collected into a lexicon. Dictionaries can be created manually by language experts or by automatic expansion from an initial list of seed sentiment words.

Using lexicon-based methods, the semantic orientation of a document can be calculated based on the semantic orientation of words and phrases [51]. The input text is pre-processed and represented as a bag of words. These words’ sentiment values (positive or negative) are extracted by matching the words with the dictionary. An aggregation function, such as sum or average, applies to individual SO scores to predict the overall sentiment of the text. Along with sentiment value, the local context of a word is usually considered, such as intensity measure and negation [52]. Figure 2 shows the general process of lexicon-based sentiment analysis of tweets, and Figure 3 presents the process with an example sentence.

Lexicon-based strategies are simple, efficient and easy to understand. They do not require sufficient labelled training data to attain good classification performance. They can consider negation (not attractive) and intensification (e.g., very interesting). This makes a lexicon-based model superior to a classifier model. Examples of some of the popular lexicons used for detecting the sentiment of the given text are SentiWordNet, bing, AFINN, NRCLex, VADER and MPQA.

4.1. Sentiment Analysis with VADER

VADER is a lexicon and rule-based sentiment analysis tool that is precisely attuned to the emotions conveyed on social media. The lexicon incorporates 7500 lexical features with validated valence scores that indicate the word’s polarities (negative and positive) and intensity. Lexical features are labelled with sentiment scores on a scale from “-4: Extremely Negative” to “+4: Extremely Positive” and “0: Neutral (or Neither, N/A)” [53]. The lexicon contains numerous lexical properties common to the sentiments expressed in microblogs:

i.: A complete list of Western-style emoticons (for example, “:-(”)
ii.: Sentiment-related abbreviations (for example—LOL and ROFL)
iii.: Frequently used slangs with sentiment value (for example, nah and meh)

Table 4 lists few of the words with their polarity and valence score as given in the VADER Lexicon [54].

VADER sentiment analysis depends on the dictionary approach to map lexical features to sentiment intensity scores. Lexical features such as emoticons “:-)”, acronyms such as “OMG” and commonly used informal words (slang) such as “Flex” are also mapped to sentiment scores. The sentiment score of the sentence is increased or decreased proportionately. VADER developers have incorporated simple heuristic rules to consider the effect of intensifiers (extremely, very, slightly), punctuation (!), and capitalization (AMAZING), which also affect the overall sentiment of the sentence.

VADER applies another heuristic to resolve the semantic ambiguity introduced by conjunction words such as but, though, whereas, yet etc., in a sentence. The conjunctions join two clauses with conflicting opinions and affect the sentence’s sentiment. For example, “I got COVID but symptoms are mild”. The second part of the text dominates the overall sentiment of the sentence. In VADER, the sentiment terms after “but” are assigned higher valence scores than the terms before the conjunction word to determine the compound polarity score of the sentence.

The compound score of the text is calculated by adding up the valence scores of each term, adjusted according to the rules. The final score is scaled to map to a value between +1 (most extreme positive) and −1 (most extreme negative).

The polarity (pos, neu, neg) and compound sentiment scores of some positive and negative tweets were computed using VADER for the COVID-19 tweets dataset. Table 5 shows the results obtained. The polarity and the sentiment scores of the tweets are computed using the sentiment analyser class of the VADER tool in the Python module. The bar graph in Figure 4 shows the count of overall positive (21,907), negative (13,755), and neutral (8517) tweets in the dataset. It can be observed that the count of positive tweets exceeds the negative and neutral tweets. Figure 5 shows the overall positive, negative, and neutral tweets count for the three cities of India: Pune, Delhi, and Mumbai.

4.2. Emotion Effects of Vaccine Tweets with NRCLex

NRCLex is an MIT-approved PyPI project by Mark M. Bailey. It is used to predict the sentiments and emotions of an input text. The dictionary contains around 27,000 words and is based on NLTK library’s WordNet synonym and NRC Canada affect lexicon [55]. The lexicon comprises a list of English words and their mapping with eight elementary sentiments (joy, anger, surprise, fear, sadness, anticipation, trust, and disgust) and two polarities (positive and negative). NRCLex module is imported in Python code to classify the emotions of tweets from the vaccine dataset.

The graph in Figure 6 shows the result of vaccine tweets classified into different emotional effects. The tweets dataset was extracted from the Kaggle platform [56]. The graph reveals that the positive sentiments (‘positive’, ‘joy’, ‘trust’) score higher than the negative emotions (‘negative’, ‘disgust’, ‘anger’). The result indicates that people are positive about vaccination. Table 6 shows the distribution of the number of tweets into various emotion categories.

5. Sentiment Analysis Using Deep Learning Approach

Deep Learning is a machine learning technique inspired by the structure and functioning of the human brain. A deep neural network consists of two or more layers of computing units called neurons working in parallel to simulate the workings of the human brain [57,58]. Deep learning models have led to revolutions in text mining, computer vision, and speech recognition [59].

A Recurrent Neural Network (RNN) is a variation of deep neural learning in which time-series or sequential data are used to train the network. An RNN is used to train the model in the proposed work. Here, RNN is used to identify the sentiments in the tweets. It differs from traditional deep neural networks because it has “memory”. In RNN, previous information in the sequence influences current input and output. For example, to predict the next term in a sentence, it is necessary to remember the previous terms. The hidden layer plays a very important role in RNN as it remembers the states of previous information. Figure 7 shows the general architecture of RNN. The same weight and bias are assigned across all the hidden layers in the RNN to reduce the complexity of the network. This makes the RNN remember the previous outputs that are fed as input to the next layer. The diagram shows that all the intermediate layers are combined to form a single recurrent unit. X is the input layer, h is the hidden layer, Y is the output layer. The parameters X, Y and Z are used to improve the network performance. Equation (1) is applied to compute the current state.

h_{t} = f (h_{t - 1}, x_{t})

(1)

where

x_{t} is the input state, h_{t}

is the current state and

h_{t - 1}

is the previous state.

There are two problems with RNN: 1. Vanishing gradient and 2. Exploding gradient. A vanishing gradient problem arises in the network when there is a small value of gradient present, causing an insignificant parameter update. In exploding gradient problems, the slope tends to grow exponentially due to the accumulation of large gradient errors, which causes large updates to the model. Learning a long data sequence for the model will become difficult due to these problems. As a result, the model will experience low accuracy, more learning time, and poor performance. These problems can be handled using two variants of RNN: LSTM [60] and GRU [61].

The general architecture of LSTM is shown in Figure 8. The network comprises three gates: an input gate, an output gate, and a forget gate. The input gate uses the Tanh or sigmoid functions to add information to the cell using the following equations:

f (x) = \tanh (x) = \frac{2}{1 + e^{- 2 x}} - 1

(2)

\tan h (x) = 2 s i g m o i d (2 x) - 1

(3)

where x is the input value.

Selecting the correct information from the current cell and showing the output is the responsibility of the output gate. This is again done with the help of Tanh and sigmoid. The forget gate removes the less important information from the cell. The forget gate takes two inputs: h_t−1 and x_t. The information from the previous output is 1, and the forget gate transmits the data into the network. If the value is 0, then data are not transmitted to the network. All three gates control the LSTM memory using the following equations:

Γ_{u} = σ (W_{u} [h^{〈t - 1〉}, X^{〈t〉}]) + b_{u} - Update gate

(4)

Γ_{f} = σ (W_{f} [h^{〈t - 1〉}, X^{〈t〉}]) + b_{f} - Forget gate

(5)

Γ_{o} = σ (W_{o} [h^{〈t - 1〉}, X^{〈t〉}]) + b_{o} - Output gate

(6)

where W is weight, matrix, b is weight vector, X is the input value, Γ_u is the update gate control, Γ_f is the forget gate control, and Γ_o is the output gate control.

{\tilde{c}}^{〈t〉} = \tan h (W_{c c} h^{〈t - 1〉} + W_{c x} X^{〈t〉}) + b_{c} - New memory cell

(7)

c^{〈t〉} = Γ_{u} ⊙ {\tilde{c}}^{〈t〉} + Γ_{u} ⊙ c^{〈t - 1〉}

(8)

h^{〈t〉} = {Γ_{o}}^{〈t〉} ⊙ \tan h (c^{〈t〉})

(9)

where Equations (8)–(10) are for the new memory cell, memory cell, and hidden cell.

It has already been proven that traditional LSTM models perform well for textual sentiment analysis [62,63]. Traditional LSTM networks are better at handling long-term dependencies as they learn from the past. However, Bi-LSTM networks learn from both past and future data. These networks preserve the past data using forward pass and backward data using backward pass. Bi-LSTM networks are better at learning, suitable for complicated data and have better accuracy [64]. Therefore, Bi-LSTM and GRU networks have been used for emotion prediction on Indian tweets collected from Twitter from March 2020 to May 2020. Figure 9 shows the architecture of the proposed model using a deep network approach.

The tweets were cleaned and pre-processed, as explained in Section 3.1, to create a new dataset of tweets along with their corresponding sentiment labels, i.e., positive (1), negative (0), and neutral (2). For the same, Keras Text Vectorization layer is used. The layer converts a string into an encoded representation and is read as input using the Embedding layer. Then, the dataset was split into two parts to building the model. The model was trained using 80% of the dataset, and the remaining 20% was used for testing.

5.1. Model Evaluation

The model calibration was performed by adjusting certain model parameters such as the number of epochs, loss function, training algorithm, batch size and learning rate etc. The CNNs were trained using the Adam optimizer algorithm and a learning rate of 0.001. Adam is an effective optimisation algorithm used in deep learning. The loss function was set to sparse categorical cross-entropy loss, which is an ideal choice for multi-class classification problems. The model was trained using a different combination of batch sizes and the number of epochs. It was observed that the model’s performance was optimal for a batch size of 64. Therefore, the batch size was set to 64.

The model performance is measured using standard metrics such as accuracy, F1-score, specificity sensitivity and recall. For evaluation, we have computed four important values from the predictions: True positives (TP) are the correctly predicted tweets. True negatives (TN) are correctly predicted negative tweets, False Positive (FP) are incorrectly predicted correct tweets, and False Negatives (FN) are positive tweets that are incorrectly predicted to be negative.

Accuracy defines the number of class labels identified correctly by the model and is computed using the following equation.

A c c u r a c y = (T N + T P) / (T N + T P + F P + F N)

(10)

Specificity is the metric used to monitor true negative predictions made by the model. It is the relationship between the true negative predictions given by the model and the total number of negative assessments.

S p e c i f i c i t y = T N / (F P + T N)

(11)

The sensitivity of the model measures the true positive predictions. The metric presents the ratio of true positive assessments to the number of correct negative and false positive assessments.

S e n s i t i v i t y = T P / (F N + T P)

(12)

The F1-Score is used to find the harmonic mean of precision and recall. It maintains the balance between precision and recall. The F1-Score will reach to maximum when precision becomes equal to recall.

F 1 - S c o r e = 2 \frac{R e c a l l * P r e c i s i o n}{P r e c i s i o n + R e c a l l}

(13)

5.2. Results and Discussion

To carry out the experiment, the Keras library and TensorFlow framework were used on Google Colab to create Bi-LSTM and GRU networks. TensorFlow is an open-source library for deep learning applications and Keras includes high-level deep learning APIs running on top of TensorFlow [65]. Google Colab is a cloud-based framework, which provides free access to machine learning tasks [66]. In this study, Bi-LSTM was trained using seven layers and five epochs. The loss and Adam optimiser parameters along with two types of activation functions, are used. A set of dense layers with 128, and 64 units use Rectified Linear Unit (ReLu) activation function, and the final and outermost dense layer, which has three units, uses the softmax activation function. The model is trained using 64 batches and four verbose. The process is repeated for both datasets, i.e., COVID-19 and vaccination. The same setup was used for the GRU network and found an increase in performance accuracy. Table 7 and Table 8 describes the number of epochs, LSTM networks, training accuracy, training loss, validation accuracy and validation loss for the COVID-19 dataset. Table 9 and Table 10 show performance measurements of GRU and Bi-LSTM models for COVID-19 and vaccination datasets.

The model accuracy and loss during training and validation with GRU and Bi-LSTM networks are shown in Figure 10 and Figure 11 for the vaccination dataset. Using the GRU network, training loss has decreased with an increase in the number of epochs, but after the 5th epoch, the loss increased, and accuracy decreased. With the Bi-LSTM model, training accuracy and loss have increased after the third epoch.

Models are tested with testing data and achieved final accuracy of 92.59% and 91.24%, with GRU and Bi-LSTM, respectively, for the COVID-19 dataset. With the vaccination dataset, 92.70% accuracy was achieved with Bi-LSTM and 92.46% with GRU. Figure 12 depicts the confusion matrix for predicted v/s actual sentiments for the GRU network into three categories: positive, negative and neutral. Figure 13 depicts the confusion matrix for the Bi-LSTM network.

Table 11 and Table 12 show the comparative analysis of the proposed methodology (lexicon-based and deep learning based) with similar other works.

The results in the Table 10 and Table 11 clearly indicate the superior performance of the proposed method over the existing techniques. The proposed deep learning models are more accurate in analysing the tweets for their polarity, in terms of positive, negative and neutral. Furthermore, the proposed study considers both the COVID-19 Tweets and the vaccination tweets for analysis.

6. Conclusions

This paper has presented the lexicon and deep learning-based approaches for sentiment analysis of tweets. The proposed work aimed to understand people’s feelings about the COVID-19 pandemic and vaccines based on the messages posted on Twitter in English, and finally to discover their concerns relative to this topic. The sentiment analysis was performed with (i) lexicon-based techniques using the tools: VADER and NRCLex, and (ii) deep learning methods such as Bi-LSTM and GRU.

The tweets were classified into positive, negative, and neutral categories. Further sentiment scores and different emotional effects of vaccination tweets were calculated. Based on the results, it can be concluded that most of the vaccination dataset’s tweets were positive polarity. With the Bi-LSTM approach, the classification accuracy achieved was 92.7%, and with GRU, the accuracy was 91.24% for COVID-19 tweets.

For vaccination tweets, the accuracy obtained was 92.48% with Bi-LSTM and 93.03% with GRU model. It can be concluded that the application of these sentiment analysis techniques can prove to be powerful tool to extract, identify and analyse people’s perceptions about disease and vaccination during the pandemic period. The results of this analysis can be helpful to health sectors and government organisations to understand the public’s concerns about the disease and take necessary actions. Further, the work can be extended to analyse the sentiments of people from different countries and their sentiments related to the new booster vaccine. In future, this study aims to include neurosymbolic AI techniques for better interpretation of sentiment analysis results.

Author Contributions

Conceptualisation, B.S.A. and R.N.P.; methodology, B.S.A. and R.N.P.; software, B.S.A., R.N.P., P.R. and B.A.; validation, P.R., A.S. and B.A.; investigation, M.S.K. and B.A.; resources, B.S.A. and R.N.P.; data curation, B.S.A., A.S. and R.N.P.; writing—original draft preparation, B.S.A., B.A. and R.N.P.; supervision: N.B.; project administration, B.S.A. and B.A.; formal analysis, N.B.; funding acquisition, N.B.; visualisation, A.S. and M.S.K.; writing—review and editing, N.B.; Figure and table, B.S.A. and R.N.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Social Media Landscape, Demographics and Digital Ad Spend in India. Available online: https://sannams4.com/digital-and-social-media-landscape-in-india/ (accessed on 28 June 2021).
Kamyab, M.; Tao, R.; Mohammadi, M.H.; Rasool, A. Sentiment analysis on Twitter: A text mining approach to the Afghanistan status reviews. ACM Int. Conf. Proc. Ser. 2018, 9, 14–19. [Google Scholar] [CrossRef]
Pise, R.; Ainapure, B. Designing User Interfaces with a Data Science Approach; IGI Global: Hershey, PA, USA, 2022; Volume i, p. 325. ISBN1 9781799891239. ISBN2 1799891232. [Google Scholar]
Farhadloo, M.; Rolland, E. Fundamentals of sentiment analysis and its applications. In Studies in Computational Intelligence; Springer: Berlin/Heidelberg, Germany, 2016; Volume 639, pp. 1–24. [Google Scholar]
Cambria, E.; Poria, S.; Gelbukh, A.; Nacional, I.P.; Thelwall, M. Sentiment Analysis Is a Big Suitcase. Ieee Intell. Syst. 2017, 32, 102–107. [Google Scholar] [CrossRef]
Liang, B.; Su, H.; Gui, L.; Cambria, E.; Xu, R. Aspect-based sentiment analysis via affective knowledge enhanced graph convolutional networks. Knowl.-Based Syst. 2022, 235, 107643. [Google Scholar] [CrossRef]
Pang, B.; Lee, L. Openion mining and sentiment analysis. Found. Trends Inf. Retr. 2008, 2. [Google Scholar] [CrossRef] [Green Version]
Cambria, E. Affective Computing and Sentiment Analysis. IEEE Intell. Syst. 2016, 31, 102–107. [Google Scholar] [CrossRef]
Ahmed, M.S.; Aurpa, T.T.; Anwar, M.M. Detecting sentiment dynamics and clusters of Twitter users for trending topics in COVID-19 pandemic. PLoS ONE 2021, 16, e0253300. [Google Scholar] [CrossRef]
Chakraborty, A.K.; Das, S.; Kolya, A.K. Sentiment Analysis of Covid-19 Tweets Using Evolutionary Classification-Based LSTM Model. Adv. Intell. Syst. Comput. 2021, 1355, 75–86. [Google Scholar] [CrossRef]
Storey, V.C.; O’leary, D.E. Text Analysis of Evolving Emotions and Sentiments in COVID-19 Twitter Communication. Cognit. Comput. 2022, 1, 3. [Google Scholar] [CrossRef] [PubMed]
Abd-Alrazaq, A.; Alhuwail, D.; Househ, M.; Hai, M.; Shah, Z. Top Concerns of Tweeters during the COVID-19 Pandemic: Infoveillance Study. J. Med. Internet Res. 2020, 22, e19016. [Google Scholar] [CrossRef] [Green Version]
Chandrasekaran, R.; Mehta, V.; Valkunde, T.; Moustakas, E. Topics, Trends, and Sentiments of Tweets about the COVID-19 Pandemic: Temporal Infoveillance Study. J. Med. Internet Res. 2020, 22, e22624. [Google Scholar] [CrossRef]
Shofiya, C.; Abidi, S. Sentiment analysis on covid-19-related social distancing in Canada using twitter data. Int. J. Environ. Res. Public Health 2021, 18, 5993. [Google Scholar] [CrossRef] [PubMed]
Aygun, I.; Kaya, B.; Kaya, M. Aspect Based Twitter Sentiment Analysis on Vaccination and Vaccine Types in COVID-19 Pandemic with Deep Learning. IEEE J. Biomed. Health Inform. 2021, 26, 2360–2369. [Google Scholar] [CrossRef] [PubMed]
Tsai, M.H.; Wang, Y. Analyzing twitter data to evaluate people’s attitudes towards public health policies and events in the era of COVID-19. Int. J. Environ. Res. Public Health 2021, 18, 6272. [Google Scholar] [CrossRef] [PubMed]
Crocamo, C.; Viviani, M.; Famiglini, L.; Bartoli, F.; Pasi, G.; Carrà, G. Surveilling COVID-19 Emotional Contagion on Twitter by Sentiment Analysis. Eur. Psychiatry 2021, 64, E17. [Google Scholar] [CrossRef] [PubMed]
Le, B.; Nguyen, H. Twitter Sentiment Analysis Using Machine Learning Techniques. In Advances in Intelligent Systems and Computing; Le Thi, H., Nguyen, N., Do, T., Eds.; Advanced Computational Methods for Knowledge Engineering; Springer: Berlin/Heidelberg, Germany, 2015; Volume 358. [Google Scholar]
Birjali, M.; Kasri, M.; Beni-hssane, A. A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowl.-Based Syst. 2021, 226, 107134. [Google Scholar] [CrossRef]
Zhang, Q.; Yi, G.Y.; Chen, L.-P.; He, W. Text mining and sentiment analysis of COVID-19 tweets. arXiv 2021, arXiv:2106.15354. [Google Scholar]
Abdulaziz, M.; Alotaibi, A.; Alsolamy, M.; Alabbas, A. Topic based Sentiment Analysis for COVID-19 Tweets. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 626–636. [Google Scholar] [CrossRef]
Lyu, J.C.; Han, E.L.; Luli, G.K. COVID-19 Vaccine–Related Discussion on Twitter: Topic Modeling and Sentiment Analysis. J. Med. Internet Res. 2021, 23, e24435. [Google Scholar] [CrossRef]
Hota, H.S.; Sharma, D.K.; Verma, N. Lexicon-based sentiment analysis using Twitter data: A case of COVID-19 outbreak in India and abroad. Data Sci. COVID-19 2021, 68, 1–12. [Google Scholar]
Jabalameli, S.; Xu, Y.; Shetty, S. Spatial and sentiment analysis of public opinion toward COVID-19 pandemic using twitter data: At the early stage of vaccination. Int. J. Disaster Risk Reduct. 2022, 80, 1–17. [Google Scholar] [CrossRef]
Pang, B.; Lee, L.; Vaithyanathan, S. Thumbs up? arXiv 2002, arXiv:0205070. [Google Scholar]
Patil, R.; Tamane, S. The Importance of Data Cleaning: Three Visualization Examples. Int. J. Electr. Comput. Eng. 2018, 8, 3966–3975. [Google Scholar] [CrossRef] [Green Version]
Alsaeedi, A.; Khan, M.Z. A Study on Sentiment Analysis Techniques of Twitter Data. IJACSA Int. J. Adv. Comput. Sci. Appl. 2019, 10. [Google Scholar] [CrossRef]
Shah, S.; Kumar, K.; Sarvananguru, R.K. Sentimental Analysis of Twitter Data using Classifier Algorithms. Int. J. Electr. Comput. Eng. 2016, 6, 357. [Google Scholar] [CrossRef]
Bian, J.; Topaloglu, U.; Yu, F. Towards Large-scale Twitter Mining for Drug-related Adverse Events. In Proceedings of the SHB’12: Proceedings of the 2012 International Workshop on Smart Health and Wellbeing, Maui, HI, USA, 29 October 2012; Volume 2019, p. 25. [Google Scholar] [CrossRef] [Green Version]
Ainapure, B.S.; Pise, R.; Wagh, A.A.; Tejnani, J. Prognosis of COVID-19 Patients with Machine Learning Techniques. Ann. Rom. Soc. Cell Biol. 2021, 25, 20183–20200. [Google Scholar]
Pise, R.; Patil, K.; Pise, N. Automatic Classification of Mosquito Genera Using Transfer Learning. J. Theor. Appl. Inf. Technol. 2022, 100, 1929–1940. [Google Scholar]
Ankit; Saleena, N. An Ensemble Classification System for Twitter Sentiment Analysis. Procedia Comput. Sci. 2018, 132, 937–946. [Google Scholar] [CrossRef]
Dragoni, M.; Poria, S.; Cambria, E. OntoSenticNet: A Commonsense Ontology for Sentiment Analysis. IEEE Intell. Syst. 2018, 33, 77–85. [Google Scholar] [CrossRef]
Majumder, N.; Poria, S.; Peng, H.; Chhaya, N.; Cambria, E.; Gelbukh, A. Sentiment and Sarcasm Classification with Multitask Learning. IEEE Intell. Syst. 2019, 34, 38–43. [Google Scholar] [CrossRef] [Green Version]
Chinnasamy, P.; Suresh, V.; Ramprathap, K.; Jebamani, B.J.A.; Rao, K.S.; Kranthi, M.S. COVID-19 vaccine sentiment analysis using public opinions on Twitter. Mater. Today Proc. 2020, 64, 448–451. [Google Scholar] [CrossRef]
Li, W.; Shao, W.; Ji, S.; Cambria, E.; Member, S. BiERU: Bidirectional Emotional Recurrent Unit for Conversational Sentiment Analysis. Neurocomputing 2022, 476, 73–82. [Google Scholar] [CrossRef]
Chintalapudi, N.; Battineni, G.; Amenta, F. Sentimental analysis of COVID-19 tweets using deep learning models. Infect. Dis. Rep. 2021, 13, 329–339. [Google Scholar] [CrossRef]
Yeasmin, N.; Mahbub, N.I.; Baowaly, M.K.; Singh, B.C.; Alom, Z.; Aung, Z.; Azim, M.A. Analysis and Prediction of User Sentiment on COVID-19 Pandemic Using Tweets. Big Data Cogn. Comput. 2022, 6, 65. [Google Scholar] [CrossRef]
Zulfiker, M.S.; Kabir, N.; Biswas, A.A.; Zulfiker, S.; Uddin, M.S. Analyzing the public sentiment on COVID-19 vaccination in social media: Bangladesh context. Array 2022, 15, 100204. [Google Scholar] [CrossRef] [PubMed]
Sunitha, D.; Patra, R.K.; Babu, N.V.; Suresh, A.; Gupta, S.C. Twitter sentiment analysis using ensemble based deep learning model towards COVID-19 in India and European countries. Pattern Recognit. Lett. 2022, 158, 164–170. [Google Scholar] [CrossRef]
Bokaee Nezhad, Z.; Deihimi, M.A. Twitter sentiment analysis from Iran about COVID 19 vaccine. Diabetes Metab. Syndr. Clin. Res. Rev. 2022, 16, 102367. [Google Scholar] [CrossRef]
Shim, J.G.; Ryu, K.H.; Lee, S.H.; Cho, E.A.; Lee, Y.J.; Ahn, J.H. Text mining approaches to analyze public sentiment changes regarding covid-19 vaccines on social media in korea. Int. J. Environ. Res. Public Health 2021, 18, 6549. [Google Scholar] [CrossRef]
Díaz-Rodríguez, N.; Lamas, A.; Sanchez, J.; Franchi, G.; Donadello, I.; Tabik, S.; Filliat, D.; Cruz, P.; Montes, R.; Herrera, F. EXplainable Neural-Symbolic Learning (X-NeSyL) methodology to fuse deep learning representations with expert knowledge graphs: The MonuMAI cultural heritage use case. Inf. Fusion 2022, 79, 58–83. [Google Scholar] [CrossRef]
He, K.; Mao, R.; Gong, T.; Li, C.; Cambria, E. Meta-based Self-training and Re-weighting for Aspect-based Sentiment Analysis. IEEE Trans. Affect. Comput. 2022. [Google Scholar] [CrossRef]
Cambria, E.; Liu, Q.; Decherchi, S.; Xing, F.; Kwok, K. SenticNet 7: A Commonsense-based Neurosymbolic AI Framework for Explainable Sentiment Analysis. In Proceedings of the Language Resources and Evaluation Conference, Marseille, France, 20–25 June 2022; pp. 3829–3839. [Google Scholar]
Gandhi, A.; Adhvaryu, K.; Poria, S.; Cambria, E.; Hussain, A. Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions. Inf. Fusion 2022, 91, 424–444. [Google Scholar] [CrossRef]
Twitter Global mDAU 2022 Statista. Available online: https://0-www-statista-com.brum.beds.ac.uk/statistics/970920/monetizable-daily-active-twitter-users-worldwide/ (accessed on 19 August 2022).
Tweets|Kaggle. Available online: https://www.kaggle.com/elgendy5576/tweets/comments (accessed on 29 June 2021).
Covid Vaccine Tweets|Kaggle. Available online: https://www.kaggle.com/kaushiksuresh147/covidvaccine-tweets (accessed on 29 June 2021).
Taboada, M.; Brooke, J.; Tofiloski, M.; Voll, K.; Stede, M. Lexicon-basedmethods for sentiment analysis. Comput. Linguist. 2011, 37, 267–307. [Google Scholar] [CrossRef]
Chiappe, L.M. Thumbs up or thumbs down?: Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the ACL ’02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, Philadelphia, PA, USA, 6–12 July 2002; pp. 417–424. [Google Scholar]
Jurek, A.; Mulvenna, M.D.; Bi, Y. Improved lexicon-based sentiment analysis for social media analytics. Secur. Inform. 2015, 4, 9. [Google Scholar] [CrossRef] [Green Version]
Hutto, C.J.; Gilbert, E. VADER: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the International AAAI Conference on Web and Social Media, Ann Arbor, MI, USA, 1–4 June 2014; pp. 216–225. [Google Scholar]
GitHub—Cjhutto/vaderSentiment: VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) Is a Lexicon and Rule-Based Sentiment Analysis Tool That Is Specifically Attuned to Sentiments Expressed in Social Media, and Works Well on Texts from Other Domains. Available online: https://github.com/cjhutto/vaderSentiment (accessed on 29 June 2021).
GitHub—Metalcorebear/NRCLex: An Affect Generator Based on TextBlob and the NRC Affect Lexicon. Note That Lexicon License Is for Research Purposes Only. Available online: https://github.com/metalcorebear/NRCLex (accessed on 29 June 2021).
Kaggle: Your Machine Learning and Data Science Community. Available online: https://www.kaggle.com/ (accessed on 20 August 2022).
Mitchell, T.M. Machine Learning; McGraw-Hill Education (India) Private Limited: Noida, India, 1997. [Google Scholar]
Muller, A.C. Introction to Machine with Python; O’ Reilly Media: Sebastopol, CA, USA, 2017. [Google Scholar]
Patel, J.; Goyal, R. Applications of Artificial Neural Networks in Medical Science. Curr. Clin. Pharmacol. 2008, 2, 217–226. [Google Scholar] [CrossRef] [PubMed]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
Chandra, R.; Krishna, A. COVID-19 sentiment analysis via deep learning during the rise of novel cases. PLoS ONE 2021, 16, e0255615. [Google Scholar] [CrossRef]
Ridhwan, K.M.; Hargreaves, C.A. Leveraging Twitter data to understand public sentiment for the COVID-19 outbreak in Singapore. Int. J. Inf. Manag. Data Insights 2021, 1, 100021. [Google Scholar]
Abduljabbar, R.L.; Dia, H.; Tsai, P.W. Unidirectional and bidirectional LSTM models for short-term traffic prediction. J. Adv. Transp. 2021, 2021, 5589075. [Google Scholar] [CrossRef]
About Keras. Available online: https://keras.io/about/ (accessed on 22 August 2022).
Welcome to Colaboratory-Colaboratory. Available online: https://colab.research.google.com/?utm_source=scs-index (accessed on 22 August 2022).
Abiola, O.; Abayomi-Alli, A.; Tale, O.A.; Misra, S.; Abayomi-Alli, O. Sentiment analysis of COVID-19 tweets from selected hashtags in Nigeria using VADER and Text Blob analyser. J. Electr. Syst. Inf. Technol. 2023, 10, 1–20. [Google Scholar] [CrossRef]
Khakharia, A.; Shah, V.; Gupta, P. Sentiment Analysis of COVID-19 Vaccine Tweets Using Machine Learning. SSRN Electron. J. 2021. [Google Scholar] [CrossRef]
Anitha, S.; Metilda, M. Apache Hadoop based effective sentiment analysis on demonetization and covid-19 tweets. Glob. Transit. Proc. 2022, 3, 338–342. [Google Scholar] [CrossRef]
Chakraborty, K.; Bhatia, S.; Bhattacharyya, S.; Platos, J.; Bag, R.; Hassanien, A.E. Sentiment Analysis of COVID-19 tweets by Deep Learning Classifiers—A study to show how popularity is affecting accuracy in social media. Appl. Soft Comput. J. 2020, 97, 106754. [Google Scholar] [CrossRef] [PubMed]
Srikanth, J.; Damodaram, A.; Teekaraman, Y.; Kuppusamy, R.; Thelkar, A.R. Sentiment Analysis on COVID-19 Twitter Data Streams Using Deep Belief Neural Networks. Comput. Intell. Neurosci. 2022, 2022, 8898100. [Google Scholar] [CrossRef]
Golchin, B.; Riahi, N. Emotion Detection in Twitter Messages Using Combination of Long Short-Term Memory and Convolutional Deep Neural Networks. Int. J. Comput. Inf. Eng. 2021, 15, 578–585. [Google Scholar]
Lwin, M.O.; Sheldenkar, A.; Lu, J.; Schulz, P.J.; Shin, W.; Panchapakesan, C.; Gupta, R.K.; Yang, Y. The Evolution of Public Sentiments during the COVID-19 Pandemic: Case Comparisons of India, Singapore, South Korea, the United Kingdom, and the United States. JMIR Infodemiology 2022, 2, e31473. [Google Scholar] [CrossRef]
Nimmi, K.; Janet, B.; Selvan, A.K.; Sivakumaran, N. Pre-trained ensemble model for identification of emotion during COVID-19 based on emergency response support system dataset. Appl. Soft Comput. 2022, 122, 108842. [Google Scholar] [CrossRef]
Al-Hashedi, A.; Al-Fuhaidi, B.; Mohsen, A.M.; Ali, Y.; Gamal Al-Kaf, H.A.; Al-Sorori, W.; Maqtary, N. Ensemble classifiers for Arabic sentiment analysis of social network (Twitter data) towards COVID-19-related conspiracy theories. Appl. Comput. Intell. Soft Comput. 2022, 2022, 6614730. [Google Scholar] [CrossRef]
Das, S.; Kolya, A.K. Predicting the pandemic: Sentiment evaluation and predictive analysis from large-scale tweets on COVID-19 by deep convolutional neural network. Evol. Intell. 2022, 15, 1913–1934. [Google Scholar] [CrossRef]
Shahi, T.B.; Sitaula, C.; Paudel, N. A Hybrid Feature Extraction Method for Nepali COVID-19-Related Tweets Classification. Comput. Intell. Neurosci. 2022, 2022, 5681574. [Google Scholar] [CrossRef]
Singh, C.; Imam, T.; Wibowo, S.; Grandhi, S. A Deep Learning Approach for Sentiment Analysis of COVID-19 Reviews. Appl. Sci. 2022, 12, 3709. [Google Scholar] [CrossRef]
Kumar, V. Spatiotemporal sentiment variation analysis of geotagged COVID-19 tweets from India using a hybrid deep learning model. Sci. Rep. 2022, 12, 1849. [Google Scholar] [CrossRef] [PubMed]
Chandrasekaran, G.; Hemanth, J. Deep learning and TextBlob based sentiment analysis for coronavirus (COVID-19) using twitter data. Int. J. Artif. Intell. Tools 2022, 31, 2250011. [Google Scholar] [CrossRef]

Figure 1. Methodology of the proposed work.

Figure 2. Lexicon-based sentiment analysis.

Figure 3. Process of lexicon-based sentiment analysis.

Figure 4. Positive, Negative, and Neutral tweets count.

Figure 5. Tweet count by city.

Figure 6. Emotion effects of vaccine tweet.

Figure 7. Recurrent Neural Network.

Figure 8. Generalised structure of LSTM.

Figure 9. Deep neural network approach.

Figure 10. Training accuracy and model loss with GRU network.

Figure 11. Training accuracy and model loss with Bi-LSTM network.

Figure 12. Confusion matrix for predicted v/s actual with GRU model (a) measured with COVID-19 dataset (b) measured with vaccination dataset.

Figure 13. Confusion matrix for predicted v/s actual with Bi-LSTM model (a) measured with COVID-19 dataset (b) measured with vaccination dataset.

Table 1. Sample tweets and sentiment polarity levels.

Sample Tweet	Sentiment Polarity
It is fun and interesting to learn # sentiment analysis : )	Positive
It is fun and interesting to learn # sentiment analysis : )
# COVID-19 vaccine introduced in India will be as effective as any vaccine developed by other nations!!!
@abc123 The movie was a major disaster with several meek plot points and disappointed the fans :(	Negative
Pune is the worst affected city by the #COVID-19 pandemic in Maharashtra.	Negative
The Maharashtra government announced a five-level #unlock plan for the state.	Unbiased/Neutral
The larger Mumbai division reported 2051 COVID-19 cases.	Unbiased/Neutral

Table 2. Overview of the COVID-19 dataset.

User Name	Screen Name	Location	Tweet At	Original Tweet	Sentiment
16	44968	Bangaluru	4/3/2020	#AirSewa	Extremely Positive
24	44976	Chandigarh	6/3/2020	Sellers are	Extremely Positive
1838	46790	Bidar, India	13/3/2020	Don’t Panic, Take care	Extremely Positive
1935	46887	Bangaluru, India	13/3/2020	For More Details---	Neutral
2280	47232	Bangaluru, India	14/3/2020	#nifty50 the	Positive
2736	47688	Chandigarh, Indian	14/3/2020	We should stock up on food in case cities we live in shutdown supermarkets because of this damn #Coronavirus? ehhh	Negative
3425	48377	Bangaluru, India	16/3/2020	Babu Don’t Think	Extremely Positive
3430	48382	Bangaluru, India	16/3/2020	Some online Shop	Positive

Table 3. Overview of the vaccination dataset.

User Name	Screen Name	Location	Tweet At	Original Tweet	Sentiment
31	35142	Pune	4/3/2020	grateful opportunity vaccinated help vaccinate today	Positive
104	35155	Mumbai, Indian	6/3/2020	COVID vaccine update hours experience arm soreness gone lethargy gone yet still intensely fever	Negative
1281	36790	Bangaluru, India	13/3/2020	yo grandmother going invited vaccine COVID vaccine	Positive
1824	38177	Bangaluru, India	13/3/2020	weeks ago, joked put charge COVID vaccine clinics would get	Neutral
1234	38277	Pune, India	14/3/2020	coronavirus vaccine guide everything need know far coronavirus pandemic	Positive
3573	39934	Mumbai	16/3/2020	flu vaccine needs COVID vaccine well flu vaccine protects fr.	Positive

Table 4. Words and their Valence Score.

Word	Polarity	Valence/Intensity
Okay	positive	0.9
Good	positive	1.9
Great	positive	3.1
Horrible	negative	−2.5
Frowning emoticon :(	negative	−2.2
Sucks	negative	−1.5

Table 5. Compound sentiment scores.

Sentences	Neg	Neu	Pos	Compound Score
The COVID Vaccines are working successfully	0	0.61	0.39	0.4939
Thank God, The COVID Vaccines are working successfully.	0	0.391	0.609	0.7783
THANK God, The COVID Vaccines are working SUCCESSFULLY.	0	0.35	0.65	0.8506
Thank God !!! The COVID Vaccines are working successfully !!!	0	0.438	0.562	0.8388
COVID-19 is dangerous to humanity	0.383	0.617	0	−0.4767
COVID-19 is very dangerous to humanity	0.361	0.639	0	−0.5256
COVID-19 is VERY DANGEROUS to humanity	0.447	0.553	0	−0.7058
COVID-19 is VERY DANGEROUS to humanity!!!	0.45	0.55	0	−0.774
OH MY GOD!!! COVID-19 is VERY DANGEROUS to humanity !!! :(	0.43	0.434	0.136	−0.796
I was vaccinated but got COVID !!!	0.622	0.162	0.221	−0.812

Table 6. Emotion categories.

Emotions	Count
Disgust	21,413
Surprise	34,028
Anger	38,179
Sadness	45,909
Joy	48,057
Fear	57,867
Anticipation	71,074
Negative	84,773
Trust	94,754
Positive	197,164

Table 7. GRU Model accuracy and loss during training with COVID-19 dataset.

Epoch No.	Training		Validation
Epoch No.	Accuracy	Loss	Accuracy	Loss
1st	0.6221	0.7797	0.5753	1.3290
2nd	0.8240	0.4539	0.9244	0.3238
3rd	0.8513	0.3834	0.8585	0.7250
4th	0.8691	0.3338	0.9078	0.6595
5th	0.8894	0.2860	0.8809	0.8642

Table 8. Bi-LSTM Model accuracy and loss during training with COVID-19 dataset.

Epoch No.	Training		Validation
Epoch No.	Accuracy	Loss	Accuracy	Loss
1st	0.6843	0.6926	0.9400	0.2256
2nd	0.8317	0.4317	0.8807	0.5215
3rd	0.8596	0.3619	0.9002	0.6220
4th	0.8788	0.3109	0.9038	0.5600
5th	0.8997	0.2620	0.8804	0.8423

Table 9. Performance metrics of COVID-19 dataset for GRU and Bi-LSTM.

Deep Learning Method	Model Evaluation Matrices	Specificity	Sensitivity	Recall	F1-Score
GRU	0 (Negative)	0.91	0.91	0.93	0.925
	1 (Positivity)	0.92	0.92	0.94	0.924
	2 (Neutral)	0.91	0.91	0.92	0.925
Bi-LSTM	0 (Negative)	0.90	0.90	0.91	0.915
	1 (Positivity)	0.91	0.91	0.91	0.912
	2 (Neutral)	0.91	0.91	0.91	0.912

Table 10. Performance metrics of vaccination dataset for GRU and Bi-LSTM.

Deep Learning Method	Model Evaluation Matrices	Specificity	Sensitivity	Recall	F1-Score
GRU	0 (Negative)	0.92	0.92	0.92	0.926
	1 (Positivity)	0.92	0.92	0.91	0.916
	2 (Neutral)	0.91	0.91	0.92	0.921
Bi-LSTM	0 (Negative)	0.92	0.92	0.91	0.925
	1 (Positivity)	0.93	0.93	0.92	0.924
	2 (Neutral)	0.92	0.92	0.91	0.912

Table 11. Comparative analysis of lexicon-based approach.

Reference	Dataset Description	Topics Labels	Approach for Sentiment Analysis
[20]	30,655 tweets from Canada and 69,742 tweets from United States about COVID-19	Neutral, positive and negative	NRCLex and VADER
[21]	600,000 COVID-19 tweets	Neutral, positive and negative	VADER
[22]	803,278 Vaccination tweets	Negative and positive	NRCLex
[23]	119,495 tweets about COVID-19	Negative, positive and neutral	VADER
[24]	67,983 Vaccination tweets	Negative, positive, neutral and compound	VADER and NRCLex
Proposed	80,000 tweets about COVID-19 and 218,791 tweets about vaccination	Neutral, positive and negative	VADER and NRCLex

Table 12. Comparative analysis of deep learning approach.

Reference	Topics Labels	Approach for Sentiment Analysis	Result
[37]	Joy, sad, fear and anger	Fine-tuned BERT	89%
		Linear regression	75%
		SVM	74.75%
		LSTM	65%
[42]	Positive, Negative and Neutral	LSTM	90.59%
[42]	Positive, Negative and Neutral	Bi-LSTM	90.83%
[67]	Positive, Negative and Neutral	VADER	NA
[68]	Positive and negative	Multinomial Naive Bayes and Logistic regression	88%
		SVM	96.26%
		Linear regression	97.3%
[28]	Positive, Negative and Neutral	Naïve Bayes Classifier Algorithm	80%
[29]	Positive and Negative	SVM	74%
[32]	Positive and Negative	Naïve Bayes, Random Forest, SVM, and Linear regression	74.67%
[69]	Positive, Negative and Neutral	SVM	70.66%
		Naïve Bayes	66.97%
		RNN	69.34%
[70]	Positive, Negative and Neutral	Fuzzy rule-based SVM	79%
[71]	Positive and negative	Deep Belief Neural Networks (N-gram model)	90.3%
[72]	Positive and negative	Bi-LSTM	93%
[73]	Positive, Negative and Neutral	CrystalFeel	Sentiment comparison across five countries
[74]	Fear, anger, sad, sadness etc.	Ensemble technique using pre-trained networks BERT, DistilBERT, RoBERT	86.46%
[75]	Positive and negative	Ensemble Classifiers	84.6%
[76]	Positive, Negative and Neutral	CNN	90.67%
[77]	Positive, Negative and Neutral	SVM+ radial basis function (RBF)	72.1%
[78]	Positive, Negative and Neutral	LSTM + RNN + Attention Mechanism	84.6%
[79]	Positive and negative	Hybrid deep learning using Bi-LSTM and CNN	89.68%
[80]	Positive and negative	Bi-LSTM	87%
Proposed Work	Positive, Negative, and Neutral	Bi-LSTM	92.70% and 92.48%
Proposed Work	Positive, Negative, and Neutral	GRU	91.24% and 93.03%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ainapure, B.S.; Pise, R.N.; Reddy, P.; Appasani, B.; Srinivasulu, A.; Khan, M.S.; Bizon, N. Sentiment Analysis of COVID-19 Tweets Using Deep Learning and Lexicon-Based Approaches. Sustainability 2023, 15, 2573. https://0-doi-org.brum.beds.ac.uk/10.3390/su15032573

AMA Style

Ainapure BS, Pise RN, Reddy P, Appasani B, Srinivasulu A, Khan MS, Bizon N. Sentiment Analysis of COVID-19 Tweets Using Deep Learning and Lexicon-Based Approaches. Sustainability. 2023; 15(3):2573. https://0-doi-org.brum.beds.ac.uk/10.3390/su15032573

Chicago/Turabian Style

Ainapure, Bharati Sanjay, Reshma Nitin Pise, Prathiba Reddy, Bhargav Appasani, Avireni Srinivasulu, Mohammad S. Khan, and Nicu Bizon. 2023. "Sentiment Analysis of COVID-19 Tweets Using Deep Learning and Lexicon-Based Approaches" Sustainability 15, no. 3: 2573. https://0-doi-org.brum.beds.ac.uk/10.3390/su15032573

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Sentiment Analysis of COVID-19 Tweets Using Deep Learning and Lexicon-Based Approaches

Abstract

1. Introduction

2. Literature Review

3. Proposed Methodology with Relevant Case Studies

3.1. Data Collection and Preprocessing

4. Sentiment’s Analysis Using Lexicon Based Approaches

4.1. Sentiment Analysis with VADER

4.2. Emotion Effects of Vaccine Tweets with NRCLex

5. Sentiment Analysis Using Deep Learning Approach

5.1. Model Evaluation

5.2. Results and Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI