Next Article in Journal
Singular Spectrum Analysis for Background Initialization with Spatio-Temporal RGB Color Channel Data
Previous Article in Journal
Randomness and Irreversiblity in Quantum Mechanics: A Worked Example for a Statistical Theory
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews

1
Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ 07102, USA
2
Department of Electrical and Computer Engineering, Faculty of Engineering, and Center of Research Excellence in Renewable Energy and Power Systems, King Abdulaziz University, Jeddah 21481, Saudi Arabia
*
Author to whom correspondence should be addressed.
Submission received: 23 October 2021 / Revised: 25 November 2021 / Accepted: 30 November 2021 / Published: 7 December 2021
(This article belongs to the Section Signal and Data Analysis)

Abstract

:
People nowadays use the internet to project their assessments, impressions, ideas, and observations about various subjects or products on numerous social networking sites. These sites serve as a great source to gather data for data analytics, sentiment analysis, natural language processing, etc. Conventionally, the true sentiment of a customer review matches its corresponding star rating. There are exceptions when the star rating of a review is opposite to its true nature. These are labeled as the outliers in a dataset in this work. The state-of-the-art methods for anomaly detection involve manual searching, predefined rules, or traditional machine learning techniques to detect such instances. This paper conducts a sentiment analysis and outlier detection case study for Amazon customer reviews, and it proposes a statistics-based outlier detection and correction method (SODCM), which helps identify such reviews and rectify their star ratings to enhance the performance of a sentiment analysis algorithm without any data loss. This paper focuses on performing SODCM in datasets containing customer reviews of various products, which are (a) scraped from Amazon.com and (b) publicly available. The paper also studies the dataset and concludes the effect of SODCM on the performance of a sentiment analysis algorithm. The results exhibit that SODCM achieves higher accuracy and recall percentage than other state-of-the-art anomaly detection algorithms.

1. Introduction

Sentiment analysis, emotion artificial intelligence, and intent analysis are often used to describe the same concept, i.e., opinion mining. Sentiment analysis uses a combination of natural language processing (NLP), computational linguistics, and text mining to analyze, derive, calibrate, and evaluate textual information in the form of sentences, phrases, documents, etc. [1]. NLP has earned a lot of attention recently.
People have started to rely on consumer reviews and sentiments shared over social media sites, blogs, and consumer feedback websites on the internet before purchasing or opting for a particular product or service. It has also become a vital tool for decision-makers who plan to improve, modify, or perform necessary actions based on public opinions.
Sentiment analysis is used extensively in various domains such as marketing, politics, sports, and stocks for information extraction, improvement of an automated chatbot response system, or product modification. Most companies use sentiment analysis to research consumer requirements and understand the market trends. Positive reviews of a product or service drive online marketing, while negative comments motivate companies to improve their products or services based on customer demands. Social media has become a robust platform that helps understand public opinions, acceptance, or issues regarding specific laws or lawmakers. Sentiment analysis helps one study the endorsement rate of these policies based on previous trends, which allows lawmakers to prepare and motivate the public accordingly. Similarly, this method aids in fan engagements and player/team reputation build-up in sports. It also helps one study a company’s prominence in the market, which impacts its stock valuation. These are some of the applications of sentiment analysis, to name a few.
With the expansion in data available through the internet, researchers have started focusing on both the academic and commercial applications of sentiment analysis. The boost in smartphone usage has increased the development of mobile games and apps. Oyebode et al. [2] used sentiment analysis to analyze the mental health apps in smartphones to classify their features as positive or negative. This analysis led to some design modifications based upon the negative factors of the app, which helped the app increase its potency. Afzaal et al. [3] used aspect-based sentiment analysis to implement a tourism app in smartphones to identify the most recommended restaurants and hotels in a city by extracting and classifying information from tourist reviews. In the fashion industry, online reviews play a vital role as it helps designers understand a shopper’s experience via the latter’s feedback. Li and Xu [4] proposed an aspect-based fashion recommendation model with an attention mechanism. They used convolutional neural networks, long short-term memory networks, and attention mechanisms to process customer and product reviews simultaneously. They then combined them to apprehend both local and global aspects of the reviews, which helps predict the customer rating.
Outlier detection is a salient data analysis concern that focuses on identifying oddities in datasets. Outlier (a.k.a. anomaly, noise, and exception) detection helps recognize an entity that prominently differs from most of the samples in a dataset [5]. Such entities may represent bank frauds, spam emails, structural defects, and errors in a dataset. Anomaly detection faces many challenges due to (a) the characteristic of input data or the nature of outliers, (b) noise in a dataset that might mimic an outlier, (c) inaccurate boundaries between standard data and outliers, and (d) computational complexity. In [6], Wang et al. explained the importance of designing an efficient and scalable outlier detection algorithm because the probability of the number of outliers is directly proportional to the volume of a dataset. It is also critical to promptly identify and rectify the outliers in a dataset such that we can have high-quality data.
The definition of an outlier may vary for various scenarios. For example, in this 5-star Amazon review for a hand sanitizer, “Do not buy. Doesn’t sanitize for covid19. Does not contain alcohol. Fake description as sanitizer.”, the nature of the review is positive as opposed to the sentiment of the review comment. Much like the example, this paper defines novelty as the reviews that have sentiment opposite to their corresponding star ratings. Anomaly detection is an eminently researched topic in various domains [7], but there is an inadequate study on outlier detection using sentiment analysis of a dataset. It is classified predominantly into supervised and unsupervised learning. The former is true when the dataset used is labeled, while the latter arises when the dataset is not labeled. The techniques used to identify anomalies are based broadly on classification, clustering, distance, machine learning, and statistical approaches. This paper proposes an outlier detection method using a combination of statistical and distance-based techniques. Our concerned dataset is scraped from the Amazon website, which consists of several Amazon products from various departments.
The rest of the paper structure is as follows: Section 2 reviews relevant sentiment analysis and outlier detection work. Section 3 discusses and analyzes the dataset used, and Section 4 presents the proposed statistics-based outlier detection and correction method (SODCM). Section 5 summarizes the experimental results. Section 6 showcases the conclusion and future work.

2. Related Work

Social media has become a powerful platform for people to share their opinions and concerns on topics ranging from socio-economic to political to technological advancements. Iglesias et al., in [8], discussed advancements in various approaches in the field of sentiment analysis, their contributions, and their applications in various domains. The work in [9] compiles all the studies related to various limitations of sentiment analysis on social media datasets. It discusses problems as trivial as spelling and grammatical mistakes to situations as critical as rumor-mongering, community shaming, riots, and protests arising from posts or comments on the internet. It also highlights the increasing impact of research conducted on sentiment analysis applied to social media datasets. The study in [10] analyzed previous literature based on modern social media applications. It also featured their impacts in healthcare, disaster management, and business.
In [11], Wang et al. explained that a sentence that holds an opinion consists of quintuple parameters (e, a, s, h, t), where e is the target or entity, a is the aspect or feature of e, s is the nature of the opinion on e or a, h is the opinion holder, and t is the time when h expresses the sentiment. For instance, in this 5-star Amazon review for a hand sanitizer, “With having to use hand sanitizers so much due to the COVID situation, this is the best one I have found. Love the residual effects and the fact that is doesn’t dry out my skin. Would recommend over other brands.”, e is the hand sanitizer, a is the residual effect, the nature of the opinion is positive, and the opinion holder is the Amazon reviewer while time is during COVID-19 pandemic. Sentiment analysis focuses explicitly on s, which is the nature of the opinion.
Sentiments or emotions tenaciously drive a consumer’s decisions and views regarding a product or service. The research in [12] focused on social media’s impact on people from a spatial and temporal vantage point. Using Alteryx, it filtered the tweets based on residential users from the 2016 United States Geo-tweets dataset. The results show a higher impact of tweets, especially those with positive sentiments, based on several features such as location, content, and time. Cosmetic brands apply sentiment analysis to obtain a clear and comprehensive insight into consumers’ thoughts on product quality and desires. In [13], Park implemented Term Frequency–Inverse Document Frequency to analyze the polarity of customer opinions and brand satisfaction for 26 different cosmetic companies. The research also focused on the factors affecting the nature of consumers’ views.
Understanding a consumer’s buying choices is a challenging assignment for a machine learning algorithm. Hu et al., in [14], introduced credibility, interest, and sentiment enhanced recommendation model, which consists of five segments, namely, feature extraction of the review, interest mining on the aesthetic of the comment, candidate feature sentiment assignment based on the nature of their fastText sentiment, and a recommendation module that utilizes credibility weighted sentiment score of the feature selected by the buyer and reviewer credibility evaluation that helps in weighing the credibility of the reviewer to avoid fake reviewers. The reviews also depend on a reviewer’s experience, which might differ from one customer to another. Li et al. focused on this problem in [15] by recommending an algorithm inspired by Dempster–Shafer’s evidence theory. They used hotel customer reviews of four different properties as a case study and extracted information from various travel websites to identify the practicability and capability of the algorithm. Their approach can help the managers develop strategies based on the customer reviews to outrun their competitors.
Aspect-based sentiment analysis (ABSA) identifies the feature/aspect of an entity/target in an opinion/review and then performs sentiment analysis on each element analyzed. In this 3-star Amazon review on gloves, “Good value for the money, however, they do not hold up very well. They rip easily”, the two aspects the consumer discusses are (a) affordability, whose sentiment is positive as they are cheap, and (b) durability, which carries a negative polarity. In [16], feature-focused sentiment analysis was applied to the customer comments, and the review votes of various mobile products were collected from Amazon. The result indicated that the method helps the manufacturers in product development and the buyers make a personalized decision based on multiple features of the product. Ali et al. [17] studied the customer reviews and feedbacks for ridesharing services to modify and uplift several organizations for Kansei engineering in India–Pakistan. Since the languages used commonly are Urdu/Hindi and English, the work converted all the reviews into English and performed ABSA. They also extracted the most frequently used aspect to improve further the services provided based on customer demands. ABSA also helps classify reviews or comments based on various product or service features related to the opinion. ABSA has several challenges, such as that the attention-based models may sometimes (a) lead to a given aspect incorrectly targeting grammatically irrelevant words, (b) fail to diagnose special sentence structures such as double negatives, and (c) weigh only one vector to depict context and target. In [18], Zhang et al. proposed a knowledge-guided capsule network to address the above limitations using Bi-LSTM and capsule attention network. The study in [19] summarizes the state-of-the-art ABSA methods by using lexicon-based, machine learning, and deep learning approaches.
In this digital age, since information is so readily available, before purchasing a product, buyers tend to read customer reviews and comments, which affect their purchasing decision. Researchers usually focus on the review body, but a review contains more information than that, which is generally not exploited, such as review time, number of helpful votes, review time, reviewer id, and review rating. In [20], Benlahbib and Nfaoui visualized the reputation of a product differently by considering all the parameters and projecting the reputation value, opinion category, top positive review, and top negative review. They implemented the time of review and the number of helpful votes for each review from the Transformers model to Bidirectional Encoder Representations. This helps to predict the probability of the nature of review sentiment. They also proposed equations that calculate the reputation value for a product. Extensive research is being conducted not only focusing on sentiment analysis in English but also several other languages such as Arabic [21], Persian [22], Urdu [23], Hindi [24], Russian [25], Chinese [26], and Indonesian [27].
Several studies were conducted on sentiment analysis [28] and its application on e-commerce. With the increase in online consumption, e-commerce enhancement has become a hot topic for research. Many scholars introduced methods focusing on deep neural networks [29], probabilistic classifiers [30], linear classifiers [31], lexicon-based approaches [32], or decision trees [33] to increase accuracy and efficiency. In [34], Wang et al. proposed an iterative sentiment analysis model called SentiDiff, which predicts polarities in Twitter messages by considering the interconnections between textual information of Twitter messages and sentiment diffusion patterns. Shofia and Abidi [35] used a support vector machine to identify the keywords and extricate the sentiment polarity of Twitter data specific to Canada on social distancing due to COVID-19. Zhang et al. [36] introduced a convolutional multi-head self-attention memory network to glean valuable and intricate semantic information from sequences and aspects of a sentence. This algorithm uses a convolutional network to capture n-gram grammatical knowledge and multi-head self-attention to acknowledge the linguistic information of the sequence by the memory network. Abdalgader et al. [37] applied a lexicon-based word polarity identification method by studying the semantic relatedness between the set of the target word and synonyms of words surrounding the target on several benchmark datasets. The result has outrun several existing methods that use pairwise relatedness between words at term-level around the target over a fixed size. The performance of various sentiment analysis methods differs due to such factors as datasets, feature representations, or classification processes. Liu et al. [19] conducted a detailed survey on several deep learning approaches for aspect-based sentiment analysis using benchmark datasets evaluation metrics and the performance of the existing deep learning methods.
Outliers are extreme values that diverge from the rest of the data samples [38,39]. It might occur due to an imbalanced dataset or experimental error, or novelty. The research [39] defines an outlier in its experiment as any tweet in a Twitter dataset that is not relevant to the topic in consideration. Once the outliers are detected and eliminated, it is noticed that the algorithm’s accuracy improves significantly. Similarly, in [40], it was observed that before implementing a convolutional neural network to the document to be classified, if outliers are identified and erased by using a density-based clustering algorithm, the efficiency increases, and the computational cost decreases. Kim et al. [41] applied a combination of four outlier detection methods, namely (a) Gaussian density estimation, (b) Parzen window density estimation, (c) Principal component analysis, and (d) K-means clustering to identify malicious activities in an institution using user log database. The outlier identification methods can be broadly categorized into statistical-based [42], distance-based [43], graph-based [44], clustering-based [45], density-based [46], and ensemble-based [47]. Once the outliers are detected, it is crucial to delete, consider, or modify the outlier. This usually depends on an outlier’s effect on the dataset if it is deleted or tampered with. The condition of an outlier can vary for different applications and datasets; for instance, if in a population estimation survey the number of people with height over 7 ft is very low, then these data can be verified and kept as they are natural outliers. In contrast, if in a dataset with various brands of shoes, the price of one or two are extraordinarily high, then those outliers can be deleted before calculating the average cost of a pair of shoes.

3. Datasets

With the advancement in the field of the internet and cloud computing [48], data collection has become more accessible. Public datasets are found in abundance for research purposes. Amazon is one of the many colossal data sources that encourage scholars to scrape publicly available data from their websites for research purposes. Based on a survey from Feedvisor, an article in Forbes concluded that 89% of the buyers choose Amazon instead of other e-commerce websites to make online purchases [49]. There are two types of datasets used in this paper, (a) collected datasets and (b) publicly available datasets. Collected datasets used in this paper [50] consists of product reviews we ourselves collected from Amazon.com, starting from the year 2008 to 2020, spanning across seven different domains, namely, book (Becoming by Michelle Obama), pharmaceutical (Turmeric Curcumin Supplement by Natures Nutrition), electronics (Echo Dot 3rd Gen by Amazon), grocery (Sparkling Ice Blue Variety Pack), healthcare (EnerPlex 3-Ply Reusable Face Mask), entertainment (Harry Potter: The Complete 8-Film Collection), and personal care (Nautica Voyage By Nautica).
Each review carries multiple information such as reviewer name, date and place of comments, star rating, verified purchase, the number of buyers who find the review helpful, and the images added by the reviewer. This dataset scraped from Amazon consists of 35,000 Amazon customer reviews, including the product name, comment date, star rating, and the number of helpful votes. Figure 1 shows the number of reviews against each star rating accumulated for all seven collected datasets. It can be observed that the extremely positive star rating (5-star) dominates the dataset, and there are very few negative (1- and 2-star) and moderately positive (3- and 4-star) star ratings. The skewed nature of the dataset results in J-shaped distribution. Multiple reasons behind such bias towards extremely positive reviews exist. People usually agree with and write about the positive ratings and comments quickly but are generally skeptical about the negative ratings or comments. When a consumer notices an extremely positive review, it usually influences the consumer’s opinion resulting in the switching of star rating. A higher rating was also observed to easily influence a consumer to increase the valuation, while the reverse is not true [51]. Table 1 represents the consumer review distribution across the different star ratings in all the collected datasets individually. The results show the same biases of customer reviews towards a 5-star rating as compared to the rest.
Figure 2 represents a graphical distribution of the average number of helpful votes per review. It can be inferred that customers find the extremely negative reviews as the most helpful ones for making buying decisions or understanding a product. Extremely negative reviews are usually critical about the product, its features, packaging, delivery, usefulness, cost, and authenticity. It becomes easier for a consumer to decide about buying a product if they understand the various aspects of a product and the extremely negative experiences of former buyers. Table 2 compiles the average helpful vote per customer review in each dataset. It can be observed that most customers find extremely negative reviews most informative and beneficial.

4. Statistics-Based Outlier Detection and Correction Method (SODCM)

4.1. Interquartile Range

Traditionally a dataset can be represented by using the five-number summary, which includes the lowest and highest value, median, and first and third quartile, the middle number between median and first and last number, respectively [52]. These values exhibit more information about a dataset as compared to just rows and columns. Figure 3 is an example of the box plot distribution of a dataset.
Q 1 and Q 3 are the intermediate points of the first and second half of an ordered dataset, respectively, and Q 2 is the median value of a dataset. For example, in an arranged dataset A = { 1 , 1 , 2 , 3 , 5 , 6 , 7 } , Q 2 is 3, which is the median value or the fourth number of the dataset. Q 1 is 1 as it is the center value of the first half, 6 is Q 3 as it is the midpoint of the second half of the dataset.
The difference between Q 1 and Q 3 is the interquartile range ( I Q R ), which reflects the spread of the dataset about the median.
I Q R = Q 3 Q 1
The lower and upper fences can be represented as:
F L = Q 1 1.5 I Q R
F U = Q 3 + 1.5 I Q R
Data in a dataset that exists beyond the bounds of F L and   F U are outliers. Additionally, 1.5 preserves the sensitivity of the dataset. A larger scale than 1.5 would consider outliers as a datapoint, while the reverse would include data points in outliers.
In a dataset, there are two types of outliers, suspected or potential outliers and definite outliers. A potential outlier ( O P ) is the data that are suspected as possible outliers if they satisfy:
F L < O P < Q 1     or   F U < O P < Q 3
A definite outlier ( O D ) is the data that are absolute outliers if they comply with:
O D < F L   or   F U < O D

4.2. Definitions for SODCM

R consist of all the customer reviews in a dataset such that R = { r 1 ,   r 2 ,   r 3 , , r N } , where r i denotes i t h review and r i * is the star rating of r i . In order to understand our proposed statistics-based outlier detection and correction method (SODCM), the following definitions are presented.
Definition 1.
r i is positive if r i *   4 , where r i R . Any review with a star rating of four or more is considered a positive star rated review, denoted by S + .
Definition 2.
  r i is negative if r i * < 4 , where r i R . Any review with less than a four-star rating is considered a negative star rating review, denoted by S .
Definition 3.
  T V ( r i ) = 1   i f   r i S + and T V ( r i ) = 1   i f   r i S . The target value of review r i is 1 if it is a positive star rated review and −1 otherwise, denoted by T V .
Definition 4.
V D ( r i ) = d ( T V ( r i ) ,   C V ( r i ) ) , where C V ( r i ) is the compound sentiment score of r i predicted by a sentiment analysis algorithm. The value difference of review r i is the Euclidean distance between T V ( r i ) and C V ( r i ) of the corresponding review, denoted by V D ( r i ) . Since the range of both T V and C V   is [−1, 1], the range of V D is [0, 2].

4.3. Proposed Algorithm

The star rating assigned to a customer’s review is generally considered as the ideal sentiment of the comment. There are instances when a customer might have assigned a positive star review, but the nature of the feedback is negative. This 4-star Amazon customer review on a thermometer, “Purchased the thermometer to have a method to check temperatures by non-contact. The thermometer’s box and content was not sealed which bothered me because of COVID.”, carries a negative sentiment but has a positive rating which is contradictory. These ratings of reviews can be corrected to their correct star rating to improve the efficiency of a sentiment analysis algorithm.
SODCM consists of two major parts, namely the (a) detection of these outliers and (b) correction of these identified anomalies. It has the following six steps:
Input: 
The input for SODCM is any dataset containing customer reviews ( r i ) and their corresponding star ratings ( r i * );
Step 1: 
T V is calculated using r i * . If r i belongs to S + then T V = 1 and if r i belongs to S then T V = 1 . Since this work focuses on the binary classification of the sentiments of customer reviews, the values assigned to T V are 1 or 1 ;
Step 2: 
V D is calculated between T V and C V . The value of V D is always positive. Since the minimum and maximum value T V and C V is 0 and 1, the range of V D is between 0 and 2. Figure 4 is an example of the box plot distribution of S + . Since the minimum value V D can hold is 0, Figure 4a depicts the box plot of S + when F L is negative and Figure 4b depicts the box plot of S + when F L is positive. Figure 5 is an example of the box plot distribution of S . Since the maximum value V D can hold is 2, Figure 5a depicts the box plot of S when F U > 2 and Figure 5b depicts the box plot of S when F U ≤ 2;
Step 3: 
After analyzing the dataset, it can be construed that S + has some reviews whose sentiment does not match the nature of star rating; hence, they are considered outliers. On the other hand, S has very few reviews whose opinions match the essence of their respective star rating; hence, the reviews which are correctly assigned to their corresponding star ratings are considered outliers. This implies that most negative comments are incorrectly rated; therefore, the outliers, in this case, would be the correctly rated comments. In other words, the incorrectly labeled reviews are all the reviews in S , excluding the ones which are outliers. Hence, the dataset is split into S + and S ;
Step 4: 
In S + , if F L is negative, then O s can be calculated as Q 3 + I Q R else, F U I Q R F L . Since the range of V D is [0, 2], the least value it can hold is 0. In S if F U > 2 , then O s can be calculated as Q 1 I Q R ,   else, Q 3 I Q R F U . We compute O s as follows:
For   S +           O s = { Q 3 + I Q R ,     F L < 0     F U I Q R F L ,     F L 0
For   S           O s = { Q 1 I Q R ,     F U > 2     Q 3 I Q R F U ,     F U 2
Step 5: 
In S + , V D ( r i ) O s , if r i is outlier. For S + , customer comments, whose V D ( r i ) O s , are outliers. In S , if V D ( r i ) O s , if r i is outlier. For S , customer comments whose V D ( r i ) O s , are outliers. These five steps complete the outlier detection process;
Step 6: 
T V of reviews labeled as outliers in S + is reversed, meaning a comment with T V = 1 now becomes re-labeled as 1 and vice versa. On the contrary, for S , T V of reviews that are not labeled as outliers is reversed. This step is vital as it performs outlier correction by changing the nature of r i * ;
Output: 
The output of the proposed algorithm is the dataset consisting of reviews with their corrected nature of star ratings which means a positive natured review is labeled as 1 and the negative natured review as 1 . SODCM helps in detecting the outliers and correcting them without eliminating or modifying any review.
The above steps are realized in SODCM. After its execution, we can perform a more accurate sentiment analysis of the revised dataset, and the obtain performance matrix of SODCM is obtained.
Theorem 1.
The time complexity of SODCM is O ( n ) .
Proof. 
Each of Steps 1 to 6 requires time complexity O ( n )   while Step 4 needs O ( 1 ) . Hence, the entire algorithm (Algorithm 1) has the complexity O ( n ) . □
Algorithm 1 Statistics-based outlier detection and correction method (SODCM)
Input:
D // dataset containing r i and r i *
Output:
D * // modified dataset post outlier detection and correction
Step 1:
1 if  r i *   4  then
2 T V = 1 ;
3 else
4  T V = 1 ;
5 end if
Step 2:
6 INITIALIZE V D to array [0];
7 for each r i  do
8  V D [ i ] = d E ( T V ,   C V ) ;
9 end for
Step 3:
10 INITIALIZE S + to array [0];
11 INITIALIZE S to array [0];
12 for each r i *  do
13 if  r i *   4  then
14  S + [ i ] = [ r i ,   r i * , V D [ i ] ] ;
15 else
16  S [ i ] = [ r i ,   r i * , V D [ i ] ] ;
17 end if
18 end for
Step 4:
19 Function  I Q R calculation ( S , V D )
20 Sort (   V D ) ;
21 Let   Q 1 = first quartile (   V D );
22 Let   Q 3 = third quartile (   V D ); I Q R = Q 3 Q 1 ;
23  F L = Q 1 1.5 I Q R ;
24  F U = Q 3 + 1.5 I Q R ;
25 if    S     S +  then
26 if  F L < 0  then
27  O s = Q 3 + I Q R ;
28 else
29  O s = F U I Q R F L ;
30 end if
31 Else
32 if  F U > 2  then
33  O s = Q 1 I Q R ;
34 else
35  O s = Q 3 I Q R F U ;
36 end if
37     end if
38 return  O s ;
39 end Function
40  O S + = calculation (   S + , V D ) ;
41  O S = calculation (   S , V D ) ;
Step 5:
42 INITIALIZE O + to array [0];
43 INITIALIZE O to array [0];
44 for each r i in   S +  do
45 if    V D ( r i )     O S +  then
46  O + [ i ] = ‘yes’;
47 else
48  O + [ i ] = ‘no’;
49 end if
50 end for
51 for each r i in   S  do
52 if    V D ( r i )     O S  then
53  O [ i ] = ‘yes’;
54 else
55  O [ i ] = ‘no’;
56 end if
57 end for
Step 6:
58 for each r i in   S +  do
59 if  O + [ i ] = ‘yes’ then
60  T V [ i ] = toggle ( T V [ i ] ) ;
61 end if
62 end for
63 for each r i in   S  do
64 if  O [ i ] = ‘no’ then
65  T V [ i ] = toggle ( T V [ i ] ) ;
66 end if
67 end for
68  D * = concat (   S + , S ) ;

5. Experimental Results

The proposed SODCM identifies and rectifies outliers for all the datasets consisting of Amazon customer reviews of products from various domains. All the three algorithms are executed on both (a) collected Amazon review datasets and (b) an Amazon review dataset publicly available in the amazon-reviews-pds S3 bucket in AWS US East Region [53]. There are several datasets consisting of product reviews from various domains, and we chose Amazon product review datasets for seven domains, namely apparel, beauty, fashion, furniture, jewelry, luggage, and toys. Each of these datasets consists of 100,000 customer reviews. The algorithm used for sentiment analysis is TextBlob [54], which is a Python library for NLP. The experiment is performed in two stages. Initially, the algorithm is implemented to each star rating of a dataset separately to study the results. SODCM then evaluates the complete dataset at a later stage of the research.
Table A1, Table A2, Table A3, Table A4 and Table A5 in Appendix A represent the results from reviews evaluated based on the star ratings individually. For Table A1 and Table A2, the least value for O s is considered as F U , and O s is then decremented by 0.1 until it reaches 0.8. For Table A3, Table A4 and Table A5, the least value for O s is considered as F L , and O s is then incremented by 0.1 until it reaches 1.2. The results are then saved in a csv file, evaluated manually to check the number of outliers detected correctly and incorrectly. In all the Tables, O D represents the total number of outliers detected, O I is the number of reviews incorrectly labeled as outliers, and O C equals the number of reviews correctly labeled as outliers. O I and O C are validated manually for cross-verification. SODCM is implemented for all the datasets and ratings separately.
The performance of SODCM is compared with two state-of-the-art outlier detection methods published this year: (a) a class-based approach [55] and (b) a deep-learning-based approach [56]. Table 3 and Table 4 represent the performance comparison of SODCM with those in [55,56] on the collected datasets and on the publicly available datasets, respectively. The bold numbers in all tables mean the best results among three methods. Table 5 compiles the metrics comparison for SODCM using p-value, T-score, and CI, where CI represents the 95% confidence interval in the form of [x, y].
From Table A1, Table A2, Table A3, Table A4 and Table A5, it can be concluded that SODCM detects an optimal number of outliers in all the datasets and shows a perfect ratio between the correctly and incorrectly detected outliers, thus resulting in a high degree of accuracy. The accuracy decreases considerably once the value of O s reaches one. Moreover, the increase or decrease in O s for positive or negative star-rated reviews, respectively, results in a rise in incorrectly labeled outliers. It can also be concluded from Table 3 and Table 4 that the accuracy and recall percentage of SODCM for all the datasets outperform those of [55,56]. Hence, it is inferred that SODCM outperforms the other methods in the outlier detection and correction approach, which are outperformed by those in [55,56].
Table 5 reflects that the p-value is less than 0.001, which is robust evidence against the null hypotheses. An extremely low p-value signifies that the results are not accidental, and the improvement is due to SODCM. The T-score for all the datasets is high, indicating more significant evidence against the null hypothesis. This means that there is a considerable difference between the collected star ratings from the website and the improved star ratings based on the nature of the reviews by SODCM. CI in Table 4 represents a 95% chance that the actual error of the model is between x ± y. Hence, the smaller CI, the more precise is the estimate of the model.

6. Conclusions and Future Work

SODCM is a novel approach tor identifying anomaly in a customer review dataset and rectifying it by improving their corresponding star rating. The results exhibit that the performance of the proposed algorithm surpasses other state-of-the-art approaches, and it also gives evidence for SODCM’s rejection of the null hypothesis. The advantage of SODCM against most of the methods is that this data analysis pipeline preserves the outliers to correct them and prevents any information loss. From this dataset study, it can also be inferred that the outlier definition is different for positive and negative reviews as the minority in a dataset with positive star rated reviews is when the nature of both reviews and star ratings contradicts. At the same time, the reverse is true for negative star-rated reviews. Moreover, Amazon customer review datasets are generally highly imbalanced irrespective of the product or its department, and they follow J-shaped distribution. By studying the count of helpful votes in the datasets, it is noticed that extremely negative reviews are the most critical ones, which help in the decision-making for the majority of the customers.
Since it can be concluded that SODCM performs well on datasets consisting of Amazon customer reviews, the future work should focus on applying the proposed method to product reviews from other marketplace datasets such as eBay, Etsy, Best Buy, Target, Walmart, etc., to obtain a better insight into the discrepancies between star ratings and the related reviews. This will help conclude that SODCM can detect and rectify anomalies without deleting any data to preserve the overall dataset knowledge. This algorithm can be implemented in several real-life scenarios such as accessing product performance [57,58,59,60,61,62], conducting market research along with flagging of reviews through rating and review irregularity detection, and thus rectifying them without any data loss [63,64]. In this paper, the sentiment analysis algorithm used is TextBlob, a Python-based NLP package. It should be interesting to study the behavior and impact of SODCM when combined with other state-of-the-art sentiment analysis algorithms such as BERT, XLNet, ELECTRA, OpenAI’s GPT-3, RoBERTa, or StructBERT.

Author Contributions

Conceptualization, I.C. and M.Z.; Investigation, I.C., K.S. and A.A. (Ahmed Alabdulwahab); Writing—Original Draft Preparation, I.C.; Writing—Review and Editing, M.Z. and A.A. (Abdullah Abusorrah); Funding, K.S. and A.A. (Ahmed Alabdulwahab); Research result validation, A.A. (Abdullah Abusorrah), K.S. and A.A. (Ahmed Alabdulwahab). All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Institutional Fund Projects under grant no (IFPNC-001-135-2020). Therefore, the authors gratefully acknowledge technical and financial support from the Ministry of Education and King Abdulaziz University, Jeddah, Saudi Arabia.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

This research uses two types of dataset a) collected dataset which we scraped from Amazon website and b) publicly available dataset. The collected data is available in a publicly accessible repository. The collected data presented in the study are openly available in Harvard Dataverse at doi/10.7910/DVN/W96OFO. The publicly available Amazon customer review data is available in TSV files in the amazon-reviews-pds S3 bucket in AWS US East Region.

Acknowledgments

We would like to acknowledge the help of Yue Liu, School of Artificial Intelligence and Automation, Beijing University of Technology, Beijing, China, for her help in drawing some figures and text revision. We also appreciate the anonymous reviewers for their constructive comments that help improve this manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

This Appendix projects all the experimental tables to support the results of this work. All the experiments were conducted in Python 3.7 on a Jupyter notebook. The models were tested locally on an Apple M1 chip, 8 GB of RAM, and 512 GB SSD. Several Python libraries were used including NLTK 3.5, pandas 1.2.0, matplotlib 3.3.3, TextBlob 0.15.3, scikit-learn 0.19.0, NumPy 1.19.5, scipy 1.7.1, and pingouin 0.4.0. For Table A1 and Table A2, the value of O s ranges between F U and 0.8 with a gradual decrement in steps of 0.1. For Table A3, Table A4 and Table A5, the value of O s ranges between F L and 1.2 with a gradual increment in steps of 0.1. The results, saved in a csv file, are manually evaluated twice by two different analysts to determine the number of outliers detected correctly and incorrectly. In all the Tables, O D represents the total number of outliers detected, O I is the number of reviews incorrectly labeled as outliers, and O C equals the number of reviews correctly labeled as outliers. O I and O C are validated manually for cross-verification.
From Table A1, Table A2, Table A3, Table A4 and Table A5, it can be observed an optimal number of outliers is successfully detected in all the datasets by the proposed SODCM. This leads to a high degree of accuracy since the number of correctly and incorrectly detected outliers reach a perfect balance. As the value of O s reaches 1, the sentiment analysis accuracy of the modified dataset decreases considerably, and the increase and decrease in O s for positive and negative star-rated reviews, respectively, results in a rise in incorrectly labeled outliers.
Table A1. SODCM applied to 5-star review comments.
Table A1. SODCM applied to 5-star review comments.
Dataset O s O D O I O C Accuracy
Book1.181559460.973
1.1417417570.978
1.19226660.982
1.0297872100.967
0.94851093760.922
0.810051298260.795
Electronics1.184351340.929
1.105755700.946
1.1868780.951
1.02311542310.991
0.95732982750.853
0.811784357430.61
Entertainment1.747151140.886
1.7172150.887
1.6264220.89
1.5409 310.894
1.478439340.895
1.46513520.902
1.39415790.911
1.2147211260.926
1.1257462110.96
1.07053513540.903
0.99374544830.832
0.812245906340.745
Grocery1.6251240.924
1.5322300.926
1.4454410.93
1.355544500.933
1.37110610.937
1.210217850.947
1.1162351270.964
1.06042453590.905
0.97743014730.855
0.810613547070.771
Health Care1.365263230.938
1.345283250.939
1.3337260.941
1.27612640.954
1.1119171020.969
1.04001832170.937
0.96722913810.847
0.810754086670.713
Personal Care1.687171160.934
1.6211200.935
1.5453420.942
1.425503470.945
1.4609510.947
1.3789690.953
1.210016840.96
1.1161431180.979
1.06712334380.861
0.98012955060.82
0.810393756640.745
Pharmaceutical1.75131120.896
1.7141130.897
1.6284140.901
1.5387310.903
1.44815330.906
1.37728490.914
1.213057730.929
1.12421141280.961
1.0120314610570.769
0.9145917712820.697
0.8174420716160.595
Table A2. SODCM applied to 4-star review comments.
Table A2. SODCM applied to 4-star review comments.
Dataset O s O D O I O C Accuracy
Book1.1745140.981
1.1386150.986
1.16150.986
1.014680.977
0.9268180.922
0.84912370.817
Electronics1.1949270.928
1.114212190.939
1.1234190.941
1.09427670.991
0.92601221380.835
0.85222053170.588
Entertainment1.5871010.872
1.54050.881
1.45050.884
1.3655050.884
1.3130130.908
1.2190190.926
1.1301290.958
1.07318550.914
0.99533620.849
0.812752750.754
Grocery1.5684040.918
1.57070.922
1.4121110.929
1.326141130.932
1.3151140.934
1.2226160.944
1.13910290.969
1.011562530.919
0.914869790.871
0.8198821160.797
Health Care1.3129180.927
1.269131120.931
1.2212190.94
1.1449350.965
1.013354790.936
0.9214931210.847
0.83611252360.685
Personal Care1.697070.943
1.67070.943
1.5132110.95
1.429142120.952
1.4152130.953
1.3183150.956
1.2254210.964
1.1416350.983
1.0184123610.849
0.9213147660.815
0.8264187770.755
Pharmaceutical1.751010.894
1.72020.896
1.64130.898
1.55140.9
1.48350.903
1.313490.91
1.2269170.927
1.14417270.951
1.02431391040.79
0.92901531370.729
0.83831612220.609
Table A3. SODCM applied to 3-star review comments.
Table A3. SODCM applied to 3-star review comments.
Dataset O s O D O I O C Accuracy
Book0.8342110.951
0.93120.967
1.06150.983
1.0066150.983
1.19450.935
1.2161060.822
Electronics0.8178170.933
0.9246180.953
0.993598510.997
1.06511540.994
1.1179791000.851
1.23762101660.604
Entertainment0.2794040.87
0.34040.87
0.45050.872
0.58080.876
0.5309090.878
0.6141130.886
0.7161150.889
0.8293260.909
0.9467390.936
1.014665810.907
1.1210901200.808
1.22791431360.7
Grocery0.5721100.902
0.61100.902
0.72110.91
0.7714130.925
0.85230.932
0.910460.97
1.0211290.947
1.12918110.888
1.24026140.805
Health Care0.6154130.889
0.75140.893
0.89180.908
0.8129180.908
0.9187110.942
1.05124270.931
1.16832360.866
1.211366470.695
Personal Care0.2412020.902
0.32020.902
0.42020.902
0.53030.907
0.533030.907
0.67160.929
0.79270.94
0.8122100.956
0.9165110.978
1.04018220.891
1.14622240.858
1.25933260.788
Pharmaceutical0.4821010.893
0.51010.893
0.61010.893
0.71010.893
0.7011010.893
0.82110.904
0.98350.989
1.03725120.723
1.14431130.648
1.26246160.457
Table A4. SODCM applied to 2-star review comments.
Table A4. SODCM applied to 2-star review comments.
Dataset O s O D O I O C Accuracy
Book0.7931010.978
0.92021
0.97062021
1.04130.956
1.17340.891
1.2171070.673
Electronics0.8275140.933
0.911380.955
1.0215160.992
1.001256190.993
1.16122390.859
1.212465590.627
Entertainment0.3461010.82
0.41010.82
0.51010.82
0.5662020.825
0.62020.825
0.72020.825
0.88170.855
0.9233200.93
1.05013370.935
1.16818500.845
1.29531640.71
Grocery0.3421010.894
0.42110.903
0.52110.903
0.5852110.903
0.63210.913
0.74220.923
0.87250.951
0.98260.961
1.0227150.903
1.12912170.836
1.23817210.75
Health Care0.5991010.887
0.72020.892
0.7986060.913
0.87160.918
0.913490.948
1.04120210.908
1.15829290.821
1.29155360.653
Personal Care0.5121010.947
0.61010.947
0.71010.947
0.7341010.947
0.81010.947
0.94040.973
1.0219120.877
1.12410140.85
1.23216160.78
Pharmaceutical0.250000.862
0.30000.862
0.40000.862
0.50000.862
0.60000.862
0.70000.862
0.80000.862
0.96510.980
1.0181530.784
1.1221930.705
1.2232030.588
Table A5. SODCM applied to 1-star review comments.
Table A5. SODCM applied to 1-star review comments.
Dataset O s O D O I O C Accuracy
Book0.70610280.931
0.8244200.956
0.9325270.97
0.914335280.972
1.07225470.959
1.111055550.892
1.2190123670.752
Electronics0.8092110.929
0.9131120.958
0.987255200.989
1.0316250.994
1.18930590.844
1.2202106960.55
Entertainment0.3092020.838
0.43120.84
0.55140.846
0.5456150.851
0.610190.853
0.7161150.865
0.8252230.882
0.9487410.925
1.012747800.926
1.1185641210.818
1.22621081540.674
Grocery0.3572020.933
0.42020.933
0.56330.939
0.67340.94
0.6068350.942
0.79540.943
0.813670.949
0.93211210.977
1.012177440.89
1.1168110580.821
1.2229153760.731
Health Care0.6085230.933
0.78350.938
0.795144100.948
0.8144100.948
0.9288200.97
1.09246460.925
1.114777700.836
1.2244148960.679
Personal Care0.3125050.922
0.47070.925
0.5110110.931
0.575120120.933
0.6131120.933
0.7173140.94
0.8243210.951
0.9378290.971
1.013786510.876
1.1167105620.831
1.2231156750.733
Pharmaceutical0.352020.874
0.42020.874
0.53120.876
0.5663120.876
0.65320.88
0.711650.891
0.8221660.913
0.94231110.951
1.0166135310.808
1.1211172390.722
1.2280236440.588

References

  1. Garcia-Diaz, V.; Espada, J.P.; Crespo, R.G.; G-Bustelo, B.C.P.; Lovelle, J.M.C. An approach to improve the accuracy of probabilistic classifiers for decision support systems in sentiment analysis. Appl. Soft Comput. 2018, 67, 822–833. [Google Scholar] [CrossRef]
  2. Oyebode, O.; Alqahtani, F.; Orji, R. Using Machine Learning and Thematic Analysis Methods to Evaluate Mental Health Apps Based on User Reviews. IEEE Access 2020, 8, 111141–111158. [Google Scholar] [CrossRef]
  3. Afzaal, M.; Usman, M.; Fong, A. Tourism Mobile App with Aspect-Based Sentiment Classification Framework for Tourist Reviews. IEEE Trans. Consum. Electron. 2019, 65, 233–242. [Google Scholar] [CrossRef]
  4. Li, W.; Xu, B. Aspect-Based Fashion Recommendation with Attention Mechanism. IEEE Access 2020, 8, 141814–141823. [Google Scholar] [CrossRef]
  5. Chenaghlou, M. Data Stream Clustering and Anomaly Detection. Ph.D. Thesis, The Univerisity of Melbourne, Parkville, Australia, 2019. [Google Scholar]
  6. Wang, H.; Bah, M.J.; Hammad, M. Progress in Outlier Detection Techniques: A Survey. IEEE Access 2019, 7, 107964–108000. [Google Scholar] [CrossRef]
  7. de la Torre-Abaitua, G.; Lago-Fernández, L.F.; Arroyo, D. A compression-based method for detecting anomalies in textual data. Entropy 2021, 23, 618. [Google Scholar] [CrossRef]
  8. Iglesias, C.A.; Moreno, A. Sentiment Analysis for social media. Appl. Sci. 2019, 9, 5037. [Google Scholar] [CrossRef] [Green Version]
  9. Chakraborty, K.; Bhattacharyya, S.; Bag, R. A Survey of Sentiment Analysis from Social Media Data. IEEE Trans. Comput. Soc. Syst. 2020, 7, 450–464. [Google Scholar] [CrossRef]
  10. Hou, Q.; Han, M.; Cai, Z. Survey on data analysis in social media: A practical application aspect. Big Data Min. Anal. 2020, 3, 259–279. [Google Scholar] [CrossRef]
  11. Nazir, A.; Rao, Y.; Wu, L.; Sun, L. Issues and Challenges of Aspect-based Sentiment Analysis: A Comprehensive Survey. IEEE Trans. Affect. Comput. 2020. [Google Scholar] [CrossRef]
  12. Hu, T.; She, B.; Duan, L.; Yue, H.; Clunis, J. A Systematic Spatial and Temporal Sentiment Analysis on Geo-Tweets. IEEE Access 2019, 8, 8658–8667. [Google Scholar] [CrossRef]
  13. Park, J. Framework for Sentiment-Driven Evaluation of Customer Satisfaction with Cosmetics Brands. IEEE Access 2020, 8, 98526–98538. [Google Scholar] [CrossRef]
  14. Hu, S.; Kumar, A.; Al-Turjman, F.; Gupta, S.; Seth, S. Shubham Reviewer Credibility and Sentiment Analysis Based User Profile Modelling for Online Product Recommendation. IEEE Access 2020, 8, 26172–26189. [Google Scholar] [CrossRef]
  15. Li, M.; Ma, Y.; Cao, P. Revealing Customer Satisfaction with Hotels Through Multi-Site Online Reviews: A Method Based on the Evidence Theory. IEEE Access 2020, 8, 225226–225239. [Google Scholar] [CrossRef]
  16. Jerripothula, K.R.; Rai, A.; Garg, K.; Rautela, Y.S. Feature-Level Rating System Using Customer Reviews and Review Votes. IEEE Trans. Comput. Soc. Syst. 2020, 7, 1210–1219. [Google Scholar] [CrossRef]
  17. Ali, S.; Wang, G.; Riaz, S. Aspect Based Sentiment Analysis of Ridesharing Platform Reviews for Kansei Engineering. IEEE Access 2020, 8, 173186–173196. [Google Scholar] [CrossRef]
  18. Zhang, B.; Li, X.; Xu, X.; Leung, K.-C.; Chen, Z.; Ye, Y. Knowledge Guided Capsule Attention Network for Aspect-Based Sentiment Analysis. IEEE/ACM Trans. Audio Speech Lang. Process. 2020, 28, 2538–2551. [Google Scholar] [CrossRef]
  19. Liu, H.; Chatterjee, I.; Zhou, M.; Lu, X.S.; Abusorrah, A. Aspect-Based Sentiment Analysis: A Survey of Deep Learning Methods. IEEE Trans. Comput. Soc. Syst. 2020, 7, 1358–1375. [Google Scholar] [CrossRef]
  20. Benlahbib, A.; Nfaoui, E.H. Aggregating Customer Review Attributes for Online Reputation Generation. IEEE Access 2020, 8, 96550–96564. [Google Scholar] [CrossRef]
  21. Almaghrabi, M.; Chetty, G. Improving Sentiment Analysis in Arabic and English Languages by Using Multi-Layer Perceptron Model (MLP). In Proceedings of the 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), Sydney, Australia, 6–9 October 2020; pp. 745–746. [Google Scholar]
  22. Basiri, M.E.; Abdar, M.; Kabiri, A.; Nemati, S.; Zhou, X.; Allahbakhshi, F.; Yen, N.Y. Improving Sentiment Polarity Detection Through Target Identification. IEEE Trans. Comput. Soc. Syst. 2020, 7, 113–128. [Google Scholar] [CrossRef]
  23. Younas, A.; Nasim, R.; Ali, S.; Wang, G.; Qi, F. Sentiment Analysis of Code-Mixed Roman Urdu-English Social Media Text using Deep Learning Approaches. In Proceedings of the 2020 IEEE 23rd International Conference on Computational Science and Engineering (CSE), Guangzhou, China, 29 December 2020—1 January 2021; pp. 66–71. [Google Scholar]
  24. Yadav, V.; Verma, P.; Katiyar, V. E-Commerce Product Reviews Using Aspect Based Hindi Sentiment Analysis. In Proceedings of the 2021 International Conference on Computer Communication and Informatics (ICCCI), Coimbatore, India, 27–29 January 2021; pp. 1–8. [Google Scholar]
  25. Yaqub, U.; Malik, M.A.; Zaman, S. Sentiment Analysis of Russian IRA Troll Messages on Twitter during US Presidential Elections of 2016. In Proceedings of the 2020 7th International Conference on Behavioural and Social Computing (BESC), Bournemouth, UK, 5–7 November 2020; pp. 1–6. [Google Scholar]
  26. Li, G.; Zheng, Q.; Zhang, L.; Guo, S.; Niu, L. Sentiment Infomation based Model for Chinese text Sentiment Analysis. In Proceedings of the 2020 IEEE 3rd International Conference on Automation, Electronics and Electrical Engineering (AUTEEE), Shenyang, China, 20–22 November 2020; pp. 366–371. [Google Scholar]
  27. Saputra, F.T.; Wijaya, S.H.; Nurhadryani, Y. Defina Lexicon Addition Effect on Lexicon-Based of Indonesian Sentiment Analysis on Twitter. In Proceedings of the 2020 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS), Jakarta, Indonesia, 19–20 November 2020; pp. 136–141. [Google Scholar]
  28. García-Mendoza, C.V.; Gambino, O.J.; Villarreal-Cervantes, M.G.; Calvo, H. Evolutionary Optimization of Ensemble Learning to Determine Sentiment Polarity in an Unbalanced Multiclass Corpus. Entropy 2020, 22, 1020. [Google Scholar] [CrossRef] [PubMed]
  29. Wang, Y.; Chen, Q.; Ahmed, M.; Li, Z.; Pan, W.; Liu, H. Joint Inference for Aspect-level Sentiment Analysis by Deep Neural Networks and Linguistic Hints. IEEE Trans. Knowl. Data Eng. 2021, 33, 2002–2014. [Google Scholar] [CrossRef]
  30. Jadon, P.; Bhatia, D.; Mishra, D.K. A BigData approach for sentiment analysis of twitter data using Naive Bayes and SVM Algorithm. In Proceedings of the 2019 Sixteenth International Conference on Wireless and Optical Communication Networks (WOCN), Bhopal, India, 19–21 December 2019. [Google Scholar]
  31. Saranya, G.; Geetha, G.; Meenakshi, K.; Karpagaselvi, S. Sentiment analysis of healthcare Tweets using SVM Classifier. In Proceedings of the 2020 International Conference on Power, Energy, Control and Transmission Systems (ICPECT), Chennai, India, 29–30 April 2020; pp. 1–3. [Google Scholar]
  32. Zhang, B.; Xu, D.; Zhang, H.; Li, M. STCS Lexicon: Spectral-Clustering-Based Topic-Specific Chinese Sentiment Lexicon Construction for Social Networks. IEEE Trans. Comput. Soc. Syst. 2019, 6, 1180–1189. [Google Scholar] [CrossRef]
  33. Singh, J.; Tripathi, P. Sentiment analysis of Twitter data by making use of SVM, Random Forest and Decision Tree algorithm; Sentiment analysis of Twitter data by making use of SVM, Random Forest and Decision Tree algorithm. In Proceedings of the 10th IEEE International Conference on Communication Systems and Network Technologies (CSNT), Bhopal, India, 18–19 June 2021. [Google Scholar] [CrossRef]
  34. Wang, L.; Niu, J.; Yu, S. SentiDiff: Combining Textual Information and Sentiment Diffusion Patterns for Twitter Sentiment Analysis. IEEE Trans. Knowl. Data Eng. 2020, 32, 2026–2039. [Google Scholar] [CrossRef]
  35. Shofiya, C.; Abidi, S. Sentiment Analysis on COVID-19-Related Social Distancing in Canada Using Twitter Data. Int. J. Environ. Res. Public Health 2021, 18, 5993. [Google Scholar] [CrossRef] [PubMed]
  36. Zhang, Y.; Xu, B.; Zhao, T. Convolutional multi-head self-attention on memory for aspect sentiment classification. IEEE/CAA J. Autom. Sin. 2020, 7, 1038–1044. [Google Scholar] [CrossRef]
  37. Abdalgader, K.; Al Shibli, A. Experimental Results on Customer Reviews Using Lexicon-Based Word Polarity Identification Method. IEEE Access 2020, 8, 179955–179969. [Google Scholar] [CrossRef]
  38. Chen, H.; Zhang, X.; Du, S.; Wu, Z.; Zheng, N. A correntropy-based affine iterative closest point algorithm for robust point set registration. IEEE/CAA J. Autom. Sin. 2019, 6, 981–991. [Google Scholar] [CrossRef]
  39. Shanmugam, M.; Agawane, A.; Tiwari, A.; Deolekar, R.V. Twitter Sentiment Analysis using Novelty Detection. In Proceedings of the 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India, 20–22 August 2020; pp. 1258–1263. [Google Scholar]
  40. Schmitt, M.F.L.; Spinosa, E.J. Outlier Detection on Semantic Space for Sentiment Analysis with Convolutional Neural Networks. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar]
  41. Kim, J.; Park, M.; Kim, H.; Cho, S.; Kang, P. Insider threat detection based on user behavior modeling and Anomaly Detection Algorithms. Appl. Sci. 2019, 9, 4018. [Google Scholar] [CrossRef] [Green Version]
  42. Neagu, B.C.; Grigoras, G.; Scarlatache, F. Outliers discovery from Smart Meters data using a statistical based data mining approach. In Proceedings of the 2017 10th International Symposium on Advanced Topics in Electrical Engineering (ATEE), Bucharest, Romania, 23–25 March 2017; pp. 555–558. [Google Scholar]
  43. Ahmed, I.; Dagnino, A.; Ding, Y. Unsupervised anomaly detection based on minimum spanning tree approximated distance measures and its application to hydropower turbines. IEEE Trans. Autom. Sci. Eng. 2019, 16, 654–667. [Google Scholar] [CrossRef]
  44. Cui, M.; Wang, J.; Florita, A.R.; Zhang, Y. Generalized Graph Laplacian Based Anomaly Detection for Spatiotemporal MicroPMU Data. IEEE Trans. Power Syst. 2019, 34, 3960–3963. [Google Scholar] [CrossRef]
  45. Verma, P.; Sinha, M.; Panda, S. Fuzzy c-Means Clustering-Based Novel Threshold Criteria for Outlier Detection in Electronic Nose. IEEE Sens. J. 2021, 21, 1975–1981. [Google Scholar] [CrossRef]
  46. Corain, M.; Garza, P.; Asudeh, A. DBSCOUT: A Density-based Method for Scalable Outlier Detection in Very Large Datasets. In Proceedings of the 2021 IEEE 37th International Conference on Data Engineering (ICDE), Chania, Greece, 19–22 April 2021; pp. 37–48. [Google Scholar]
  47. Sapegin, A.; Meinel, C. K-metamodes: Frequency-and ensemble-based distributed k-modes clustering for security analytics. In Proceedings of the 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, 14–17 December 2020; pp. 344–351. [Google Scholar]
  48. Ghahramani, M.H.; Zhou, M.; Hon, C.T. Toward cloud computing QoS architecture: Analysis of cloud systems and cloud services. IEEE/CAA J. Autom. Sin. 2017, 4, 6–18. [Google Scholar] [CrossRef]
  49. Masters, K. 89% of Consumers Are More Likely to Buy Products from Amazon than Other E-Commerce Sites: Study. Forbes 20 March 2019. Available online: https://www.forbes.com/sites/kirimasters/2019/03/20/study-89-of-consumers-are-more-likely-to-buy-products-from-amazon-than-other-e-commerce-sites/?sh=273313e64af1 (accessed on 12 November 2021).
  50. Chatterjee, I. Amazon Customer Review. Harvard Dataverse. 2021. Available online: https://0-doi-org.brum.beds.ac.uk/10.7910/DVN/W96OFO (accessed on 29 November 2021).
  51. Hu, N.; Zhang, J.; Pavlou, P.A. Overcoming the J-shaped distribution of product reviews. Commun. ACM 2009, 52, 144–147. [Google Scholar] [CrossRef]
  52. Hussain, A.; Aleem, M. GoCJ: Google Cloud Jobs Dataset for Distributed and Cloud Computing Infrastructures. Data 2018, 3, 38. [Google Scholar] [CrossRef] [Green Version]
  53. Amazon Customer Reviews Dataset. Available online: https://s3.amazonaws.com/amazon-reviews-pds/readme.html (accessed on 12 November 2021).
  54. Release 0.16. Available online: https://buildmedia.readthedocs.org/media/pdf/textblob/latest/textblob.pdf (accessed on 18 October 2021).
  55. Riahi-Madvar, M.; Nasersharif, B.; Azirani, A.A. Subspace Outlier Detection in High Dimensional Data using Ensemble of PCA-based Subspaces. In Proceedings of the 2021 26th International Computer Conference, Computer Society of Iran (CSICC), Tehran, Iran, 3–4 March 2021; pp. 1–5. [Google Scholar]
  56. Studiawan, H.; Sohel, F.; Payne, C. Anomaly Detection in Operating System Logs with Deep Learning-based Sentiment Analysis. IEEE Trans. Dependable Secur. Comput. 2021, 18, 2136–2148. [Google Scholar] [CrossRef]
  57. Tian, R.; Ruan, K.; Li, L.; Le, J.; Greenberg, J.; Barbat, S. Standardized evaluation of camera-based driver state monitoring systems. IEEE/CAA J. Autom. Sin. 2019, 6, 716–732. [Google Scholar] [CrossRef]
  58. Tian, G.; Zhang, H.; Zhou, M.; Li, Z. AHP, Gray Correlation, and TOPSIS Combined Approach to Green Performance Evaluation of Design Alternatives. IEEE Trans. Syst. Man Cybern. Syst. 2018, 48, 1093–1105. [Google Scholar] [CrossRef]
  59. Feng, Y.; Zhou, M.; Tian, G.; Li, Z.; Zhang, Z.; Zhang, Q.; Tan, J. Target Disassembly Sequencing and Scheme Evaluation for CNC Machine Tools Using Improved Multiobjective Ant Colony Algorithm and Fuzzy Integral. IEEE Trans. Syst. Man Cybern. Syst. 2018, 49, 2438–2451. [Google Scholar] [CrossRef]
  60. Han, W.; Lu, X.S.; Zhou, M.; Shen, X.; Wang, J.; Xu, J. An Evaluation and Optimization Methodology for Efficient Power Plant Programs. IEEE Trans. Syst. Man Cybern. Syst. 2020, 50, 707–716. [Google Scholar] [CrossRef]
  61. Ghahramani, M.; Qiao, Y.; Zhou, M.; O Hagan, A.; Sweeney, J. AI-based modeling and data-driven evaluation for smart manufacturing processes. IEEE/CAA J. Autom. Sin. 2020, 7, 1026–1037. [Google Scholar] [CrossRef]
  62. Tian, G.; Hao, N.; Zhou, M.; Pedrycz, W.; Zhang, C.; Ma, F.; Li, Z. Fuzzy Grey Choquet Integral for Evaluation of Multicriteria Decision Making Problems with Interactive and Qualitative Indices. IEEE Trans. Syst. Man Cybern. Syst. 2020, 51, 1–14. [Google Scholar] [CrossRef]
  63. Luo, X.; Zhou, M.; Leung, H.; Xia, Y.; Zhu, Q.; You, Z.; Li, S. An Incremental-and-Static-Combined Scheme for Matrix-Factorization-Based Collaborative Filtering. IEEE Trans. Autom. Sci. Eng. 2016, 13, 333–343. [Google Scholar] [CrossRef]
  64. Shang, M.; Luo, X.; Liu, Z.; Chen, J.; Yuan, Y.; Zhou, M. Randomized latent factor model for high-dimensional and sparse matrices from industrial applications. IEEE/CAA J. Autom. Sin. 2019, 6, 131–141. [Google Scholar] [CrossRef]
Figure 1. J-shaped distribution of the tallied reviews from all the accumulated datasets.
Figure 1. J-shaped distribution of the tallied reviews from all the accumulated datasets.
Entropy 23 01645 g001
Figure 2. Average helpful votes per review across different star ratings.
Figure 2. Average helpful votes per review across different star ratings.
Entropy 23 01645 g002
Figure 3. Box plot (with interquartile range) of a normal distribution for outliers’ detection.
Figure 3. Box plot (with interquartile range) of a normal distribution for outliers’ detection.
Entropy 23 01645 g003
Figure 4. Box plot (with interquartile range) of S+ distribution for outliers’ detection. (a) depicts the box plot of S+ when FL is negative and (b) depicts the box plot of S+ when FL is positive.
Figure 4. Box plot (with interquartile range) of S+ distribution for outliers’ detection. (a) depicts the box plot of S+ when FL is negative and (b) depicts the box plot of S+ when FL is positive.
Entropy 23 01645 g004
Figure 5. Box plot (with interquartile range) of S distribution for outliers’ detection. (a) depicts the box plot of S when FU > 2 and (b) depicts the box plot of S when FU ≤ 2.
Figure 5. Box plot (with interquartile range) of S distribution for outliers’ detection. (a) depicts the box plot of S when FU > 2 and (b) depicts the box plot of S when FU ≤ 2.
Entropy 23 01645 g005
Table 1. Review distribution across different star ratings.
Table 1. Review distribution across different star ratings.
Dataset5-Star Rating4-Star Rating3-Star Rating2-Star Rating1-Star Rating
Book41042196246569
Electronics35677709451518
Entertainment24851062797271385
Grocery3402683134104677
Health Care3014910263196617
Personal Care3287338641200534
Pharmaceutical3190855184114657
Table 2. Average helpful votes per review across different star ratings.
Table 2. Average helpful votes per review across different star ratings.
Dataset5-Star Rating4-Star Rating3-Star Rating2-Star Rating1-Star Rating
Book6.271.899.721052.3
Electronics4.614.541.231.5228.11
Entertainment1.310.111.270.695.19
Grocery0.770.350.230.581.45
Health Care1.011.040.450.381.22
Personal Care0.540.460.060.231.41
Pharmaceutical4.192.340.430.729.64
Table 3. Performance comparison of SODCM with state-of-the-art approaches.
Table 3. Performance comparison of SODCM with state-of-the-art approaches.
DatasetMethodsAccuracy%Recall% O D
BookSODCM96.998.475
[55]84.152.2410
[56]86.150.2955
ElectronicsSODCM93.196.560
[55]67.349.8193
[56]71.348.5638
EntertainmentSODCM87.693.823
[55]67.751.8158
[56]79.148.91434
GrocerySODCM92.396.131
[55]75.749.7406
[56]85.848.11194
Health CareSODCM93.196.543
[55]74.851.199
[56]86.249.11025
Personal CareSODCM93.396.631
[55]76.350.9717
[56]86.248.91177
PharmaceuticalSODCM89.494.717
[55]78.751.0239
[56]77.347.2971
Table 4. Performance comparison of SODCM with state-of-the-art methods on public datasets.
Table 4. Performance comparison of SODCM with state-of-the-art methods on public datasets.
DatasetMethodsAccuracy%Recall% O D
ApparelSODCM89.194.5809
[55]78.865.36404
[56]80.165.3585
BeautySODCM90.495.1936
[55]81.265.49501
[56]83.165.5643
FashionSODCM92.396.11061
[55]81.662.23257
[56]81.462.1604
FurnitureSODCM90.895.3922
[55]80.464.83743
[56]81.264.1675
JewelrySODCM91.395.6700
[55]81.264.46345
[56]82.464.4562
LuggageSODCM92.196.2831
[55]82.163.64000
[56]83.363.8599
ToySODCM90.295.1662
[55]83.265.79444
[56]84.165.2634
Table 5. Metrics comparison for SODCM.
Table 5. Metrics comparison for SODCM.
Datasetp-ValueT-ScoreCI
Book1.77 × 10−99.05[0.02, 0.04]
Electronics1.43 × 10−616.67[0.06, 0.08]
Entertainment8.46 × 10−825.67[0.11, 0.13]
Grocery1.48 × 10−718.93[0.07, 0.08]
Health Care7.26 × 10−617.27[0.06, 0.08]
Personal Care1.08 × 10−617.38[0.06, 0.07]
Pharmaceutical3.62 × 10−923.63[0.10, 0.12]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Chatterjee, I.; Zhou, M.; Abusorrah, A.; Sedraoui, K.; Alabdulwahab, A. Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews. Entropy 2021, 23, 1645. https://0-doi-org.brum.beds.ac.uk/10.3390/e23121645

AMA Style

Chatterjee I, Zhou M, Abusorrah A, Sedraoui K, Alabdulwahab A. Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews. Entropy. 2021; 23(12):1645. https://0-doi-org.brum.beds.ac.uk/10.3390/e23121645

Chicago/Turabian Style

Chatterjee, Ishani, Mengchu Zhou, Abdullah Abusorrah, Khaled Sedraoui, and Ahmed Alabdulwahab. 2021. "Statistics-Based Outlier Detection and Correction Method for Amazon Customer Reviews" Entropy 23, no. 12: 1645. https://0-doi-org.brum.beds.ac.uk/10.3390/e23121645

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop