Next Article in Journal
Development of an Evaluation Framework for Smartness and Sustainability in Cities
Next Article in Special Issue
ABGS: A System for the Automatic Generation of Building Information Models from Two-Dimensional CAD Drawings
Previous Article in Journal
Differences in Electricity Generation from Renewable Sources from Similar Environmental Conditions: The Cases of Spain and Cuba
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Sentiment Digitization Modeling for Recommendation System

1
National Program of Excellence in Software center, Chosun University, 61452 Gwangju, Korea
2
Department of Electronics Engineering, Chosun University, 61452 Gwangju, Korea
*
Author to whom correspondence should be addressed.
Sustainability 2020, 12(12), 5191; https://0-doi-org.brum.beds.ac.uk/10.3390/su12125191
Submission received: 9 May 2020 / Revised: 22 June 2020 / Accepted: 23 June 2020 / Published: 25 June 2020
(This article belongs to the Special Issue Human-Centric Urban Services)

Abstract

:
As the importance of providing personalized services increases, various studies on personalized recommendation systems are actively being conducted. Among the many methods used for recommendation systems, the most widely used is collaborative filtering. However, this method has lower accuracy because recommendations are limited to using quantitative information, such as user ratings or amount of use. To address this issue, many studies have been conducted to improve the accuracy of the recommendation system by using other types of information, in addition to quantitative information. Although conducting sentiment analysis using reviews is popular, previous studies show the limitation that results of sentiment analysis cannot be directly reflected in recommendation systems. Therefore, this study aims to quantify the sentiments presented in the reviews and reflect the results to the ratings; that is, this study proposes a new algorithm that quantifies the sentiments of user-written reviews and converts them into quantitative information, which can be directly reflected in recommendation systems. To achieve this, the user reviews, which are qualitative information, must first be quantified. Thus, in this study, sentiment scores are calculated through sentiment analysis by using a text mining technique. The data used herein are from movie reviews. A domain-specific sentiment dictionary was constructed, and then based on the dictionary, sentiment scores of the reviews were calculated. The collaborative filtering of this study, which reflected the sentiment scores of user reviews, was verified to demonstrate its higher accuracy than the collaborative filtering using the traditional method, which reflects only user rating data. To overcome the limitations of the previous studies that examined the sentiments of users based only on user rating data, the method proposed in this study successfully enhanced the accuracy of the recommendation system by precisely reflecting user opinions through quantified user reviews. Based on the findings of this study, the recommendation system accuracy is expected to improve further if additional analysis can be performed.

1. Introduction

It is estimated that more than 2.5 trillion MB of data are generated per day worldwide, at the current pace, and the pace of this generation is increasing by 60% each year. Online accessibility has made it possible to store large amounts of data while mitigating the physical limitation of offline storage. However, with the increase in the amount of accessible information, more people are feeling extreme exhaustion from the flood of newly generated information every day. As reviewing every piece of information is merely impossible, there are many difficulties in finding and selecting information that suits the preferences of each user. Thus, the necessity of a recommendation system that can remove unnecessary information from a large of number data and provide information according to individual preferences is gradually increasing.
Previous studies on recommendation systems that provided personalized information to users were generally focused on analyses using structured data, which are easy to quantify, such as user purchase history, product ratings, and number of visits [1]. Among the data, rating data have been used in the recommendation systems as the most popular index of user preference. However, in recent years, recommendations limited to using only the user rating data as the index to indicate user preferences provided low accuracy [2,3]. This indicates that there were limitations in recommendation system developments as the detailed and elaborate preferences of users are not reflected. Although the same rating scores of 10 points may be given by a user to two different movies, the intensity of the sentiments found in the user review texts may be different for these movies. Despite the same quantitative rating scores, this could be the crucial factor that reduces the accuracy of the recommendation system.
Owing to its high usability and its effortless processing of explicit information into a mathematical system, collaborative filtering using user ratings has been widely used. However, questions have been raised whether the user rating information properly reflects the user preferences. This uncertainty has led to various studies that developed collaborative filtering models that better reflect user preferences [4].
Many researchers have focused on text reviews in making rating predictions. The text reviews serve as a useful tool in the recommendation systems as they provide detailed preferences of what the user might have for a specific item. Various studies have also confirmed that the user text reviews play an important role in developing a recommendation system [5,6].
Since human-centered urban services should identify what people need, address their various needs, and set a vision by reinterpreting the city, various fields should be involved, such as culture, architecture, city, civil engineering, transportation, and machinery. As a city without people has no meaning, it is natural for the environment to change centered on human beings. In order to develop a new city, we should pay attention to each person who belongs to it, not the physical environment. Furthermore, if there is a request from a single individual, no matter how trivial it is, a series of efforts are required to actively respond to the request.
In order to analyze and utilize such request, making the most of big data is important more than anything else. Big data analysis is a method of analyzing a huge amount of data arising from the use of wired/wireless Internet and social network services (SNS), and various studies on this have been actively underway.
In particular, data on movies are a typical data that help to verify human emotion indicators and can be applied to personalized urban services. Accordingly, research on recommendation systems through movie analysis using big data has been steadily increasing. The public’s preferences for products and movies are analyzed by classifying emotions that appear in product reviews, product use reviews, and movie reviews written by unspecified individuals as negative or positive. By combining the analyzed preferences with personalized information, it is used in various recommendation system areas appropriate for the user propensity.
In this study, to improve the accuracy of existing recommendation systems, which used only quantitative data, a model that precisely corrects ratings by quantifying the sentiments of user reviews, which are qualitative data, is proposed. In addition, an algorithm that improves the recommendation system accuracy is proposed by applying review sentiment scores to the ratings. This study constructs a domain-specific dictionary based on the reviews, which are qualitative data generated by users. Then, using the dictionary, an algorithm that can improve the accuracy of the recommendation system is derived by extracting the sentiment scores of the texts and reflecting them to the ratings (i.e., quantitative data).

2. Related Research

2.1. Recommendation System

The aim of the recommendation system is to provide suggestions to users with items that fit the user preference criteria, based on various factors, such as demographic information, purchase history, and expected interest of the user. Since the successful recommendation system implementations by Netflix and Amazon, various efforts have also been made in Korea to recommend users with items relevant to their preferences. Some examples of this include a movie recommendation system by WATCHA and top news recommendation through NAVER AiRS, and Kakao’s RUBICS, which recommends content by analyzing user responses in real time. Thus, recommendation systems are widely used in our daily lives, and studies related to these systems are continuously being conducted. A collaborative filtering method is used for the popular recommendation algorithm that is most frequently used [7].
Collaborative filtering is a preference predicting method that is based on similarities between users or items, using a basic assumption that the users with similar preferences on one particular item will show similar preferences on the other item [8]. The collaborative filtering method can be mainly classified into memory-based and model-based algorithms. The memory-based algorithm, also referred to as a neighborhood model, is an algorithm that predicts the rating that a user might give by constructing a user-item matrix of all users and by finding similar users or items based on the user or item information. This method is divided into user-based and item-based collaborative filtering [9].
The user-based collaborating filtering is a method of recommending items that are commonly desired by neighboring users, after defining neighboring users who have similar desires to the user, based on the rating information provided by the user.
The basic concept is shown in Figure 1. In the figure, the user with the most similar preferences to the recommendation target user is “User C,” who purchased “Item 1,” “Item 2,” and “Item 5.” Based on the information of “User C,” “Item 7,” which has been preferred by “User C” but not by the target user yet, is chosen and recommended to the target user.
Item-based collaborative filtering recommendation is used for YouTube and Netflix video recommendations and Amazon product recommendations [10].
The basic concept is shown in Figure 2. The item with the highest similarity is chosen among the recommendation items. The selected “Item 3” is presented as the recommended item to “User E”, who has not yet purchased the recommended item.
The model-based algorithm employs the base process of a memory-based collaborating filtering method and uses machine learning or data mining techniques in the clustering, classification, and prediction processes [11]. Matrix factorization and clustering models are techniques that predict ratings on the unrated items by modeling users based on past user ratings. The matrix factorization method is a rating prediction method that uses latent factors, instead of a direct relationship between a user and an item. The most widely used algorithms are the singular value decomposition (SVD) and SVD++ algorithms. The clustering model enables individuals with similar characteristics to be grouped together using similarity measures between individuals [12]. The most widely used algorithms include the k-means and DBSCAN algorithms.

2.2. Recommendation System Reflecting User Reviews

As an example related to the recommendation system reflecting user reviews, one study developed a model that estimates the trend and intensity of positive or negative sentiment found in user reviews. Based on this model, sentiment analysis was conducted using a collaborative filtering recommendation system to classify whether the user is an optimist or pessimist [13,14]. Moreover, collaborated filtering was carried out for each group to set the user reviews as a criterion for group classification. One study extracted words that have a high relevance to the ratings and conducted multi-category classification featuring a neutral category, in addition to positive and negative categories, based on the frequency of the extracted words [15]. Thus, many studies have been conducted to include the classification of multiple sentiments, such as neutral, in addition to classifying sentiments into only a positive and negative dichotomy. Further, various attempts have been consistently made to predict ratings by constructing sentiment dictionaries by creating a group of sentiment sentences related to opinions and assessments in reviews and then applying these sentences to movie reviews to infer ratings from 1 to 10, according to polarity [16]. However, these studies have the limitations of information loss as they do not directly reflect the review data in the algorithms. In recent studies related to sentiment analysis, sentiment scores have been investigated to measure the degree of sentiment rather than the classification. Hence, numerous studies that comprehend the degree of sentiment are also actively being conducted.

2.3. Recommendation System Using Sentiment Analysis

As social media and social network services (SNS) have become popular, the public has become the source of information as well as being the information consumer. The public generally expresses their feelings or opinions by posting comments and thoughts on various websites and SNS platforms. Consequently, with the increase in review data flow, interests in sentiment analysis also increased.
Sentiment analysis is a method that extracts subjective attitudes or sentiments of people based on text mining, which is one of the important areas of big data. Among previous studies and recommendation systems that use sentiment analysis, one study proposed a movie recommendation system by extracting emotion-related words from user reviews and comments to recommend personalized movies to individuals. In another study, the possibility of recommending appropriate movies to users was demonstrated through analysis of the sentiments of emotion-related words extracted from the sentiment lexicon, SentiWordNet [17]. Another study defined a review ontology, which is an emotion word dictionary for movies, to evaluate the sentiments of ratings and reviews [18]. Based on this, the study recommended customized movies to users by using a collaborated filtering method and context-based technique to analyze the sentiment level of emotion-related words of movie reviews. In addition, there is a study that used feedback given by restaurant customers to build a recommendation system based on user attributes and characteristics, by classifying positive and negative emotions and calculating sentiment scores through SentiWordNet [19]. In addition to the mentioned examples, there is also a study that showed a recommendation performance improvement by reflecting user review mining in the traditional recommendation algorithm, which was based on ratings [20].
Previous studies on recommendation systems using sentiment analysis mainly consist of using sentiment lexicons, such as SentiWordNet. Various studies on sentiment analysis in English have been actively conducted using lexicons such as AFFIN, SentiWordNet, and EmoLex. However, studies on sentiment analysis in the Korean language are relatively insufficient compared with English. This may be due to the linguistic characteristics of the Korean language. To conduct sentiment analysis, part-of-speech tagging should be performed first on nouns, verbs, and adjectives, in the process of natural language processing. In the case of English, part-of-speech tagging can be easily performed using the space between the words, as English is an inflected language with the tendency to have part-of-speech breaks with spaces. Contrary to this, the Korean language is an agglutinating language. Hence, there are many cases where the part-of-speech cannot be distinguished by spaces between words. Owing to this reason, studies on sentiment analysis using Korean texts have not been actively conducted.

2.4. Dictionary Construction-Based Sentiment Analysis

Dictionary-based sentiment analysis is the methodology of quantifying user reviews by matching the collected review data, which have been pre-processed, with a pre-constructed sentiment dictionary. Although the degree of sentiment is easy to understand when a dictionary is used, if the sentiment analysis is conducted using a general sentiment dictionary, the same words could have completely opposite sentiments in some cases, depending on the domain, which consequently could return a poor accuracy. Therefore, to correctly conduct sentiment analysis, specialized dictionaries using the domain characteristics for each application field should be constructed. Furthermore, because the same vocabulary can be used with different meanings, depending on the characteristics of the topic to be analyzed, constructing different sentiment dictionaries according to the characteristics of each domain is suggested for higher performance, rather than using a general sentiment dictionary [21].
To conduct sentiment analysis, numerous studies have been conducted on constructing appropriate sentiment dictionaries for given domains. One study improved the classification prediction accuracy by constructing a sentiment dictionary by extracting hospital-specific sentiment vocabulary and polarity values using 4300 “voice of customer” data, collected from a medical institution webpage. Another study constructed a sentiment dictionary specialized for the stock market domain from economic news data to improve the prediction accuracy of the stock market index [22]. In addition, one study verified that conducting sentiment analysis using a sentiment dictionary constructed for a specific topic significantly improves the prediction accuracy, compared to using a general sentiment dictionary [23].
The basic summary of the abovementioned previous studies is as follows: Typically, quantitative data, such as ratings, purchase history, and number of visits, have been utilized to conduct the most widely used collaborative filtering. In recent years, commonly used rating data have been found to be one of the major causes of the low accuracy of recommendation systems. Based on this finding, various methods have been proposed to improve the recommendation system accuracy. A popular method that uses user reviews in the recommendation system was proposed. However, it was limited to classifying the reviews only into positive, neutral, and negative sentiments and, thus, cannot reflect the detailed user satisfaction found in the reviews. As the number of studies on analyzing the degree of sentiment increased, studies that apply the results of these analyses to the recommendation systems have been actively conducted. However, in the case of Korean language, the number of publicly available sentiment dictionaries is very limited, and the existing dictionaries are only general dictionaries, instead of domain-specific dictionaries, hence limiting the recommendation accuracy.

3. Proposed Method

This study proposes a model that aims to improve the accuracy of the recommendation system by calculating review sentiment scores and integrating them with user ratings. In more detail, a domain-specific sentiment dictionary is constructed to derive the sentiment scores of user reviews. Then, based on the dictionary, the sentiment scores of the user review data are calculated and reflected in the recommendation system.
Figure 3 shows the proposed algorithm.

3.1. Step 1, Data Collection (Web Crawling)

Within the texts, emotions, assessments, attitudes, facts, etc., are expressed. Among many types of texts, movie reviews were used for the analysis as they efficiently express user sentiments in short sentences of no longer than 140 characters [24]. The data were collected from NAVER Movies (movie.naver.com), which is operated by NAVER, the largest online platform in Korea. Further, a web crawling process was used to access the users and collect movie ratings and reviews left by the users.
The procedure of collecting review and rating data is shown in Figure 4. The original intent was to use the NAVER Movie API in collecting the data; however, the desired information for the study could not be collected through the API. Thus, the data were collected using a Python-based web crawler to automatically accumulate various information by visiting the website. After accessing the users through the web crawler, movie titles, ratings, and reviews left by the users were collected. The collected data consisted of 3856 users, 32,486 individual movie titles, and 100,230 movie ratings and reviews. Moreover, the rating data were in the scale of 1 to 10 points, without decimal points.

3.2. Step 2, Sampling the Data

Within the collected data, a data scarcity issue was found because the number of movies that had not been rated was greater than the number of movies that had been rated, among the entire collection of movies for which the users had left ratings and reviews. If the number of rated movies is small, incorrect similarity could be returned when finding similar users or items during the recommendation process. To reduce this data scarcity, only the users who left both ratings and reviews on at least 10 or more movies were selected for the experiment in this study. For this study, a total of 537 users and 4211 movie titles were selected from the collected data.

3.3. Step 3, Rating-Normalization

The user rating data are generally unequally distributed according to the user preferences. Owing to different criteria for rating items, there are users who generally tend to provide higher ratings, whereas other users tend to provide lower ratings. In the former case, a rating of 5 points would indicate a non-interesting movie, whereas this rating could indicate an interesting movie in the latter case. Hence, viewing the same rating scores of two different people in the same perspective cannot reflect the different rating tendencies and criteria of each person, as it could lead to a poor prediction accuracy. To reduce any bias caused from external factors, the data were normalized based on the personal evaluation tendencies of the users given that normalization can provide more accurate user similarities and user movie preferences.
In this study, we attempted to normalize user ratings by reflecting the rating tendency of users based on the differences in their preference for various items.

3.3.1. Movie Recommendation Approach Applying Differences in User Preferences or Partiality for Items

Based on user rating information, the difference or variances average rating score between items is calculated and then using it, the target user’s rating of a new item is predicted.
To calculate the differences in the preference for items, the average rating differences among the items are derived based on the item rating scores given by the users. The average user’s preference difference d i , j between two items i and j can be derived by using Equation (1).
d i , j = a U i U j r a , i r a , j U i U j
In Equation (1), the terms the rating difference of two items based on the users’ evaluations are expressed as U i ,   U j , as r a , i r a , j .
The preference prediction can be derived from Equation (2), which uses the average rating difference obtained from Equation (1) to derive the rating r ^ u , i given by user u for the new item i.
r ^ u , i = j I u r u , j + d i , j I u
Using Equation (2), the rating that will be given by user u for the new item i is predicted based on the rating for item j. This can be achieved by adding the average preference difference d i , j between items i and j to user u’s rating preference r u , j for the rated item j. Subsequently, the predicted values obtained using item j are averaged to obtain r ^ u , i . This corresponds to the case where the importance for each predicted value is considered as a constant value of 1.
r ^ u , i = j I u r u , j + d i , j · U i U j I u
The number of users that evaluated both items i and j, U i U j , can be considered as the weight of item j, which can then be multiplied with Equation (2) to derive the relative importance of each item j. Equation (3) denotes the equation corresponding to the application of a weighted average.

3.3.2. Recommendation Method Applying User Rating Tendency

The accuracy of rating prediction is improved by reflecting user tendencies for determinations when rating with the recommendation method and applying the user preference differences to items.
The manner in which users decide the rating for a given item varies depending on the personal preference of each user. For example, in the case of movie ratings, when there are two users u1 and u2 who judge the rating based on the five different criteria of storyline, characters, story development, entertainment value, and cinematography, user u1 may rate a movie as 10 out of 10 as long as the movie satisfies the entertainment value criterion, regardless of other criteria. In contrast, user u2 may rate the movie as 6 out of 10 if any one of the five criteria are not satisfied. As such, the rating tendencies differ for each user depending on the user’s preference; therefore, a process for converting subjective data into more objective data is required to apply the rating data obtained from various users to predict a different user’s rating for a new item. Accordingly, if the collected rating data can be appropriately normalized based on users’ rating tendencies, a more accurate recommendation can be provided to a new user, u3. User rating normalization is the process of adjusting the data distribution of user ratings such that the entire sample data has the same median, and it is performed as follows.
1. Normalization based on a median of 5.5
Since the median rating score is 5.5 when the users can rate items on a scale of 1 to 10, the normalization is conducted based on the median value of 5.5 in order to adjust each user rating dataset distribution to have a minimum rating score of 1 and a maximum rating score of 10. For example, in the case where the maximum rating score given by user u1 is 8.0 out of 10, the process of normalizing a rating score of 7.0 given by user u1 is as follows. Since the score of 7.0 is greater than the median value of 5.5, the score is normalized by employing 5.5 + 7 5.5 × 10 5.5 8 5.5 to adjust the score such that it lies within the common scale of 1 to 10. The normalization of the user’s rating score considering the minimum rating score is conducted similarly. For example, when the minimum rating score given by a user is 2.0, the user’s rating score of 3.0 can be normalized by adjusting the score to the common scale using 5.5 5.5 3 × 5.5 1 5.5 2 .
2. Normalization of Data Between Median and Minimum in the Rating Value Range
When the range of a user’s rating data is within the range constituted by the minimum value and median value of the common scale only, the maximum value of the user rating is set as the median value of the common rating scale, 5.5. Additionally, the minimum value of the user rating is set as the minimum value of the common rating scale, 1. Subsequently, normalization is conducted based on the median value of this user rating range, which is 3.25.
3. Normalization of Data Between Median and Maximum Rating Value Range
Similar to the case described in 2, when the range of a user’s rating data is localized such that it lies between the maximum and median rating values of the common scale only, the maximum value of the user rating is set as the maximum value of the common rating scale. Additionally, the minimum value of the user rating is set as the median value of the common rating scale, and normalization is performed based on the median value of this user rating range, which is 7.75.
Based on the recommendation method that uses the user preference differences among various items, user rating normalization is applied according to the rating decision tendencies of users. The normalized rating data are applied to Equation (1). Through Equation (1), by using the rating data of users who rated both items i and j, the user rating differences for items i and j are aggregated. During this process, the rating data that have been normalized according to the individual user rating tendency for items i and j can serve as a more objective index in predicting the target user’s rating.
After normalizing the selected user rating data, the extracted normalized rating data from the users were transformed into a user x item rating matrix, with user, item, and rating relations, as shown in Table 1 [25].

3.4. Step 4, Review-Preprocessing

Before conducting morphological analysis, which allows a more accurate review data analysis, words that do not have meanings, special characters, punctuation marks, English words, numbers, etc., were removed. Subsequently, morphological analysis was conducted to extract the necessary parts of speech of the reviews. Among various morphological analyzers, the RHINO library, which is a Korean morphological analyzer, was used to select and extract only the nouns, verbs, and adjectives that are most frequently used in sentiment analysis.

3.5. Step 5, Review-Sentiment Analysis

3.5.1. Step 5-1, Review Data Collecting

In constructing the sentiment dictionary, data from NAVER Lab were used as additional data. A total of 200,000 review data were obtained, which had rating integer values between 1 and 10. If the rating score was from 1 to 3, a label of 0 (negative) was assigned to the review, and if the rating score was from 9 to 10, a label of 1 (positive) was assigned. Among the data, 100,000 reviews were extracted with an equal polarity ratio of negative and positive labels. Subsequently, 75,000 movie reviews were used to construct the dictionary and 25,000 movie reviews were used as test data to verify the accuracy of the dictionary.

3.5.2. Step 5-2, Review Data Preprocessing

Morphological analysis was carried out using the same four-step pre-processing procedure. After the morphological analysis, the RHINO library was used to select and extract only the nouns, verbs, and adjectives of the parts of speech from the review data.
To construct a sentiment dictionary, words and phrases were extracted from the training data. In this study, the adjective, noun, and verb parts of speech tags that directly describe and express sentiments, were extracted from the training data to construct word and classification graphs. Among these, the ones that clearly express emotions were defined as sentiment words and sentiment phrases. Table 2 shows the number of words and phrases extracted to construct the word and phrase graphs, and Table 3; Table 4 each display the defined words and phrases.

3.5.3. Step 5-3, Dictionary Construction

The pre-processed review data were transformed into a document-term matrix. During this process, the Term frequency inverse document frequency (TF-IDF) weight method, which indicates the importance and frequency of the word in a document, was used to vectorize the texts.
Figure 5 displays the sentiment dictionary construction flow chart.
The independent variables are the TF-IDF value matrix of review words and the dependent variables are the label values of 0 and 1 of each review. Regression analyses were used to construct the dictionary. After acquiring regression coefficients of each word, the sentiment dictionary was constructed by placing the words into the positive dictionary if the coefficient was greater than 0 and into the negative dictionary if less than 0. However, as the text data lacked structures and had a large number of dimensions, the process of selecting and extracting variables when conducting regression analysis is important to improve the analysis performance. Thus, Ridge, Lasso, and ElasticNet regressions were used among the regression methods [26].
Ridge regression is a method of shrinking the regression coefficient by penalizing the regression model with a penalty [27]. Ridge regression is a linear regression that has L 2 - constraints. The ridge estimates are obtained using Equation (4).
β ^ r i d g e = a g r m i n i = 1 N ( y i β 0 j = 1 P x i j β j ) 2 + λ j = 1 P β j 2
From the equation, λ λ 0 determines the amount of shrinkage of the regression coefficient. As the λ value increases, the shrinkage amount also increases, and the regression coefficient value tends to zero.
Lasso regression analysis is a method of shrinking the regression coefficient by penalizing the regression model with a penalty, similar to ridge regression analysis [28]. This estimation method enables variable selection by making regression coefficient values of insignificant variables, as the lasso estimates are obtained using Equation (5).
β ^ l a s s o = a r g m i n i = 1 N ( y i β 0 j = 1 P x i j β j ) 2 + λ j = 1 P β j
As the value of λ λ 0 in Equation (2) increases, the value of the regression coefficient tends to zero.
The main difference between the two models of the ridge regression and lasso regression is that the ridge model uses the square of the coefficients; however, the lasso model uses the absolute value. Because the coefficients of each independent variable are close to zero, but not actually zero, the ridge model employs all the independent variables, even if the penalty value is large. However, because some variables become zero if the penalty value is large, the lasso model employs only the selected variables that are not zero.
ElasticNet is an algorithm that combines both ridge and lasso regressions. The ElasticNet estimates are obtained by Equation (6).
β ^ E l a s t i c N e t = a r g m i n i = N ( y i β 0 j = 1 P x i j β j ) 2 + λ α j = P β j + 1 α j = 1 P β j 2
The ElasticNet linearly adds penalties of the ridge and lasso methods and adjusts λ to derive an optimized model. Additionally, it adds an extra parameter of α to differentiate the relationship between the two. In contrast to the ridge and lasso methods, which are adjusted with λ , parameter α is employed, and the lasso effect increases with an increase in the value of α, whereas the ridge effect increases with the decrease in the value of α.
When using the ridge, lasso, and ElasticNet regression methods, a cross-validation method was used to estimate the shrinkage parameter λ . After obtaining the optimal λ value that returns the smallest error through fivefold cross validation, the word that has a regression coefficient value greater than 0 for the given λ value was classified into the positive dictionary, and the word with a value less than 0 was classified into the negative dictionary, thereby constructing a positive and a negative dictionary.
The words of each constructed dictionary were manually checked and any unnecessary words were removed. For example, if the dictionary contained nouns that were not related to sentiments, such as actor’s names, location names, etc., the corresponding words were removed.

3.5.4. Step 5-4, Dictionary Accuracy Verification

In verifying the accuracy of the dictionary, a test dataset of 25,000 review data was used. Based on the dictionary, sentiment scores of reviews were calculated and classified as positive if the sentiment score was greater than 0, and as negative if less than 0. The sentiment scores are obtained through Equation (7).
Sentiment   Score = P o s w o r d N e g w o r d P o s w o r d + N e g w o r d
To examine the accuracy of the sentiment dictionary, sentiment scores are calculated based on the frequency of the positive and negative words. The sentiment score can range from negative 1.0 to positive 1.0. The words that fall in the score range of +0.1 to +1.0 are identified as positive words, while the words that fall in the range of -1.0 to -0.1 are identified as negative words. Subsequently, the sentiment scores obtained through sentiment analysis are applied to the rating data and new ratings are generated.
As a measure to evaluate the positive and negative prediction results, misclassification ratio was used, and by measuring the accuracy of the confusion matrix of Table 5, dictionaries with high performance were selected for the analysis [29].
In this matrix, the values of TP(True Positive), FP(False Positive), TN(True Negative), and FN(False Negative) represent the result values.
TP indicates that the classifier accurately predicted by classifying the positive case as a positive. Conversely, FP indicates that the classifier incorrectly classified the negative case as positive. Similarly, TN denotes that the classifier accurately predicted by classifying the negative case as a negative, while FN denotes that the classifier incorrectly classified the positive case as a negative. Based on the results derived from the confusion matrix, accuracy, precision, and recall can be derived. Equations (8)–(11) respectively express the equations for calculating the accuracy, recall, precision, and F-measures.
Accuracy = T P + T N T P + F P + T N + F N
Precision = T P T P + F P
Recall = T P T P + F N
F measure = 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l
Figure 6 displays the results of calculating the accuracy based on the positive and negative vocabulary frequency of each dictionary and on Equations (7) and (8). Based on the results of the number of words used in the dictionary, the words were more diverse in the way of expressing negative vocabulary than expressing positive vocabulary.
The lasso-based dictionary featured 398 positive sentiment vocabulary and 421 negative sentiment vocabulary, with 70% accuracy. The ridge-based dictionary featured 3164 positive sentiment vocabulary and 3425 negative sentiment vocabulary, with 79% accuracy. When constructing the ElasticNet-based dictionary, α of 0.3 was chosen as it returned the highest accuracy. A total of 2875 positive and 2954 negative vocabulary was extracted with 83% accuracy. As a result, this study used an ElasticNet-based positive and negative dictionary, which had the highest accuracy, for calculating the sentiment scores of user reviews.
Furthermore, this study used the SVM (support vector machine), RF (random forest), and NNet (neural network) algorithms, which are popular methods for recognizing and classifying sentiments. The training and test data were labeled according to the collected sentiment words in the sentiment dictionary. Furthermore, the classifier models were trained using the training data and the trained models were used to classify the sentiments of the test data. The SVM model used a traditional kernel function RBF (radial basis function); the RF model used a total of 500 trees with 10 variables; the NNet used a total of 10 hidden layers. Then, similarly to the earlier regression analysis method of this paper, the 5-fold cross-validation method was used for the classification performance test. For performance measurement, recall, precision, and F-measures were selected to test the accuracy of the model
Table 6 displays the classification performance of each classifier on the sentiment dictionary.
The results of using the general sentiment dictionary and the domain-specific dictionary constructed for analysis identified the following distinctions. Except for the analysis results of the NNet model, the recall values were generally higher than the precision values when using the general dictionary, while the precision values were higher than the recall values when using the sentiment dictionary. Additionally, the dictionary returned a higher F-measure due to the smaller difference between the recall and precision than did the general sentiment dictionary. These results suggest that the constructed dictionary yielded a more stable and accurate sentiment analysis result.

3.6. Step 6, New Rating Peflecting Sentiment Digitization

The sentiment scores of the entire text data were numerically expressed based on the positive and negative words featured in the constructed sentiment dictionary. The sentiment scores are derived using Equation (7). The sentiment scores obtained through the sentiment analysis are reflected in the rating data to generate a new rating data. An example of the generated rating is shown on the right side of Table 7. Although previous user ratings used integer values, from 1 to 10, the newly generated ratings of the proposed method use real numbers.

3.7. Step 7, Rating Prediction

In predicting user ratings, user-based and item-based filtering of model-based collaborative filtering and SVD and SVD++ algorithms, which are popular algorithms of model-based matrix factorization, were used.
After selecting neighboring users who have similar preferences as the target user, based on the rating information entered by the user, user-based collaborative filtering is a method of providing recommendations to a user with items that are commonly preferred by neighboring users. The most important step in predicting ratings through user-based collaborative filtering is calculating the user similarities. The similarity between the two users a and b, S i m i l a r i t y a , b , is obtained by Equation (12).
S i m i l a r i t y a , b = i I r a , i r ¯ a r b , i r ¯ b i I r a , i r ¯ a 2 i I r b , i r ¯ b 2  
Here, I denotes the entire set of items, r a , i denotes the rating score given by user a on item i , and r ¯ a indicates the average rating score of all items that user a has rated. Once the users with similar preferences are selected through the similarity measure, the user rating is predicted based on their purchase history, using the weighted sum method.
The predicted rating that user a would provide on the item i is obtained through Equation (13).
r ^ a , i = r ¯ a + u U r u , i r ¯ u × W a , u u U W a , u
Further, r ¯ a indicates the average score of all items given by the recommendation target user, r ¯ u denotes the average score of all items given by the other user, and W a , u represents the weight of the similarity between the user u and recommendation target user a, where a higher similarity returns a larger weight.
In the item-based collaborative filtering, a specific item is selected as a standard, and then a neighboring item with similar user rating scores is selected. Consequently, based on the neighboring item rating score, a rating that the target user might have for the specific item is predicted. The similarity between the two items i and j , S i m i l a r i t y i , j is obtained by Equation (14).
S i m i l a r i t y i , j = u U r u , i r ¯ i r u , j r ¯ j u U r u , i r ¯ i 2 u U r u , j r ¯ j 2
Here, U indicates the set of all users who rated both items i and j , r u , i represents the score on item i given by user u, and r ¯ i represents the average score on item i given by all users.
The item-based collaborative filtering predicts rating scores through a simple weighted average method, as shown by Equation (15).
r ^ u , i = n N r a , n × W i , n n N W i , n
Here, r ¯ a and r ¯ u denote the average score of all items given by the recommendation target user and the other user, respectively. Further, W i , n uses the weighted similarity between the item to be predicted and the other item to calculate the prediction value by reflecting the rating of the similar item to the item to be predicted.
Among the model-based matrix factorization methods, SVD and SVD++ are the most widely used methods in collaborative filtering. SVD is a method of decomposing a matrix into a product of any matrices. A singular value decomposition on matrix M, m × n , of all users and items can be expressed as the product of three matrices, as shown by Equation (16).
M = U V T
In the equation, U m × n denotes a user matrix, m × n denotes the diagonal matrix entries with singular values in diagonal terms, and V n × n T represents a movie matrix.
However, as the matrix M is a sparse matrix, there is a probability that SVD may not be defined, owing to many empty values (missing values) that are not provided by the user. To address this problem, a normalized model, Equation (11), is used to predict the rating by deriving a factor vector that minimizes the error function, based on the ratings given by the user.
min u , i M r u i r ^ u i 2 + λ q i 2 + p u 2
For the method of minimization, SGD is used to calculate the prediction error, and by adjusting the parameters, r ^ u i can be predicted through Equations (18) and (19).
q i q i + λ r u i q i T p u λ q i
p u p u + λ r u i q i T q i λ p u
In contrast to SVD, which considers explicit feedback information only, the SVD++ method considers both implicit and explicit feedback information.
Based on the SVD method, the characteristics of all the items are reflected in SVD++, regardless of having user rating scores or not. The rating prediction using the SVD++ method is obtained using Equation (20).
p ^ u i = μ + b u + b i + q i T p u + R u 1 2 j R u y j
The rating prediction value r ^ u i can be derived by the sum of μ and b u , b i , which is the average rating of all data and individual bias values on users and items, respectively. To include the additional association between the user and the item, the explicit rating data matrix and the implicit rating data matrix were decomposed based on SVD. Subsequently, by searching for a low-dimensional hidden space that collectively expresses both the user and item, d-dimensional latent vectors q i for the item and p u for the user, were obtained. R u is characterized by the user, with preference on the item, as a vector. y j is an attribute that describes the user u.

4. Performance Evaluation Method and Experiment Results

4.1. Performance Evaluation Method

To examine the difference between the recommendation system method reflecting only the rating data and the method integrating the rating data with sentiment scores, the mean absolute error (MAE) and root-mean-square error (RMSE) are used for the evaluation method. The two measures, which help show the difference between the predicted user rating and the actual user rating, are the most frequently used measures in the recommendation systems using collaborative filtering [30,31].
MAE is defined as shown in Equation (21).
MAE = i j R i j R ^ i j N
RMSE is defined as shown in Equation (22).
RMSE = i j R i j R ^ i j 2 N
Here, N indicates the number of data points; R i j denotes the actual rating on item j, given by the user i ; and R ^ i j denotes the rating prediction that the user might provide. MAE is a mean absolute error measure that is calculated by adding all the absolute values of the errors between the measured value and the predicted value and dividing it by the number of predicted values. Meanwhile, RMSE is as RMSE measure calculated by first obtaining the sum of the squared differences between the actual and predicted values and then dividing the sum result by the number of predictions, followed by the square root. In both these measures, smaller error values indicate a better prediction accuracy of the recommendation system.

4.2. Experiment Results

Using the described MAE and RMSE, the performances of the existing method using only ratings for prediction and the prediction method proposed in this paper were compared. For the data, 80% was used as training data, and 20% as test data. Further, cross-validation was conducted to evaluate the rating prediction performances. The experimental results of fivefold cross-validation, using the user-based collaborative filtering algorithm, are shown in Table 8.
The “Original Rating” represents the basic performance of the system, reflecting only the rating, whereas the “Proposed Rating” represents the performance of the proposed method, which reflects the sentiment scores in predicting the user rating.
The user-based collaborative filtering returned a MAE value of 2.3056 and RMSE value of 3.0803 for the Original Rating and a MAE value of 2.2442 and RMSE value of 3.0342 for the Proposed Rating. The MAE improved by 0.0614 and RMSE by 0.0461 in the proposed method.
The results of the item-based collaborative filtering are shown in Table 9, where the MAE improved by 0.0833 and the RMSE by 0.083, compared to the existing method of reflecting only the rating data.
Table 10 displays the MAE and RMSE results of the SVD algorithm. In this study, to confirm that the optimized prediction by combining sentiment scores with rating data yields a higher accuracy than the method reflecting only the rating data, the prediction accuracies were measured under the same conditions. The result indicates that the MAE value improved by 0.0991 and the RMSE improved by 0.1208.
Table 11 displays the MAE and RMSE results of using the SVD++ algorithm. The proposed method of combining sentiment scores showed a MAE improvement of 0.2036 and RMSE improvement of 0.1916.
Table 12 shows the performance results of the proposed method using the test data.
For the MAE measures of the test data evaluation, the user-based and item-based collaborative filtering methods obtained MAE improvements of 0.059 and 0.0862, respectively, while the SVD and SVD++ algorithms showed improvements of 0.1012 and 0.188, respectively. For the RMSE measures, the user-based and item-based collaborative filtering methods showed improvements of 0.0431 and 0.0882, respectively, and the SVD and SVD++ algorithms showed improvements of 0.1103 and 0.1756, respectively. The analysis results suggested that the proposed method of reflecting the sentiment scores in the rating prediction yields significantly better overall prediction performance than the existing method of reflecting only the rating data.
The model that showed the highest performance improvement among the existing method of reflecting only the rating data was SVD++. The proposed method of reflecting sentiment scores obtained better rating prediction accuracies in all methods. In particular, as shown in Table 8, the model-based collaborative filtering performed better than the memory-based collaborative filtering method. This may be due to the use of a sparse matrix, an environment in which the model-based algorithms would return more accurate user ratings. There are two main drawbacks in using data of sparse matrices: The first issue is the cold start problem. This issue occurs when the rating cannot be predicted, owing to the lack of data to measure the similarity, from users who have not entered a single user rating. The second issue is the first rater problem. This issue occurs when there is an item that no one has purchased before, resulting in no recommendation made until some user provides a rating on the item.
In this study, the cold start problem was avoided by limiting the data to users who have rated and wrote reviews on at least 10 movies. Furthermore, the first rater problem was eliminated by collecting the movie title, rating, and review data on a user basis. Thus, all the movies had at least one or more ratings. Nonetheless, data were insufficient because the number of movies that the users have rated was less than the total number of movie titles. Hence, the probability of locating users with similar preferences in the items to the target user was low, resulting in a relatively low performance of the memory-based recommendation system compared with the model-based system. However, the model-based collaborative filtering method deviates from simply comparing the similarity between the users or items. Instead, it uses the patterns and attributes that are implied in the data. Hence, the user rating on a specific item can be predicted, even without the rating information. It is assumed that the SVD method acquired better performance in the user rating prediction by reducing the dimensions of the matrix by directly removing insignificant users or items from the matrix. Thus, data scarcity issue and noise were reduced.

4.3. Evaluation Method Using Feature Selection Approach

In this study, feature selection was not applied to the TF-IDF generated at the preprocessing stage; rather, the feature selection method was used to improve prediction performance. Feature selection is often used in data mining to increase prediction performance and efficiency by reducing the data dimension and the required time and cost. Feature selection is advantageous for reducing the complexity of the model with minimal information loss and performance accuracy being maintained at the requisite level. The feature selection process performed for the model construction is capable of impacting the model accuracy, where, if the features are incorrectly selected, the prediction accuracy of the model may drastically decrease. This also suggests that removing unnecessary features is beneficial as they can be the factors of hindering both effectiveness and efficiency in the sentiment classification.
In this study, ElasticNet, SVM, and Naïve Bayes models were constructed without conducting feature selection on the TF-IDF generated at the preprocessing stage. The feature selection models that use weighting techniques were constructed by selecting the relevant sub-features based on the feature weights. This technique allows the use of different weights to select the sub-features. Additionally, several weight thresholds were tested to find the optimal one. The feature weighting process was conducted using various weighting methods such as SVM, Information Gain, Information Gain Ratio, principal component analysis (PCA), and chi-squared statistical weighting. In addition, by gradually changing the weighting from 0 to 0.9 at 0.1 intervals, the results of each method were compared for different weight configurations.

4.3.1. Results of Simple Modeling Techniques

Table 13 shows the performance of the models generated with ElasticNet, SVM, and Naïve Bayes algorithms without conducting feature selection on the dataset. The SVM model performed better than did the other algorithms in terms of accuracy, AUC (Area Under Curve), and precision, whereas the Naïve Bayes model performed better than the other algorithms in terms of recall. Furthermore, compared to the other algorithms, the Naïve Bayes model showed significantly poorer performance in AUC and precision.

4.3.2. Sentiment Classification Results of the Feature Selection Models Using Weighting Techniques

1. Distribution of Terms Based on Feature Weighting
Table 14 shows the top term-lists based on the normalized weight assigned to each word or phrase term using the feature weighting technique. As shown in the table, although the terms like ‘waste’ and ‘worst’ appear in all lists, a difference in the top terms is observed depending on which weighting technique is applied. When examining the degree of change in weight, the PCA method tends to have a rapid drop in the normalized feature weights while the SVM and gain ratio methods tend to have lesser drops in the normalized feature weights.
Depending on the weighting method applied, the experimental results showed significant differences in the distribution of the weighted terms. In the case of the Gain Ratio weighting, the distribution was similar to the normal distribution, whereas the PCA weighting had most terms located between the 0 and 0.1 weights, and other weighting methods also exhibited higher term appearances in the lower weights. These distribution results suggest that the sentiment classification results will vary depending on the weighting method used.
2. Sentiment Classification Performance of Feature Weighting Models
The following displays the sentiment classification accuracy, AUC, precision, recall, and F-measure obtained with the ElasticNet, SVM, and Naïve Bayes feature selection models using various weighting methods. Each row denotes a different weight threshold value configuration and based on the modeling, the results were derived using the terms with normalized weights greater than or equal to the weight threshold configuration.
In the ElasticNet feature selection model with SVM weighting, high performance was observed in most performance indices when the weight threshold was greater than or equal to 0.2 (>=0.2). Conversely, in the SVM feature selection model, high performance was observed in most performance indices when the weight threshold was greater than or equal to 0.1 (>=0.1). Lastly, in the Naïve Bayes feature selection model, the weight threshold value for the highest performance for each performance index was not consistent. When comparing the three feature selection algorithms, the best overall performance was observed when the SVM weight threshold was set to 0.2 or greater, in the ElasticNet algorithm. Table 15, Table 16 and Table 17 show the classification performance results for each weight of the feature selection method using SVM weighting.
In the case of applying information gain weighting, overall high performance was observed when the weight was greater than 0, in other words when the modeling was conducted using all the terms. This result suggests that the information gain weighting method does not serve an important function in the document sentiment classification using feature selection. Table 18, Table 19 and Table 20 display the classification performance results of the feature selection models with information gain weighting.
In the case of applying gain ratio weighting, the ElasticNet feature selection algorithm exhibited the best overall performance when the weight threshold was configured to 0.3 or higher. Contrarily, the other algorithms did not necessarily have a specific weight threshold value that increased the overall performance. Table 21, Table 22 and Table 23 display the classification performance results of the feature selection models with Gain Ratio weighting.
The PCA weighting method focuses on a section where the normalized weights of most terms are less than 0.1. Therefore, it was expected that it would be meaningless to select the variable by adjusting the weighting threshold. As a result, the model without variable selection in all three algorithms showed the best performance. Table 24, Table 25 and Table 26 display the classification performance results of the feature selection models with PCA weighting.
In the case of applying chi-squared statistical weighting, similar to the information gain and PCA weighting methods, effective performance could not be obtained in feature selection as the chi-squared statistical weighting does not allow for adjustment of the weight threshold for feature selection. Table 27, Table 28 and Table 29 display the classification performance results of the feature selection models using chi-squared statistical weighting.
As shown from the above analysis results, the SVM weighting method exhibited the highest overall performance compared to the other weighting methods. The SVM weighting method was found to produce the most stable performance improvement when it was applied to the ElasticNet algorithm with threshold values equal to, or greater than, 0.2.
3. Sentiment Classification Performance Results of Various Feature Selection Models
Among the feature selection models using the feature weighting technique from the earlier experiment, the ElasticNet feature selection model using SVM weighting obtained the highest performance. Table 30, Table 31 and Table 32 show the results of comparing the sentiment classification performances of the simple method without feature selection, feature selection method with SVM weighting (with a weight threshold value greater than 0.2), forward selection method, and backward elimination method. As shown from the results, in the case of the ElasticNet algorithm, the feature selection with SVM weighting was found to be the most effective. In all five measuring indexes, this model attained the highest performance level. In the classification using the SVM algorithm, different best-performing feature selection methods were observed in each index but feature selection with SVM weighting and backward elimination produced the best overall performance. Similarly, the Naïve Bayes algorithm using feature selection with SVM weighting obtained high performance in three out of five indexes.

5. Conclusions

To improve the accuracy of the existing collaborative filtering method that generates recommendation results using only qualitative data, this study proposed a new recommendation algorithm that improves the collaborative filtering performance by reflecting the qualitative data, i.e., user reviews. In addition, a domain-specific dictionary was constructed. Based on the dictionary, sentiment scores of the reviews were quantified and integrated with the rating data to generate new rating data, reflecting the sentiment scores. Subsequently, the rating predictions were conducted using the method that uses newly generated ratings reflecting sentiment scores and the existing method reflecting only the ratings. As a result, the user-based and item-based collaborative filtering methods obtained MAE improvements of 0.059 and 0.0862, respectively, while the SVD and SVD++ methods showed improvements of 0.1012 and 0.188, respectively. For the RMSE measures, the user-based collaborative filtering and item-based collaborative filtering methods showed improvements of 0.0431 and 0.0882, respectively, and the SVD and SVD++ methods showed improvements of 0.1103 and 0.1756, respectively. Based on the results, the proposed method in this study was verified to improve the rating prediction accuracy, regardless of the algorithm type in the SVD and SVD++ methods, in addition to the user-based and item-based collaborative filtering methods.
In addition, due to the higher difference between precision and recall, when the sentiment analysis performances of the machine learning classifiers SVM, RF, and NNet were compared using the general sentiment dictionary and the constructed domain-specific sentiment dictionary, the models using the general sentiment dictionary generally exhibited higher recall values than the precision values but obtained lower F-measure values than when the constructed sentiment dictionary was used. These results suggest that using the sentiment dictionary constructed from this study yields more stable and accurate sentiment analysis results. Furthermore, in sentiment classification when using the feature selection method, the SVM algorithm showed the best overall performance. Subsequently, when the results between the simple modeling techniques and the feature selection modeling techniques using SVM, information gain, gain ratio, PCA, and chi-squared statistical weighting methods were compared, the models including the feature weighting technique generally yielded better results than the simple models. Overall, among the feature weighting techniques, the ElasticNet algorithm applied with SVM weighting with a threshold value of 0.2 produced the most stable and effective performance improvement.
The recommendation system algorithm proposed in this study is expected to accurately reflect the user preferences in recommendation systems. The method used herein can quantify user review data while resolving the limitations of previous studies that determined the user preferences based only on the rating data.
In the future, studies on developing a variety of sentiment-based recommendation systems should be conducted using the proposed recommendation system algorithm. Moreover, studies on constructing dictionaries that include adverbs should be conducted to further improve the performance of the recommendation system algorithm. In this study, only nouns, verbs, and adjectives were used in constructing the dictionary, whereas adverbs, which are useful in expressing sentiment expressions and meanings, have not been reflected. To express the degree of sentiment in detail, adverbs should be included in constructing the sentiment dictionaries, and this is expected to further refine the sentiment scores and improve the accuracy of the recommendation system.

Author Contributions

Conceptualization, T.-Y.K.; methodology, T.-Y.K; software, T.-Y.K.; validation, T.-Y.K. and S.-H.K.; formal analysis, T.-Y.K. and S.B.P; investigation, S.-H.K.; resources, S.-H.K.; data curation, T.-Y.K. and S.-H.K.; writing—original draft preparation, T.-Y.K.; writing—review and editing, T.-Y.K. and S.B.P.; visualization, T.-Y.K.; supervision, S.B.P. and S.-H.K.; project administration, S.B.P. and S.-H.K.; funding acquisition, T.-Y.K. and S.B.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. 2019R1F1A1041186). This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education (No. 2017R1A6A1A03015496).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Jo, H.J.; Rhee, P.K. Distributed Recommendation System Using Clustering-based Collaborative Filtering Algorithm. J. Inst. Internet Broadcasting Commun. 2014, 14, 101–107. [Google Scholar] [CrossRef]
  2. Kim, T.Y.; Ko, H.; Kim, S.H. Data Analysis for Emotion Classification Based on Bio-Information in Self-Driving Vehicles. J. Adv. Transp. 2020, 2020, 8167295. [Google Scholar] [CrossRef]
  3. Jeon, B.; Ahn, H. A collaborative filtering system combined with users’ review mining: Application to the recommendation of smartphone apps. J. Intell. Inf. Syst. 2015, 21, 1–18. [Google Scholar] [CrossRef] [Green Version]
  4. Kim, T.Y.; Lee, K.S.; An, Y.E. A Study on the Recommendation of Contents using Speech Emotion Information and Emotion Collaborative Filtering. J. Digit. Contents Soc. 2018, 19, 2247–2256. [Google Scholar] [CrossRef]
  5. Oramas, B.R.; Zatarain, C.R.; Barrón, E.M.L.; Hernández, P.Y. Opinion mining and emotion recognition in an intelligent learning environment. Comput. Appl. Eng. Educ. 2019, 27, 90–101. [Google Scholar] [CrossRef] [Green Version]
  6. Cao, R.; Zhang, X.; Wang, H. A Review Semantics Based Model for Rating Prediction. IEEE Access 2020, 8, 4714–4723. [Google Scholar] [CrossRef]
  7. Nassar, N.; Jafar, A.; Rahhal, Y. A novel deep multi-criteria collaborative filtering model for recommendation system. Knowl. -Based Syst. 2020, 187. [Google Scholar] [CrossRef]
  8. Gazdar, A.; Hidri, L. A new similarity measure for collaborative filtering based recommender systems. Knowl. -Based Syst. 2020, 188. [Google Scholar] [CrossRef]
  9. Shi, Y.; Larson, M.; Hanjalic, A. Collaborative filtering beyond the user-item matrix: A survey of the state of the art and future challenges. ACM Comput. Surv. (CSUR) 2014, 47, 1–45. [Google Scholar] [CrossRef]
  10. Zhang, J.; Peng, Q.; Sun, S.; Liu, C. Collaborative filtering recommendation algorithm based on user preference derived from item domain features. Phys. A Stat. Mech. Appl. 2014, 396, 66–76. [Google Scholar] [CrossRef]
  11. Son, J.E.; Kim, S.B.; Kim, H.J.; Cho, S.Z. Review and analysis of recommender systems. J. Korean Inst. Ind. Eng. 2015, 41, 185–208. [Google Scholar] [CrossRef] [Green Version]
  12. Jiao, J.; Zhang, X.; Li, F.; Wang, Y. A Novel Learning Rate Function and Its Application on the SVD++ Recommendation Algorithm. IEEE Access 2019, 8, 14112–14122. [Google Scholar] [CrossRef]
  13. Leung, C.W.; Chan, S.C.; Chung, F.L. Integrating collaborative filtering and sentiment analysis: A rating inference approach. In Proceedings of the ECAI 2006 Workshop on Recommender Systems, Riva del Garda, Italy, 28–29 August 2006; pp. 62–66. [Google Scholar]
  14. García-Cumbreras, M.Á.; Montejo-Ráez, A.; Díaz-Galiano, M.C. Pessimists and optimists: Improving collaborative filtering through sentiment analysis. Expert Syst. Appl. 2013, 40, 6758–6765. [Google Scholar] [CrossRef]
  15. Liu, S.M.; Chen, J.H. A multi-label classification based approach for sentiment classification. Expert Syst. Appl. 2015, 42, 1083–1093. [Google Scholar] [CrossRef]
  16. Lee, J.S.; Kim, J.Y.; Kang, B.W. A Study on Improvement of Collaborative Filtering Based on Implicit User Feedback Using RFM Multidimensional Analysis. J. Intell. Inf. Syst. 2019, 25, 139–161. [Google Scholar] [CrossRef]
  17. Park, J.Y.; Chon, B.S. A structural Analysis of the Movie Reviews. J. Korea Contents Assoc. 2014, 14, 85–94. [Google Scholar] [CrossRef] [Green Version]
  18. Kim, D.H.; Cho, T.M.; Lee, J.H. A domain adaptive sentiment dictionary construction method for domain sentiment analysis. In Proceedings of the Korean Society of Computer Information Conference on Korean Society of Computer Information (KSCI), Cheongju, Korea, 22–24 January 2015; pp. 15–18. [Google Scholar]
  19. Bhojne, N.; Deore, S.; Jagtap, R.; Jain, G.; Kalal, C. Collaborative Approach based Restaurant Recommender System using Naive Bayes. Int. J. Adv. Res. Comput. Commun. Eng. 2017, 6. [Google Scholar] [CrossRef]
  20. Yang, C.; Yu, X.; Liu, Y.; Nie, Y.; Wang, Y. Collaborative filtering with weighted opinion aspects. Neurocomputing 2016, 210, 185–196. [Google Scholar] [CrossRef]
  21. Afzaal, M.; Usman, M.; Fong, A.C.; Fong, S. Multiaspect-based opinion classification model for tourist reviews. Expert Syst. 2019, 36. [Google Scholar] [CrossRef]
  22. Yu, E.J.; Kim, Y.S.; Kim, N.G.; Jeong, S.R. Predicting the direction of the stock index by using a domain-specific sentiment dictionary. J. Intell. Inf. Syst. 2013, 19, 95–110. [Google Scholar] [CrossRef]
  23. Song, J.S.; Baik, J.B.; Lee, S.W. Automatic Construction of Positive/Negative Dictionary to Improve Performance of Product Review Classification. In Proceedings of the Korean Information Science Society Conference on Korean Institute of Information Scientists and Engineers (KIISE), Jeju Island, Korea, 27–29 June 2010; pp. 136–137. [Google Scholar]
  24. Liu, J.; Jiang, L.; Wu, Z.; Zheng, Q. Deep Web adaptive crawling based on minimum executable pattern. J. Intell. Inf. Syst. 2010, 36, 197–215. [Google Scholar] [CrossRef]
  25. Wang, B.; Huang, Y.; Li, X. Combining review text content and reviewer-item rating matrix to predict review rating. Comput. Intell. Neurosci. 2016, 2016, 1–11. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  26. Azghani, M.; Karimi, M.; Marvasti, F. Multihypothesis compressed video sensing technique. IEEE Trans. Circuits Syst. Video Technol. 2015, 26, 627–635. [Google Scholar] [CrossRef] [Green Version]
  27. Elkhalil, K.; Kammoun, A.; Zhang, X.; Alouini, M.S.; Al-Naffouri, T. Risk Convergence of Centered Kernel Ridge Regression with Large Dimensional Data. IEEE Trans. Signal Process. 2020, 68, 1574–1588. [Google Scholar] [CrossRef]
  28. Oszust, M. Image quality assessment with lasso regression and pairwise score differences. Multimed. Tools Appl. 2017, 76, 13255–13270. [Google Scholar] [CrossRef] [Green Version]
  29. Poudel, S.; Lee, S.W. A Novel Integrated Convolutional Neural Network via Deep Transfer Learning in Colorectal Images. J. Inf. Technol. Appl. Eng. 2019, 19, 9–22. [Google Scholar] [CrossRef]
  30. Ebtehaj, I.; Bonakdari, H.; Shamshirband, S.; Mohammadi, K. A combined support vector machine-wavelet transform model for prediction of sediment transport in sewer. Flow Meas. Instrum. 2016, 47, 19–27. [Google Scholar] [CrossRef]
  31. Venkatesan, S.K.; Lee, M.B.; Park, J.W.; Shin, C.S.; Cho, Y.Y. A Comparative Study based on Random Forest and Support Vector Machine for Strawberry Production Forecasting. J. Inf. Technol. Appl. Eng. 2019, 19, 45–52. [Google Scholar]
Figure 1. User-based collaborative filtering.
Figure 1. User-based collaborative filtering.
Sustainability 12 05191 g001
Figure 2. Item-based collaborative filtering.
Figure 2. Item-based collaborative filtering.
Sustainability 12 05191 g002
Figure 3. The proposed algorithm configuration.
Figure 3. The proposed algorithm configuration.
Sustainability 12 05191 g003
Figure 4. Data crawling process.
Figure 4. Data crawling process.
Sustainability 12 05191 g004
Figure 5. The sentiment dictionary construction flow chart.
Figure 5. The sentiment dictionary construction flow chart.
Sustainability 12 05191 g005
Figure 6. Dictionary accuracy and Pos/Neg word frequency.
Figure 6. Dictionary accuracy and Pos/Neg word frequency.
Sustainability 12 05191 g006
Table 1. Example of user-item rating matrix.
Table 1. Example of user-item rating matrix.
Item1Item2Item3Item M
User 12?41
User 2?86?
User 32??7
User N1542
Table 2. The number of extracted words and phrases.
Table 2. The number of extracted words and phrases.
WordsPhrases
Noun (NNG)Verb (VV)Adjective (VA)Total2430
1017296811394
Table 3. The examples of pre-defined sentiment words.
Table 3. The examples of pre-defined sentiment words.
Sentiment (Count)Sentiment Words
Positive (14)good/VA, fun/VA, fine/VA, sad/VA, like/VV, funny/VA, pretty/VA, stand out/VV, sad/VA, amaze/VV, appeal/NNG, awesome/VA, heartbreaking/VA, strong recommendation/NNG
Negative (14)awkward/VA, fall for/VV, tedious/VA, worst/NNG, annoy/VV, irritating/NNG, rowdy/VA, obvious/VA, disturb/VV, be criticized/VV, bad/VA, embarrassing/VV, boring/VA, unpleasant/VA
Table 4. The examples of pre-defined sentiment phrase.
Table 4. The examples of pre-defined sentiment phrase.
Sentiment (Count)Sentiment Phrases
Positive (30)(expectation/NNG, high/VA), (scene/NNG, sad/VA), (movie/NNG, heartbreaking/VA), (impress/NNG, be/VV), (storyline/NNG, decent/VA), (tears/NNG, fall/VV)
(thriller/NNG, is/VV), (portrayal/NNG, stand out/VV), (goosebumps/NNG, rise/VV), (love/NNG, overflow/VV), (perfect score/NNG, give/VV), (stress/NNG, relieve/VV), (inspiration/NNG, deep/VA)
Negative (30)(expectation/NNG, different from/VA), (movie/NNG, exaggerate/VV), (money/NNG, waste/VA), (constraint/NNG, overdo/VV), (impression/NNG, weak/VA), (time/NNG, waste/VA), …
(hands and feet/NNG, cringe/VV), (feeling/NNG, crap/VA), (suspense/NNG, lack/VA), (unreality/NNG, excessive/VA), (immersion/NNG, difficult/VA), (imagination/NNG, no/VA), (limit/NNG, feel/VV)
Table 5. 2 × 2 Confusion matrix.
Table 5. 2 × 2 Confusion matrix.
Actual PositiveActual Negative
Predicted PositiveTF(True Positive)FP(False Positive)
Predicted NegativeFN(False Negative)TN(True Negative)
Table 6. The results of sentiment analysis for movie reviews using the proposed sentiment dictionary.
Table 6. The results of sentiment analysis for movie reviews using the proposed sentiment dictionary.
ClassifierEvaluation MeasuresClassification Performance (%)
General DictionaryProposed Dictionary
RFRecall86.7671.95
Precision57.2885.51
F-measure69.0178.15
SVMRecall86.7662.2
Precision55.6693.1
F-measure67.8274.57
NnetRecall66.1885.37
Precision54.8878.65
F-measure6081.87
Table 7. Original ratings and proposed new ratings.
Table 7. Original ratings and proposed new ratings.
Item 1Item 2Item 3Item MItem 1Item 2Item 3Item M
User 12?59User 12.24?4.489.45
User 2?76?User 2?7.756.21?
User 35??3User 34.48??2.98
User N?5?4User N?5.12?4.34
Table 8. User-based collaborative filtering mean absolute error (MAE)/root-mean-square error (RMSE).
Table 8. User-based collaborative filtering mean absolute error (MAE)/root-mean-square error (RMSE).
Title 1Original RatingProposed New Rating
Test #MAERMSEMAERMSE
12.30513.06562.28173.1120
22.29643.12292.22733.0731
32.33103.03032.21783.0231
42.27433.11522.29232.9964
52.31853.06752.20212.9667
Avg.2.30563.08032.24423.0342
Table 9. Item-based collaborative filtering MAE/RMSE.
Table 9. Item-based collaborative filtering MAE/RMSE.
Title 1Original RatingProposed New Rating
Test #MAERMSEMAERMSE
12.15142.87012.0452.8025
22.13332.88122.02112.8062
32.09542.91222.0122.781
42.1032.8582.04512.8175
52.11732.90092.06082.8003
Avg.2.12012.88452.03682.8015
Table 10. Singular value decomposition (SVD) MAE/RMSE.
Table 10. Singular value decomposition (SVD) MAE/RMSE.
Title 1Original RatingProposed New Rating
Test #MAERMSEMAERMSE
11.98982.62161.87272.4924
21.97782.61471.89842.5038
31.93562.60211.87552.4563
41.97362.61741.86832.5134
51.98012.61221.85362.4976
Avg.1.97142.61361.87232.4928
Table 11. SVD++ MAE/RMSE.
Table 11. SVD++ MAE/RMSE.
Title 1Original RatingProposed New Rating
Test #MAERMSEMAERMSE
11.95142.63021.78022.4964
21.87722.64731.74322.5213
32.03542.65241.75562.4628
41.95592.63191.82502.4499
51.99892.64821.71682.3213
Avg.1.96382.64191.76022.4503
Table 12. Experiment results of proposed model with test data.
Table 12. Experiment results of proposed model with test data.
MAEREMS
OriginalProposedOriginalProposed
Memory-based CFUBCF2.30132.24173.08673.0426
IBCF2.09842.01222.89462.8064
Model-based CFSVD1.97541.87422.61462.5043
SVD++1.96251.77452.62282.4532
Table 13. Basic model classification performance.
Table 13. Basic model classification performance.
ElasticNetSVMNaïve Bayes
Accuracy798075
AUC858863
Precision665590
Recall768963
F-measure716874
Table 14. Weight by variable.
Table 14. Weight by variable.
SVMWgtInfo GainWgtGain RatioWgtPCAWgtChiWgt
Waste1.00Worst1.00Worst1.00Really1.00Waste1.00
Worst0.99Waste0.97Worst Movie0.84Too0.76Worst0.91
Fun0.98Rating0.76No fun0.82Fun0.13Rating0.84
Rating0.89Love0.50Garbage Movie0.77Impression0.09Impression0.45
Director0.78Impression0.44Nuclear no run0.75Waste0.07Probability0.44
Boring0.77Probability0.43Waste0.75Movie Rally0.06Love0.42
Disappointment0.74Garbage0.37Movie Worst0.74Rating0.06Just0.42
No fun0.66Worst Movie0.36Childish0.74Love0.06Director0.39
No0.64Just0.35Actor Waste0.74Time0.06Garbage0.38
Just0.60Best0.32Comment0.73Worst0.04Story0.33
Table 15. ElasticNet classification performance (support vector machine (SVM) weight).
Table 15. ElasticNet classification performance (support vector machine (SVM) weight).
Weight00.10.20.30.40.50.60.70.80.9
Accuracy79838480777370696766
AUC85899083757265636058
Precision66716961504332292115
Recall76828882837980777983
F-measure71767770635645423426
Table 16. SVM classification performance (SVM weight).
Table 16. SVM classification performance (SVM weight).
Weight00.10.20.30.40.50.60.70.80.9
Accuracy80808078747369696865
AUC88898783757164646157
Precision55585348393526262215
Recall89889192908888858581
F-measure68706763545040403525
Table 17. Naïve Bayes classification performance (SVM weight).
Table 17. Naïve Bayes classification performance (SVM weight).
Weight00.10.20.30.40.50.60.70.80.9
Accuracy75818077777369696766
AUC63828581777164636158
Precision90745749463828302116
Recall63778786898581778384
F-measure74756962615242433427
Table 18. ElasticNet classification performance (Information Gain).
Table 18. ElasticNet classification performance (Information Gain).
Weight00.10.20.30.40.50.60.70.80.9
Accuracy79797570686767676565
AUC85857868645959595556
Precision66564428221918191112
Recall76858486888790899591
F-measure71675842353130312021
Table 19. SVM classification performance (Information Gain).
Table 19. SVM classification performance (Information Gain).
Weight00.10.20.30.40.50.60.70.80.9
Accuracy80767270686767676565
AUC88827659645859595656
Precision55443325201819181212
Recall89908992918791889692
F-measure68594839333031302122
Table 20. Naïve Bayes classification performance (Inforamtion weight).
Table 20. Naïve Bayes classification performance (Inforamtion weight).
Weight00.10.20.30.40.50.60.70.80.9
Accuracy75777269696867676565
AUC63827567655958595556
Precision90514025232018191012
Recall63837986928988879693
F-measure74635339373330311921
Table 21. ElasticNet classification performance (Gain Ratio).
Table 21. ElasticNet classification performance (Gain Ratio).
Weight00.10.20.30.40.50.60.70.80.9
Accuracy79808181797876686463
AUC85868788817570605453
Precision666767656155412086
Recall76788182818394939899
F-measure71727472706657331512
Table 22. SVM classification performance (Gain Ratio).
Table 22. SVM classification performance (Gain Ratio).
Weight00.10.20.30.40.50.60.70.80.9
Accuracy80797880787775696462
AUC88888888817669605452
Precision555651534946372174
Recall89879092899095969998
F-measure6868656763605334138
Table 23. Naïve Bayes classification performance (Gain Ratio).
Table 23. Naïve Bayes classification performance (Gain Ratio).
Weight00.10.20.30.40.50.60.70.80.9
Accuracy75757574787874686363
AUC63646460797469605050
Precision908990945653372075
Recall6363636182849593100100
F-measure7474747466655333139
Table 24. ElasticNet classification performance (principal component analysis (PCA)).
Table 24. ElasticNet classification performance (principal component analysis (PCA)).
Weight00.10.20.30.40.50.60.70.80.9
Accuracy79626161616161616161
AUC85545253525352515152
Precision66723323433
Recall76605959676360626054
F-measure711245655756
Table 25. SVM classification performance (PCA).
Table 25. SVM classification performance (PCA).
Weight00.10.20.30.40.50.60.70.80.9
Accuracy80616161616161616161
AUC88535150515351515252
Precision55501102111
Recall8950NaN7059NaN50596368
F-measure689NaN22NaN5233
Table 26. Naïve Bayes classification performance (PCA).
Table 26. Naïve Bayes classification performance (PCA).
Weight00.10.20.30.40.50.60.70.80.9
Accuracy75626161616160616161
AUC63545352535351535152
Precision901289988977
Recall63555051505347515050
F-measure74201415151413161213
Table 27. ElasticNet classification performance (Chi Squared Statistics).
Table 27. ElasticNet classification performance (Chi Squared Statistics).
Weight00.10.20.30.40.50.60.70.80.9
Accuracy79787570696767676765
AUC85847769655959595956
Precision66574534261918191913
Recall76838277838790899092
F-measure71675847393130313122
Table 28. SVM classification performance (Chi Squared Statistics).
Table 28. SVM classification performance (Chi Squared Statistics).
Weight00.10.20.30.40.50.60.70.80.9
Accuracy80767270686767676764
AUC88827469655859595855
Precision55453328221819181710
Recall89908785888791888891
F-measure68604842363031302918
Table 29. Naïve Bayes classification performance (Chi Squared Statistics).
Table 29. Naïve Bayes classification performance (Chi Squared Statistics).
Weight00.10.20.30.40.50.60.70.80.9
Accuracy75777270696867676765
AUC63817368665958595856
Precision90493729262018191712
Recall63868282888988878993
F-measure74625143403330312822
Table 30. ElasticNet classification performance (based on model fitness).
Table 30. ElasticNet classification performance (based on model fitness).
ElasticNetNot SelectedSVM Weighting (>=0.2)Forward Selection MethodBackward Elimination Method
Accuracy79846779
AUC85905986
Precision66692068
Recall76888476
F-measure71773372
Table 31. SVM classification performance (based on model fitness).
Table 31. SVM classification performance (based on model fitness).
SVMNot SelectedSVM Weighting (>=0.2)Forward Selection MethodBackward Elimination Method
Accuracy80806680
AUC88875988
Precision55531759
Recall89918386
F-measure68672870
Table 32. Naïve Bayes classification performance (based on model fitness).
Table 32. Naïve Bayes classification performance (based on model fitness).
Naïve BayesNot SelectedSVM Weighting (>=0.2)Forward Selection MethodBackward Elimination Method
Accuracy75806774
AUC63855963
Precision90572089
Recall63878562
F-measure74693373

Share and Cite

MDPI and ACS Style

Kim, T.-Y.; Pan, S.B.; Kim, S.-H. Sentiment Digitization Modeling for Recommendation System. Sustainability 2020, 12, 5191. https://0-doi-org.brum.beds.ac.uk/10.3390/su12125191

AMA Style

Kim T-Y, Pan SB, Kim S-H. Sentiment Digitization Modeling for Recommendation System. Sustainability. 2020; 12(12):5191. https://0-doi-org.brum.beds.ac.uk/10.3390/su12125191

Chicago/Turabian Style

Kim, Tae-Yeun, Sung Bum Pan, and Sung-Hwan Kim. 2020. "Sentiment Digitization Modeling for Recommendation System" Sustainability 12, no. 12: 5191. https://0-doi-org.brum.beds.ac.uk/10.3390/su12125191

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop