Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Analyzing Social Media Data Using Sentiment Mining and Bigram Analysis for the Recommendation of YouTube Videos

Information 2023, 14(7), 408; https://0-doi-org.brum.beds.ac.uk/10.3390/info14070408

by Ken McGarry

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3:

Vincenza Carchiolo

Information 2023, 14(7), 408; https://0-doi-org.brum.beds.ac.uk/10.3390/info14070408

Submission received: 30 May 2023 / Revised: 30 June 2023 / Accepted: 12 July 2023 / Published: 16 July 2023

(This article belongs to the Special Issue Recommendation Algorithms and Web Mining)

Round 1

Reviewer 1 Report

This paper combines sentiment analysis with graph theory to analyze user posts, and likes/dislikes on a variety of social media to provide recommendations for YouTube videos. The work seems novel. However, different parts of the paper need revisions. The following are my comments for the improvement of different parts of this paper.

1. The author states – “The data downloaded and preprocessed from Twitter consists of…..” Rewrite this section to clearly highlight how this data was extracted. Was the Standard Search API or the Advanced Search API used? How long did the data collection take? How did the author design their data collection to comply with the rate limits of accessing the Twitter API?

2. What is the relevance behind using the sentimentR package for sentiment analysis? Why were other approaches such as VADER not used?

3. In the context of working with Tweets, how was bot-generated content detected and eliminated? For instance, if a bot account on Twitter tweets something highly negative multiple times - that could influence the results of sentiment analysis. How were such scenarios addressed?

4. A review of recent works related to Sentiment Analysis is presented in Section 2.1. However, several papers reviewed and cited in this section are not recent ones. For instance, [17] was published 8 years ago, [29] was published 13 years ago, [31] was published 20 years ago, and so on. Consider adding the review of more recent works in this field such as https://0-doi-org.brum.beds.ac.uk/10.3390/bdcc7020116 and https://0-doi-org.brum.beds.ac.uk/10.1016/j.technovation.2022.102666

5. A comparison with prior works is missing: Please include a comparative study (qualitative and quantitative) with prior works in this field to highlight the novelty of this work

6. Please double-check the references for minor typos. For instance, in [60] the URL provided is incorrect.

Author Response

Reviewer 1 Comments and Suggestions for Authors

1. I have rewritten parts of this section to inform readers that twitter data was downloaded over a period of three weeks. I had started working on this climate change issue for 4-5 years ago but stalled. I used the standard API at the time. I recently experienced issues with the Twitter API and could no longer get access. Therefore I supplemented the Twitter with a Kaggle climate change dataset. Sorry, I should have been clearer on this point.

2. I used the sentimentR package as I have used R and RStudio for sentiment analysis for a while and appears to give equivalent results. I did not therefore consider other approaches such as VADER. All programming and data analysis was conducted in R. It has a very comprehensive package of libraries for many problems and applications.

3. I did not look into bot-generated content directly as a data analysis task. Twitter now has stricter policies since Elon Musk took over, however both my twitter sources are outside this date. Twitter are using AI to detect and remove bots at a very great rate. I agree with reviewer 1, it is a major issue as it almost broke the $44billion deal. For Reddit, we have the moderators. YouTube appears to use AI to detect bots, generally these are viewbots that try to up the count of videos and "watch" videos. I have mentioned the problem of bots to highlight the issues that may bias the data analysis. Despite the bad intentions to bump up viewcounts for financial or other reasons, bots are not illegal. But agree that bias is present.

4. I have added many more recent references. I have reworked the introduction to make aims and objectives clearer. I have added text in related works to highlight the issues and our claims for novelty.

5. I have added a qualitative comparison (a table) with prior works (based on closest methods in literature). However, a quantitative comparison with prior works is probably unnecessary and I would need more time. However, since I have compared three basic methods with our system. See ROC curves in results.

6. I have gone through all of the links and fixed them. I have deleted a doi link since the fault lay with a publisher. I have placed two faulty links in the "note" field of bibtex.

7. I have added a diagram in methods section for workflow as one reviewer suggested I do so since the process was not clear. This entails the inital search terms for global warming, the searching of social media to building the recommendation model.

Author Response File: Author Response.pdf

Reviewer 2 Report

The article is on a popular topic even though there are many similar approaches in the literature (sentiment analysis in social media for recommendation purposes) in the last decade.

- related work is very short and shallow and is not actually so related. Aren't there any researchers working on similar topics (sentiment analysis, youtube videos, graph-theory techniques, and integration of other social network profiles?).

- the description in sections 3.4, 3.5, and 3.6 are very theoretical and probably come from the documentation of the respective libraries. It is appropriate to use such text especially when the author's contribution is very limited (if any). How did the author use these code libraries and were there any problems he faced or interventions he made to make the complete system work properly?

- this comment also applies to section 3.7 which should be central to the paper. NMF is a method the reader can study from the related bibliography, the theoretical foundation of the process gives no added value to the article and I am also skeptical about how ethical it is to include the documentation of NMF, especially without any references...Did the author invent the method?

- Does Table 6 actually present recommendations for 5 users? Do all these users receive the same recommendations?

- I fail to see the practical value of your approach. As you also mention "mainly we cannot usually identify posters from one forum to another". Taking also into account the sparsity of data for each user on Youtube for videos of a specific topic, how likely it is that your proposed approach can provide realistic recommendations?

- how does your approach compare to similar/alternative research in the field?

- The example you describe is not clear and the reader cannot follow. You should provide more clear descriptions of the steps of the process. How is your mechanism activated? A user enters Youtube and searches for a specific topic? What preconditions have to hold so that the mechanism can work? Should he/she provide the username on other social networks? Have you tried a different search topic?

- your datasets are not clear. How many records did you export from the social network? what time span?

Author Response

I have improved the related work section. I have reworked the introduction to make aims and objectives clearer. I have made it clear why we have used the R libraries and not different systems, basically it makes data integration easier if you use the same platform i.e. using same data structures without the need for saving to files and converting them into the requirement of the next software etc. I have added text in related works to highlight the issues and our claims for novelty. In section three, this is now just a discussion on the data. Section four is now methods only.
I have added references for the NMF discussion. We did a slight modification to NMF in a previous paper which I now cite in this section, so I'm not without experience. Ken McGarry, Yitka Graham, Sharon McDonald, Anuam Rashid, RESKO: Repositioning drugs by using side effects and knowledge from ontologies,Knowledge-Based Systems, Volume 160, 2018, Pages 34-48, ISSN 0950-7051.
I have made a table with dataset size and time period of collection. I have also added a qualitative table for system comparison.
I have added references to back up "sections 3.4, 3.5, and 3.6 are very theoretical". The reviewer is correct, the majority of equations are from the package libraries (called vignettes) which are cited.
"I fail to see the practical value of your approach. ". I am sorry I have not been clear enough. The key value is user profiling, many papers are now taking this approach based on tenor/tone of language. We are not likely to be monitoring the same people across multiple social media forums but we do user profiling to a certain extent. The same personality types and concerns they have are similar across these platforms. That is to say climate change skeptics and climate change belivers use similar scientific arguments and tone/tenor of language.
I have modified Table 6 by adding numbers to identify the randomly selected users. My apologies for neglecting this.
I have not attempted different search topics such as war in ukraine or covid etc. A brief perusal of Youtube indicates the motivations from the users would be the same - you either support the war in Ukraine or are against it. Usually the majority are supportive of Ukraine, other issues are the opinions on giving Ukraine aid. Covid is usually (here in the UK at least) is about the merits or lack of them of lockdown to save lives versus economic meltdown and the origin of the virus.

Author Response File: Author Response.pdf

Reviewer 3 Report

The main issue with the paper is that the objective is unclear. The introduction is extremely deficient in this regard, as well as the 'related work' section. Both fail to adequately frame the problem. Section 3 inappropriately mixes data and methods. The description of the methods merely introduces basic concepts but fails to explore them adequately. The results described in Section 4 should be compared with those obtained using at least two other methods among the many available in the literature.

Author Response

1. I have reworked the introduction to make aims and objectives clearer.

2. I have added text in related works to highlight the issues and our claims for novelty.

3. In section three it is now just a discussion on the data. Section four is now methods only.

4. In the results section (now section four) I do point out that my method is compared with three other recomendation systems. The ROC curves demonstrate this. I have added tables for data used and one qualitative comparison with other systems.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The author has revised the paper as per all my comments and feedback. I do not have any additional comments at this point. I recommend the publication of the paper in its current form.

Reviewer 2 Report

Dear author,

The manuscript is significantly improved and my only suggestion now is to provide some discussion as to how your approach (and results) compare to alternative approaches in the literature.

Article Menu

Analyzing Social Media Data Using Sentiment Mining and Bigram Analysis for the Recommendation of YouTube Videos

Further Information

Guidelines

MDPI Initiatives

Follow MDPI