Next Article in Journal
Understanding the Role of Visualizations on Decision Making: A Study on Working Memory
Previous Article in Journal
Digital Educational Support Groups Administered through WhatsApp Messenger Improve Health-Related Knowledge and Health Behaviors of New Adolescent Mothers in the Dominican Republic: A Multi-Method Study
Open AccessArticle

Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter

1
School of Computer Science and Statistics, Trinity College Dublin, D02 PN40 Dublin, Ireland
2
ADAPT Centre, D02 PN40 Dublin, Ireland
*
Author to whom correspondence should be addressed.
Received: 30 September 2020 / Revised: 12 November 2020 / Accepted: 13 November 2020 / Published: 15 November 2020
Twitter enables millions of active users to send and read concise messages on the internet every day. Yet some people use Twitter to propagate violent and threatening messages resulting in cyberbullying. Previous research has focused on whether cyberbullying behavior exists or not in a tweet (binary classification). In this research, we developed a model for detecting the severity of cyberbullying in a tweet. The developed model is a feature-based model that uses features from the content of a tweet, to develop a machine learning classifier for classifying the tweets as non-cyberbullied, and low, medium, or high-level cyberbullied tweets. In this study, we introduced pointwise semantic orientation as a new input feature along with utilizing predicted features (gender, age, and personality type) and Twitter API features. Results from experiments with our proposed framework in a multi-class setting are promising both with respect to Kappa (84%), classifier accuracy (93%), and F-measure (92%) metric. Overall, 40% of the classifiers increased performance in comparison with baseline approaches. Our analysis shows that features with the highest odd ratio: for detecting low-level severity include: age group between 19–22 years and users with <1 year of Twitter account activation; for medium-level severity: neuroticism, age group between 23–29 years, and being a Twitter user between one to two years; and for high-level severity: neuroticism and extraversion, and the number of times tweet has been favorited by other users. We believe that this research using a multi-class classification approach provides a step forward in identifying severity at different levels (low, medium, high) when the content of a tweet is classified as cyberbullied. Lastly, the current study only focused on the Twitter platform; other social network platforms can be investigated using the same approach to detect cyberbullying severity patterns. View Full-Text
Keywords: cyberbullying; Twitter; social networks; algorithms cyberbullying; Twitter; social networks; algorithms
Show Figures

Figure 1

MDPI and ACS Style

Talpur, B.A.; O’Sullivan, D. Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter. Informatics 2020, 7, 52. https://0-doi-org.brum.beds.ac.uk/10.3390/informatics7040052

AMA Style

Talpur BA, O’Sullivan D. Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter. Informatics. 2020; 7(4):52. https://0-doi-org.brum.beds.ac.uk/10.3390/informatics7040052

Chicago/Turabian Style

Talpur, Bandeh A.; O’Sullivan, Declan. 2020. "Multi-Class Imbalance in Text Classification: A Feature Engineering Approach to Detect Cyberbullying in Twitter" Informatics 7, no. 4: 52. https://0-doi-org.brum.beds.ac.uk/10.3390/informatics7040052

Find Other Styles
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop