Online platforms are nurturing social interactions, yet regrettably, they have also led to the proliferation of antisocial behaviors such as cyberbullying, trolling, and hate speech on a global scale. The identification of hate speech and aggression has become indispensable in the fight against cyberbullying and online harassment. Cyberbullying encompasses the use of aggressive and offensive language, including rude, insulting, hateful, and teasing comments, to inflict harm on individuals through social media platforms. Human moderation is both sluggish and costly, rendering it impractical in light of the exponential growth of data. Consequently, automated detection systems are imperative to effectively combat trolling. This study addresses the challenge of automatically discerning cyberbullying in tweets sourced from a publicly available cyberbullying dataset. The proposed methodology leverages the robustly optimized bidirectional encoder representations from transformers approach (RoBERTa), integrating principle component analysis (PCA) extracted global vectors for word representation (GLOVE) word embedding features. Furthermore, our proposed approach is benchmarked against state-of-the-art machine learning, deep learning, and transformer-based methods, utilizing the GLOVE word embedding technique. Statistical analyses reveal that our proposed model outperforms its counterparts, achieving a 0.98 accuracy and recall rate with 0.97 of precision and F1 score in detecting cyberbullying tweets. Results from k-fold cross validation further corroborate the superior performance of our proposed model.
Cyberbullying Detection Using PCA Extracted GLOVE Features and RoBERTaNet Transformer Learning Model
Cascone, Lucia
;Nappi, Michele
2024-01-01
Abstract
Online platforms are nurturing social interactions, yet regrettably, they have also led to the proliferation of antisocial behaviors such as cyberbullying, trolling, and hate speech on a global scale. The identification of hate speech and aggression has become indispensable in the fight against cyberbullying and online harassment. Cyberbullying encompasses the use of aggressive and offensive language, including rude, insulting, hateful, and teasing comments, to inflict harm on individuals through social media platforms. Human moderation is both sluggish and costly, rendering it impractical in light of the exponential growth of data. Consequently, automated detection systems are imperative to effectively combat trolling. This study addresses the challenge of automatically discerning cyberbullying in tweets sourced from a publicly available cyberbullying dataset. The proposed methodology leverages the robustly optimized bidirectional encoder representations from transformers approach (RoBERTa), integrating principle component analysis (PCA) extracted global vectors for word representation (GLOVE) word embedding features. Furthermore, our proposed approach is benchmarked against state-of-the-art machine learning, deep learning, and transformer-based methods, utilizing the GLOVE word embedding technique. Statistical analyses reveal that our proposed model outperforms its counterparts, achieving a 0.98 accuracy and recall rate with 0.97 of precision and F1 score in detecting cyberbullying tweets. Results from k-fold cross validation further corroborate the superior performance of our proposed model.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.