Online platforms are nurturing social interactions, yet regrettably, they have also led to the proliferation of antisocial behaviors such as cyberbullying, trolling, and hate speech on a global scale. The identification of hate speech and aggression has become indispensable in the fight against cyberbullying and online harassment. Cyberbullying encompasses the use of aggressive and offensive language, including rude, insulting, hateful, and teasing comments, to inflict harm on individuals through social media platforms. Human moderation is both sluggish and costly, rendering it impractical in light of the exponential growth of data. Consequently, automated detection systems are imperative to effectively combat trolling. This study addresses the challenge of automatically discerning cyberbullying in tweets sourced from a publicly available cyberbullying dataset. The proposed methodology leverages the robustly optimized bidirectional encoder representations from transformers approach (RoBERTa), integrating principle component analysis (PCA) extracted global vectors for word representation (GLOVE) word embedding features. Furthermore, our proposed approach is benchmarked against state-of-the-art machine learning, deep learning, and transformer-based methods, utilizing the GLOVE word embedding technique. Statistical analyses reveal that our proposed model outperforms its counterparts, achieving a 0.98 accuracy and recall rate with 0.97 of precision and F1 score in detecting cyberbullying tweets. Results from k-fold cross validation further corroborate the superior performance of our proposed model.

Cyberbullying Detection Using PCA Extracted GLOVE Features and RoBERTaNet Transformer Learning Model

Cascone, Lucia
;
Nappi, Michele
2024-01-01

Abstract

Online platforms are nurturing social interactions, yet regrettably, they have also led to the proliferation of antisocial behaviors such as cyberbullying, trolling, and hate speech on a global scale. The identification of hate speech and aggression has become indispensable in the fight against cyberbullying and online harassment. Cyberbullying encompasses the use of aggressive and offensive language, including rude, insulting, hateful, and teasing comments, to inflict harm on individuals through social media platforms. Human moderation is both sluggish and costly, rendering it impractical in light of the exponential growth of data. Consequently, automated detection systems are imperative to effectively combat trolling. This study addresses the challenge of automatically discerning cyberbullying in tweets sourced from a publicly available cyberbullying dataset. The proposed methodology leverages the robustly optimized bidirectional encoder representations from transformers approach (RoBERTa), integrating principle component analysis (PCA) extracted global vectors for word representation (GLOVE) word embedding features. Furthermore, our proposed approach is benchmarked against state-of-the-art machine learning, deep learning, and transformer-based methods, utilizing the GLOVE word embedding technique. Statistical analyses reveal that our proposed model outperforms its counterparts, achieving a 0.98 accuracy and recall rate with 0.97 of precision and F1 score in detecting cyberbullying tweets. Results from k-fold cross validation further corroborate the superior performance of our proposed model.
2024
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4887274
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 0
social impact