The spread of new social networks in recent years, especially among adolescents, has increased the spread of social posts encouraging harmful behaviors, targeting people based on factors such as race, sex, or personal beliefs. This phenomenon makes it necessary to define intelligent tools capable of efficiently analyzing social media content. Recent Large Language Models (LLMs) have demonstrated advanced text generation and comprehension capabilities, making them efficient tools for identifying harmful posts. In this paper, we perform a large-scale evaluation of 20 generative LLMs in detecting cyberbullying phenomena in real social media posts through a new ad-hoc prompt Machine Learning approach (Prompt-based ML). We evaluate LLMs on binary and multiclass classification tasks on thousands of real posts from X, Facebook, and Reddit, and also compare their performance with 24 machine learning and natural language processing models. Specifically, the comparison analysis aims to understand the cyberbullying discrimination capability of LLMs with respect to traditional models, and the obtained findings to select suitable models for identifying harmful content on social network platforms. Furthermore, we provide an evaluation of the clarity, coherence, and relevance of the explanations provided by LLMs downstream of the identification of cyberbullying in social posts involving three domain experts. Experimental results highlight high performances of LLMs, particularly Claude 3.0 and Mistral family models, in identifying different types of cyberbullying. The domain expert evaluation of explainability showed that LLMs belonging to the Claude and Mistral families had better scores for clarity, coherence and relevance in their explanations compared to other models.
Exploring the ability of emerging large language models to detect cyberbullying in social posts through new prompt-based classification approaches
Cirillo S.
;Desiato D.;Polese G.;Solimando G.;
2025
Abstract
The spread of new social networks in recent years, especially among adolescents, has increased the spread of social posts encouraging harmful behaviors, targeting people based on factors such as race, sex, or personal beliefs. This phenomenon makes it necessary to define intelligent tools capable of efficiently analyzing social media content. Recent Large Language Models (LLMs) have demonstrated advanced text generation and comprehension capabilities, making them efficient tools for identifying harmful posts. In this paper, we perform a large-scale evaluation of 20 generative LLMs in detecting cyberbullying phenomena in real social media posts through a new ad-hoc prompt Machine Learning approach (Prompt-based ML). We evaluate LLMs on binary and multiclass classification tasks on thousands of real posts from X, Facebook, and Reddit, and also compare their performance with 24 machine learning and natural language processing models. Specifically, the comparison analysis aims to understand the cyberbullying discrimination capability of LLMs with respect to traditional models, and the obtained findings to select suitable models for identifying harmful content on social network platforms. Furthermore, we provide an evaluation of the clarity, coherence, and relevance of the explanations provided by LLMs downstream of the identification of cyberbullying in social posts involving three domain experts. Experimental results highlight high performances of LLMs, particularly Claude 3.0 and Mistral family models, in identifying different types of cyberbullying. The domain expert evaluation of explainability showed that LLMs belonging to the Claude and Mistral families had better scores for clarity, coherence and relevance in their explanations compared to other models.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.