Stance detection is one of the many NLP tasks that is gaining importance with the spreading of information on the internet and social media. Even though there are several studies that conduct stance classification experiments using machine learning models, those that utilize a set of variables including emotional aspects are not that many. In this work, starting from a set of tweets, we describe the procedure of feature extraction and, using two machine learning models (random forest and support vector) we evaluate our results. The set of considered features includes variables measuring emotional, sentiment-based, readability, morality and engagement-related aspects. We examine which features could be considered useful by exploiting the Mann-Whitney test and the recursive feature elimination procedure. Doing this we provide a specific analysis related to the emotional and sentiment-based aspects. We find out also that features taking into consideration both target and message seem to be particularly advantageous. Also regressors measuring lexical aspects like readability and complexity of the text seems to give a good contribution. The best results are obtained using a selected and smaller set of variables that obtains the 50% of F1 score. Although, in general, this is not very high, it is comparable with those obtained by other models taking into consideration mainly lexical aspects of the same data.

Selecting a Reduced Set of Features for Supporting the Stance Detection Task

Damiano E.;Gaeta A.;Orciuoli F.
2023-01-01

Abstract

Stance detection is one of the many NLP tasks that is gaining importance with the spreading of information on the internet and social media. Even though there are several studies that conduct stance classification experiments using machine learning models, those that utilize a set of variables including emotional aspects are not that many. In this work, starting from a set of tweets, we describe the procedure of feature extraction and, using two machine learning models (random forest and support vector) we evaluate our results. The set of considered features includes variables measuring emotional, sentiment-based, readability, morality and engagement-related aspects. We examine which features could be considered useful by exploiting the Mann-Whitney test and the recursive feature elimination procedure. Doing this we provide a specific analysis related to the emotional and sentiment-based aspects. We find out also that features taking into consideration both target and message seem to be particularly advantageous. Also regressors measuring lexical aspects like readability and complexity of the text seems to give a good contribution. The best results are obtained using a selected and smaller set of variables that obtains the 50% of F1 score. Although, in general, this is not very high, it is comparable with those obtained by other models taking into consideration mainly lexical aspects of the same data.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4847571
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact