With the surge of the large volume of data availability, Machine Learning and mainly Deep Learning techniques are the leading solutions in classification and predictive tasks, targeted at data-efficient learning. These models learn by training on many diversified samples in a process that is computationally expensive or time-consuming. Moreover, in many real-world scenarios, the amount of available data for training is unsuitable, because it is unlabeled or covers only portions of the whole reference domain cases. This paper proposes an alternative approach for document classification that leverages the distribution of the data projected in the multi-dimensional feature space to assess the weight of features in the final classification. The approach does not rely on traditional iterative methods for classification but builds a relevance measure to assess the relevance/importance of the features describing the domain of interest. The idea is to harness this metric to select relevant features and then express the values calculated by these metrics in natural language by exploiting fuzzy variables and linguistic labels to make human comprehension more immediate. The approach has been employed for emotion extraction from social media messages. The novelty of this approach is twofold: first, the well-known TF-IDF measure was reinterpreted as a relevance measure of emotions discovered in text content. Then, the discovered emotion relevance was described by fuzzy linguistic labels, defined on an ad-hoc-designed fuzzy partition, to express the data classification in natural language, more suitable to human understanding.

A fuzzy partition-based method to classify social messages assessing their emotional relevance

Senatore, S.
2022-01-01

Abstract

With the surge of the large volume of data availability, Machine Learning and mainly Deep Learning techniques are the leading solutions in classification and predictive tasks, targeted at data-efficient learning. These models learn by training on many diversified samples in a process that is computationally expensive or time-consuming. Moreover, in many real-world scenarios, the amount of available data for training is unsuitable, because it is unlabeled or covers only portions of the whole reference domain cases. This paper proposes an alternative approach for document classification that leverages the distribution of the data projected in the multi-dimensional feature space to assess the weight of features in the final classification. The approach does not rely on traditional iterative methods for classification but builds a relevance measure to assess the relevance/importance of the features describing the domain of interest. The idea is to harness this metric to select relevant features and then express the values calculated by these metrics in natural language by exploiting fuzzy variables and linguistic labels to make human comprehension more immediate. The approach has been employed for emotion extraction from social media messages. The novelty of this approach is twofold: first, the well-known TF-IDF measure was reinterpreted as a relevance measure of emotions discovered in text content. Then, the discovered emotion relevance was described by fuzzy linguistic labels, defined on an ad-hoc-designed fuzzy partition, to express the data classification in natural language, more suitable to human understanding.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4778651
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? 5
social impact