One of the primary tasks of text mining is to organise a large number of unlabeled documents into a smaller set of meaningful and coherent clusters that are similar in content. Clustering algorithms typically operate on document x term matrices, where each document is represented as a vector in an algebraic format. Alternatively, a collection of documents can be represented using a documents x documents structure, which can be viewed as an adjacency matrix and graphically depicted as a graph. In network analysis, community detection is used on these graphs to identify groups of nodes that share common characteristics and perform similar functions. This paper aims to evaluate different data structures and grouping criteria, showing the effectiveness of various alternatives in a text categorisation strategy. We conduct a comparative study involving classical text clustering methods and community detection approaches, examining and discussing their performances.

From Vectors to Networks: Comparing conventional and graph-based approaches to Unsupervised Text Categorisation

Michelangelo Misuraca
;
2025

Abstract

One of the primary tasks of text mining is to organise a large number of unlabeled documents into a smaller set of meaningful and coherent clusters that are similar in content. Clustering algorithms typically operate on document x term matrices, where each document is represented as a vector in an algebraic format. Alternatively, a collection of documents can be represented using a documents x documents structure, which can be viewed as an adjacency matrix and graphically depicted as a graph. In network analysis, community detection is used on these graphs to identify groups of nodes that share common characteristics and perform similar functions. This paper aims to evaluate different data structures and grouping criteria, showing the effectiveness of various alternatives in a text categorisation strategy. We conduct a comparative study involving classical text clustering methods and community detection approaches, examining and discussing their performances.
2025
978-3-032-03041-2
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4916295
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact