Among all the different clustering approaches proposed so far, graph-based algorithms are particularly suited for dealing with data that does not come from a Gaussian or a spherical distribution. They can be used for detecting clusters of any size and shape without the need of specifying the actual number of clusters; moreover, they can be profitably used in cluster detection problems.Despite of the fact that graph-based methods are gaining more and more popularity in different scientific areas, the choice of an appropriate algorithm for a given application is still the most crucial task. In this paper, we then present a detailed performance evaluation of five different graph-based clustering approaches on a database of synthetically generated graphs. The main findings of such an analysis were that algorithms based on the Minimum Spanning Tree perform better than other approaches.Four of the algorithms selected for comparison have been chosen from the open literature. While these algorithms do not require the setting of the number of clusters, they need, however, some parameters to be provided by the user. So, as the fifth algorithm under comparison, we propose an approach that overcomes this limitation, proving to be an effective solution in real applications where a completely unsupervised method for cluster detection is desirable. This was confirmed by a further comparative analysis carried out on four datasets coming from the UCI Machine Learning Repository.

Benchmarking graph-based clustering algorithms

FOGGIA, PASQUALE;PERCANNELLA, Gennaro;VENTO, Mario
2009

Abstract

Among all the different clustering approaches proposed so far, graph-based algorithms are particularly suited for dealing with data that does not come from a Gaussian or a spherical distribution. They can be used for detecting clusters of any size and shape without the need of specifying the actual number of clusters; moreover, they can be profitably used in cluster detection problems.Despite of the fact that graph-based methods are gaining more and more popularity in different scientific areas, the choice of an appropriate algorithm for a given application is still the most crucial task. In this paper, we then present a detailed performance evaluation of five different graph-based clustering approaches on a database of synthetically generated graphs. The main findings of such an analysis were that algorithms based on the Minimum Spanning Tree perform better than other approaches.Four of the algorithms selected for comparison have been chosen from the open literature. While these algorithms do not require the setting of the number of clusters, they need, however, some parameters to be provided by the user. So, as the fifth algorithm under comparison, we propose an approach that overcomes this limitation, proving to be an effective solution in real applications where a completely unsupervised method for cluster detection is desirable. This was confirmed by a further comparative analysis carried out on four datasets coming from the UCI Machine Learning Repository.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11386/2261591
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 12
  • ???jsp.display-item.citation.isi??? 7
social impact