Among all the different clustering approaches proposed so far, graph-based algorithms are particularly suited for dealing with data that does not come from a Gaussian or a spherical distribution. They can be used for detecting clusters of any size and shape without the need of specifying the actual number of clusters; moreover, they can be profitably used in cluster detection problems.Despite of the fact that graph-based methods are gaining more and more popularity in different scientific areas, the choice of an appropriate algorithm for a given application is still the most crucial task. In this paper, we then present a detailed performance evaluation of five different graph-based clustering approaches on a database of synthetically generated graphs. The main findings of such an analysis were that algorithms based on the Minimum Spanning Tree perform better than other approaches.Four of the algorithms selected for comparison have been chosen from the open literature. While these algorithms do not require the setting of the number of clusters, they need, however, some parameters to be provided by the user. So, as the fifth algorithm under comparison, we propose an approach that overcomes this limitation, proving to be an effective solution in real applications where a completely unsupervised method for cluster detection is desirable. This was confirmed by a further comparative analysis carried out on four datasets coming from the UCI Machine Learning Repository.
Benchmarking graph-based clustering algorithms
FOGGIA, PASQUALE;PERCANNELLA, Gennaro;VENTO, Mario
2009-01-01
Abstract
Among all the different clustering approaches proposed so far, graph-based algorithms are particularly suited for dealing with data that does not come from a Gaussian or a spherical distribution. They can be used for detecting clusters of any size and shape without the need of specifying the actual number of clusters; moreover, they can be profitably used in cluster detection problems.Despite of the fact that graph-based methods are gaining more and more popularity in different scientific areas, the choice of an appropriate algorithm for a given application is still the most crucial task. In this paper, we then present a detailed performance evaluation of five different graph-based clustering approaches on a database of synthetically generated graphs. The main findings of such an analysis were that algorithms based on the Minimum Spanning Tree perform better than other approaches.Four of the algorithms selected for comparison have been chosen from the open literature. While these algorithms do not require the setting of the number of clusters, they need, however, some parameters to be provided by the user. So, as the fifth algorithm under comparison, we propose an approach that overcomes this limitation, proving to be an effective solution in real applications where a completely unsupervised method for cluster detection is desirable. This was confirmed by a further comparative analysis carried out on four datasets coming from the UCI Machine Learning Repository.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.