Clustering is one of the most important unsupervised learning problems and it consists of finding a common structure in a collection of unlabeled data. However, due to the ill-posed nature of the problem, different runs of the same clustering algorithm applied to the same data-set usually produce different solutions. In this scenario choosing a single solution is quite arbitrary. On the other hand, in many applications the problem of multiple solutions becomes intractable, hence it is often more desirable to provide a limited group of ‘‘good’’ clusterings rather than a single solution. In the present paper we propose the least squares consensus clustering. This technique allows to extrapolate a small number of different clustering solutions from an initial (large) ensemble obtained by applying any clustering algorithm to a given data-set. We also define a measure of quality and present a graphical visualization of each consensus clustering to make immediately interpretable the strength of the consensus. We have carried out several numerical experiments both on synthetic and real data-sets to illustrate the proposed methodology.

Beyond classical consensus clustering: The least squares approachto multiple solutions

MURINO, LOREDANA;RAICONI, Giancarlo;TAGLIAFERRI, Roberto
2011

Abstract

Clustering is one of the most important unsupervised learning problems and it consists of finding a common structure in a collection of unlabeled data. However, due to the ill-posed nature of the problem, different runs of the same clustering algorithm applied to the same data-set usually produce different solutions. In this scenario choosing a single solution is quite arbitrary. On the other hand, in many applications the problem of multiple solutions becomes intractable, hence it is often more desirable to provide a limited group of ‘‘good’’ clusterings rather than a single solution. In the present paper we propose the least squares consensus clustering. This technique allows to extrapolate a small number of different clustering solutions from an initial (large) ensemble obtained by applying any clustering algorithm to a given data-set. We also define a measure of quality and present a graphical visualization of each consensus clustering to make immediately interpretable the strength of the consensus. We have carried out several numerical experiments both on synthetic and real data-sets to illustrate the proposed methodology.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11386/3035888
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 5
social impact