Cluster analysis requires fixing the number of clusters and often many hyper-parameters. In practice, one produces several partitions, and a final one is chosen based on validation or selection criteria. There exist an abundance of validation methods that, implicitly or explicitly, assume a certain clustering notion. In this paper, we focus on groups that can be well separated by quadratic or linear boundaries. The reference cluster concept is defined through the quadratic discriminant function and parameters describing clusters’ size, center and scatter. We develop two cluster-quality criteria that are consistent with groups generated from a class of elliptic–symmetric distributions. Using the bootstrap resampling of the proposed criteria, we propose a selection rule that allows choosing among many clustering solutions, eventually obtained from different methods. Extensive experimental analysis shows that the proposed methodology achieves a better overall performance compared to established alternatives from the literature.

Selecting the number of clusters, clustering models, and algorithms. A unifying approach based on the quadratic discriminant score

Coretto, Pietro
2023-01-01

Abstract

Cluster analysis requires fixing the number of clusters and often many hyper-parameters. In practice, one produces several partitions, and a final one is chosen based on validation or selection criteria. There exist an abundance of validation methods that, implicitly or explicitly, assume a certain clustering notion. In this paper, we focus on groups that can be well separated by quadratic or linear boundaries. The reference cluster concept is defined through the quadratic discriminant function and parameters describing clusters’ size, center and scatter. We develop two cluster-quality criteria that are consistent with groups generated from a class of elliptic–symmetric distributions. Using the bootstrap resampling of the proposed criteria, we propose a selection rule that allows choosing among many clustering solutions, eventually obtained from different methods. Extensive experimental analysis shows that the proposed methodology achieves a better overall performance compared to established alternatives from the literature.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4826203
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact