Selecting an optimal clustering solution is a longstanding problem. In model-based clustering this amounts to choose the architecture of the model mixture distribution. Decisions to be made pertain to: cluster prototype distribution; number of mixture components; (optionally) restrictions on the clusters’ geometry. Classical pro- posals address this issue via penalized model selection criteria based on the observed likelihood function. In this study, we compare these techniques with the less explored cross-validation alternative, which is rather popular for many other data-driven opti- mized methods. We analyze both classical methods such as BIC, AIC, AIC3 and ICL, and several cross-validation schemes where the risk is defined in terms of minus the log-likelihood function. Selection methods are compared by using the Iris dataset.
Likelihood-type methods for comparing clustering solutions
Pietro Coretto
2019
Abstract
Selecting an optimal clustering solution is a longstanding problem. In model-based clustering this amounts to choose the architecture of the model mixture distribution. Decisions to be made pertain to: cluster prototype distribution; number of mixture components; (optionally) restrictions on the clusters’ geometry. Classical pro- posals address this issue via penalized model selection criteria based on the observed likelihood function. In this study, we compare these techniques with the less explored cross-validation alternative, which is rather popular for many other data-driven opti- mized methods. We analyze both classical methods such as BIC, AIC, AIC3 and ICL, and several cross-validation schemes where the risk is defined in terms of minus the log-likelihood function. Selection methods are compared by using the Iris dataset.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.