In this article an introduction on unsupervised cluster analysis is provided. Clustering is the organisation of unlabelled data into similarity groups called clusters. A cluster is a collection of data items which are similar between them, and dissimilar to data items in other clusters. Three are the main elements needed to perform cluster analysis: a proximity measure, to evaluate similarity between patterns and the classical one is the Euclidean distance; a quality measure, to evaluate the results of the analysis; a clustering algorithm. In this article three classical algorithms are described: k-means, Hierarchical clustering and Expectation Maximisation. Furthermore, an example of their application on a gene expression dataset for patient sub-typing purpose is provided.
Unsupervised learning: Clustering
Serra A.;Tagliaferri R.
2018-01-01
Abstract
In this article an introduction on unsupervised cluster analysis is provided. Clustering is the organisation of unlabelled data into similarity groups called clusters. A cluster is a collection of data items which are similar between them, and dissimilar to data items in other clusters. Three are the main elements needed to perform cluster analysis: a proximity measure, to evaluate similarity between patterns and the classical one is the Euclidean distance; a quality measure, to evaluate the results of the analysis; a clustering algorithm. In this article three classical algorithms are described: k-means, Hierarchical clustering and Expectation Maximisation. Furthermore, an example of their application on a gene expression dataset for patient sub-typing purpose is provided.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.