Data compression, data prediction, data classification, learning and data mining are all facets of the same (multidimensional) coin. In particular it is possible to use data compression as a metric for clustering. In this paper we test a clustering method that does not rely on any knowledge or theoretical analysis of the problem domain, but it relies only on general-purpose compression techniques. Our experiments, on different kinds of digital data, show that the results obtained are impressive: the system is versatile and, under appropriate conditions, robust. The experimental results are presented for clustering of digital data representing heterogeneous data, text in different languages, drugs, cereals, and music
A “Blind” Approach to Clustering Through Data Compression
CARPENTIERI, Bruno
2013
Abstract
Data compression, data prediction, data classification, learning and data mining are all facets of the same (multidimensional) coin. In particular it is possible to use data compression as a metric for clustering. In this paper we test a clustering method that does not rely on any knowledge or theoretical analysis of the problem domain, but it relies only on general-purpose compression techniques. Our experiments, on different kinds of digital data, show that the results obtained are impressive: the system is versatile and, under appropriate conditions, robust. The experimental results are presented for clustering of digital data representing heterogeneous data, text in different languages, drugs, cereals, and musicI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.