Data compression, data prediction, data classification, learning and data mining are all facets of the same (multidimensional) coin. In particular it is possible to use data compression as a metric for clustering. In this paper we test a clustering method that does not rely on any knowledge or theoretical analysis of the problem domain, but it relies only on general-purpose compression techniques. Our experiments, on different kinds of digital data, show that the results obtained are impressive: the system is versatile and, under appropriate conditions, robust. The experimental results are presented for clustering of digital data representing heterogeneous data, text in different languages, drugs, cereals, and music

A “Blind” Approach to Clustering Through Data Compression

CARPENTIERI, Bruno
2013-01-01

Abstract

Data compression, data prediction, data classification, learning and data mining are all facets of the same (multidimensional) coin. In particular it is possible to use data compression as a metric for clustering. In this paper we test a clustering method that does not rely on any knowledge or theoretical analysis of the problem domain, but it relies only on general-purpose compression techniques. Our experiments, on different kinds of digital data, show that the results obtained are impressive: the system is versatile and, under appropriate conditions, robust. The experimental results are presented for clustering of digital data representing heterogeneous data, text in different languages, drugs, cereals, and music
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4250091
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact