One of the main challenges in data profiling is to efficiently extract metadata from dynamic information sources, by avoiding the processing of the whole dataset from scratch upon modifications. In this paper, we present IndiBits, an algorithm for discovering relaxed functional dependencies (RFDs for short), which represent data relationships relying on approximate matching paradigms. IndiBits is able to dynamically infer and update the RFDs holding on a dataset upon modification operations performed on it. It exploits a binary representation of data similarities, a new validation method, and specific search methods, to dynamically update the set of RFDs, based on previously holding RFDs and the type of modifications performed over data. Experimental results demonstrate the effectiveness of IndiBits on real-world datasets, even in comparison with FD and RFD discovery algorithms in both static and dynamic scenarios.

IndiBits: Incremental Discovery of Relaxed Functional Dependencies using Bitwise Similarity

Breve, Bernardo;Caruccio, Loredana;Cirillo, Stefano;Deufemia, Vincenzo;Polese, Giuseppe
2023-01-01

Abstract

One of the main challenges in data profiling is to efficiently extract metadata from dynamic information sources, by avoiding the processing of the whole dataset from scratch upon modifications. In this paper, we present IndiBits, an algorithm for discovering relaxed functional dependencies (RFDs for short), which represent data relationships relying on approximate matching paradigms. IndiBits is able to dynamically infer and update the RFDs holding on a dataset upon modification operations performed on it. It exploits a binary representation of data similarities, a new validation method, and specific search methods, to dynamically update the set of RFDs, based on previously holding RFDs and the type of modifications performed over data. Experimental results demonstrate the effectiveness of IndiBits on real-world datasets, even in comparison with FD and RFD discovery algorithms in both static and dynamic scenarios.
2023
979-8-3503-2227-9
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4853654
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
social impact