One of the main challenges in data profiling is to efficiently extract metadata from dynamic information sources, by avoiding the processing of the whole dataset from scratch upon modifications. In this paper, we present IndiBits, an algorithm for discovering relaxed functional dependencies (RFDs for short), which represent data relationships relying on approximate matching paradigms. IndiBits is able to dynamically infer and update the RFDs holding on a dataset upon modification operations performed on it. It exploits a binary representation of data similarities, a new validation method, and specific search methods, to dynamically update the set of RFDs, based on previously holding RFDs and the type of modifications performed over data. Experimental results demonstrate the effectiveness of IndiBits on real-world datasets, even in comparison with FD and RFD discovery algorithms in both static and dynamic scenarios.
IndiBits: Incremental Discovery of Relaxed Functional Dependencies using Bitwise Similarity
Breve, Bernardo;Caruccio, Loredana;Cirillo, Stefano;Deufemia, Vincenzo;Polese, Giuseppe
2023-01-01
Abstract
One of the main challenges in data profiling is to efficiently extract metadata from dynamic information sources, by avoiding the processing of the whole dataset from scratch upon modifications. In this paper, we present IndiBits, an algorithm for discovering relaxed functional dependencies (RFDs for short), which represent data relationships relying on approximate matching paradigms. IndiBits is able to dynamically infer and update the RFDs holding on a dataset upon modification operations performed on it. It exploits a binary representation of data similarities, a new validation method, and specific search methods, to dynamically update the set of RFDs, based on previously holding RFDs and the type of modifications performed over data. Experimental results demonstrate the effectiveness of IndiBits on real-world datasets, even in comparison with FD and RFD discovery algorithms in both static and dynamic scenarios.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.