The extraction of metadata from dynamic data sources represents an extremely challenging task of the data profiling research area, since it requires to handle the update of the inferred metadata without processing the whole dataset from scratch upon modifications. This discussion paper presents IndiBits, an approach for discovering relaxed functional dependencies (rfds for short), which represent data relationships relying on approximate matching paradigms. It exploits a binary representation of data similarities, a new validation method, and specific search methods, to dynamically update the set of rfds, based on previously holding rfds and the type of modifications performed over data. Experimental results demonstrate the effectiveness of IndiBits on real-world datasets, even in comparison with fd and rfd discovery algorithms in both static and dynamic scenarios.
Relaxed Functional Dependency Discovery in Incremental Scenarios
Breve B.;Caruccio L.;Cirillo S.;Deufemia V.;Polese G.
2023-01-01
Abstract
The extraction of metadata from dynamic data sources represents an extremely challenging task of the data profiling research area, since it requires to handle the update of the inferred metadata without processing the whole dataset from scratch upon modifications. This discussion paper presents IndiBits, an approach for discovering relaxed functional dependencies (rfds for short), which represent data relationships relying on approximate matching paradigms. It exploits a binary representation of data similarities, a new validation method, and specific search methods, to dynamically update the set of rfds, based on previously holding rfds and the type of modifications performed over data. Experimental results demonstrate the effectiveness of IndiBits on real-world datasets, even in comparison with fd and rfd discovery algorithms in both static and dynamic scenarios.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.