Data stream profiling concerns the automatic extraction of metadata from a data stream, without having the possibility to store it. Among the metadata of interest, functional dependencies (FDs), and their extensions relaxed functional dependencies (RFDs), represent an important semantic property of data. Nowadays, there are many algorithms for automatically discovering them from static datasets, and some are being proposed for data streams. However, one of the main problems is that the stream nature of data requires a different paradigm of monitoring, since the “big” number of (R)FDs that might hold on a given dataset continuously change as new data are read from the stream. In this paper, we present a tool for visualizing RFDs discovered from a data stream. The tool permits to explore results for different types of RFDs, and uses quantitative measures to monitor how discovery results evolve. Moreover, the tool enables the comparison among RFDs discovered across several executions, also proving visual manipulation operators to dynamically compose and filter results. A user study has been conducted to assess the effectiveness of the proposed visualization tool.
Dependency Visualization in Data Stream Profiling
Breve B.;Caruccio L.;Cirillo S.;Deufemia V.;Polese G.
2021-01-01
Abstract
Data stream profiling concerns the automatic extraction of metadata from a data stream, without having the possibility to store it. Among the metadata of interest, functional dependencies (FDs), and their extensions relaxed functional dependencies (RFDs), represent an important semantic property of data. Nowadays, there are many algorithms for automatically discovering them from static datasets, and some are being proposed for data streams. However, one of the main problems is that the stream nature of data requires a different paradigm of monitoring, since the “big” number of (R)FDs that might hold on a given dataset continuously change as new data are read from the stream. In this paper, we present a tool for visualizing RFDs discovered from a data stream. The tool permits to explore results for different types of RFDs, and uses quantitative measures to monitor how discovery results evolve. Moreover, the tool enables the comparison among RFDs discovered across several executions, also proving visual manipulation operators to dynamically compose and filter results. A user study has been conducted to assess the effectiveness of the proposed visualization tool.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.