Functional dependencies (fds), and their extensions relaxed functional dependencies (rfds), represent an important semantic property of data. They have been widely used over the years for several advanced database operations. Thanks to the availability of discovery algorithms for inferring them from data, in the last years (relaxed) fds have been exploited in many new application contexts, including data cleansing and query relaxation. One of the main problems in this context is the possible "big" number of rfds that might hold on a given dataset, which might make it difficult for a user getting insights from them. On the other hand, one of the main challenges that has recently arisen is the possibility of monitoring how dependencies change during discovery processes run over data streams, also known as continuous discovery processes. To this end, in this paper we present a tool for visualizing the evolution of discovered rfds during continuous discovery processes. It permits to analyze detailed results for different types of rfds, and uses quantitative measures to monitor how discovery results evolve. Finally, in order to facilitate the analysis of results in long discovery processes, the tool enables the comparison among rfds holding in different time-slots. The effectiveness of the proposed tool has been evaluated in a case study focused on dependencies discovered from streams of data associated to the tweets posted over the Twitter social network.
Visualizing dependencies during incremental discovery processes
Breve B.;Caruccio L.;Cirillo S.;Deufemia V.;Polese G.
2020-01-01
Abstract
Functional dependencies (fds), and their extensions relaxed functional dependencies (rfds), represent an important semantic property of data. They have been widely used over the years for several advanced database operations. Thanks to the availability of discovery algorithms for inferring them from data, in the last years (relaxed) fds have been exploited in many new application contexts, including data cleansing and query relaxation. One of the main problems in this context is the possible "big" number of rfds that might hold on a given dataset, which might make it difficult for a user getting insights from them. On the other hand, one of the main challenges that has recently arisen is the possibility of monitoring how dependencies change during discovery processes run over data streams, also known as continuous discovery processes. To this end, in this paper we present a tool for visualizing the evolution of discovered rfds during continuous discovery processes. It permits to analyze detailed results for different types of rfds, and uses quantitative measures to monitor how discovery results evolve. Finally, in order to facilitate the analysis of results in long discovery processes, the tool enables the comparison among rfds holding in different time-slots. The effectiveness of the proposed tool has been evaluated in a case study focused on dependencies discovered from streams of data associated to the tweets posted over the Twitter social network.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.