UniSa - IRIS Institutional Research Information System

Functional dependencies (FDs) are one of the metadata used to assess data quality and to perform data cleaning operations. However, in order to pursue robustness with respect to data errors, it has been necessary to devise imprecise versions of functional dependencies, yielding relaxed functional dependencies (RFDs). Among them, there exists the class of RFDs relaxing on the extent, i.e., those admitting the possibility that an FD holds on a subset of data. In the literature several algorithms to automatically discover RFDs from big data collections have been defined. They achieve good performances with respect to the inherent problem complexity. However, most of them are capable of discovering RFDs only by batch processing the entire dataset. This is not suitable in the era of big data, where the size of a database instance can grow with high-velocity, and the insertion of new data can invalidate previously holding RFDs. Thus, it is necessary to devise incremental discovery algorithms capable of updating the set of holding RFDs upon data insertions, without processing the entire dataset. To this end, in this paper we propose an incremental discovery algorithm for RFDs relaxing on the extent. It manages the validation of candidate RFDs and the generation of possibly new RFD candidates upon the insertion of the new tuples, while limiting the size of the overall search space. Experimental results show that the proposed algorithm achieves extremely good performances on real-world datasets.

Incremental Discovery of Imprecise Functional Dependencies

Caruccio, Loredana;Cirillo, Stefano

2020

Abstract

Functional dependencies (FDs) are one of the metadata used to assess data quality and to perform data cleaning operations. However, in order to pursue robustness with respect to data errors, it has been necessary to devise imprecise versions of functional dependencies, yielding relaxed functional dependencies (RFDs). Among them, there exists the class of RFDs relaxing on the extent, i.e., those admitting the possibility that an FD holds on a subset of data. In the literature several algorithms to automatically discover RFDs from big data collections have been defined. They achieve good performances with respect to the inherent problem complexity. However, most of them are capable of discovering RFDs only by batch processing the entire dataset. This is not suitable in the era of big data, where the size of a database instance can grow with high-velocity, and the insertion of new data can invalidate previously holding RFDs. Thus, it is necessary to devise incremental discovery algorithms capable of updating the set of holding RFDs upon data insertions, without processing the entire dataset. To this end, in this paper we propose an incremental discovery algorithm for RFDs relaxing on the extent. It manages the validation of candidate RFDs and the generation of possibly new RFD candidates upon the insertion of the new tuples, while limiting the size of the overall search space. Experimental results show that the proposed algorithm achieves extremely good performances on real-world datasets.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2020

Appare nelle tipologie:

1.1.1 Articolo su rivista con DOI

File in questo prodotto:

File	Dimensione	Formato
Incremental Discovery of Imprecise Functional Dependencies.pdf accesso aperto Descrizione: post-print Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione) Licenza: DRM non definito Dimensione 1.33 MB Formato Adobe PDF Visualizza/Apri	1.33 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4748740

Citazioni

ND

27

ND

social impact