With the advent of big data and data lakes, data are often integrated from multiple sources. Such integrated data are often of poor quality, due to inconsistencies, errors, and so forth. One way to check the quality of data is to infer functional dependencies (FDs). However, in many modern applications it might be necessary to extract properties and relationships that are not captured through FDs, due to the necessity to admit exceptions, or to consider similarity rather than equality of data values. Relaxed FDs (RFDs) have been introduced to meet these needs, but their discovery from data adds further complexity to an already complex problem, also due to the necessity of specifying similarity and validity thresholds. We propose DOMINO, a new discovery algorithm for RFDs that exploits the concept of dominance in order to derive similarity thresholds of attribute values while inferring RFDs. An experimental evaluation on real datasets demonstrates the discovery performance and the effectiveness of the proposed algorithm.

Discovering Relaxed Functional Dependencies based on Multi-attribute Dominance

Loredana Caruccio;Vincenzo Deufemia;Giuseppe Polese
2020-01-01

Abstract

With the advent of big data and data lakes, data are often integrated from multiple sources. Such integrated data are often of poor quality, due to inconsistencies, errors, and so forth. One way to check the quality of data is to infer functional dependencies (FDs). However, in many modern applications it might be necessary to extract properties and relationships that are not captured through FDs, due to the necessity to admit exceptions, or to consider similarity rather than equality of data values. Relaxed FDs (RFDs) have been introduced to meet these needs, but their discovery from data adds further complexity to an already complex problem, also due to the necessity of specifying similarity and validity thresholds. We propose DOMINO, a new discovery algorithm for RFDs that exploits the concept of dominance in order to derive similarity thresholds of attribute values while inferring RFDs. An experimental evaluation on real datasets demonstrates the discovery performance and the effectiveness of the proposed algorithm.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4733595
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 31
  • ???jsp.display-item.citation.isi??? 18
social impact