UniSa - IRIS Institutional Research Information System

Over the past decade, there have been numerous extensions to the definition of Functional Dependency (FD), culminating in the introduction of Relaxed Functional Dependency (RFD), offering more flexible constraints compared to traditional FDs. This increased flexibility makes RFDs well-suited for exploring and profiling data in datasets with lower data quality. However, efficiently identifying RFDs within dynamic data sources presents a significant challenge, as it requires processing an entire dataset from scratch whenever modifications occur. To tackle this problem, incremental discovery algorithms have been defined, but they often suffer when the frequency and the size of batches of updates increase. This paper presents a new algorithm, namely D-INDIBITS, relying on a new decentralized architecture to balance the workload that drives the incremental discovery process of INDIBITS, which is based on bitwise operators for computing attribute similarities. Experiments demonstrate DINDIBITS's effectiveness compared to FD and RFD discovery algorithms on both static and dynamic real-world data. With batches of modifications of sizes 10k and 100k, D-INDIBITS is capable of updating the set of RFDs in a few seconds, whereas all other approaches often employ more than 3 hours.

Decentralized and Incremental Discovery of Relaxed Functional Dependencies Using Bitwise Similarity

Breve B.;Caruccio L.;Cirillo S.;Deufemia V.;Polese G.

2024

Abstract

Over the past decade, there have been numerous extensions to the definition of Functional Dependency (FD), culminating in the introduction of Relaxed Functional Dependency (RFD), offering more flexible constraints compared to traditional FDs. This increased flexibility makes RFDs well-suited for exploring and profiling data in datasets with lower data quality. However, efficiently identifying RFDs within dynamic data sources presents a significant challenge, as it requires processing an entire dataset from scratch whenever modifications occur. To tackle this problem, incremental discovery algorithms have been defined, but they often suffer when the frequency and the size of batches of updates increase. This paper presents a new algorithm, namely D-INDIBITS, relying on a new decentralized architecture to balance the workload that drives the incremental discovery process of INDIBITS, which is based on bitwise operators for computing attribute similarities. Experiments demonstrate DINDIBITS's effectiveness compared to FD and RFD discovery algorithms on both static and dynamic real-world data. With batches of modifications of sizes 10k and 100k, D-INDIBITS is capable of updating the set of RFDs in a few seconds, whereas all other approaches often employ more than 3 hours.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2024

Appare nelle tipologie:

1.1.1 Articolo su rivista con DOI

File in questo prodotto:

File	Dimensione	Formato
TKDE-2024.pdf accesso aperto Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Creative commons Dimensione 3.8 MB Formato Adobe PDF Visualizza/Apri	3.8 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4869753

Citazioni

ND

1

0

social impact