Many modern application contexts, especially those related to the semantic Web, advocate for automatic techniques capable of extracting relationships between semi-structured data, for several purposes, such as the identification of inconsistencies or patterns of semantically related data, query rewriting, and so forth. One way to represent such relationships is to use relaxed functional dependencies (rfds), since they can embed approximate matching paradigms to compare unstructured data, and admit the possibility of exceptions for them. To this end, thresholds might need to be specified in order to limit the similarity degree in approximate comparisons or the occurrence of exceptions. Thanks to the availability of huge amount of data, including unstructured data available on the Web, nowadays it is possible to automatically discover rfds from data. However, due to the many different combinations of similarity and exception thresholds, the discovery process has an exponential complexity. Thus, it is vital devising proper optimization strategies, in order to make the discovery process feasible. To this end, in this paper, we propose a genetic algorithm to discover rfds from data, also providing an empirical evaluation demonstrating its effectiveness.

Evolutionary mining of relaxed dependencies from big data collections

CARUCCIO, LOREDANA;DEUFEMIA, Vincenzo;POLESE, Giuseppe
2017

Abstract

Many modern application contexts, especially those related to the semantic Web, advocate for automatic techniques capable of extracting relationships between semi-structured data, for several purposes, such as the identification of inconsistencies or patterns of semantically related data, query rewriting, and so forth. One way to represent such relationships is to use relaxed functional dependencies (rfds), since they can embed approximate matching paradigms to compare unstructured data, and admit the possibility of exceptions for them. To this end, thresholds might need to be specified in order to limit the similarity degree in approximate comparisons or the occurrence of exceptions. Thanks to the availability of huge amount of data, including unstructured data available on the Web, nowadays it is possible to automatically discover rfds from data. However, due to the many different combinations of similarity and exception thresholds, the discovery process has an exponential complexity. Thus, it is vital devising proper optimization strategies, in order to make the discovery process feasible. To this end, in this paper, we propose a genetic algorithm to discover rfds from data, also providing an empirical evaluation demonstrating its effectiveness.
2017
9781450352253
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4691706
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 16
  • ???jsp.display-item.citation.isi??? ND
social impact