UniSa - IRIS Institutional Research Information System

Many modern application contexts, especially those related to the semantic Web, advocate for automatic techniques capable of extracting relationships between semi-structured data, for several purposes, such as the identification of inconsistencies or patterns of semantically related data, query rewriting, and so forth. One way to represent such relationships is to use relaxed functional dependencies (rfds), since they can embed approximate matching paradigms to compare unstructured data, and admit the possibility of exceptions for them. To this end, thresholds might need to be specified in order to limit the similarity degree in approximate comparisons or the occurrence of exceptions. Thanks to the availability of huge amount of data, including unstructured data available on the Web, nowadays it is possible to automatically discover rfds from data. However, due to the many different combinations of similarity and exception thresholds, the discovery process has an exponential complexity. Thus, it is vital devising proper optimization strategies, in order to make the discovery process feasible. To this end, in this paper, we propose a genetic algorithm to discover rfds from data, also providing an empirical evaluation demonstrating its effectiveness.

Evolutionary mining of relaxed dependencies from big data collections

CARUCCIO, LOREDANA;DEUFEMIA, Vincenzo;POLESE, Giuseppe

2017

Abstract

Many modern application contexts, especially those related to the semantic Web, advocate for automatic techniques capable of extracting relationships between semi-structured data, for several purposes, such as the identification of inconsistencies or patterns of semantically related data, query rewriting, and so forth. One way to represent such relationships is to use relaxed functional dependencies (rfds), since they can embed approximate matching paradigms to compare unstructured data, and admit the possibility of exceptions for them. To this end, thresholds might need to be specified in order to limit the similarity degree in approximate comparisons or the occurrence of exceptions. Thanks to the availability of huge amount of data, including unstructured data available on the Web, nowadays it is possible to automatically discover rfds from data. However, due to the many different combinations of similarity and exception thresholds, the discovery process has an exponential complexity. Thus, it is vital devising proper optimization strategies, in order to make the discovery process feasible. To this end, in this paper, we propose a genetic algorithm to discover rfds from data, also providing an empirical evaluation demonstrating its effectiveness.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2017
			
	ISBN
	
				9781450352253
			
	Appare nelle tipologie:
	
				4.1.1 Proceedings con DOI

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4691706

Citazioni

ND

16

ND

social impact