A genetic algorithm to discover relaxed functional dependencies from data

Caruccio, Loredana; Deufemia, Vincenzo; Polese, Giuseppe

Approximate functional dependencies are used in many emerging application domains, such as the identification of data inconsistencies or patterns of semantically related data, query rewriting, and so forth. They can approximate the canonical definition of functional dependency (FD) by relaxing on the data comparison (i.e., by considering data similarity rather than equality), on the extent (i.e., by admitting the possibility that the dependency holds on a subset of data), or both. Approximate FDs are difficult to be identified at design time like it happens with FDs. In this paper, we propose a genetic algorithm to discover approximate FDs from data. An empirical evaluation demonstrates the effectiveness of the algorithm.