The availability of big data collections, together with powerful hardware and software mechanisms to process them, gives nowadays the possibility to learn useful insights from data, which can be exploited for multiple purposes, including marketing, fault prevention, and so forth. However, it is also possible to learn important metadata that can suggest how data should be manipulated in several advanced operations. In this paper, we show the potentiality of learning from data by focusing on the problem of relaxing the results of database queries, that is, trying to return some approximated answer to a query when a result for it is unavailable in the database, and the system will return an empty answer set, or even worse, erroneous mismatch results. In particular, we introduce a novel approach to rewrite queries that are in disjunctive normal form and contain a mixture of discrete and continuous attributes. The approach preprocesses data collections to discover the implicit relationships that exist among the various domain attributes, and then uses this knowledge to rewrite the constraints from the failing query. In a first step, the approach tries to learn a set of functional dependencies from the data, which are ranked according to special mechanisms that will successively allow to predict the order in which the extracted dependencies have to be used to properly rewrite the failing query. An experimental evaluation of the approach on three real data sets shows its effectiveness in terms of robustness and coverage.

Learning Effective Query Management Strategies from Big Data

Loredana Caruccio;Vincenzo Deufemia
;
Giuseppe Polese
2017-01-01

Abstract

The availability of big data collections, together with powerful hardware and software mechanisms to process them, gives nowadays the possibility to learn useful insights from data, which can be exploited for multiple purposes, including marketing, fault prevention, and so forth. However, it is also possible to learn important metadata that can suggest how data should be manipulated in several advanced operations. In this paper, we show the potentiality of learning from data by focusing on the problem of relaxing the results of database queries, that is, trying to return some approximated answer to a query when a result for it is unavailable in the database, and the system will return an empty answer set, or even worse, erroneous mismatch results. In particular, we introduce a novel approach to rewrite queries that are in disjunctive normal form and contain a mixture of discrete and continuous attributes. The approach preprocesses data collections to discover the implicit relationships that exist among the various domain attributes, and then uses this knowledge to rewrite the constraints from the failing query. In a first step, the approach tries to learn a set of functional dependencies from the data, which are ranked according to special mechanisms that will successively allow to predict the order in which the extracted dependencies have to be used to properly rewrite the failing query. An experimental evaluation of the approach on three real data sets shows its effectiveness in terms of robustness and coverage.
2017
978-1-5386-1417-4
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4703669
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 5
social impact