The availability of big data collections, together with powerful hardware and software mechanisms to process them, gives nowadays the possibility to learn useful insights from data, which can be exploited for multiple purposes, including marketing, fault prevention, and so forth. However, it is also possible to learn important metadata that can suggest how data should be manipulated in several advanced operations. In this paper, we show the potentiality of learning from data by focusing on the problem of relaxing the results of database queries, that is, trying to return some approximated answer to a query when a result for it is unavailable in the database, and the system will return an empty answer set, or even worse, erroneous mismatch results. In particular, we introduce a novel approach to rewrite queries that are in disjunctive normal form and contain a mixture of discrete and continuous attributes. The approach preprocesses data collections to discover the implicit relationships that exist among the various domain attributes, and then uses this knowledge to rewrite the constraints from the failing query. In a first step, the approach tries to learn a set of functional dependencies from the data, which are ranked according to special mechanisms that will successively allow to predict the order in which the extracted dependencies have to be used to properly rewrite the failing query. An experimental evaluation of the approach on three real data sets shows its effectiveness in terms of robustness and coverage.
Learning Effective Query Management Strategies from Big Data
Loredana Caruccio;Vincenzo Deufemia
;Giuseppe Polese
2017
Abstract
The availability of big data collections, together with powerful hardware and software mechanisms to process them, gives nowadays the possibility to learn useful insights from data, which can be exploited for multiple purposes, including marketing, fault prevention, and so forth. However, it is also possible to learn important metadata that can suggest how data should be manipulated in several advanced operations. In this paper, we show the potentiality of learning from data by focusing on the problem of relaxing the results of database queries, that is, trying to return some approximated answer to a query when a result for it is unavailable in the database, and the system will return an empty answer set, or even worse, erroneous mismatch results. In particular, we introduce a novel approach to rewrite queries that are in disjunctive normal form and contain a mixture of discrete and continuous attributes. The approach preprocesses data collections to discover the implicit relationships that exist among the various domain attributes, and then uses this knowledge to rewrite the constraints from the failing query. In a first step, the approach tries to learn a set of functional dependencies from the data, which are ranked according to special mechanisms that will successively allow to predict the order in which the extracted dependencies have to be used to properly rewrite the failing query. An experimental evaluation of the approach on three real data sets shows its effectiveness in terms of robustness and coverage.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.