A missing value represents a piece of incomplete information that might appear in database instances. Data imputation is the problem of filling missing values by means of consistent data with respect to the semantic of the entire database instance they belong to. To overcome the complexity of considering all possible candidates for each missing value, heuristic methods have become popular to enhance execution times, while keeping high accuracy. This paper presents RENUVER, a new data imputation algorithm relying on relaxed functional dependencies RFDs for identifying value candidates best guaranteeing the integrity of data. More specifically, the RENUVER imputation process focuses on the fds involving the attribute whose value is missing. In particular, they are used to guide the selection of best candidate tuples from which to take values for imputing a missing value, and to evaluate the semantic consistency of the imputed missing values. Experimental results on real-world datasets highlighted the effectiveness of RENUVER in terms of both filling accuracy and execution times, also compared to other well-known missing value imputation approaches.
RENUVER: A Missing Value Imputation Algorithm based on Relaxed Functional Dependencies
Bernardo Breve
;Loredana Caruccio
;Vincenzo Deufemia;Giuseppe Polese
2022-01-01
Abstract
A missing value represents a piece of incomplete information that might appear in database instances. Data imputation is the problem of filling missing values by means of consistent data with respect to the semantic of the entire database instance they belong to. To overcome the complexity of considering all possible candidates for each missing value, heuristic methods have become popular to enhance execution times, while keeping high accuracy. This paper presents RENUVER, a new data imputation algorithm relying on relaxed functional dependencies RFDs for identifying value candidates best guaranteeing the integrity of data. More specifically, the RENUVER imputation process focuses on the fds involving the attribute whose value is missing. In particular, they are used to guide the selection of best candidate tuples from which to take values for imputing a missing value, and to evaluate the semantic consistency of the imputed missing values. Experimental results on real-world datasets highlighted the effectiveness of RENUVER in terms of both filling accuracy and execution times, also compared to other well-known missing value imputation approaches.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.