UniSa - IRIS Institutional Research Information System

A missing value represents a piece of incomplete information that might appear in database instances. Data imputation is the problem of filling missing values by means of consistent data with respect to the semantic of the entire database instance they belong to. To overcome the complexity of considering all possible candidates for each missing value, heuristic methods have become popular to enhance execution times, while keeping high accuracy. This paper presents RENUVER, a new data imputation algorithm relying on relaxed functional dependencies RFDs for identifying value candidates best guaranteeing the integrity of data. More specifically, the RENUVER imputation process focuses on the fds involving the attribute whose value is missing. In particular, they are used to guide the selection of best candidate tuples from which to take values for imputing a missing value, and to evaluate the semantic consistency of the imputed missing values. Experimental results on real-world datasets highlighted the effectiveness of RENUVER in terms of both filling accuracy and execution times, also compared to other well-known missing value imputation approaches.

RENUVER: A Missing Value Imputation Algorithm based on Relaxed Functional Dependencies

Bernardo Breve;Loredana Caruccio;Vincenzo Deufemia;Giuseppe Polese

2022

Abstract

A missing value represents a piece of incomplete information that might appear in database instances. Data imputation is the problem of filling missing values by means of consistent data with respect to the semantic of the entire database instance they belong to. To overcome the complexity of considering all possible candidates for each missing value, heuristic methods have become popular to enhance execution times, while keeping high accuracy. This paper presents RENUVER, a new data imputation algorithm relying on relaxed functional dependencies RFDs for identifying value candidates best guaranteeing the integrity of data. More specifically, the RENUVER imputation process focuses on the fds involving the attribute whose value is missing. In particular, they are used to guide the selection of best candidate tuples from which to take values for imputing a missing value, and to evaluate the semantic consistency of the imputed missing values. Experimental results on real-world datasets highlighted the effectiveness of RENUVER in terms of both filling accuracy and execution times, also compared to other well-known missing value imputation approaches.

Scheda breve

Scheda completa

Scheda completa (DC)

Anno

2022

Appare nelle tipologie:

4.1 Contributi in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4781180

Citazioni

ND

29

ND

social impact