One of the most difficult issues today, is the integration of data from various sources. Thus, it arises the need of automatic Data Integration (DI) methods. However, in the literature there are fully automatic or semi-automatic DI techniques, but they require the involvement of IT-experts with specific domain skills. In this paper we present a novel DI methodology for which it is not required the involvement of IT-experts; in this methodology syntactically/semantically similar entities present in the sources are merged, by exploiting an information retrieval technique, a clustering method and a trained neural net-work. Although the suggested process is completely automated, we planned some interactions with the Company Manager, a figure who is not required to have IT-skills, but whose only contribution will be to define limits and tolerance thresholds during the DI process, based on the interests of the company. The validity of the proposed approach showed an integration accuracy between 99% - 100% .(c) 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

A semi-automatic data integration process of heterogeneous databases

Barbella, Marcello
;
Tortora, Genoveffa
2023-01-01

Abstract

One of the most difficult issues today, is the integration of data from various sources. Thus, it arises the need of automatic Data Integration (DI) methods. However, in the literature there are fully automatic or semi-automatic DI techniques, but they require the involvement of IT-experts with specific domain skills. In this paper we present a novel DI methodology for which it is not required the involvement of IT-experts; in this methodology syntactically/semantically similar entities present in the sources are merged, by exploiting an information retrieval technique, a clustering method and a trained neural net-work. Although the suggested process is completely automated, we planned some interactions with the Company Manager, a figure who is not required to have IT-skills, but whose only contribution will be to define limits and tolerance thresholds during the DI process, based on the interests of the company. The validity of the proposed approach showed an integration accuracy between 99% - 100% .(c) 2023 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4857652
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 8
  • ???jsp.display-item.citation.isi??? 3
social impact