Data warehouse loading and refreshment is typically performed by means of complex software processes called extraction–transformation–loading (ETL). In this paper, we propose a system based on a suite of visual languages for mastering several aspects of the ETL development process, turning it into a visual programming task. The approach can be easily generalized and applied to other data integration contexts beyond data warehouses. It introduces two new visual languages that are used to specify the ETL process, which can also be represented by means of UML activity diagrams. In particular, the first visual language supports data manipulation activities, whereas the second one provides traceability information of attributes to highlight the impact of potential transformations on integrated schemas depending on them. Once the whole ETL process has been visually specified, the designer might invoke the automatic generation of an activity diagram representing a possible orchestration of it based on its dependencies. The designer can edit such a diagram to modify the proposed orchestration provided that changes do not alter data dependencies. The final specification can be translated into code that is executable on the data sources. Finally, the effec- tiveness of the proposed approach has been validated through a user study in which we have compared the effort needed to design an ETL process in our approach with respect to the one required with main visual approaches described in the literature.

A visual language-based system for extraction–transformation–loading development

DEUFEMIA, Vincenzo;GIORDANO, MASSIMILIANO;POLESE, Giuseppe;TORTORA, Genoveffa
2014

Abstract

Data warehouse loading and refreshment is typically performed by means of complex software processes called extraction–transformation–loading (ETL). In this paper, we propose a system based on a suite of visual languages for mastering several aspects of the ETL development process, turning it into a visual programming task. The approach can be easily generalized and applied to other data integration contexts beyond data warehouses. It introduces two new visual languages that are used to specify the ETL process, which can also be represented by means of UML activity diagrams. In particular, the first visual language supports data manipulation activities, whereas the second one provides traceability information of attributes to highlight the impact of potential transformations on integrated schemas depending on them. Once the whole ETL process has been visually specified, the designer might invoke the automatic generation of an activity diagram representing a possible orchestration of it based on its dependencies. The designer can edit such a diagram to modify the proposed orchestration provided that changes do not alter data dependencies. The final specification can be translated into code that is executable on the data sources. Finally, the effec- tiveness of the proposed approach has been validated through a user study in which we have compared the effort needed to design an ETL process in our approach with respect to the one required with main visual approaches described in the literature.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11386/4195304
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? 8
social impact