In recent times, thanks to the availability of a large quantity of data coming from the industrial process, several techniques based on a data-driven approach could be developed. Between all the data-driven techniques, as Principle Component Regression, Support Vector Machines, Artificial Neural Networks, Neuro-Fuzzy Systems, and many others, the data on which they rely should be analyzed to find correlations and dependencies that could improve their design. For this reason, the Input variable Selection (IVS) process has become of great interest in the recent period. The classical IVS relies on classical statistics, as Pearson coefficients, able to discover linear dependencies among data; today, due to the significant amount of data available, the challenge of also discovering non-linear dependencies appears to be a necessary skill, mainly for the design and development of a neural network. This paper proposes the use of a novel statistical tool named Maximal Information Coefficient (MIC) for developing an IVS procedure able to discover dependencies in a considerable dataset and guide the IVS designer to the selection of input variables in a data-driven application. As a case study, the procedure will be applied to a real application developed in the context of the Swedish forest industry, in order to choose the input variables of a neural network able to estimate the timber bundles volume, which represents an expensive parameter to measure in this context.

A novel IVS procedure for handling big data with artificial neural networks

Carratù M.;Liguori C.;Pietrosanto A.;
2020-01-01

Abstract

In recent times, thanks to the availability of a large quantity of data coming from the industrial process, several techniques based on a data-driven approach could be developed. Between all the data-driven techniques, as Principle Component Regression, Support Vector Machines, Artificial Neural Networks, Neuro-Fuzzy Systems, and many others, the data on which they rely should be analyzed to find correlations and dependencies that could improve their design. For this reason, the Input variable Selection (IVS) process has become of great interest in the recent period. The classical IVS relies on classical statistics, as Pearson coefficients, able to discover linear dependencies among data; today, due to the significant amount of data available, the challenge of also discovering non-linear dependencies appears to be a necessary skill, mainly for the design and development of a neural network. This paper proposes the use of a novel statistical tool named Maximal Information Coefficient (MIC) for developing an IVS procedure able to discover dependencies in a considerable dataset and guide the IVS designer to the selection of input variables in a data-driven application. As a case study, the procedure will be applied to a real application developed in the context of the Swedish forest industry, in order to choose the input variables of a neural network able to estimate the timber bundles volume, which represents an expensive parameter to measure in this context.
2020
978-1-7281-4460-3
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4752381
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
social impact