New scenarios in data analysis call for new methods capable of fully exploiting the potential of the available data. In the specific context of massive data, this means tackling two principal issues: the computational cost of applying a method, since convergence must be fast, and a level of accuracy that ensures precise estimation. We explore the applicability of the partial least squares structural equation modelling (PLS-SEM) algorithm to a massive data context. Considering the classical bootstrap procedure used to validate model coefficients, we show that bootstrapping becomes very expensive computationally once a sample size becomes massive. We consequently adapted the subsampled double bootstrap (SDB) algorithm to the PLS-SEM context to reduce the computational cost without sacrificing accuracy in confidence interval estimates, using a straightforward procedure that is easy to implement. The accuracy and the speed of convergence of the SDB PLS-SEM are demonstrated in simulation studies, and the new method is successfully tested with a model of European internet use.

Bootstrapping partial least squares structural equation modelling with massive data

Lamberti, Giuseppe
;
La Rocca, Michele
2025

Abstract

New scenarios in data analysis call for new methods capable of fully exploiting the potential of the available data. In the specific context of massive data, this means tackling two principal issues: the computational cost of applying a method, since convergence must be fast, and a level of accuracy that ensures precise estimation. We explore the applicability of the partial least squares structural equation modelling (PLS-SEM) algorithm to a massive data context. Considering the classical bootstrap procedure used to validate model coefficients, we show that bootstrapping becomes very expensive computationally once a sample size becomes massive. We consequently adapted the subsampled double bootstrap (SDB) algorithm to the PLS-SEM context to reduce the computational cost without sacrificing accuracy in confidence interval estimates, using a straightforward procedure that is easy to implement. The accuracy and the speed of convergence of the SDB PLS-SEM are demonstrated in simulation studies, and the new method is successfully tested with a model of European internet use.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4927636
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact