Several studies have raised concerns about the performance of estimation techniques if employed with default parameters provided by specific development toolkits, e.g., Weka. In this paper, we evaluate the impact of parameter optimization with nine different estimation techniques in the Software Development Effort Estimation (SDEE) and Software Fault Prediction (SFP) domains to provide more generic findings of the impact of parameter optimization. To this aim, we employ three datasets from the domain of SDEE (China, Maxwell, Nasa) and three different regression-based datasets from the SFP domain (Ant, Xalan, Xerces). Regarding parameter optimization, we consider four optimization algorithms from different families: Grid Search and Random Search, Simulated Annealing, and Bayesian Optimization. The estimation techniques are: Support Vector Machine, Random Forest, Classification and Regression Tree, Neural Networks, Averaged Neural Networks, k-Nearest Neighbor, Partial Least Square, MultiLayer Perceptron, and Gradient Boosting Machine. Results reveal that, with both SDEE and SFP datasets, seven out of nine estimation techniques require optimization/configuration of at least one parameter. In majority of the cases, the parameters of the employed estimation techniques are sensitive to the optimization of specific types of data. Moreover, not all the parameters need to be optimized as some of them are not sensitive to optimization.

The Impact of Parameters Optimization in Software Prediction Models

Gravino C.
2022-01-01

Abstract

Several studies have raised concerns about the performance of estimation techniques if employed with default parameters provided by specific development toolkits, e.g., Weka. In this paper, we evaluate the impact of parameter optimization with nine different estimation techniques in the Software Development Effort Estimation (SDEE) and Software Fault Prediction (SFP) domains to provide more generic findings of the impact of parameter optimization. To this aim, we employ three datasets from the domain of SDEE (China, Maxwell, Nasa) and three different regression-based datasets from the SFP domain (Ant, Xalan, Xerces). Regarding parameter optimization, we consider four optimization algorithms from different families: Grid Search and Random Search, Simulated Annealing, and Bayesian Optimization. The estimation techniques are: Support Vector Machine, Random Forest, Classification and Regression Tree, Neural Networks, Averaged Neural Networks, k-Nearest Neighbor, Partial Least Square, MultiLayer Perceptron, and Gradient Boosting Machine. Results reveal that, with both SDEE and SFP datasets, seven out of nine estimation techniques require optimization/configuration of at least one parameter. In majority of the cases, the parameters of the employed estimation techniques are sensitive to the optimization of specific types of data. Moreover, not all the parameters need to be optimized as some of them are not sensitive to optimization.
2022
978-1-6654-6152-8
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4826532
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact