Predicting the classes more likely to change in the future helps developers to focus on the more critical parts of a software system, with the aim of preventively improving its maintainability. The research community has devoted a lot of effort in the definition of change prediction models, i.e., models exploiting a machine learning classifier to relate a set of independent variables to the change-proneness of classes. Besides the good performances of such models, key results of previous studies highlight how classifiers tend to perform similarly even though they are able to correctly predict the change-proneness of different code elements, possibly indicating the presence of some complementarity among them. In this paper, we aim at analyzing the extent to which ensemble methodologies, i.e., machine learning techniques able to combine multiple classifiers, can improve the performances of change-prediction models. Specifically, we empirically compared the performances of three ensemble techniques (i.e., Boosting, Random Forest, and Bagging) with those of standard machine learning classifiers (i.e., Logistic Regression and Naive Bayes). The study was conducted on eight open source systems and the results showed how ensemble techniques, in some cases, perform better than standard machine learning approaches, even if the differences among them is small. This requires the need of further research aimed at devising effective methodologies to ensemble different classifiers.

Ensemble techniques for software change prediction: A preliminary investigation

CATOLINO, GEMMA;Ferrucci, Filomena
2018

Abstract

Predicting the classes more likely to change in the future helps developers to focus on the more critical parts of a software system, with the aim of preventively improving its maintainability. The research community has devoted a lot of effort in the definition of change prediction models, i.e., models exploiting a machine learning classifier to relate a set of independent variables to the change-proneness of classes. Besides the good performances of such models, key results of previous studies highlight how classifiers tend to perform similarly even though they are able to correctly predict the change-proneness of different code elements, possibly indicating the presence of some complementarity among them. In this paper, we aim at analyzing the extent to which ensemble methodologies, i.e., machine learning techniques able to combine multiple classifiers, can improve the performances of change-prediction models. Specifically, we empirically compared the performances of three ensemble techniques (i.e., Boosting, Random Forest, and Bagging) with those of standard machine learning classifiers (i.e., Logistic Regression and Naive Bayes). The study was conducted on eight open source systems and the results showed how ensemble techniques, in some cases, perform better than standard machine learning approaches, even if the differences among them is small. This requires the need of further research aimed at devising effective methodologies to ensemble different classifiers.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11386/4716221
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 12
  • ???jsp.display-item.citation.isi??? 9
social impact