Regression testing is the activity performed by developers to check whether new modifications have not introduced bugs. A crucial requirement to make regression testing effective is that test cases are deterministic. Unfortunately, this is not always the case as some tests might suffer from so-called flakiness, i.e., tests that exhibit both a passing and a failing outcome with the same code. Flaky tests are widely recognized as a serious issue, since they hide real bugs and increase software inspection costs. While previous research has focused on understanding the root causes of test flakiness and devising techniques that automatically fix them, in this paper we explore an orthogonal perspective: the relation between flaky tests and test smells, i.e., suboptimal development choices applied when developing tests. Relying on (1) an analysis of the state-of-the-art and (2) interviews with industrial developers, we first identify five flakiness-inducing test smell types, namely Resource Optimism, Indirect Testing, Test Run War, Fire and Forget, and Conditional Test Logic, and automate their detection. Then, we perform a large-scale empirical study on 19,532 JUnit test methods of 18 software systems, discovering that the five considered test smells causally co-occur with flaky tests in 75% of the cases. Furthermore, we evaluate the effect of refactoring, showing that it is not only able to remove design flaws, but also fixes all 75% flaky tests causally co-occurring with test smells.

THE SMELL OF FEAR: ON THE RELATION BETWEEN TEST SMELLS AND FLAKY TESTS

Fabio Palomba
;
2019-01-01

Abstract

Regression testing is the activity performed by developers to check whether new modifications have not introduced bugs. A crucial requirement to make regression testing effective is that test cases are deterministic. Unfortunately, this is not always the case as some tests might suffer from so-called flakiness, i.e., tests that exhibit both a passing and a failing outcome with the same code. Flaky tests are widely recognized as a serious issue, since they hide real bugs and increase software inspection costs. While previous research has focused on understanding the root causes of test flakiness and devising techniques that automatically fix them, in this paper we explore an orthogonal perspective: the relation between flaky tests and test smells, i.e., suboptimal development choices applied when developing tests. Relying on (1) an analysis of the state-of-the-art and (2) interviews with industrial developers, we first identify five flakiness-inducing test smell types, namely Resource Optimism, Indirect Testing, Test Run War, Fire and Forget, and Conditional Test Logic, and automate their detection. Then, we perform a large-scale empirical study on 19,532 JUnit test methods of 18 software systems, discovering that the five considered test smells causally co-occur with flaky tests in 75% of the cases. Furthermore, we evaluate the effect of refactoring, showing that it is not only able to remove design flaws, but also fixes all 75% flaky tests causally co-occurring with test smells.
2019
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4727927
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 33
  • ???jsp.display-item.citation.isi??? 17
social impact