Intermittent test failures (test flakiness) is common during continuous integration as modern software systems have become inherently non-deterministic. Understanding the root cause of test flakiness is crucial as intermittent test failures might be the result of real non-deterministic defects in the production code, rather than mere errors in the test code. Given a flaky test, existing techniques for root causing test flakiness compare the runtime behavior of its passing and failing executions. They achieve this by repetitively executing the flaky test on an instrumented version of the system under test. This approach has two fundamental limitations: (i) code instrumentation might prevent the manifestation of test flakiness; (ii) when test flakiness is rare passively re-executing a test many times might be inadequate to trigger intermittent test outcomes. To address these limitations, we propose a new idea for root causing test flakiness that actively explores the non-deterministic space without instrumenting code. Our novel idea is to repetitively execute a flaky test, under different execution clusters. Each cluster explores a certain non-deterministic dimension (e.g., concurrency, I/O, and networking) with dedicated software containers and fuzzydriven resource load generators. The execution cluster that manifests the most balanced (or unbalanced) sets of passing and failing executions is likely to explain the broad type of test flakiness.
A container-based infrastructure for fuzzy-driven root causing of flaky tests
Ferrucci F.
2020-01-01
Abstract
Intermittent test failures (test flakiness) is common during continuous integration as modern software systems have become inherently non-deterministic. Understanding the root cause of test flakiness is crucial as intermittent test failures might be the result of real non-deterministic defects in the production code, rather than mere errors in the test code. Given a flaky test, existing techniques for root causing test flakiness compare the runtime behavior of its passing and failing executions. They achieve this by repetitively executing the flaky test on an instrumented version of the system under test. This approach has two fundamental limitations: (i) code instrumentation might prevent the manifestation of test flakiness; (ii) when test flakiness is rare passively re-executing a test many times might be inadequate to trigger intermittent test outcomes. To address these limitations, we propose a new idea for root causing test flakiness that actively explores the non-deterministic space without instrumenting code. Our novel idea is to repetitively execute a flaky test, under different execution clusters. Each cluster explores a certain non-deterministic dimension (e.g., concurrency, I/O, and networking) with dedicated software containers and fuzzydriven resource load generators. The execution cluster that manifests the most balanced (or unbalanced) sets of passing and failing executions is likely to explain the broad type of test flakiness.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.