Among distributed applications, the actor model is increasingly prevalent. This programming model organises applications into fully-isolated processes that communicate through asynchronous messaging. Supported by frameworks such as Akka and Orleans, it is believed to facilitate realising responsive, elastic and resilient distributed applications. While these frameworks do provide abstractions for implementing resilience, it remains up to developers to use them correctly and to test that their implementation recovers from anticipated failures. As manually exploring the reaction to every possible failure scenario is infeasible, there is a need for automated means of testing the resilience of a distributed application. We present the first automated approach to testing the resilience of actor programs. Our approach perturbs the execution of existing test cases and leverages delta debugging to explore all failure scenarios more efficiently. Moreover, we present a further optimisation that uses causality to prune away redundant perturbations and speed up the exploration. However, its effectiveness is sensitive to the program's organisation and the actual location of the fault. Our experimental evaluation shows that our approach can speed up resilience testing by four times compared to random exploration.

A delta-debugging approach to assessing the resilience of actor programs through run-time test perturbations

Di Nucci D.;
2020-01-01

Abstract

Among distributed applications, the actor model is increasingly prevalent. This programming model organises applications into fully-isolated processes that communicate through asynchronous messaging. Supported by frameworks such as Akka and Orleans, it is believed to facilitate realising responsive, elastic and resilient distributed applications. While these frameworks do provide abstractions for implementing resilience, it remains up to developers to use them correctly and to test that their implementation recovers from anticipated failures. As manually exploring the reaction to every possible failure scenario is infeasible, there is a need for automated means of testing the resilience of a distributed application. We present the first automated approach to testing the resilience of actor programs. Our approach perturbs the execution of existing test cases and leverages delta debugging to explore all failure scenarios more efficiently. Moreover, we present a further optimisation that uses causality to prune away redundant perturbations and speed up the exploration. However, its effectiveness is sensitive to the program's organisation and the actual location of the fault. Our experimental evaluation shows that our approach can speed up resilience testing by four times compared to random exploration.
2020
9781450379571
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4774984
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact