The probabilistic ensembles with trajectory sampling (PETS) algorithm is a recognized baseline among model-based reinforcement learning (MBRL) methods. PETS incorporates planning and handles uncertainty using ensemble-based probabilistic models. However, no formal robustness guarantees against epistemic uncertainty exist for PETS. Providing such guarantees is a key enabler for reliable real-world deployment. To address this gap, we propose a distributionally robust extension of PETS, called DR-PETS. We formalize model uncertainty using a distributional ambiguity set and optimize the worst-case expected return. We derive a tractable convex approximation of the resulting min-max planning problem, which integrates seamlessly into PETSs planning loop as a regularized objective. Experiments on pendulum and cart-pole environments show that DR-PETS certifies robustness against adversarial parameter perturbations, achieving consistent performance in worst-case scenarios where PETS deteriorates.

DR-PETS: Learning-Based Control With Planning in Adversarial Environments

Jesawada, Hozefa;Russo, Giovanni;
In corso di stampa

Abstract

The probabilistic ensembles with trajectory sampling (PETS) algorithm is a recognized baseline among model-based reinforcement learning (MBRL) methods. PETS incorporates planning and handles uncertainty using ensemble-based probabilistic models. However, no formal robustness guarantees against epistemic uncertainty exist for PETS. Providing such guarantees is a key enabler for reliable real-world deployment. To address this gap, we propose a distributionally robust extension of PETS, called DR-PETS. We formalize model uncertainty using a distributional ambiguity set and optimize the worst-case expected return. We derive a tractable convex approximation of the resulting min-max planning problem, which integrates seamlessly into PETSs planning loop as a regularized objective. Experiments on pendulum and cart-pole environments show that DR-PETS certifies robustness against adversarial parameter perturbations, achieving consistent performance in worst-case scenarios where PETS deteriorates.
In corso di stampa
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11386/4913675
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact