The probabilistic ensembles with trajectory sampling (PETS) algorithm is a recognized baseline among model-based reinforcement learning (MBRL) methods. PETS incorporates planning and handles uncertainty using ensemble-based probabilistic models. However, no formal robustness guarantees against epistemic uncertainty exist for PETS. Providing such guarantees is a key enabler for reliable real-world deployment. To address this gap, we propose a distributionally robust extension of PETS, called DR-PETS. We formalize model uncertainty using a distributional ambiguity set and optimize the worst-case expected return. We derive a tractable convex approximation of the resulting min-max planning problem, which integrates seamlessly into PETSs planning loop as a regularized objective. Experiments on pendulum and cart-pole environments show that DR-PETS certifies robustness against adversarial parameter perturbations, achieving consistent performance in worst-case scenarios where PETS deteriorates.
DR-PETS: Learning-Based Control With Planning in Adversarial Environments
Jesawada, Hozefa;Russo, Giovanni;
In corso di stampa
Abstract
The probabilistic ensembles with trajectory sampling (PETS) algorithm is a recognized baseline among model-based reinforcement learning (MBRL) methods. PETS incorporates planning and handles uncertainty using ensemble-based probabilistic models. However, no formal robustness guarantees against epistemic uncertainty exist for PETS. Providing such guarantees is a key enabler for reliable real-world deployment. To address this gap, we propose a distributionally robust extension of PETS, called DR-PETS. We formalize model uncertainty using a distributional ambiguity set and optimize the worst-case expected return. We derive a tractable convex approximation of the resulting min-max planning problem, which integrates seamlessly into PETSs planning loop as a regularized objective. Experiments on pendulum and cart-pole environments show that DR-PETS certifies robustness against adversarial parameter perturbations, achieving consistent performance in worst-case scenarios where PETS deteriorates.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.