Dynamic programming (DP) and Markov Decision Process (MDP) offer powerful tools for formulating, modeling, and solving decision making problems under uncertainty. In real-world applications, the applicability of DP is limited by severe scalability issues. These issues can be addressed by Approximate Dynamic Programming (ADP) techniques. ADP methods are based on the assumption of having either a proper estimation of the underlying state transition probability distributions or a simulation mechanism with the capability of generating samples according to such probability distributions. In this paper, we present a data-driven ADP-based approach, which can offer an alternative in case such assumption cannot be guaranteed. In particular, when varying the set-up of the MDP state transition probability matrix, different policies can be calculated through exact DP or ADP methods. Such policies are then processed by an Apriori-based algorithm to find frequent association rules within them. A pruning procedure is used to select the most suitable association rules, and finally an Association Classifier infers the optimal policy in all the possible circumstances. We show a detailed application of the proposed approach for the calculation of a proper mission operations plan for spacecrafts with a high level of on-board autonomy. (C) 2019 Elsevier Inc. All rights reserved.
A data-driven approximate dynamic programming approach based on association rule learning: Spacecraft autonomy as a case study
D'Angelo G.;Palmieri F.;Glielmo L.
2019-01-01
Abstract
Dynamic programming (DP) and Markov Decision Process (MDP) offer powerful tools for formulating, modeling, and solving decision making problems under uncertainty. In real-world applications, the applicability of DP is limited by severe scalability issues. These issues can be addressed by Approximate Dynamic Programming (ADP) techniques. ADP methods are based on the assumption of having either a proper estimation of the underlying state transition probability distributions or a simulation mechanism with the capability of generating samples according to such probability distributions. In this paper, we present a data-driven ADP-based approach, which can offer an alternative in case such assumption cannot be guaranteed. In particular, when varying the set-up of the MDP state transition probability matrix, different policies can be calculated through exact DP or ADP methods. Such policies are then processed by an Apriori-based algorithm to find frequent association rules within them. A pruning procedure is used to select the most suitable association rules, and finally an Association Classifier infers the optimal policy in all the possible circumstances. We show a detailed application of the proposed approach for the calculation of a proper mission operations plan for spacecrafts with a high level of on-board autonomy. (C) 2019 Elsevier Inc. All rights reserved.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.