TY - JOUR
T1 - Stochastically Dominant Distributional Reinforcement Learning
AU - Martin, John D.
AU - Lyskawinski, Michal
AU - Li, Xiaohu
AU - Englot, Brendan
N1 - Publisher Copyright:
© 2020 by the author(s).
PY - 2020
Y1 - 2020
N2 - We describe a new approach for managing aleatoric uncertainty in the Reinforcement Learning (RL) paradigm. Instead of selecting actions according to a single statistic, we propose a distributional method based on the second-order stochastic dominance (SSD) relation. This compares the inherent dispersion of random returns induced by actions, producing a comprehensive evaluation of the environment’s uncertainty. The necessary conditions for SSD require estimators to predict accurate second moments. To accommodate this, we map the distributional RL problem to a Wasserstein gradient flow, treating the distributional Bellman residual as a potential energy functional. We propose a particle-based algorithm for which we prove optimality and convergence. Our experiments characterize the algorithm’s performance and demonstrate how uncertainty and performance are better balanced using an SSD policy than with other risk measures.
AB - We describe a new approach for managing aleatoric uncertainty in the Reinforcement Learning (RL) paradigm. Instead of selecting actions according to a single statistic, we propose a distributional method based on the second-order stochastic dominance (SSD) relation. This compares the inherent dispersion of random returns induced by actions, producing a comprehensive evaluation of the environment’s uncertainty. The necessary conditions for SSD require estimators to predict accurate second moments. To accommodate this, we map the distributional RL problem to a Wasserstein gradient flow, treating the distributional Bellman residual as a potential energy functional. We propose a particle-based algorithm for which we prove optimality and convergence. Our experiments characterize the algorithm’s performance and demonstrate how uncertainty and performance are better balanced using an SSD policy than with other risk measures.
UR - https://www.scopus.com/pages/publications/105022411786
UR - https://www.scopus.com/pages/publications/105022411786#tab=citedBy
M3 - Conference article
AN - SCOPUS:105022411786
VL - 119
SP - 6745
EP - 6754
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
T2 - 37th International Conference on Machine Learning, ICML 2020
Y2 - 13 July 2020 through 18 July 2020
ER -