TY - GEN
T1 - On the Expressivity of Markov Reward (Extended Abstract)
AU - Abel, David
AU - Dabney, Will
AU - Harutyunyan, Anna
AU - Ho, Mark K.
AU - Littman, Michael L.
AU - Precup, Doina
AU - Singh, Satinder
N1 - Publisher Copyright:
© 2022 International Joint Conferences on Artificial Intelligence. All rights reserved.
PY - 2022
Y1 - 2022
N2 - Reward is the driving force for reinforcement-learning agents. We here set out to understand the expressivity of Markov reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of “task”: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to perform each task type, and correctly determine when no such reward function exists.
AB - Reward is the driving force for reinforcement-learning agents. We here set out to understand the expressivity of Markov reward as a way to capture tasks that we would want an agent to perform. We frame this study around three new abstract notions of “task”: (1) a set of acceptable behaviors, (2) a partial ordering over behaviors, or (3) a partial ordering over trajectories. Our main results prove that while reward can express many of these tasks, there exist instances of each task type that no Markov reward function can capture. We then provide a set of polynomial-time algorithms that construct a Markov reward function that allows an agent to perform each task type, and correctly determine when no such reward function exists.
UR - http://www.scopus.com/inward/record.url?scp=85137883537&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85137883537&partnerID=8YFLogxK
U2 - 10.24963/ijcai.2022/730
DO - 10.24963/ijcai.2022/730
M3 - Conference contribution
AN - SCOPUS:85137883537
T3 - IJCAI International Joint Conference on Artificial Intelligence
SP - 5254
EP - 5258
BT - Proceedings of the 31st International Joint Conference on Artificial Intelligence, IJCAI 2022
A2 - De Raedt, Luc
A2 - De Raedt, Luc
T2 - 31st International Joint Conference on Artificial Intelligence, IJCAI 2022
Y2 - 23 July 2022 through 29 July 2022
ER -