TY - JOUR
T1 - Least squares policy iteration with instrumental variables vs. direct policy search
T2 - comparison against optimal benchmarks using energy storage
AU - Moazeni, Somayeh
AU - Scott, Warren R.
AU - Powell, Warren B.
N1 - Publisher Copyright:
© 2019 Canadian Operational Research Society (CORS).
PY - 2020
Y1 - 2020
N2 - This article studies least-squares approximate policy iteration (API) methods with parametrized value-function approximation. We study several variations of the policy evaluation phase, namely, Bellman error minimization, Bellman error minimization with instrumental variables, projected Bellman error minimization, and projected Bellman error minimization with instrumental variables. For a general discrete-time stochastic control problem, Bellman error minimization policy evaluation using instrumental variables is equivalent to both variants of the projected Bellman error minimization. An alternative to these API methods is direct policy search based on knowledge gradient. The practical performance of these three approximate dynamic programming methods, (i) least squares API with Bellman error minimization, (ii) least squares API with Bellman error minimization with instrumental variables, and (iii) direct policy search, are investigated in the context of an application in energy storage operations management. We create a library of test problems using real-world data and apply value iteration to find their optimal policies. These optimal benchmarks are then used to compare the developed approximate dynamic programming policies. Our analysis indicates that least-squares API with instrumental variables Bellman error minimization prominently outperforms least-squares API with Bellman error minimization. However, these approaches underperform our direct policy search implementation.
AB - This article studies least-squares approximate policy iteration (API) methods with parametrized value-function approximation. We study several variations of the policy evaluation phase, namely, Bellman error minimization, Bellman error minimization with instrumental variables, projected Bellman error minimization, and projected Bellman error minimization with instrumental variables. For a general discrete-time stochastic control problem, Bellman error minimization policy evaluation using instrumental variables is equivalent to both variants of the projected Bellman error minimization. An alternative to these API methods is direct policy search based on knowledge gradient. The practical performance of these three approximate dynamic programming methods, (i) least squares API with Bellman error minimization, (ii) least squares API with Bellman error minimization with instrumental variables, and (iii) direct policy search, are investigated in the context of an application in energy storage operations management. We create a library of test problems using real-world data and apply value iteration to find their optimal policies. These optimal benchmarks are then used to compare the developed approximate dynamic programming policies. Our analysis indicates that least-squares API with instrumental variables Bellman error minimization prominently outperforms least-squares API with Bellman error minimization. However, these approaches underperform our direct policy search implementation.
KW - Bellman error minimization
KW - Dynamic programming
KW - approximate dynamic programming
KW - approximate policy iteration
KW - direct policy search
KW - energy storage
UR - http://www.scopus.com/inward/record.url?scp=85103438812&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85103438812&partnerID=8YFLogxK
U2 - 10.1080/03155986.2019.1624491
DO - 10.1080/03155986.2019.1624491
M3 - Article
AN - SCOPUS:85103438812
SN - 0315-5986
VL - 58
SP - 141
EP - 166
JO - INFOR
JF - INFOR
IS - 1
ER -