TY - JOUR
T1 - Parallel nonstationary direct policy search for risk-averse stochastic optimization
AU - Moazeni, Somayeh
AU - Powell, Warren B.
AU - Defourny, Boris
AU - Bouzaiene-Ayari, Belgacem
N1 - Publisher Copyright:
© 2017 INFORMS.
PY - 2017/3/1
Y1 - 2017/3/1
N2 - This paper presents an algorithmic strategy to nonstationary policy search for finite-horizon, discrete-time Markovian decision problems with large state spaces, constrained action sets, and a risk-sensitive optimality criterion. The methodology relies on modeling time-variant policy parameters by a nonparametric response surface model for an indirect parametrized policy motivated by Bellman's equation. The policy structure is heuristic when the optimization of the risk-sensitive criterion does not admit a dynamic programming reformulation. Through the interpolating approximation, the level of nonstationarity of the policy, and consequently, the size of the resulting search problem can be adjusted. The computational tractability and the generality of the approach follow from a nested parallel implementation of derivative-free optimization in conjunction with Monte Carlo simulation. We demonstrate the efficiency of the approach on an optimal energy storage charging problem, and illustrate the effect of the risk functional on the improvement achieved by allowing a higher complexity in time variation for the policy.
AB - This paper presents an algorithmic strategy to nonstationary policy search for finite-horizon, discrete-time Markovian decision problems with large state spaces, constrained action sets, and a risk-sensitive optimality criterion. The methodology relies on modeling time-variant policy parameters by a nonparametric response surface model for an indirect parametrized policy motivated by Bellman's equation. The policy structure is heuristic when the optimization of the risk-sensitive criterion does not admit a dynamic programming reformulation. Through the interpolating approximation, the level of nonstationarity of the policy, and consequently, the size of the resulting search problem can be adjusted. The computational tractability and the generality of the approach follow from a nested parallel implementation of derivative-free optimization in conjunction with Monte Carlo simulation. We demonstrate the efficiency of the approach on an optimal energy storage charging problem, and illustrate the effect of the risk functional on the improvement achieved by allowing a higher complexity in time variation for the policy.
KW - Derivative-free optimization
KW - Direct policy search
KW - Dynamic optimization
KW - Energy storage
KW - Learning
KW - Parallel optimization
KW - Risk-averse stochastic optimization
UR - http://www.scopus.com/inward/record.url?scp=85019137194&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85019137194&partnerID=8YFLogxK
U2 - 10.1287/ijoc.2016.0733
DO - 10.1287/ijoc.2016.0733
M3 - Article
AN - SCOPUS:85019137194
SN - 1091-9856
VL - 29
SP - 332
EP - 349
JO - INFORMS Journal on Computing
JF - INFORMS Journal on Computing
IS - 2
ER -