TY - JOUR
T1 - Efficient approximate dynamic programming based on design and analysis of computer experiments for infinite-horizon optimization
AU - Chen, Ying
AU - Liu, Feng
AU - Rosenberger, Jay M.
AU - Chen, Victoria C.P.
AU - Kulvanitchaiyanunt, Asama
AU - Zhou, Yuan
N1 - Publisher Copyright:
© 2020 Elsevier Ltd
PY - 2020/12
Y1 - 2020/12
N2 - The approximate dynamic programming (ADP) method based on the design and analysis of computer experiments (DACE) approach has been demonstrated as an effective method to solve multistage decision-making problems in the literature. However, this method is still not efficient for infinite-horizon optimization considering the required large volume of sampling in the state space and high-quality value function identification. Therefore, we propose a sequential sampling algorithm and embed it into a DACE-based ADP method to obtain a high-quality value function approximation. Considering the limitations of the traditional stopping criterion (Bellman error bound), we further propose a 45-degree line stopping criterion to terminate value iteration early by identifying an optimally equivalent value function. A comparison of the computational results with those of other three existing policies indicates that the proposed sampling algorithm and stopping criterion can determine a high-quality ADP policy. Finally, we discuss the extrapolation issue of the value function approximated by multivariate adaptive regression splines, the results of which further demonstrate the quality of the ADP policy generated in this study.
AB - The approximate dynamic programming (ADP) method based on the design and analysis of computer experiments (DACE) approach has been demonstrated as an effective method to solve multistage decision-making problems in the literature. However, this method is still not efficient for infinite-horizon optimization considering the required large volume of sampling in the state space and high-quality value function identification. Therefore, we propose a sequential sampling algorithm and embed it into a DACE-based ADP method to obtain a high-quality value function approximation. Considering the limitations of the traditional stopping criterion (Bellman error bound), we further propose a 45-degree line stopping criterion to terminate value iteration early by identifying an optimally equivalent value function. A comparison of the computational results with those of other three existing policies indicates that the proposed sampling algorithm and stopping criterion can determine a high-quality ADP policy. Finally, we discuss the extrapolation issue of the value function approximated by multivariate adaptive regression splines, the results of which further demonstrate the quality of the ADP policy generated in this study.
KW - Approximate dynamic programming
KW - Extrapolation
KW - State space sampling
KW - Stopping criterion
UR - http://www.scopus.com/inward/record.url?scp=85089819618&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85089819618&partnerID=8YFLogxK
U2 - 10.1016/j.cor.2020.105032
DO - 10.1016/j.cor.2020.105032
M3 - Article
AN - SCOPUS:85089819618
SN - 0305-0548
VL - 124
JO - Computers and Operations Research
JF - Computers and Operations Research
M1 - 105032
ER -