A Tale of Two Efficient Value Iteration Algorithms for Solving Linear MDPs with Large Action Space

Zhaozhuo Xu, Zhao Song, Anshumali Shrivastava

Research output: Contribution to journalConference articlepeer-review

3 Scopus citations

Abstract

Markov Decision Process (MDP) with large action space naturally occurs in many applications such as language processing, information retrieval, and recommendation system. There have been various approaches to solve these MDPs through value iteration (VI). Unfortunately, all VI algorithms require expensive linear scans over the entire action space for value function estimation during each iteration. To this end, we present two provable Least-Squares Value Iteration (LSVI) algorithms with runtime complexity sublinear in the number of actions for linear MDPs. We formulate the value function estimation procedure in VI as an approximate maximum inner product search problem and propose a Locality Sensitive Hashing (LSH) type data structure to solve this problem with sublinear time complexity. Our major contribution is combining the guarantees of approximate maximum inner product search with the regret analysis of reinforcement learning. We prove that, with the appropriate choice of approximation factor, there exists a sweet spot. Our proposed Sublinear LSVI algorithms maintain the same regret as the original LSVI algorithms while reducing the runtime complexity to sublinear in the number of actions. To the best of our knowledge, this is the first work that combines LSH with reinforcement learning that results in provable improvements. We hope that our novel way of combining data structures and the iterative algorithm will open the door for further study into the cost reduction in reinforcement learning.

Original languageEnglish
Pages (from-to)788-836
Number of pages49
JournalProceedings of Machine Learning Research
Volume206
StatePublished - 2023
Event26th International Conference on Artificial Intelligence and Statistics, AISTATS 2023 - Valencia, Spain
Duration: 25 Apr 202327 Apr 2023

Fingerprint

Dive into the research topics of 'A Tale of Two Efficient Value Iteration Algorithms for Solving Linear MDPs with Large Action Space'. Together they form a unique fingerprint.

Cite this