Abstract
We present a method for Temporal Difference (TD) learning that addresses several challenges faced by robots learning to navigate in a marine environment. For improved data efficiency, our method reduces TD updates to Gaussian Process regression. To make predictions amenable to online settings, we introduce a sparse approximation with improved quality over current rejection-based methods. We derive the predictive value function posterior and use the moments to obtain a new algorithm for model-free policy evaluation, SPGP-SARSA. With simple changes, we show SPGP-SARSA can be reduced to a model-based equivalent, SPGP-TD. We perform comprehensive simulation studies and also conduct physical learning trials with an underwater robot. Our results show SPGP-SARSA can outperform the state-of-the-art sparse method, replicate the prediction quality of its exact counterpart, and be applied to solve underwater navigation tasks.
| Original language | English |
|---|---|
| Pages (from-to) | 179-189 |
| Number of pages | 11 |
| Journal | Proceedings of Machine Learning Research |
| Volume | 87 |
| State | Published - 2018 |
| Event | 2nd Conference on Robot Learning, CoRL 2018 - Zurich, Switzerland Duration: 29 Oct 2018 → 31 Oct 2018 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 14 Life Below Water
Keywords
- Reinforcement Learning
- Sparse Gaussian Process Regression
Fingerprint
Dive into the research topics of 'Sparse Gaussian Process Temporal Difference Learning for Marine Robot Navigation'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver