Abstract
Multi-robot learning has been extensively studied recently. Developing provably-correct algorithms for learning decentralized control policies remains challenging. In this letter, we propose a sample-efficient multi-robot learning method based on guided policy search to learn decentralized swarm control policies. The proposed method uses distributed trajectory optimization to provide guiding trajectory samples for policy training. In turn, the learned policy is exploited to update the trajectory optimization results so that the guiding trajectories are reproducible by the current policy. A learning algorithm is designed to alternate between distributed trajectory optimization and policy optimization, which eventually converges to a solution with good long-term performance. We demonstrate the effectiveness of our method in a multi-robot rendezvous problem. The simulation results in a robot simulator show that our method efficiently learn decentralized control policy with substantially less training samples.
| Original language | English |
|---|---|
| Article number | 9127548 |
| Pages (from-to) | 743-748 |
| Number of pages | 6 |
| Journal | IEEE Control Systems Letters |
| Volume | 5 |
| Issue number | 3 |
| DOIs | |
| State | Published - Jul 2021 |
Keywords
- Multi-robot learning
- distributed trajectory optimization
- guided policy search
- robotic swarm
Fingerprint
Dive into the research topics of 'Multi-Robot Guided Policy Search for Learning Decentralized Swarm Control'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver