TY - JOUR
T1 - Multi-Robot Guided Policy Search for Learning Decentralized Swarm Control
AU - Jiang, Chao
AU - Guo, Yi
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2021/7
Y1 - 2021/7
N2 - Multi-robot learning has been extensively studied recently. Developing provably-correct algorithms for learning decentralized control policies remains challenging. In this letter, we propose a sample-efficient multi-robot learning method based on guided policy search to learn decentralized swarm control policies. The proposed method uses distributed trajectory optimization to provide guiding trajectory samples for policy training. In turn, the learned policy is exploited to update the trajectory optimization results so that the guiding trajectories are reproducible by the current policy. A learning algorithm is designed to alternate between distributed trajectory optimization and policy optimization, which eventually converges to a solution with good long-term performance. We demonstrate the effectiveness of our method in a multi-robot rendezvous problem. The simulation results in a robot simulator show that our method efficiently learn decentralized control policy with substantially less training samples.
AB - Multi-robot learning has been extensively studied recently. Developing provably-correct algorithms for learning decentralized control policies remains challenging. In this letter, we propose a sample-efficient multi-robot learning method based on guided policy search to learn decentralized swarm control policies. The proposed method uses distributed trajectory optimization to provide guiding trajectory samples for policy training. In turn, the learned policy is exploited to update the trajectory optimization results so that the guiding trajectories are reproducible by the current policy. A learning algorithm is designed to alternate between distributed trajectory optimization and policy optimization, which eventually converges to a solution with good long-term performance. We demonstrate the effectiveness of our method in a multi-robot rendezvous problem. The simulation results in a robot simulator show that our method efficiently learn decentralized control policy with substantially less training samples.
KW - Multi-robot learning
KW - distributed trajectory optimization
KW - guided policy search
KW - robotic swarm
UR - http://www.scopus.com/inward/record.url?scp=85089138415&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85089138415&partnerID=8YFLogxK
U2 - 10.1109/LCSYS.2020.3005441
DO - 10.1109/LCSYS.2020.3005441
M3 - Article
AN - SCOPUS:85089138415
VL - 5
SP - 743
EP - 748
JO - IEEE Control Systems Letters
JF - IEEE Control Systems Letters
IS - 3
M1 - 9127548
ER -