TY - JOUR
T1 - Reinforcement learning-based multi-AUV adaptive trajectory planning for under-ice field estimation
AU - Wang, Chaofeng
AU - Wei, Li
AU - Wang, Zhaohui
AU - Song, Min
AU - Mahmoudian, Nina
N1 - Publisher Copyright:
© 2018 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2018/11/9
Y1 - 2018/11/9
N2 - This work studies online learning-based trajectory planning for multiple autonomous underwater vehicles (AUVs) to estimate a water parameter field of interest in the under-ice environment. A centralized system is considered, where several fixed access points on the ice layer are introduced as gateways for communications between the AUVs and a remote data fusion center. We model the water parameter field of interest as a Gaussian process with unknown hyper-parameters. The AUV trajectories for sampling are determined on an epoch-by-epoch basis. At the end of each epoch, the access points relay the observed field samples from all the AUVs to the fusion center, which computes the posterior distribution of the field based on the Gaussian process regression and estimates the field hyper-parameters. The optimal trajectories of all the AUVs in the next epoch are determined to maximize a long-term reward that is defined based on the field uncertainty reduction and the AUV mobility cost, subject to the kinematics constraint, the communication constraint and the sensing area constraint. We formulate the adaptive trajectory planning problem as a Markov decision process (MDP). A reinforcement learning-based online learning algorithm is designed to determine the optimal AUV trajectories in a constrained continuous space. Simulation results show that the proposed learning-based trajectory planning algorithm has performance similar to a benchmark method that assumes perfect knowledge of the field hyper-parameters.
AB - This work studies online learning-based trajectory planning for multiple autonomous underwater vehicles (AUVs) to estimate a water parameter field of interest in the under-ice environment. A centralized system is considered, where several fixed access points on the ice layer are introduced as gateways for communications between the AUVs and a remote data fusion center. We model the water parameter field of interest as a Gaussian process with unknown hyper-parameters. The AUV trajectories for sampling are determined on an epoch-by-epoch basis. At the end of each epoch, the access points relay the observed field samples from all the AUVs to the fusion center, which computes the posterior distribution of the field based on the Gaussian process regression and estimates the field hyper-parameters. The optimal trajectories of all the AUVs in the next epoch are determined to maximize a long-term reward that is defined based on the field uncertainty reduction and the AUV mobility cost, subject to the kinematics constraint, the communication constraint and the sensing area constraint. We formulate the adaptive trajectory planning problem as a Markov decision process (MDP). A reinforcement learning-based online learning algorithm is designed to determine the optimal AUV trajectories in a constrained continuous space. Simulation results show that the proposed learning-based trajectory planning algorithm has performance similar to a benchmark method that assumes perfect knowledge of the field hyper-parameters.
KW - AUVs
KW - Adaptive trajectory planning
KW - Field estimation
KW - Reinforcement learning
KW - Under-ice exploration
KW - Underwater communication networks
UR - http://www.scopus.com/inward/record.url?scp=85056498701&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85056498701&partnerID=8YFLogxK
U2 - 10.3390/s18113859
DO - 10.3390/s18113859
M3 - Article
C2 - 30424017
AN - SCOPUS:85056498701
SN - 1424-8220
VL - 18
JO - Sensors (Switzerland)
JF - Sensors (Switzerland)
IS - 11
M1 - 3859
ER -