TY - JOUR
T1 - The RobotSlang Benchmark
T2 - 4th Conference on Robot Learning, CoRL 2020
AU - Banerjee, Shurjo
AU - Thomason, Jesse
AU - Corso, Jason J.
N1 - Publisher Copyright:
© 2020 Proceedings of Machine Learning Research. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Autonomous robot systems for applications from search and rescue to assistive guidance should be able to engage in natural language dialog with people. To study such cooperative communication, we introduce Robot Simultaneous Localization and Mapping with Natural Language (RobotSlang), a benchmark of 169 natural language dialogs between a human DRIVER controlling a robot and a human COMMANDER providing guidance towards navigation goals. In each trial, the pair first cooperates to localize the robot on a global map visible to the COMMANDER, then the DRIVER follows COMMANDER instructions to move the robot to a sequence of target objects. We introduce a Localization from Dialog History (LDH) and a Navigation from Dialog History (NDH) task where a learned agent is given dialog and visual observations from the robot platform as input and must localize in the global map or navigate towards the next target object, respectively. RobotSlang is comprised of nearly 5k utterances and over 1k minutes of robot camera and control streams. We present an initial model for the NDH task, and show that an agent trained in simulation can follow the RobotSlang dialog-based navigation instructions for controlling a physical robot platform. Code and data are available at https://umrobotslang.github.io/.
AB - Autonomous robot systems for applications from search and rescue to assistive guidance should be able to engage in natural language dialog with people. To study such cooperative communication, we introduce Robot Simultaneous Localization and Mapping with Natural Language (RobotSlang), a benchmark of 169 natural language dialogs between a human DRIVER controlling a robot and a human COMMANDER providing guidance towards navigation goals. In each trial, the pair first cooperates to localize the robot on a global map visible to the COMMANDER, then the DRIVER follows COMMANDER instructions to move the robot to a sequence of target objects. We introduce a Localization from Dialog History (LDH) and a Navigation from Dialog History (NDH) task where a learned agent is given dialog and visual observations from the robot platform as input and must localize in the global map or navigate towards the next target object, respectively. RobotSlang is comprised of nearly 5k utterances and over 1k minutes of robot camera and control streams. We present an initial model for the NDH task, and show that an agent trained in simulation can follow the RobotSlang dialog-based navigation instructions for controlling a physical robot platform. Code and data are available at https://umrobotslang.github.io/.
KW - Benchmark
KW - Dialog
KW - Natural Language
KW - Robot
KW - Visual Navigation
UR - http://www.scopus.com/inward/record.url?scp=85175826723&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85175826723&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85175826723
VL - 155
SP - 1384
EP - 1393
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
Y2 - 16 November 2020 through 18 November 2020
ER -