TY - JOUR
T1 - Diagnosis, Feedback, Adaptation
T2 - 40th International Conference on Machine Learning, ICML 2023
AU - Peng, Andi
AU - Netanyahu, Aviv
AU - Ho, Mark
AU - Shu, Tianmin
AU - Bobu, Andreea
AU - Shah, Julie
AU - Agrawal, Pulkit
N1 - Publisher Copyright:
© 2023 Proceedings of Machine Learning Research. All rights reserved.
PY - 2023
Y1 - 2023
N2 - Policies often fail due to distribution shift-changes in the state and reward that occur when a policy is deployed in new environments. Data augmentation can increase robustness by making the model invariant to task-irrelevant changes in the agent's observation. However, designers don't know which concepts are irrelevant a priori, especially when different end users have different preferences about how the task is performed. We propose an interactive framework to leverage feedback directly from the user to identify personalized task-irrelevant concepts. Our key idea is to generate counterfactual demonstrations that allow users to quickly identify possible task-relevant and irrelevant concepts. The knowledge of task-irrelevant concepts is then used to perform data augmentation and thus obtain a policy adapted to personalized user objectives. We present experiments validating our framework on discrete and continuous control tasks with real human users. Our method (1) enables users to better understand agent failure, (2) reduces the number of demonstrations required for fine-tuning, and (3) aligns the agent to individual user task preferences.
AB - Policies often fail due to distribution shift-changes in the state and reward that occur when a policy is deployed in new environments. Data augmentation can increase robustness by making the model invariant to task-irrelevant changes in the agent's observation. However, designers don't know which concepts are irrelevant a priori, especially when different end users have different preferences about how the task is performed. We propose an interactive framework to leverage feedback directly from the user to identify personalized task-irrelevant concepts. Our key idea is to generate counterfactual demonstrations that allow users to quickly identify possible task-relevant and irrelevant concepts. The knowledge of task-irrelevant concepts is then used to perform data augmentation and thus obtain a policy adapted to personalized user objectives. We present experiments validating our framework on discrete and continuous control tasks with real human users. Our method (1) enables users to better understand agent failure, (2) reduces the number of demonstrations required for fine-tuning, and (3) aligns the agent to individual user task preferences.
UR - http://www.scopus.com/inward/record.url?scp=85174421704&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85174421704&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85174421704
VL - 202
SP - 27630
EP - 27641
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
Y2 - 23 July 2023 through 29 July 2023
ER -