TY - JOUR
T1 - Measuring Robustness for NLP
AU - Yu, Yu
AU - Khan, Abdul Rafae
AU - Xu, Jia
N1 - Publisher Copyright:
© 2022 Proceedings - International Conference on Computational Linguistics, COLING. All rights reserved.
PY - 2022
Y1 - 2022
N2 - The quality of Natural Language Processing (NLP) models is typically measured by the accuracy or error rate of a predefined test set. Because the evaluation and optimization of these measures are narrowed down to a specific domain like news and cannot be generalized to other domains like Twitter, we often observe that a system reported with human parity results generates surprising errors in real-life use scenarios. We address this weakness with a new approach that uses an NLP quality measure based on robustness. Unlike previous work that has defined robustness using Minimax to bound worst cases, we measure robustness based on the consistency of cross-domain accuracy and introduce the coefficient of variation and (ϵ, γ)Robustness. Our measures demonstrate higher agreements with human evaluation than accuracy scores like BLEU on ranking Machine Translation (MT) systems.
AB - The quality of Natural Language Processing (NLP) models is typically measured by the accuracy or error rate of a predefined test set. Because the evaluation and optimization of these measures are narrowed down to a specific domain like news and cannot be generalized to other domains like Twitter, we often observe that a system reported with human parity results generates surprising errors in real-life use scenarios. We address this weakness with a new approach that uses an NLP quality measure based on robustness. Unlike previous work that has defined robustness using Minimax to bound worst cases, we measure robustness based on the consistency of cross-domain accuracy and introduce the coefficient of variation and (ϵ, γ)Robustness. Our measures demonstrate higher agreements with human evaluation than accuracy scores like BLEU on ranking Machine Translation (MT) systems.
UR - http://www.scopus.com/inward/record.url?scp=85159857579&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85159857579&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:85159857579
SN - 2951-2093
VL - 29
SP - 3908
EP - 3916
JO - Proceedings - International Conference on Computational Linguistics, COLING
JF - Proceedings - International Conference on Computational Linguistics, COLING
IS - 1
T2 - 29th International Conference on Computational Linguistics, COLING 2022
Y2 - 12 October 2022 through 17 October 2022
ER -