TY - GEN
T1 - ACCORD
T2 - 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2025
AU - Roewer-Després, François
AU - Feng, Jinyue
AU - Zhu, Zining
AU - Rudzicz, Frank
N1 - Publisher Copyright:
© 2025 Association for Computational Linguistics.
PY - 2025
Y1 - 2025
N2 - We present ACCORD, a framework and benchmark suite for disentangling the commonsense grounding and reasoning abilities of large language models (LLMs) through controlled, multi-hop counterfactuals. ACCORD introduces formal elements to commonsense reasoning to explicitly control and quantify reasoning complexity beyond the typical 1 or 2 hops. Uniquely, ACCORD can automatically generate benchmarks of arbitrary reasoning complexity, so it scales with future LLM improvements. Indeed, our experiments on state-of-the-art LLMs show performance degrading to below random chance with only moderate scaling, leaving substantial headroom for improvement. We release a leaderboard of the benchmark suite tested in this work, as well as code to automatically generate more complex benchmarks.
AB - We present ACCORD, a framework and benchmark suite for disentangling the commonsense grounding and reasoning abilities of large language models (LLMs) through controlled, multi-hop counterfactuals. ACCORD introduces formal elements to commonsense reasoning to explicitly control and quantify reasoning complexity beyond the typical 1 or 2 hops. Uniquely, ACCORD can automatically generate benchmarks of arbitrary reasoning complexity, so it scales with future LLM improvements. Indeed, our experiments on state-of-the-art LLMs show performance degrading to below random chance with only moderate scaling, leaving substantial headroom for improvement. We release a leaderboard of the benchmark suite tested in this work, as well as code to automatically generate more complex benchmarks.
UR - https://www.scopus.com/pages/publications/105027432236
UR - https://www.scopus.com/pages/publications/105027432236#tab=citedBy
U2 - 10.18653/v1/2025.naacl-long.193
DO - 10.18653/v1/2025.naacl-long.193
M3 - Conference contribution
AN - SCOPUS:105027432236
T3 - Proceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025
SP - 3799
EP - 3829
BT - Long Papers
A2 - Chiruzzo, Luis
A2 - Ritter, Alan
A2 - Wang, Lu
Y2 - 29 April 2025 through 4 May 2025
ER -