Skip to main navigation Skip to search Skip to main content

ACCORD: Closing the Commonsense Measurability Gap

  • François Roewer-Després
  • , Jinyue Feng
  • , Zining Zhu
  • , Frank Rudzicz
  • University of Toronto
  • Dalhousie University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We present ACCORD, a framework and benchmark suite for disentangling the commonsense grounding and reasoning abilities of large language models (LLMs) through controlled, multi-hop counterfactuals. ACCORD introduces formal elements to commonsense reasoning to explicitly control and quantify reasoning complexity beyond the typical 1 or 2 hops. Uniquely, ACCORD can automatically generate benchmarks of arbitrary reasoning complexity, so it scales with future LLM improvements. Indeed, our experiments on state-of-the-art LLMs show performance degrading to below random chance with only moderate scaling, leaving substantial headroom for improvement. We release a leaderboard of the benchmark suite tested in this work, as well as code to automatically generate more complex benchmarks.

Original languageEnglish
Title of host publicationLong Papers
EditorsLuis Chiruzzo, Alan Ritter, Lu Wang
Pages3799-3829
Number of pages31
ISBN (Electronic)9798891761896
DOIs
StatePublished - 2025
Event2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2025 - Hybrid, Albuquerque, United States
Duration: 29 Apr 20254 May 2025

Publication series

NameProceedings of the 2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies: Long Papers, NAACL-HLT 2025
Volume1

Conference

Conference2025 Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2025
Country/TerritoryUnited States
CityHybrid, Albuquerque
Period29/04/254/05/25

Fingerprint

Dive into the research topics of 'ACCORD: Closing the Commonsense Measurability Gap'. Together they form a unique fingerprint.

Cite this