TY - GEN
T1 - AuditBench
T2 - 2nd AI4Research Workshop: Towards a Knowledge-Grounded Scientific Research Lifecycle, AI4Research 2025 and 1st Workshop on Scalable and Efficient Artificial Intelligence Systems, SEAS 2025, held in conjunction with the 39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025
AU - Wang, Rushi
AU - Liu, Jiateng
AU - Zhao, Weijie
AU - Li, Shenglan
AU - Zhang, Denghui
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.
PY - 2025
Y1 - 2025
N2 - Financial statement auditing is essential for stakeholders to understand a company’s financial health, yet current manual processes are inefficient and error-prone. Even with extensive verification procedures, auditors frequently miss errors, leading to inaccurate financial statements that fail to meet stakeholder expectations for transparency and reliability. To this end, we harness large language models (LLMs) to automate financial statement auditing and rigorously assess their capabilities, providing insights on their performance boundaries in the scenario of automated auditing. Our work introduces a comprehensive benchmark using a curated dataset combining real-world financial tables with synthesized transaction data. In the benchmark, we developed a rigorous five-stage evaluation framework to assess LLMs’ auditing capabilities. The benchmark also challenges models to map specific financial statement errors to corresponding violations of accounting standards, simulating real-world auditing scenarios through test cases. Our testing reveals that current state-of-the-art LLMs successfully identify financial statement errors when given historical transaction data. However, these models demonstrate significant limitations in explaining detected errors and citing relevant accounting standards. Furthermore, LLMs struggle to execute complete audits and make necessary financial statement revisions. These findings highlight a critical gap in LLMs’ domain-specific accounting knowledge. Future research must focus on enhancing LLMs’ understanding of auditing principles and procedures. Our benchmark and evaluation framework establish a foundation for developing more effective automated auditing tools that will substantially improve the accuracy and efficiency of real-world financial statement auditing.
AB - Financial statement auditing is essential for stakeholders to understand a company’s financial health, yet current manual processes are inefficient and error-prone. Even with extensive verification procedures, auditors frequently miss errors, leading to inaccurate financial statements that fail to meet stakeholder expectations for transparency and reliability. To this end, we harness large language models (LLMs) to automate financial statement auditing and rigorously assess their capabilities, providing insights on their performance boundaries in the scenario of automated auditing. Our work introduces a comprehensive benchmark using a curated dataset combining real-world financial tables with synthesized transaction data. In the benchmark, we developed a rigorous five-stage evaluation framework to assess LLMs’ auditing capabilities. The benchmark also challenges models to map specific financial statement errors to corresponding violations of accounting standards, simulating real-world auditing scenarios through test cases. Our testing reveals that current state-of-the-art LLMs successfully identify financial statement errors when given historical transaction data. However, these models demonstrate significant limitations in explaining detected errors and citing relevant accounting standards. Furthermore, LLMs struggle to execute complete audits and make necessary financial statement revisions. These findings highlight a critical gap in LLMs’ domain-specific accounting knowledge. Future research must focus on enhancing LLMs’ understanding of auditing principles and procedures. Our benchmark and evaluation framework establish a foundation for developing more effective automated auditing tools that will substantially improve the accuracy and efficiency of real-world financial statement auditing.
KW - Automated Auditing
KW - Error Detection
KW - Financial Statement Auditing
KW - Large Language Models (LLMs)
UR - https://www.scopus.com/pages/publications/105010827899
UR - https://www.scopus.com/pages/publications/105010827899#tab=citedBy
U2 - 10.1007/978-981-96-8912-5_3
DO - 10.1007/978-981-96-8912-5_3
M3 - Conference contribution
AN - SCOPUS:105010827899
SN - 9789819689118
T3 - Communications in Computer and Information Science
SP - 59
EP - 81
BT - AI for Research and Scalable, Efficient Systems - Second International Workshop, AI4Research 2025, and First International Workshop, SEAS 2025, Held in Conjunction with AAAI 2025, Proceedings
A2 - Wang, Qingyun
A2 - Yin, Wenpeng
A2 - Aich, Abhishek
A2 - Suh, Yumin
A2 - Peng, Kuan-Chuan
Y2 - 25 February 2025 through 4 March 2025
ER -