TY - GEN
T1 - INVESTORBENCH
T2 - 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025
AU - Li, Haohang
AU - Cao, Yupeng
AU - Yu, Yangyang
AU - Javaji, Shashidhar Reddy
AU - Deng, Zhiyang
AU - He, Yueru
AU - Jiang, Yuechen
AU - Zhu, Zining
AU - Subbalakshmi, Koduvayur
AU - Huang, Jimin
AU - Qian, Lingfei
AU - Peng, Xueqing
AU - Xie, Qianqian
AU - Suchow, Jordan W.
N1 - Publisher Copyright:
© 2025 Association for Computational Linguistics.
PY - 2025
Y1 - 2025
N2 - Recent advancements have underscored the potential of large language model (LLM)-based agents in financial decision-making. Despite this progress, the field currently encounters two main challenges: (1) the lack of a comprehensive LLM agent framework adaptable to a variety of financial tasks, and (2) the absence of standardized benchmarks and consistent datasets for assessing agent performance. To tackle these issues, we introduce INVESTORBENCH, the first benchmark specifically designed for evaluating LLM-based agents in diverse financial decision-making contexts. INVESTORBENCH enhances the versatility of LLM-enabled agents by providing a comprehensive suite of tasks applicable to different financial products, including single equities like stocks, cryptocurrencies and exchange-traded funds (ETFs). Additionally, we assess the reasoning and decision-making capabilities of our agent framework using thirteen different LLMs as backbone models, across various market environments and tasks. Furthermore, we have curated a diverse collection of open-source, multimodal datasets and developed a comprehensive suite of environments for financial decision-making. This establishes a highly accessible platform for evaluating financial agents' performance across various scenarios. The code is available at Github Repo: https://github.com/felis33/INVESTOR-BENCH.
AB - Recent advancements have underscored the potential of large language model (LLM)-based agents in financial decision-making. Despite this progress, the field currently encounters two main challenges: (1) the lack of a comprehensive LLM agent framework adaptable to a variety of financial tasks, and (2) the absence of standardized benchmarks and consistent datasets for assessing agent performance. To tackle these issues, we introduce INVESTORBENCH, the first benchmark specifically designed for evaluating LLM-based agents in diverse financial decision-making contexts. INVESTORBENCH enhances the versatility of LLM-enabled agents by providing a comprehensive suite of tasks applicable to different financial products, including single equities like stocks, cryptocurrencies and exchange-traded funds (ETFs). Additionally, we assess the reasoning and decision-making capabilities of our agent framework using thirteen different LLMs as backbone models, across various market environments and tasks. Furthermore, we have curated a diverse collection of open-source, multimodal datasets and developed a comprehensive suite of environments for financial decision-making. This establishes a highly accessible platform for evaluating financial agents' performance across various scenarios. The code is available at Github Repo: https://github.com/felis33/INVESTOR-BENCH.
UR - https://www.scopus.com/pages/publications/105021015577
UR - https://www.scopus.com/pages/publications/105021015577#tab=citedBy
M3 - Conference contribution
AN - SCOPUS:105021015577
T3 - Proceedings of the Annual Meeting of the Association for Computational Linguistics
SP - 2509
EP - 2525
BT - Long Papers
A2 - Che, Wanxiang
A2 - Nabende, Joyce
A2 - Shutova, Ekaterina
A2 - Pilehvar, Mohammad Taher
Y2 - 27 July 2025 through 1 August 2025
ER -