Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
8-2025
Abstract
Solving financial problems demands complex reasoning, multimodal data processing, and a broad technical understanding, presenting unique challenges for current large language models (LLMs). We introduce **XFinBench**, a novel benchmark with 4,235 examples designed to evaluate LLM’s ability in solving comple**X**, knowledge-intensive **Fin**ancial problems across diverse graduate-level finance topics with multi-modal context. We identify five core capabilities of LLMs using XFinBench, i.e., _terminology understanding_, _temporal reasoning_, _future forecasting_, _scenario planning_, and _numerical modelling_. Upon XFinBench, we conduct extensive experiments on 18 leading models. The result shows that o1 is the best-performing text-only model with an overall accuracy of 67.3%, but still lags significantly behind human experts with 12.5%, especially in temporal reasoning and scenario planning capabilities. We further construct a knowledge bank with 3,032 finance terms for knowledge augmentation analysis, and find that relevant knowledge to the question only brings consistent accuracy improvements to small open-source model. Additionally, our error analysis reveals that rounding errors during calculation and blindness to position and intersection of curves in the image are two primary issues leading to model’s poor performance in calculating and visual-context questions, respectively.
Discipline
Artificial Intelligence and Robotics | Programming Languages and Compilers
Research Areas
Intelligent Systems and Optimization
Areas of Excellence
Digital transformation
Publication
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Vienna, Austria, July 27 - August 1
First Page
8715
Last Page
8758
Identifier
10.18653/v1/2025.findings-acl.457
Publisher
ACL
City or Country
Austria
Citation
ZHANG, Zhihan; CAO, Yixin; and LIAO, Lizi.
XFinBench: Benchmarking LLMs in complex financial problem solving and reasoning. (2025). Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL 2025), Vienna, Austria, July 27 - August 1. 8715-8758.
Available at: https://ink.library.smu.edu.sg/sis_research/10788
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.18653/v1/2025.findings-acl.457
Included in
Artificial Intelligence and Robotics Commons, Programming Languages and Compilers Commons