Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

8-2025

Abstract

Large Language Models (LLMs) have significantly advanced the fact-checking studies. However, existing automated fact-checking evaluation methods rely on static datasets and classification metrics, which fail to automatically evaluate the justification production and uncover the nuanced limitations of LLMs in fact-checking. In this work, we introduce FACT-AUDIT, an agent-driven framework that adaptively and dynamically assesses LLMs’ fact-checking capabilities. Leveraging importance sampling principles and multi-agent collaboration, FACT-AUDIT generates adaptive and scalable datasets, performs iterative model-centric evaluations, and updates assessments based on model-specific responses. By incorporating justification production alongside verdict prediction, this framework provides a comprehensive and evolving audit of LLMs’ factual reasoning capabilities, to investigate their trustworthiness. Extensive experiments demonstrate that FACT-AUDIT effectively differentiates among state-of-the-art LLMs, providing valuable insights into model strengths and limitations in model-centric fact-checking analysis.

Discipline

Programming Languages and Compilers

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, Vienna, Austria, 2025 July 27 - August 1

First Page

360

Last Page

381

Identifier

10.18653/v1/2025.acl-long.17

Publisher

Association for Computational Linguistics

City or Country

USA

Additional URL

https://doi.org/10.18653/v1/2025.acl-long.17

Share

COinS