Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

8-2025

Abstract

Humans possess a remarkable ability to interpret underspecified ambiguous statements by inferring their meanings from contexts such as visual inputs. This ability, however, may not be as developed in recent pre-trained visionlanguage models (VLMs). In this paper, we introduce a novel probing dataset called FOCUS to evaluate whether state-of-the-art VLMs have this ability. FOCUS consists of underspecified sentences paired with image contexts and carefully designed probing questions. Our experiments reveal that VLMs still fall short in handling underspecification even when visual inputs that can help resolve the ambiguities are available. To further support research in underspecification, FOCUS will be released for public use. We hope this dataset will inspire further research on the reasoning and contextual understanding capabilities of VLMs.

Discipline

Graphics and Human Computer Interfaces | Programming Languages and Compilers

Research Areas

Data Science and Engineering

Publication

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, ACL 2025, Vienna, Austria, July 27 - August 1

Volume

1

First Page

27565

Last Page

27584

Identifier

10.18653/v1/2025.acl-long.1337

City or Country

Vienna, Austria

Additional URL

https://doi.org/10.18653/v1/2025.acl-long.1337

Share

COinS