Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
2-2024
Abstract
Large language models (LLMs) have shown remarkable performance in natural language processing (NLP) tasks. To comprehend and execute diverse human instructions over image data, instruction-tuned large vision-language models (LVLMs) have been introduced. However, LVLMs may suffer from different types of object hallucinations. Nevertheless, LVLMs are evaluated for coarse-grained object hallucinations only (i.e., generated objects non-existent in the input image). The fine-grained object attributes and behaviors non-existent in the image may still be generated but not measured by the current evaluation methods. In this paper, we thus focus on reducing fine-grained hallucinations of LVLMs. We propose ReCaption, a framework that consists of two components: rewriting captions using ChatGPT and fine-tuning the instruction-tuned LVLMs on the rewritten captions. We also propose a fine-grained probing-based evaluation method named Fine-Grained Object Hallucination Evaluation (FGHE). Our experiment results demonstrate that ReCaption effectively reduces fine-grained object hallucination for different LVLM options and improves their text generation quality. The code can be found at https://github.com/Anonymousanoy/FOHE.
Keywords
Hallucination Mitigation, Large Vision-Language Models
Discipline
Artificial Intelligence and Robotics | Databases and Information Systems
Research Areas
Data Science and Engineering
Publication
Multimedia Modeling: MMM 2024: International Conference, Amsterdam, January 29 - February 2: Proceedings
First Page
32
Last Page
45
ISBN
9783031533013
Identifier
10.1007/978-3-031-53302-0_3
Publisher
Springer
City or Country
Cham
Citation
WANG, Lei; HE, Jiabang; LI, Shenshen; LIU, Ning; and LIM, Ee-peng.
Mitigating fine-grained hallucination by fine-tuning large vision-language models with caption rewrites. (2024). Multimedia Modeling: MMM 2024: International Conference, Amsterdam, January 29 - February 2: Proceedings. 32-45.
Available at: https://ink.library.smu.edu.sg/sis_research/8750
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1007/978-3-031-53302-0_3