Research Collection School Of Computing and Information Systems

Pro-Cap: Leveraging a frozen vision-language model for hateful meme detection

Rui CAO, Singapore Management UniversityFollow
Ming Shan HEE
Adriel KUEK
Wen Haw CHONG, Singapore Management UniversityFollow
Roy Ka-Wei LEE
Jing JIANG, Singapore Management UniversityFollow

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

11-2023

Abstract

Hateful meme detection is a challenging multimodal task that requires comprehension of both vision and language, as well as cross-modal interactions. Recent studies have tried to fine-tune pre-trained vision-language models (PVLMs) for this task. However, with increasing model sizes, it becomes important to leverage powerful PVLMs more efficiently, rather than simply fine-tuning them. Recently, researchers have attempted to convert meme images into textual captions and prompt language models for predictions. This approach has shown good performance but suffers from non-informative image captions. Considering the two factors mentioned above, we propose a probing-based captioning approach to leverage PVLMs in a zero-shot visual question answering (VQA) manner. Specifically, we prompt a frozen PVLM by asking hateful content-related questions and use the answers as image captions (which we call Pro-Cap), so that the captions contain information critical for hateful content detection. The good performance of models with Pro-Cap on three benchmarks validates the effectiveness and generalization of the proposed method 1.

Keywords

Memes, multimodal, semantic extraction

Discipline

Databases and Information Systems | Graphic Communications | Graphics and Human Computer Interfaces

Research Areas

Data Science and Engineering

Publication

MM '23: Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, October 29 - November 3

First Page

5244

Last Page

5252

ISBN

9798400701085

Identifier

10.1145/3581783.3612498

Publisher

ACM

City or Country

New York

Citation

CAO, Rui; HEE, Ming Shan; KUEK, Adriel; CHONG, Wen Haw; LEE, Roy Ka-Wei; and JIANG, Jing. Pro-Cap: Leveraging a frozen vision-language model for hateful meme detection. (2023). MM '23: Proceedings of the 31st ACM International Conference on Multimedia, Ottawa, October 29 - November 3. 5244-5252.
Available at: https://ink.library.smu.edu.sg/sis_research/8477

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution 3.0 License.

Additional URL

https://doi.org/10.1145/3581783.3612498

Download

Included in

Databases and Information Systems Commons, Graphic Communications Commons, Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

Pro-Cap: Leveraging a frozen vision-language model for hateful meme detection

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Pro-Cap: Leveraging a frozen vision-language model for hateful meme detection

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links