Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

7-2023

Abstract

Existing research on multimodal relation extraction (MRE) faces two co-existing challenges, internal-information over-utilization and external-information under-exploitation. To combat that, we propose a novel framework that simultaneously implements the idea of internal-information screening and external-information exploiting. First, we represent the fine-grained semantic structures of the input image and text with the visual and textual scene graphs, which are further fused into a unified cross-modal graph (CMG). Based on CMG, we perform structure refinement with the guidance of the graph information bottleneck principle, actively denoising the less-informative features. Next, we perform topic modeling over the input image and text, incorporating latent multimodal topic features to enrich the contexts. On the benchmark MRE dataset, our system outperforms the current best model significantly. With further in-depth analyses, we reveal the great potential of our method for the MRE task.

Keywords

computational linguistics; data mining; extraction; information retrieval

Discipline

Computer Sciences

Research Areas

Intelligent Systems and Optimization

Publication

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics

Volume

Volume 1: Long Papers

First Page

14734

Last Page

14751

Identifier

10.18653/v1/2023.acl-long.823

Publisher

Association for Computational Linguistics

City or Country

Canada

Additional URL

https://doi.org/10.18653/v1/2023.acl-long.823

Share

COinS