Research Collection School Of Computing and Information Systems

Self-adaptive fine-grained multi-modal data augmentation for semi-supervised muti-modal coreference resolution

Li ZHENG
Boyu CHEN
Hao FEI
Fei LI
Shengqiong WU
Lizi LIAO, Singapore Management UniversityFollow
Donghong JI

Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

10-2024

Abstract

Coreference resolution, an essential task in natural language processing, is particularly challenging in multi-modal scenarios where data comes in various forms and modalities. Despite advancements, limitations due to scarce labeled data and underleveraged unlabeled data persist. We address these issues with a self-adaptive fine-grained multi-modal data augmentation framework for semi-supervised MCR, focusing on enriching training data from labeled datasets and tapping into the untapped potential of unlabeled data. Regarding the former issue, we first leverage text coreference resolution datasets and diffusion models,to perform fine-grained text-to-image generation with aligned text entities and image bounding boxes. We then introduce a self-adaptive selection strategy, meticulously curating the augmented data to enhance the diversity and volume of the training set without compromising its quality. For the latter issue, we design a self-adaptive threshold strategy that dynamically adjusts the confidence threshold based on the model's learning status and performance, enabling effective utilization of valuable information from unlabeled data. Additionally, we incorporate a distance smoothing term, which smooths distances between positive and negative samples, enhancing discriminative power of the model?s feature representations and addressing noise and uncertainty in the unlabeled data. Our experiments on the widely-used CIN dataset show that our framework significantly outperforms state-of-the-art baselines by at least 9.57% on MUC F1 score and 4.92% on CoNLL F1 score. Remarkably, against weakly-supervised baselines, our framework achieves a staggering 22.24% enhancement in MUC F1 score. These results, underpinned by in-depth analyses, underscore the effectiveness and potential of our approach for advancing MCR tasks.

Keywords

Coreference Resolution, Multi-modal, Semi-supervised Learning

Discipline

Artificial Intelligence and Robotics | Computer Sciences

Research Areas

Data Science and Engineering; Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the 32nd ACM International Conference on Multimedia (ACMMM 2024) : Melbourne, Australia, Oct 28-Nov 1

First Page

8576

Last Page

8585

Identifier

10.1145/3664647.3680966

Publisher

Association for Computing Machinery

City or Country

Melbourne, Australia

Citation

ZHENG, Li; CHEN, Boyu; FEI, Hao; LI, Fei; WU, Shengqiong; LIAO, Lizi; and JI, Donghong. Self-adaptive fine-grained multi-modal data augmentation for semi-supervised muti-modal coreference resolution. (2024). Proceedings of the 32nd ACM International Conference on Multimedia (ACMMM 2024) : Melbourne, Australia, Oct 28-Nov 1. 8576-8585.
Available at: https://ink.library.smu.edu.sg/sis_research/9694

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1145/3664647.3680966

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Research Collection School Of Computing and Information Systems

Self-adaptive fine-grained multi-modal data augmentation for semi-supervised muti-modal coreference resolution

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Self-adaptive fine-grained multi-modal data augmentation for semi-supervised muti-modal coreference resolution

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links