"Improving multimodal human pose estimation by adversarial modality enh" by Jiangnan XIA, Qilong WU et al.
 

Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

3-2025

Abstract

Human pose estimation in computer vision predominantly focuses on the visible modality, with limited research on the infrared modality. No existing methods demonstrate robust performance across both modalities, missing their complementary strengths. This gap arises from the lack of a multimodal benchmark and the difficulty of developing robust multimodal capabilities. To address this, we introduce MMPD, a novel visible-infrared multimodal pose benchmark with high-quality annotations for both modalities. Leveraging MMPD, we expose the limitations of state-of-the-art methods due to modality variance. To overcome this challenge, we propose a novel method-agnostic scheme called AMMPE. By employing the Modality Adversarial Enhancement Stage and Modality Interaction Stage, AMMPE easily incorporates multimodal information without additional pose annotations and enhances effective modality interaction. Extensive experiments demonstrate that AMMPE improves performance in both visible and infrared modalities, achieving excellent modality robustness.

Keywords

Human pose estimation, multi-modality, benchmark

Discipline

Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2025), Hyderabad, India, April 6-11

First Page

1

Last Page

5

ISBN

9798350368741

Identifier

10.1109/ICASSP49660.2025.10888262

Publisher

IEEE

City or Country

Piscataway, NJ

Additional URL

http://doi.org/10.1109/ICASSP49660.2025.10888262

Share

COinS