Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

3-2025

Abstract

Human pose estimation in computer vision predominantly focuses on the visible modality, with limited research on the infrared modality. No existing methods demonstrate robust performance across both modalities, missing their complementary strengths. This gap arises from the lack of a multimodal benchmark and the difficulty of developing robust multimodal capabilities. To address this, we introduce MMPD, a novel visible-infrared multimodal pose benchmark with high-quality annotations for both modalities. Leveraging MMPD, we expose the limitations of state-of-the-art methods due to modality variance. To overcome this challenge, we propose a novel method-agnostic scheme called AMMPE. By employing the Modality Adversarial Enhancement Stage and Modality Interaction Stage, AMMPE easily incorporates multimodal information without additional pose annotations and enhances effective modality interaction. Extensive experiments demonstrate that AMMPE improves performance in both visible and infrared modalities, achieving excellent modality robustness.

Keywords

Human pose estimation, multi-modality, benchmark

Discipline

Graphics and Human Computer Interfaces

Research Areas

Intelligent Systems and Optimization

Areas of Excellence

Digital transformation

Publication

Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2025), Hyderabad, India, April 6-11

First Page

Last Page

ISBN

9798350368741

Identifier

10.1109/ICASSP49660.2025.10888262

Publisher

IEEE

City or Country

Piscataway, NJ

Citation

XIA, Jiangnan; WU, Qilong; GUO, Yanyin; LI, Yi; CHENG, Jianghan; LI, Junwei; and ZHANG, Zhiyuan. Improving multimodal human pose estimation by adversarial modality enhancement. (2025). Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2025), Hyderabad, India, April 6-11. 1-5.
Available at: https://ink.library.smu.edu.sg/sis_research/10137

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

http://doi.org/10.1109/ICASSP49660.2025.10888262

Download

Included in

Graphics and Human Computer Interfaces Commons

COinS

Research Collection School Of Computing and Information Systems

Improving multimodal human pose estimation by adversarial modality enhancement

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Improving multimodal human pose estimation by adversarial modality enhancement

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links