Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
3-2025
Abstract
Human pose estimation in computer vision predominantly focuses on the visible modality, with limited research on the infrared modality. No existing methods demonstrate robust performance across both modalities, missing their complementary strengths. This gap arises from the lack of a multimodal benchmark and the difficulty of developing robust multimodal capabilities. To address this, we introduce MMPD, a novel visible-infrared multimodal pose benchmark with high-quality annotations for both modalities. Leveraging MMPD, we expose the limitations of state-of-the-art methods due to modality variance. To overcome this challenge, we propose a novel method-agnostic scheme called AMMPE. By employing the Modality Adversarial Enhancement Stage and Modality Interaction Stage, AMMPE easily incorporates multimodal information without additional pose annotations and enhances effective modality interaction. Extensive experiments demonstrate that AMMPE improves performance in both visible and infrared modalities, achieving excellent modality robustness.
Keywords
Human pose estimation, multi-modality, benchmark
Discipline
Graphics and Human Computer Interfaces
Research Areas
Intelligent Systems and Optimization
Areas of Excellence
Digital transformation
Publication
Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2025), Hyderabad, India, April 6-11
First Page
1
Last Page
5
ISBN
9798350368741
Identifier
10.1109/ICASSP49660.2025.10888262
Publisher
IEEE
City or Country
Piscataway, NJ
Citation
XIA, Jiangnan; WU, Qilong; GUO, Yanyin; LI, Yi; CHENG, Jianghan; LI, Junwei; and ZHANG, Zhiyuan.
Improving multimodal human pose estimation by adversarial modality enhancement. (2025). Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2025), Hyderabad, India, April 6-11. 1-5.
Available at: https://ink.library.smu.edu.sg/sis_research/10137
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
http://doi.org/10.1109/ICASSP49660.2025.10888262