Research Collection School Of Computing and Information Systems

Bridging the modality gap: Advancing multimodal human pose estimation with modality-adaptive pose estimator and novel benchmark datasets

Jiangnan XIA
Zhiyuan ZHANG, Singapore Management UniversityFollow
Yanyin GUO
Qilong WU
Yi LI
Jianghan CHENG
LI Junwei

Publication Type

Conference Proceeding Article

Publication Date

4-2025

Abstract

Visual and infrared images represent two indispensable modalities that complement each other, offering unique insights into human pose estimation under different lighting conditions. However, existing efforts have predominantly focused on single modality, leading to significant challenges when transitioning to multimodal environments. The performance degradation observed in state-of-the-art models on multimodal images can be attributed to the substantial modality gap and the absence of multimodal benchmarks. To address this critical gap, we introduce novel visible-infrared multimodal human pose datasets where the two modality images are well balanced and accurately labeled. Leveraging these datasets, we establish the comprehensive benchmark to facilitate rigorous analysis and enhancement of multimodal human pose estimation techniques. Our findings underscore the limitations posed by modality variance on state-of-the-art methods. To overcome this challenge, we propose a method-agnostic scheme called Modality-Adaptive Pose Estimation, designed to seamlessly integrate into existing approaches. By employing Modality-Specific Batch Normalization and Modality Adaptive Loss, our approach enhances feature interactions between the two modalities, yielding superior performance. Extensive experiments conducted with popular baseline methods demonstrate the efficacy of our proposed approach in achieving state-of-the-art results on both modalities. We believe that our benchmarks offer a robust platform for investigating robustness and will significantly contribute to advancing research in this field.

Keywords

Human pose estimation, Multimodal, Visible, Infrared, Benckmark

Discipline

Artificial Intelligence and Robotics | Databases and Information Systems

Research Areas

Intelligent Systems and Optimization

Publication

Proceedings of the 13th International Conference, CVM 2025, Hong Kong, China, April 19-21

First Page

125

Last Page

153

Identifier

10.1007/978-981-96-5815-2_8

Publisher

Springer

City or Country

Cham

Citation

XIA, Jiangnan; ZHANG, Zhiyuan; GUO, Yanyin; WU, Qilong; LI, Yi; CHENG, Jianghan; and LI Junwei. Bridging the modality gap: Advancing multimodal human pose estimation with modality-adaptive pose estimator and novel benchmark datasets. (2025). Proceedings of the 13th International Conference, CVM 2025, Hong Kong, China, April 19-21. 125-153.
Available at: https://ink.library.smu.edu.sg/sis_research/10173

Additional URL

https://doi.org/10.1007/978-981-96-5815-2_8

This document is currently not available here.

COinS

Research Collection School Of Computing and Information Systems

Bridging the modality gap: Advancing multimodal human pose estimation with modality-adaptive pose estimator and novel benchmark datasets

Publication Type

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Additional URL

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Bridging the modality gap: Advancing multimodal human pose estimation with modality-adaptive pose estimator and novel benchmark datasets

Author

Publication Type

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Additional URL

Share

Search

Links

Browse

Links