Publication Type
Journal Article
Version
acceptedVersion
Publication Date
1-2025
Abstract
Human pose estimation (HPE) models underperform in recognizing rare poses because they suffer from data imbalance problems (i.e., there are few image samples for rare poses) in their training datasets. From a data perspective, the most intuitive solution is to synthesize data for rare poses. Specifically, the rule-based methods apply manual manipulations (such as Cutout and GridMask) to the existing data, so the limited diversity of the data constrains the model. An alternative method is to learn the underlying data distribution via deep generative models (such as ControlNet and HumanSD) and then sample “new data” from the distribution. This works well for generating frequent poses in common scenes, but suffers when applied to rare poses or complex scenes (such as multiple persons with overlapping limbs). In this paper, we aim to address the above two issues, i.e., rare poses and complex scenes, for person image generation. We propose a two-stage method. In the first stage, we design a controllable pose generator named PoseFactory to synthesize rare poses. This generator is specifically trained on augmented pose data, and each pose is labelled with its level of difficulty and rarity. In the second stage, we introduce a multi-person image generator named MultipGenerator. It is conditioned on multiple human poses and textual descriptions of complex scenes. Both stages are controllable in terms of the diversity of poses and the complexity of scenes. For evaluation, we conduct extensive experiments on three widely used datasets: MS-COCO, HumanArt, and OCHuman. We compare our method against traditional pose data augmentation and person image generation methods, and it demonstrates its superior performance both quantitatively and qualitatively.
Discipline
Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces
Areas of Excellence
Digital transformation
Publication
IEEE Transactions on Multimedia
First Page
1
Last Page
13
ISSN
1520-9210
Publisher
Institute of Electrical and Electronics Engineers
Citation
ZHAO, Liuqing; TIAN, Zichen; ZOU Peng; HONG, Richang; and SUN, Qianru.
Synthesizing multi-person and rare pose images for human pose estimation. (2025). IEEE Transactions on Multimedia. 1-13.
Available at: https://ink.library.smu.edu.sg/sis_research/10151
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Included in
Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons
Comments
Accepted in Jan 2025, now we are waiting for TMM's proofreading and publication notification.