Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
11-2024
Abstract
Product bundling has been a prevailing marketing strategy that is beneficial in the online shopping scenario. Effective product bundling methods depend on high-quality item representations capturing both the individual items' semantics and cross-item relations. However, previous item representation learning methods, either feature fusion or graph learning, suffer from inadequate cross-modal alignment and struggle to capture the cross-item relations for cold-start items. Multimodal pre-train models could be the potential solutions given their promising performance on various multimodal downstream tasks. However, the cross-item relations have been under-explored in the current multimodal pre-train models.To bridge this gap, we propose a novel and simple framework Cross-Item Relational Pre-training (CIRP) for item representation learning in product bundling. Specifically, we employ a multimodal encoder to generate image and text representations. Then we leverage both the cross-item contrastive loss (CIC) and individual item's image-text contrastive loss (ITC) as the pre-train objectives. Our method seeks to integrate cross-item relation modeling capability into the multimodal encoder. Therefore, even for cold-start items without explicit relations, their representations are still relation-aware. Furthermore, to eliminate the potential noise and reduce the computational cost, we harness a relation pruning module to remove the noisy and redundant relations. We apply the item representations extracted by CIRP to the product bundling model ItemKNN, and experiments on three e-commerce datasets demonstrate that CIRP outperforms various leading representation learning methods. The code and dataset are available at https://github.com/HappyPointer/CIRP.
Keywords
bundle recommendation, multimodal bundle construction, multimodal pre-train, vision language model
Discipline
Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces
Research Areas
Intelligent Systems and Optimization
Areas of Excellence
Digital transformation
Publication
MM '24: The 32nd ACM International Conference on Multimedia, Melbourne, Australia, October 28 - November 1
First Page
9641
Last Page
9649
Identifier
10.1145/3664647.3681349
Publisher
ACM
City or Country
New York
Citation
MA, Yunshan; HE, Yingzhi; ZHONG, Wenjun; WANG, Xiang; ZIMMERMANN, Roger; and CHUA, Tat-Seng.
CIRP: Cross‑item relational pre‑training for multimodal product bundling. (2024). MM '24: The 32nd ACM International Conference on Multimedia, Melbourne, Australia, October 28 - November 1. 9641-9649.
Available at: https://ink.library.smu.edu.sg/sis_research/10912
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/3664647.3681349
Included in
Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons