Multi-way cascade-attention network for multi-modal sequential recommendation
Publication Type
Journal Article
Publication Date
1-2026
Abstract
Sequential recommendation has become a hot topic, which aims to predict the desired items for each user based on his/her historical actions. The mainstream advancements for this task focus on modeling user behaviors in a pure item ID-based manner, which often fail to provide satisfactory results due to data sparsity and cold-start issues. Recently, several studies that leverage multi-modal information (i.e., multi-modal sequential recommendation) have shed light on alleviating such issues. However, we argue that three limitations are still not well addressed: 1) they usually extract ID modality features of an item with an one-hot encoding, which does not include any semantic information; 2) they fail to effectively mitigate the semantic gap issue and explicitly explore the asynchronous interplay between any two modalities; and 3) during the model prediction stage, they neglect the significance of adaptively fusing multi-modal embeddings for each user. To address such defects, we propose a novel framework for multi-modal sequential recommendation, namely, a multi-way cascade-attention network (MCN). Specifically, we apply a lightweight graph propagation network to derive informative representations of the ID-oriented modality, explicitly encoding collaborative signals in the user-item interaction graph. Next, we develop a multi-way cascade-attention module (CAM) to accomplish user behavior sequence alignment across different modality spaces. Each CAM consists of a cross-attention block followed by a series of self-attention blocks. The former encodes the asynchronous interplay between two modalities, while the latter captures intra-modal temporal dependencies. Finally, we design a modality-aware attentive strategy to dynamically fuse the user's dynamic interests across different modality spaces. Our extensive experiments on four public datasets demonstrate the superiority of MCN over recent state-of-the-art recommenders.
Keywords
Cascade-attention, Graph Neural Networks, Multi-modal Recommendation, Sequential Recommendation
Discipline
Databases and Information Systems | Software Engineering
Research Areas
Data Science and Engineering
Publication
IEEE Transactions on Multimedia
First Page
1
Last Page
13
ISSN
1520-9210
Identifier
10.1109/TMM.2026.3664993
Publisher
Institute of Electrical and Electronics Engineers
Citation
WU, Bin; CHEN, Long; FU, Yuheng; MA, Yunshan; XU, Mingliang; and CHUA, Tat-Seng.
Multi-way cascade-attention network for multi-modal sequential recommendation. (2026). IEEE Transactions on Multimedia. 1-13.
Available at: https://ink.library.smu.edu.sg/sis_research/11039
Additional URL
https://doi.org/10.1109/TMM.2026.3664993