Multi-way cascade-attention network for multi-modal sequential recommendation

Publication Type

Journal Article

Publication Date

1-2026

Abstract

Sequential recommendation has become a hot topic, which aims to predict the desired items for each user based on his/her historical actions. The mainstream advancements for this task focus on modeling user behaviors in a pure item ID-based manner, which often fail to provide satisfactory results due to data sparsity and cold-start issues. Recently, several studies that leverage multi-modal information (i.e., multi-modal sequential recommendation) have shed light on alleviating such issues. However, we argue that three limitations are still not well addressed: 1) they usually extract ID modality features of an item with an one-hot encoding, which does not include any semantic information; 2) they fail to effectively mitigate the semantic gap issue and explicitly explore the asynchronous interplay between any two modalities; and 3) during the model prediction stage, they neglect the significance of adaptively fusing multi-modal embeddings for each user. To address such defects, we propose a novel framework for multi-modal sequential recommendation, namely, a multi-way cascade-attention network (MCN). Specifically, we apply a lightweight graph propagation network to derive informative representations of the ID-oriented modality, explicitly encoding collaborative signals in the user-item interaction graph. Next, we develop a multi-way cascade-attention module (CAM) to accomplish user behavior sequence alignment across different modality spaces. Each CAM consists of a cross-attention block followed by a series of self-attention blocks. The former encodes the asynchronous interplay between two modalities, while the latter captures intra-modal temporal dependencies. Finally, we design a modality-aware attentive strategy to dynamically fuse the user's dynamic interests across different modality spaces. Our extensive experiments on four public datasets demonstrate the superiority of MCN over recent state-of-the-art recommenders.

Keywords

Cascade-attention, Graph Neural Networks, Multi-modal Recommendation, Sequential Recommendation

Discipline

Databases and Information Systems | Software Engineering

Research Areas

Data Science and Engineering

Publication

IEEE Transactions on Multimedia

First Page

1

Last Page

13

ISSN

1520-9210

Identifier

10.1109/TMM.2026.3664993

Publisher

Institute of Electrical and Electronics Engineers

Additional URL

https://doi.org/10.1109/TMM.2026.3664993

This document is currently not available here.

Share

COinS