Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
7-2024
Abstract
Texts are often interconnected in a network structure, e.g., academic papers via citations. On the one hand, though Graph Neural Networks (GNNs) have shown promising ability to derive effective embeddings for networked documents, they do not assume latent topics, resulting in uninterpretahle embeddings. On the other hand, topic models can infer interpretable document representations. However, most topic models focus on plain text and fail to leverage network structure across documents. In this paper, we propose a GNN-based topic model that both captures network connection and derives semantically interpretable text representations. For network modeling, we build our model with Optimal Transport Barycenter. For semantic interpretability, we extend optimal transport with pre-trained word embeddings.
Keywords
Graph Neural Networks, Text Mining, Optimal Transport, Dirichlet Distribution, Document Networks
Discipline
Artificial Intelligence and Robotics | Computer Sciences
Research Areas
Data Science and Engineering; Intelligent Systems and Optimization
Publication
Proceedings of the 40th IEEE International Conference on Data Engineering (ICDE 2024) : Utrecht, Netherlands, May 13-17
First Page
5743
Last Page
5744
Identifier
10.1109/ICDE60146.2024.00503
Publisher
IEEE
City or Country
Utrecht, Netherlands
Citation
ZHANG, Ce and LAUW, Hady Wirawan.
Topic modeling on document networks with Dirichlet Optimal Transport Barycenter (Extended Abstract). (2024). Proceedings of the 40th IEEE International Conference on Data Engineering (ICDE 2024) : Utrecht, Netherlands, May 13-17. 5743-5744.
Available at: https://ink.library.smu.edu.sg/sis_research/9840
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/ICDE60146.2024.00503
Comments
PDF provided by faculty