Publication Type

Journal Article

Version

acceptedVersion

Publication Date

8-2024

Abstract

Text documents are often interconnected in a network structure, e.g., academic papers via citations, Web pages via hyperlinks. On the one hand, though Graph Neural Networks (GNNs) have shown promising ability to derive effective embeddings for such networked documents, they do not assume a latent topic structure and result in uninterpretable embeddings. On the other hand, topic models can infer semantically interpretable topic distributions for documents by associating each topic with a group of understandable key words. However, most topic models mainly focus on plain text within documents and fail to leverage network structure across documents. Network connectivity reveals topic similarity between linked documents, and modeling it could uncover meaningful semantics. Motivated by above two challenges, in this paper, we propose a GNN-based neural topic model that both captures network connectivity and derives semantically interpretable topic distributions for networked documents. For network modeling, we build the model based on the theory of Optimal Transport Barycenter, which captures network structure by allowing the topic distribution of a document to generate the content of its linked neighbors. For semantic interpretability, we extend optimal transport by incorporating semantically related words in the embedding space. Since Dirichlet prior in Latent Dirichlet Allocation successfully improves topic quality, we also analyze Dirichlet as an optimal transport prior distribution to improve topic interpretability. We design rejection sampling to simulate Dirichlet distribution. Extensive experiments on document classification, clustering, link prediction, and topic analysis verify the effectiveness of our model.

Keywords

Graph Neural Networks, Text Mining, Optimal Transport, Dirichlet Distribution, Document Networks

Discipline

Artificial Intelligence and Robotics | Computer Sciences

Research Areas

Data Science and Engineering; Intelligent Systems and Optimization

Publication

IEEE Transactions on Knowledge and Data Engineering

Volume

Issue

First Page

1328

Last Page

1340

ISSN

1041-4347

Identifier

10.1109/TKDE.2023.3303465

Publisher

Institute of Electrical and Electronics Engineers

Citation

ZHANG, Ce and LAUW, Hady Wirawan. Topic modeling on document networks with Dirichlet Optimal Transport Barycenter. (2024). IEEE Transactions on Knowledge and Data Engineering. 36, (3), 1328-1340.
Available at: https://ink.library.smu.edu.sg/sis_research/9839

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Comments

PDF provided by faculty

Additional URL

https://doi.org/10.1109/TKDE.2023.3303465

Download

Included in

Artificial Intelligence and Robotics Commons

COinS

Research Collection School Of Computing and Information Systems

Topic modeling on document networks with Dirichlet Optimal Transport Barycenter

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Comments

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Topic modeling on document networks with Dirichlet Optimal Transport Barycenter

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Comments

Additional URL

Included in

Share

Search

Links

Browse

Links