Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

8-2022

Abstract

While Variational Graph Auto-Encoder (VGAE) has presented promising ability to learn representations for documents, most existing VGAE methods do not model a latent topic structure and therefore lack semantic interpretability. Exploring hidden topics within documents and discovering key words associated with each topic allow us to develop a semantic interpretation of the corpus. Moreover, documents are usually associated with authors. For example, news reports have journalists specializing in writing certain type of events, academic papers have authors with expertise in certain research topics, etc. Modeling authorship information could benefit topic modeling, since documents by the same authors tend to reveal similar semantics. This observation also holds for documents published on the same venues. However, most topic models ignore the auxiliary authorship and publication venues. Given above two challenges, we propose a Variational Graph Author Topic Model for documents to integrate both semantic interpretability and authorship and venue modeling into a unified VGAE framework. For authorship and venue modeling, we construct a hierarchical multi-layered document graph with both intra- and cross-layer topic propagation. For semantic interpretability, three word relations (contextual, syntactic, semantic) are modeled and constitute three word sub-layers in the document graph. We further propose three alternatives for variational divergence. Experiments verify the effectiveness of our model on supervised and unsupervised tasks.

Keywords

author topic modeling, graph neural networks, text mining, variational graph auto-encoder

Discipline

Databases and Information Systems | Theory and Algorithms

Research Areas

Software and Cyber-Physical Systems

Publication

KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, August 14-18

First Page

2429

Last Page

2438

ISBN

9781450393850

Identifier

10.1145/3534678.3539310

Publisher

ACM

City or Country

New York

Additional URL

https://doi.org/10.1145/3534678.3539310

Share

COinS