Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

5-2024

Abstract

Text documents are usually connected in a graph structure, resulting in an important class of data named text-attributed graph, e.g., paper citation graph and Web page hyperlink graph. On the one hand, Graph Neural Networks (GNNs) consider text in each document as general vertex attribute and do not specifically deal with text data. On the other hand, Pre-trained Language Models (PLMs) and Topic Models (TMs) learn effective document embeddings. However, most models focus on text content in each single document only, ignoring link adjacency across documents. The above two challenges motivate the development of text-attributed graph representation learning, combining GNNs with PLMs and TMs into a unified model and learning document embeddings preserving both modalities, which fulfill applications, e.g., text classification, citation recommendation, question answering, etc. In this lecture-style tutorial, we will provide a systematic review of text-attributed graph, including its formal definition, recent methods, diverse applications, and challenges. Specifically, i) we will formally define text-attributed graph and briefly review GNNs, PLMs, and TMs, which are the fundamentals of some existing methods. ii) We will then revisit the technical details of text-attributed graph models, which are generally split into two categories, PLMbased and TM-based. iii) Besides, we will show diverse applications built on text-attributed graph. iv) Finally, we will discuss some challenges of existing models and propose solutions for future research.

Keywords

Document representation learning, Graph algorithms, Neural networks, Pre-trained Language Models, Topic models, Text mining

Discipline

Artificial Intelligence and Robotics | Numerical Analysis and Scientific Computing

Research Areas

Data Science and Engineering

Publication

Proceedings of the ACM Web Conference 2024 (WWW 2024) : Singapore, May 13-17

First Page

1298

Last Page

1301

Identifier

10.1145/3589335.3641255

Publisher

Association for Computing Machinery

City or Country

Singapore

Additional URL

https://doi.org/10.1145/3589335.3641255

Share

COinS