Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
5-2024
Abstract
Text documents are usually connected in a graph structure, resulting in an important class of data named text-attributed graph, e.g., paper citation graph and Web page hyperlink graph. On the one hand, Graph Neural Networks (GNNs) consider text in each document as general vertex attribute and do not specifically deal with text data. On the other hand, Pre-trained Language Models (PLMs) and Topic Models (TMs) learn effective document embeddings. However, most models focus on text content in each single document only, ignoring link adjacency across documents. The above two challenges motivate the development of text-attributed graph representation learning, combining GNNs with PLMs and TMs into a unified model and learning document embeddings preserving both modalities, which fulfill applications, e.g., text classification, citation recommendation, question answering, etc. In this lecture-style tutorial, we will provide a systematic review of text-attributed graph, including its formal definition, recent methods, diverse applications, and challenges. Specifically, i) we will formally define text-attributed graph and briefly review GNNs, PLMs, and TMs, which are the fundamentals of some existing methods. ii) We will then revisit the technical details of text-attributed graph models, which are generally split into two categories, PLMbased and TM-based. iii) Besides, we will show diverse applications built on text-attributed graph. iv) Finally, we will discuss some challenges of existing models and propose solutions for future research.
Keywords
Document representation learning, Graph algorithms, Neural networks, Pre-trained Language Models, Topic models, Text mining
Discipline
Artificial Intelligence and Robotics | Numerical Analysis and Scientific Computing
Research Areas
Data Science and Engineering
Publication
Proceedings of the ACM Web Conference 2024 (WWW 2024) : Singapore, May 13-17
First Page
1298
Last Page
1301
Identifier
10.1145/3589335.3641255
Publisher
Association for Computing Machinery
City or Country
Singapore
Citation
ZHANG, Ce; YANG, Menglin; YING, Rex; and LAUW, Hady Wirawan.
Text-attributed graph representation learning : Methods, applications, and challenges. (2024). Proceedings of the ACM Web Conference 2024 (WWW 2024) : Singapore, May 13-17. 1298-1301.
Available at: https://ink.library.smu.edu.sg/sis_research/9841
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/3589335.3641255
Included in
Artificial Intelligence and Robotics Commons, Numerical Analysis and Scientific Computing Commons