Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
8-2021
Abstract
Graph neural networks (GNNs) emerge as the state-of-the-art representation learning methods on graphs and often rely on a large amount of labeled data to achieve satisfactory performance. Recently, in order to relieve the label scarcity issues, some works propose to pre-train GNNs in a self-supervised manner by distilling transferable knowledge from the unlabeled graph structures. Unfortunately, these pre-training frameworks mainly target at homogeneous graphs, while real interaction systems usually constitute large-scale heterogeneous graphs, containing different types of nodes and edges, which leads to new challenges on structure heterogeneity and scalability for graph pre-training. In this paper, we first study the problem of pre-training on large-scale heterogeneous graph and propose a novel pre-training GNN framework, named PT-HGNN. The proposed PT-HGNN designs both the node- and schema-level pre-training tasks to contrastively preserve heterogeneous semantic and structural properties as a form of transferable knowledge for various downstream tasks. In addition, a relationbased personalized PageRank is proposed to sparsify large-scale heterogeneous graph for efficient pre-training. Extensive experiments on one of the largest public heterogeneous graphs (OAG) demonstrate that our PT-HGNN significantly outperforms various state-of-the-art baselines.
Keywords
Heterogeneous graph, Self-supervised learning, Pre-training
Discipline
Databases and Information Systems
Research Areas
Data Science and Engineering
Publication
Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 21), Virtual Online, August 14-18
First Page
756
Last Page
766
Identifier
10.1145/3447548.3467396
Publisher
ACM
City or Country
New York
Citation
JIANG, Xunqiang; JIA, Tianrui; FANG, Yuan; SHI, Chuan; LIN, Zhe; and WANG, Hui.
Pre-training on large-scale heterogeneous graph. (2021). Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 21), Virtual Online, August 14-18. 756-766.
Available at: https://ink.library.smu.edu.sg/sis_research/6888
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.