Publication Type
Journal Article
Version
publishedVersion
Publication Date
2-2021
Abstract
In real-world problems, heterogeneous entities are often related to each other through multiple interactions, forming a Heterogeneous Interaction Graph (HIG in short). While modeling HIGs to deal with fundamental tasks, graph neural networks present an attractive opportunity that can make full use of the heterogeneity and rich semantic information by aggregating and propagating information from different types of neighborhoods. However, learning on such complex graphs, often with millions or billions of nodes, edges, and various attributes, could suffer from expensive time cost and high memory consumption. In this paper, we attempt to accelerate representation learning on large-scale HIGs by adopting the importance sampling of heterogeneous neighborhoods in a batch-wise manner, which naturally fits with most batch-based optimizations. Distinct from traditional homogeneous strategies neglecting semantic types of nodes and edges, to handle the rich heterogeneous semantics within HIGs, we devise both type-dependent and type-fusion samplers where the former respectively samples neighborhoods of each type and the latter jointly samples from candidates of all types. Furthermore, to overcome the imbalance between the down-sampled and the original information, we respectively propose heterogeneous estimators including the self-normalized and the adaptive estimators to improve the robustness of our sampling strategies.
Finally, we evaluate the performance of our models for node classification and link prediction on five real-world datasets, respectively. The empirical results demonstrate that our approach performs significantly better than other state-of-the-art alternatives, and is able to reduce the number of edges in computation by up to 93%, the memory cost by up to 92% and the time cost by up to 86%.
Keywords
Heterogeneous interaction graphs, Large-scale graphs, Type-dependent sampler, Type-fusion sampler, Importance sampling
Discipline
Artificial Intelligence and Robotics | Databases and Information Systems
Research Areas
Data Science and Engineering
Publication
ACM Transactions on Knowledge Discovery from Data
Volume
15
Issue
1
First Page
1
Last Page
22
ISSN
1556-4681
Identifier
10.1145/3418684
Publisher
Association for Computing Machinery (ACM)
Embargo Period
3-28-2021
Citation
JI, Yugang; YIN, Mingyang; YANG, Hongxia; ZHOU, Jingren; ZHENG, Vincent W.; SHI, Chuan; and FANG, Yuan.
Accelerating large-scale heterogeneous interaction graph embedding learning via importance sampling. (2021). ACM Transactions on Knowledge Discovery from Data. 15, (1), 1-22.
Available at: https://ink.library.smu.edu.sg/sis_research/5879
Copyright Owner and License
Publisher
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/3418684