Publication Type
Journal Article
Version
publishedVersion
Publication Date
3-2024
Abstract
Subgraph counting is a fundamental component for many downstream applications such as graph representation learning and query optimization. Since obtaining the exact count is often intractable, there have been a plethora of approximation methods on graph sampling techniques. Nonetheless, the state-of-the-art sampling methods still require massive samples to produce accurate approximations on large data graphs. We propose gSWORD, a GPU framework that leverages the massive parallelism of GPUs to accelerate iterative sampling algorithms for subgraph counting. Despite the embarrassingly parallel nature of the samples, there are unique challenges in accelerating subgraph counting due to its irregular computation logic. To address these challenges, we introduce two GPU-centric optimizations: (1) sample inheritance, enabling threads to inherit samples from neighboring threads to avoid idling, and (2) warp streaming, effectively distributing workloads among threads through a streaming process. Moreover, we propose a CPU-GPU co-processing pipeline that overlaps the sampling and enumeration processes to mitigate the underestimation issue. Experimental results demonstrate that deploying state-of-the-art sampling algorithms on gSWORD can perform millions of samples per second. The co-processing pipeline substantially improves the estimation accuracy in the cases where existing methods encounter severe underestimations with negligible overhead.
Keywords
Graph sampling, Subgraph counting, GPU computing
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing
Research Areas
Data Science and Engineering
Publication
Proceedings of the ACM on Management of Data
Volume
2
Issue
12
First Page
15
Last Page
26
ISSN
2836-6573
Identifier
10.1145/3639288
Publisher
ACM
Citation
YE, Chang; LI, Yuchen; SUN, Shixuan; and GUO, Wentian.
gSWORD: GPU-accelerated sampling for subgraph counting. (2024). Proceedings of the ACM on Management of Data. 2, (12), 15-26.
Available at: https://ink.library.smu.edu.sg/sis_research/9320
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Additional URL
https://doi.org/10.1145/3639288
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons
Comments
SIGMOD International Conference on Management of Data 2024