Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

6-2021

Abstract

Fraud detection is a pressing challenge for most financial and commercial platforms. In this paper, we study the processing pipeline of fraud detection in a large e-commerce platform of TaoBao. Graph label propagation (LP) is a core component in this pipeline to detect suspicious clusters from the user-interaction graph. Furthermore, the run-time of the LP component occupies 75% overhead of TaoBao’s automated detection pipeline. To enable real-time fraud detection, we propose a GPU-based framework, called GLP, to support large-scale LP workloads in enterprises. We have identified two key challenges when integrating GPU acceleration into TaoBao’s data processing pipeline: (1) programmability for evolving fraud detection logics; (2) demand for real-time performance. Motivated by these challenges, we offer a set of expressive APIs that data engineers can customize and deploy efficient LP algorithms on GPUs with ease. We propose novel GPU-centric optimizations by leveraging the community as well as power-law properties of large graphs. Extensive experiments have confirmed the effectiveness of our proposed optimizations. With a single GPU, GLP supports a real billion-scale graph workload from the fraud detection pipeline of TaoBao and achieves 8.2x speedup to the current in-house distributed solution running on high-end multicore machines.

Discipline

Databases and Information Systems

Research Areas

Data Science and Engineering

Publication

Proceedings of the 2021 International Conference on Management of Data (SIGMOD ’21), 2021 June 18-27

First Page

2348

Last Page

2356

ISBN

9781450383431

Identifier

10.1145/3448016.3452774

Publisher

ACM

City or Country

Virtual Conference

Share

COinS