Publication Type

PhD Dissertation

Version

publishedVersion

Publication Date

6-2024

Abstract

Community discovery, as a fundamental problem in graph mining, finds applications in various domains such as biological analysis, system optimization and fraud detection. Although many efforts have been made to address community discovery based on graph topology, few works have been devoted to community discovery over attributed graphs, where graphs are equipped with attribute information such as node and edge types. Thus, this thesis is devoted to designing innovative solutions that can utilize the attribute information together with graph topology for community discovery. In particular, we study novel problems with efficient algorithms for both homogeneous and heterogeneous attributed graphs and discover communities that satisfy requirements from various downstream applications.

First, we examine the community search problem over homogeneous attributed graphs where each node is attached with node labels describing its property. Several methods based on topology-driven community models have been proposed for community search over homogeneous attributed graphs. However, these methods require memory-intensive indexes to search for communities that comply with certain community models efficiently. To resolve this issue, we propose the index-free approach based on sweep over personalized pagerank vector computed on a weighted graph generated by reweighting each edge according to the number of query-aware motifs containing the edge. Results show that our proposal achieves upto 90% relative improvement on F1-scores while consuming 10x fewer memory than topology-driven baselines.

Second, we investigate the problem of characteristic community discovery over homogeneous attributed graphs. We formulate the problem as finding the largest community, in which the query node is top-k influential, among a set of hierarchical communities generated by fusing the query attributes and graph topology. We propose the local reclustering technique to obtain attribute-related hierarchical communities efficiently. We propose the compressed influence evaluation technique to evaluate the influence. Subsequently, we design the HIMOR-index to accelerate the characteristic community queries by reusing influence values in communities that are not changed by query attributes. In experiments, our proposed method discovers larger and better characteristic communities than topology-driven baselines while achieving upto 25x speedups.

Finally, we study community discovery over heterogeneous graphs and formulate the densest subgraph discovery problem over relational graphs induced by meta-paths. Existing works on community discovery assume that the data graph is materialized and available for access with low latency. However, the relational graphs need to be meticulously extracted from the heterogeneous graphs through meta-paths. To avoid this problem, we propose the sketch-based peeling algorithm, which utilizes sketches to estimate the subgraph densities to avoid materialization of the relational graphs for the densest subgraph discovery. Subsequently, we further devise a novel system through which users can implement the peeling algorithm for the densest subgraph discovery based on different density objectives. A case study on real-world heterogeneous data from Grab demonstrates that our sketch-based methods achieve 97.55% precision for fraud detection.

Keywords

community discovery, attributed graph, local clustering, characteristic community, relational graph

Degree Awarded

PhD in Computer Science

Discipline

Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces

Supervisor(s)

LI, Yuchen

First Page

1

Last Page

162

Publisher

Singapore Management University

City or Country

Singapore

Copyright Owner and License

Author

Available for download on Wednesday, September 03, 2025

Share

COinS