Publication Type

PhD Dissertation

Version

publishedVersion

Publication Date

6-2024

Abstract

Community discovery, as a fundamental problem in graph mining, finds applications in various domains such as biological analysis, system optimization and fraud detection. Although many efforts have been made to address community discovery based on graph topology, few works have been devoted to community discovery over attributed graphs, where graphs are equipped with attribute information such as node and edge types. Thus, this thesis is devoted to designing innovative solutions that can utilize the attribute information together with graph topology for community discovery. In particular, we study novel problems with efficient algorithms for both homogeneous and heterogeneous attributed graphs and discover communities that satisfy requirements from various downstream applications.

First, we examine the community search problem over homogeneous attributed graphs where each node is attached with node labels describing its property. Several methods based on topology-driven community models have been proposed for community search over homogeneous attributed graphs. However, these methods require memory-intensive indexes to search for communities that comply with certain community models efficiently. To resolve this issue, we propose the index-free approach based on sweep over personalized pagerank vector computed on a weighted graph generated by reweighting each edge according to the number of query-aware motifs containing the edge. Results show that our proposal achieves upto 90% relative improvement on F1-scores while consuming 10x fewer memory than topology-driven baselines.

Second, we investigate the problem of characteristic community discovery over homogeneous attributed graphs. We formulate the problem as finding the largest community, in which the query node is top-k influential, among a set of hierarchical communities generated by fusing the query attributes and graph topology. We propose the local reclustering technique to obtain attribute-related hierarchical communities efficiently. We propose the compressed influence evaluation technique to evaluate the influence. Subsequently, we design the HIMOR-index to accelerate the characteristic community queries by reusing influence values in communities that are not changed by query attributes. In experiments, our proposed method discovers larger and better characteristic communities than topology-driven baselines while achieving upto 25x speedups.

Finally, we study community discovery over heterogeneous graphs and formulate the densest subgraph discovery problem over relational graphs induced by meta-paths. Existing works on community discovery assume that the data graph is materialized and available for access with low latency. However, the relational graphs need to be meticulously extracted from the heterogeneous graphs through meta-paths. To avoid this problem, we propose the sketch-based peeling algorithm, which utilizes sketches to estimate the subgraph densities to avoid materialization of the relational graphs for the densest subgraph discovery. Subsequently, we further devise a novel system through which users can implement the peeling algorithm for the densest subgraph discovery based on different density objectives. A case study on real-world heterogeneous data from Grab demonstrates that our sketch-based methods achieve 97.55% precision for fraud detection.

Keywords

community discovery, attributed graph, local clustering, characteristic community, relational graph

Degree Awarded

PhD in Computer Science

Discipline

Artificial Intelligence and Robotics | Graphics and Human Computer Interfaces

Supervisor(s)

LI, Yuchen

First Page

Last Page

162

Publisher

Singapore Management University

City or Country

Singapore

Citation

NIU, Yudong. Community discovery over attributed graphs. (2024). 1-162.
Available at: https://ink.library.smu.edu.sg/etd_coll/616

Copyright Owner and License

Author

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

Artificial Intelligence and Robotics Commons, Graphics and Human Computer Interfaces Commons

COinS

Dissertations and Theses Collection (Open Access)

Community discovery over attributed graphs

Publication Type

Version

Publication Date

Abstract

Keywords

Degree Awarded

Discipline

Supervisor(s)

First Page

Last Page

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Included in

Search

Links

Browse

Links

Dissertations and Theses Collection (Open Access)

Community discovery over attributed graphs

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Degree Awarded

Discipline

Supervisor(s)

First Page

Last Page

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Included in

Share

Search

Links

Browse

Links