Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
9-2011
Abstract
With ever-growing popularity of social networks, web and bio-networks, mining large frequent patterns from a single huge network has become increasingly important. Yet the existing pattern mining methods cannot offer the efficiency desirable for large pattern discovery. We propose Spider- Mine, a novel algorithm to efficiently mine top-K largest frequent patterns from a single massive network with any user-specified probability of 1 − ϵ. Deviating from the existing edge-by-edge (i.e., incremental) pattern-growth framework, SpiderMine achieves its efficiency by unleashing the power of small patterns of a bounded diameter, which we call “spiders”. With the spider structure, our approach adopts a probabilistic mining framework to find the top-k largest patterns by (i) identifying an affordable set of promising growth paths toward large patterns, (ii) generating large patterns with much lower combinatorial complexity, and finally (iii) greatly reducing the cost of graph isomorphism tests with a new graph pattern representation by a multi-set of spiders. Extensive experimental studies on both synthetic and real data sets show that our algorithm outperforms existing methods.
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing
Research Areas
Software and Cyber-Physical Systems
Publication
Proceedings of VLDB Endowment: 37th VLDB 2011, August 29 - September 3, Seattle
First Page
807
Last Page
818
Identifier
10.14778/3402707.3402720
Publisher
VLDB Endowment
City or Country
Saratoga, CA
Citation
ZHU, Feida; QU, Qiang; LO, David; YAN, Xifeng; HAN, Jiawei; and YU, Philip S..
Mining Top-K Large Structural Patterns in a Massive Network. (2011). Proceedings of VLDB Endowment: 37th VLDB 2011, August 29 - September 3, Seattle. 807-818.
Available at: https://ink.library.smu.edu.sg/sis_research/1394
Copyright Owner and License
Publisher
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.14778/3402707.3402720
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons