Publication Type

Journal Article

Version

submittedVersion

Publication Date

7-2020

Abstract

Node clustering on heterogeneous information networks (HINs) plays an important role in many real-world applications. While previous research mainly clusters same-type nodes independently via exploiting structural similarity search, they ignore the correlations of different-type nodes. In this paper, we focus on the problem of co-clustering heterogeneous nodes where the goal is to mine the latent relevance of heterogeneous nodes and simultaneously partition them into the corresponding type-aware clusters. This problem is challenging in two aspects. First, the similarity or relevance of nodes is not only associated with multiple meta-path-based structures but also related to numerical and categorical attributes. Second, clusters and similarity/relevance searches usually promote each other. To address this problem, we first design a learnable overall relevance measure that integrates the structural and attributed relevance by employing meta-paths and attribute projection. We then propose a novel approach, called SCCAIN, to co-cluster heterogeneous nodes based on constrained orthogonal non-negative matrix tri-factorization. Furthermore, an end-to-end framework is developed to jointly optimize the relevance measures and co-clustering. Extensive experiments on real-world datasets not only demonstrate that SCCAIN consistently outperforms state-of-the-art methods but also validate the effectiveness of integrating attributed and structural information for co-clustering. Keywords: co-clustering, heterogeneous information network, meta-paths, matrix tri-factorization, semi-supervised learning

Keywords

co-clustering, heterogeneous information network, meta-paths, matrix tri-factorization, semi-supervised learning

Discipline

Databases and Information Systems | OS and Networks

Research Areas

Data Science and Engineering

Publication

Information Processing and Management

Volume

57

Issue

6

First Page

1

Last Page

12

ISSN

0306-4573

Identifier

10.1016/j.ipm.2020.102338

Publisher

Elsevier

Comments

The embargo period should be 2 years -- not sure why under the drop down I can only select one year. Please validate.

Additional URL

https://doi.org/10.1016/j.ipm.2020.102338

Share

COinS