Publication Type

Journal Article

Version

publishedVersion

Publication Date

9-2019

Abstract

Co-clustering addresses the problem of simultaneous clustering of both dimensions of a data matrix. When dealing with high dimensional sparse data, co-clustering turns out to be more beneficial than one-sided clustering even if one is interested in clustering along one dimension only. Aside from being high dimensional and sparse, some datasets, such as document-term matrices, exhibit directional characteristics, and the L2 normalization of such data, so that it lies on the surface of a unit hypersphere, is useful. Popular co-clustering assumptions such as Gaussian or Multinomial are inadequate for this type of data. In this paper, we extend the scope of co-clustering to directional data. We present Diagonal Block Mixture of Von Mises–Fisher distributions (dbmovMFs), a co-clustering model which is well suited for directional data lying on a unit hypersphere. By setting the estimate of the model parameters under the maximum likelihood (ML) and classification ML approaches, we develop a class of EM algorithms for estimating dbmovMFs from data. Extensive experiments, on several real-world datasets, confirm the advantage of our approach and demonstrate the effectiveness of our algorithms.

Keywords

Co-clustering, directional data, document clustering, EM algorithm, von Mises-Fisher distribution

Discipline

Databases and Information Systems

Publication

Advances in Data Analysis and Classification

Volume

13

Issue

3

First Page

591

Last Page

620

ISSN

1862-5347

Identifier

10.1007/s11634-018-0323-4

Publisher

Springer

Additional URL

https://doi.org/10.1007/s11634-018-0323-4

Share

COinS