Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

8-2014

Abstract

Visualization of high-dimensional data such as text documents is widely applicable. The traditional means is to find an appropriate embedding of the high-dimensional representation in a low-dimensional visualizable space. As topic modeling is a useful form of dimensionality reduction that preserves the semantics in documents, recent approaches aim for a visualization that is consistent with both the original word space, as well as the semantic topic space. In this paper, we address the semantic visualization problem. Given a corpus of documents, the objective is to simultaneously learn the topic distributions as well as the visualization coordinates of documents. We propose to develop a semantic visualization model that approximates L2-normalized data directly. The key is to associate each document with three representations: a coordinate in the visualization space, a multinomial distribution in the topic space, and a directional vector in a high-dimensional unit hypersphere in the word space. We join these representations in a unified generative model, and describe its parameter estimation through variational inference. Comprehensive experiments on real-life text datasets show that the proposed method outperforms the existing baselines on objective evaluation metrics for visualization quality and topic interpretability.

Keywords

dimensionality reduction, semantic visualization, spherical semantic embedding, spherical space, generative model, L2-normalized vector, topic model

Discipline

Databases and Information Systems | Numerical Analysis and Scientific Computing

Publication

KDD '14: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, August 24-27

First Page

1007

Last Page

1016

ISBN

9781450329569

Identifier

10.1145/2623330.2623620

Publisher

ACM

City or Country

New York

Additional URL

https://doi.org/10.1145/2623330.2623620

Share

COinS