Conference Proceeding Article
Visualization of high-dimensional data such as text documents is widely applicable. The traditional means is to find an appropriate embedding of the high-dimensional representation in a low-dimensional visualizable space. As topic modeling is a useful form of dimensionality reduction that preserves the semantics in documents, recent approaches aim for a visualization that is consistent with both the original word space, as well as the semantic topic space. In this paper, we address the semantic visualization problem. Given a corpus of documents, the objective is to simultaneously learn the topic distributions as well as the visualization coordinates of documents. We propose to develop a semantic visualization model that approximates L2-normalized data directly. The key is to associate each document with three representations: a coordinate in the visualization space, a multinomial distribution in the topic space, and a directional vector in a high-dimensional unit hypersphere in the word space. We join these representations in a unified generative model, and describe its parameter estimation through variational inference. Comprehensive experiments on real-life text datasets show that the proposed method outperforms the existing baselines on objective evaluation metrics for visualization quality and topic interpretability.
Databases and Information Systems | Numerical Analysis and Scientific Computing
Data Management and Analytics
KDD '14: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 24-27, 2014, New York
City or Country
LE, Tuan M. V. and LAUW, Hady Wirawan.
Semantic Visualization for Spherical Representation. (2014). KDD '14: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 24-27, 2014, New York. 1007-1016. Research Collection School Of Information Systems.
Available at: http://ink.library.smu.edu.sg/sis_research/2250