Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
12-2014
Abstract
A document network refers to a data type that can be represented as a graph of vertices, where each vertex is associated with a text document. Examples of such a data type include hyperlinked Web pages, academic publications with citations, and user profiles in social networks. Such data have very high-dimensional representations, in terms of text as well as network connectivity. In this paper, we study the problem of embedding, or finding a low-dimensional representation of a document network that "preserves" the data as much as possible. These embedded representations are useful for various applications driven by dimensionality reduction, such as visualization or feature selection. While previous works in embedding have mostly focused on either the textual aspect or the network aspect, we advocate a holistic approach by finding a unified low-rank representation for both aspects. Moreover, to lend semantic interpretability to the low-rank representation, we further propose to integrate topic modeling and embedding within a joint model. The gist is to join the various representations of a document (words, links, topics, and coordinates) within a generative model, and to estimate the hidden representations through MAP estimation. We validate our model on real-life document networks, showing that it outperforms comparable baselines comprehensively on objective evaluation metrics.
Keywords
dimensionality reduction, document network, embedding, visualization, topic modeling, generative model
Discipline
Computer Sciences | Databases and Information Systems
Publication
2014 IEEE International Conference on Data Mining ICDM: Shenzhen, China, 14-17 December: Proceedings
First Page
270
Last Page
279
ISBN
9781479943036
Identifier
10.1109/ICDM.2014.119
Publisher
IEEE
City or Country
Piscataway, NJ
Citation
LE, Tuan M. V. and LAUW, Hady W..
Probabilistic Latent Document Network Embedding. (2014). 2014 IEEE International Conference on Data Mining ICDM: Shenzhen, China, 14-17 December: Proceedings. 270-279.
Available at: https://ink.library.smu.edu.sg/sis_research/2594
Copyright Owner and License
LARC
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/ICDM.2014.119