Publication Type

Conference Proceeding Article

Publication Date

7-2015

Abstract

Topic modeling has been widely used in text mining. Previous topic models such as Latent Dirichlet Allocation (LDA) are successful in learning hidden topics but they do not take into account metadata of documents. To tackle this problem, many augmented topic models have been proposed to jointly model text and metadata. But most existing models handle only categorical and numerical types of metadata. We identify another type of metadata that can be more natural to obtain in some scenarios. These are relative similarities among documents. In this paper, we propose a general model that links LDA with constraints derived from document relative similarities. Specifically, in our model, the constraints act as a regularizer of the log likelihood of LDA. We fit the proposed model using Gibbs-EM. Experiments with two real world datasets show that our model is able to learn meaningful topics. The results also show that our model outperforms the baselines in terms of topic coherence and a document classification task.

Discipline

Computer Sciences | Databases and Information Systems

Research Areas

Data Management and Analytics

Publication

Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015), July 25-31, Buenos Aires, Argentina

First Page

3469

Last Page

3475

ISBN

9781577357384

Publisher

IJCAI

City or Country

Menlo Park, CA

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Additional URL

http://ijcai.org/papers15/Papers/IJCAI15-488.pdf

Share

COinS