Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

11-2021

Abstract

As a well-established probabilistic method, topic models seek to uncover latent semantics from plain text. In addition to having textual content, we observe that documents are usually compared in listwise rankings based on their content. For instance, world-wide countries are compared in an international ranking in terms of electricity production based on their national reports. Such document comparisons constitute additional information that reveal documents' relative similarities. Incorporating them into topic modeling could yield comparative topics that help to differentiate and rank documents. Furthermore, based on different comparison criteria, the observed document comparisons usually cover multiple aspects, each expressing a distinct ranked list. For example, a country may be ranked higher in terms of electricity production, but fall behind others in terms of life expectancy or government budget. Each comparison criterion, or aspect, observes a distinct ranking. Considering such multiple aspects of comparisons based on different ranking criteria allows us to derive one set of topics that inform heterogeneous document similarities. We propose a generative topic model aimed at learning topics that are well aligned to multi-aspect listwise comparisons. Experiments on public datasets demonstrate the advantage of the proposed method in jointly modeling topics and ranked lists against baselines comprehensively.

Keywords

Generative Topic Model, Text Mining, Comparative Documents

Discipline

Databases and Information Systems | Data Science

Research Areas

Data Science and Engineering

Publication

CIKM '21: Proceedings of the ACM International Conference on Information and Knowledge Management, November 1-5, Virtual

First Page

2507

Last Page

2516

ISBN

9781450384469

Identifier

10.1145/3459637.3482398

Publisher

ACM

City or Country

New York

Embargo Period

12-13-2021

Copyright Owner and License

Authors

Additional URL

https://doi.org/10.1145/3459637.3482398

Share

COinS