Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
7-2023
Abstract
Automated coherence metrics constitute an important and popular way to evaluate topic models. Previous works present a mixed picture of their presumed correlation with human judgement. In this paper, we conduct a large-scale correlation analysis of coherence metrics. We propose a novel sampling approach to mine topics for the purpose of metric evaluation, and conduct the analysis via three large corpora showing that certain automated coherence metrics are correlated. Moreover, we extend the analysis to measure topical differences between corpora. Lastly, we examine the reliability of human judgement by conducting an extensive user study, which is designed as an amalgamation of different proxy tasks to derive a finer insight into the human decision-making processes. Our findings reveal some correlation between automated coherence metrics and human judgement, especially for generic corpora.
Keywords
Automated metric, Coherence metric, Correlation analysis, Human decision-making, Human judgments, Large corpora, Large-scale correlations, Metric evaluation, Topic Modeling, User study
Discipline
Databases and Information Systems
Research Areas
Data Science and Engineering
Publication
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, Canada, 2023 Jul 9-14
First Page
13874
Last Page
13898
ISBN
9781959429722
Identifier
10.18653/v1/2023.acl-long.776
Publisher
Association for Computational Linguistics
City or Country
Pennsylvania
Citation
LIM, Jia Peng and LAUW, Hady Wirawan.
Large-scale correlation analysis of automated metrics for topic models. (2023). Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Toronto, Canada, 2023 Jul 9-14. 13874-13898.
Available at: https://ink.library.smu.edu.sg/sis_research/8346
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.18653/v1/2023.acl-long.776