Publication Type

Journal Article

Version

publishedVersion

Publication Date

9-2024

Abstract

Automated coherence metrics constitute an efficient and popular way to evaluate topic models. Previous work presents a mixed picture of their presumed correlation with human judgment. This work proposes a novel sampling approach to mining topic representations at a large scale while seeking to mitigate bias from sampling, enabling the investigation of widely used automated coherence metrics via large corpora. Additionally, this article proposes a novel user study design, an amalgamation of different proxy tasks, to derive a finer insight into the human decision-making processes. This design subsumes the purpose of simple rating and outlier-detection user studies. Similar to the sampling approach, the user study conducted is extensive, comprising 40 study participants split into eight different study groups tasked with evaluating their respective set of 100 topic representations. Usually, when substantiating the use of these metrics, human responses are treated as the gold standard. This article further investigates the reliability of human judgment by flipping the comparison and conducting a novel extended analysis of human response at the group and individual level against a generic corpus. The investigation results show a moderate to good correlation between these metrics and human judgment, especially for generic corpora, and derive further insights into the human perception of coherence. Analyzing inter-metric correlations across corpora shows moderate to good correlation among these metrics. As these metrics depend on corpus statistics, this article further investigates the topical differences between corpora, revealing nuances in applications of these metrics.

Keywords

Vocabulary, decision-making processes, topic models

Discipline

Computational Engineering | Databases and Information Systems | Linguistics

Research Areas

Data Science and Engineering

Areas of Excellence

Digital transformation

Publication

Computational Linguistics

Volume

Issue

First Page

893

Last Page

952

ISSN

0891-2017

Identifier

10.1162/coli_a_00518

Publisher

Massachusetts Institute of Technology Press

Citation

LIM, Jia Peng and LAUW, Hady Wirawan. Aligning human and computational coherence evaluations. (2024). Computational Linguistics. 50, (3), 893-952.
Available at: https://ink.library.smu.edu.sg/sis_research/9427

Copyright Owner and License

Publisher-CC-NC-ND

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1162/coli_a_00518

Download

Included in

Computational Engineering Commons, Databases and Information Systems Commons, Linguistics Commons

COinS

Research Collection School Of Computing and Information Systems

Aligning human and computational coherence evaluations

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Aligning human and computational coherence evaluations

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links