Research Collection School Of Computing and Information Systems

Beyond factuality: A comprehensive evaluation of large language models as knowledge generators

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

12-2023

Abstract

Large language models (LLMs) outperform information retrieval techniques for downstream knowledge-intensive tasks when being prompted to generate world knowledge. Yet, community concerns abound regarding the factuality and potential implications of using this uncensored knowledge. In light of this, we introduce CONNER, a COmpreheNsive kNowledge Evaluation fRamework, designed to systematically and automatically evaluate generated knowledge from six important perspectives - Factuality, Relevance, Coherence, Informativeness, Helpfulness and Validity. We conduct an extensive empirical analysis of the generated knowledge from three different types of LLMs on two widely-studied knowledge-intensive tasks, i.e., open-domain question answering and knowledge-grounded dialogue. Surprisingly, our study reveals that the factuality of generated knowledge, even if lower, does not significantly hinder downstream tasks. Instead, the relevance and coherence of the outputs are more important than small factual mistakes. Further, we show how to use CONNER to improve knowledge-intensive tasks by designing two strategies: Prompt Engineering and Knowledge Selection. Our evaluation code and LLM-generated knowledge with human annotations will be released to facilitate future research.

Keywords

Comprehensive evaluation, Down-stream, Empirical analysis, Evaluation framework, Informativeness, Knowledge evaluations, Knowledge intensive tasks, Language model, Retrieval techniques, World knowledge

Discipline

Databases and Information Systems | Information Security

Research Areas

Data Science and Engineering; Information Systems and Management

Areas of Excellence

Digital transformation

Publication

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, December 6-10

First Page

6325

Last Page

6341

ISBN

9798891760608

Identifier

10.18653/v1/2023.emnlp-main.390

Publisher

Association for Computational Linguistics

City or Country

Texas

Citation

CHEN, Liang; DENG, Yang; BIAN, Yatao; QIN, Zeyu; WU, Bingzhe; CHUA, Tat-Seng; and WONG, Kam-Fai. Beyond factuality: A comprehensive evaluation of large language models as knowledge generators. (2023). Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Singapore, December 6-10. 6325-6341.
Available at: https://ink.library.smu.edu.sg/sis_research/9117

Copyright Owner and License

Authors

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.18653/v1/2023.emnlp-main.390

Download

Included in

Databases and Information Systems Commons, Information Security Commons

COinS

Research Collection School Of Computing and Information Systems

Beyond factuality: A comprehensive evaluation of large language models as knowledge generators

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Beyond factuality: A comprehensive evaluation of large language models as knowledge generators

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Areas of Excellence

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Copyright Owner and License

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links