Research Collection School Of Computing and Information Systems

SeaExam and SeaBench: Benchmarking LLMs with local multilingual questions in Southeast Asia

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

5-2025

Abstract

This study introduces two novel benchmarks, SeaExam and SeaBench, designed to evalu ate the capabilities of Large Language Models (LLMs) in Southeast Asian (SEA) application scenarios. Unlike existing multilingual datasets primarily derived from English translations, these benchmarks are constructed based on real world scenarios from SEA regions. SeaExam draws from regional educational exams to form a comprehensive dataset that encompasses sub jects such as local history and literature. In contrast, SeaBench is crafted around multi turn, open-ended tasks that reflect daily inter actions within SEA communities. Our evalua tions demonstrate that SeaExam and SeaBench more effectively discern LLM performance on SEA language tasks compared to their trans lated benchmarks. This highlights the impor tance of using real-world queries to assess the multilingual capabilities of LLMs.

Discipline

Artificial Intelligence and Robotics | Databases and Information Systems

Publication

Findings of the Association for Computational Linguistics: NAACL 2025: Albuquerque, April 29 - May 4

First Page

6134

Last Page

6151

ISBN

9798891761957

Identifier

10.18653/v1/2025.findings-naacl.341

Publisher

Association for Computational Linguistics (ACL)

City or Country

Albuquerque

Citation

LIU, Chaoqun; ZHANG, Wenxuan; YING, Jiahao; Aljunied, Mahani; LUU, Anh Tuan; and BING, Lidong. SeaExam and SeaBench: Benchmarking LLMs with local multilingual questions in Southeast Asia. (2025). Findings of the Association for Computational Linguistics: NAACL 2025: Albuquerque, April 29 - May 4. 6134-6151.
Available at: https://ink.library.smu.edu.sg/sis_research/11104

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.18653/v1/2025.findings-naacl.341

Download

Included in

Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons

COinS

Research Collection School Of Computing and Information Systems

SeaExam and SeaBench: Benchmarking LLMs with local multilingual questions in Southeast Asia

Publication Type

Version

Publication Date

Abstract

Discipline

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

SeaExam and SeaBench: Benchmarking LLMs with local multilingual questions in Southeast Asia

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links