Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

12-2022

Abstract

As English is a widely used language in many countries of different cultures, variants of English also known as English creoles have also been created. Singlish is one such English creole used by people in Singapore. Nevertheless, unlike English, Singlish is not taught in schools nor encouraged to be used in formal communications. Hence, it remains to be a low resource language with a lack of up-to-date Singlish word dictionary and computational tools to analyse the language. In this paper, we therefore propose Singlish Checker, a tool that is able to help detecting Singlish text, Singlish words and phrases. To develop this tool, we first construct a large set of Singlish words and phrases by identifying different sources of Singlish words and their definitions and integrating them. We later propose a Singlish classifier model based on a BERT model fine-tuned with a large number of classified Singlish sentences. Our experiment show that the BERT-based classifier can achieved very high F1 performance, outperforming the baseline.

Keywords

Singlish, Singlish classification, Singlish dictionary

Discipline

Asian Studies | Databases and Information Systems | South and Southeast Asian Languages and Societies

Research Areas

Data Science and Engineering

Publication

ICADL '22: Proceedings of the International Conference on Asia-Pacific Digital Libraries, Hanoi, November 30 - December 2

Volume

13636

First Page

115

Last Page

124

ISBN

9783031217555

Identifier

10.1007/978-3-031-21756-2_9

Publisher

Springer

City or Country

Cham

Additional URL

https://doi.org/10.1007/978-3-031-21756-2_9

Share

COinS