Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
12-2022
Abstract
As English is a widely used language in many countries of different cultures, variants of English also known as English creoles have also been created. Singlish is one such English creole used by people in Singapore. Nevertheless, unlike English, Singlish is not taught in schools nor encouraged to be used in formal communications. Hence, it remains to be a low resource language with a lack of up-to-date Singlish word dictionary and computational tools to analyse the language. In this paper, we therefore propose Singlish Checker, a tool that is able to help detecting Singlish text, Singlish words and phrases. To develop this tool, we first construct a large set of Singlish words and phrases by identifying different sources of Singlish words and their definitions and integrating them. We later propose a Singlish classifier model based on a BERT model fine-tuned with a large number of classified Singlish sentences. Our experiment show that the BERT-based classifier can achieved very high F1 performance, outperforming the baseline.
Keywords
Singlish, Singlish classification, Singlish dictionary
Discipline
Asian Studies | Databases and Information Systems | South and Southeast Asian Languages and Societies
Research Areas
Data Science and Engineering
Publication
ICADL '22: Proceedings of the International Conference on Asia-Pacific Digital Libraries, Hanoi, November 30 - December 2
Volume
13636
First Page
115
Last Page
124
ISBN
9783031217555
Identifier
10.1007/978-3-031-21756-2_9
Publisher
Springer
City or Country
Cham
Citation
HSIEH, Lee-Hsun; CHUA, Nam Chew; KWEE, Agus Trisnajaya; LO, Pei-Chi; LEE, Yang-Yin; and LIM, Ee-peng.
Singlish Checker: A tool for understanding and analysing an English Creole language. (2022). ICADL '22: Proceedings of the International Conference on Asia-Pacific Digital Libraries, Hanoi, November 30 - December 2. 13636, 115-124.
Available at: https://ink.library.smu.edu.sg/sis_research/7775
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1007/978-3-031-21756-2_9
Included in
Asian Studies Commons, Databases and Information Systems Commons, South and Southeast Asian Languages and Societies Commons