This paper describes the process of creating a grapheme-to-phoneme (G2P) converter for Standard Malay (SM). A fundamental step to building TTS and ASR engines, is to build a good G2P system that can automatically generate accurate phonemic representations for words. Our goal is to generate phonemes that reflect real speech, thereby facilitating more accurate phoneme alignment with actual waveforms (obtained from voice-data collection), keeping human intervention to the minimum. Here we discuss the key areas in SM that require considerable phonemic alterations including letter elisions, consonant insertions, multiple ways of uttering a letter/diagraph – areas that any good G2P system for SM should address. The application of these rules to two sets of corpus will also be discussed, and their generated phonemes examined for both accuracy measurement as well as for further rule refinements.
Speech recognition, Speech synthesis, Grapheme-to-Phoneme, Malay language
Physical Sciences and Mathematics
COCOSDA Jakarta Conference, December 2005
City or Country
LI, Haizhou; Aljunied, Mahani; and Teoh, Boon Seong.
A Grapheme to Phoneme Converter for Standard Malay. (2005). COCOSDA Jakarta Conference, December 2005. Research Collection Lee Kong Chian School Of Business.
Available at: http://ink.library.smu.edu.sg/lkcsb_research/2781