Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
5-2022
Abstract
While GPT has become the de-facto method for text generation tasks, its application to pinyin input method remains unexplored. In this work, we make the first exploration to leverage Chinese GPT for pinyin input method. We find that a frozen GPT achieves state-of-the-art performance on perfect pinyin. However, the performance drops dramatically when the input includes abbreviated pinyin. A reason is that an abbreviated pinyin can be mapped to many perfect pinyin, which links to even larger number of Chinese characters. We mitigate this issue with two strategies, including enriching the context with pinyin and optimizing the training process to help distinguish homophones. To further facilitate the evaluation of pinyin input method, we create a dataset consisting of 270K instances from fifteen domains. Results show that our approach improves the performance on abbreviated pinyin across all domains. Model analysis demonstrates that both strategies contribute to the performance boost.
Keywords
Chinese characters, Input methods, ITS applications, Modeling analyzes, training process
Discipline
Artificial Intelligence and Robotics | Databases and Information Systems | Programming Languages and Compilers
Research Areas
Data Science and Engineering
Publication
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 2022 May 22-27
First Page
1899
Last Page
1909
Identifier
10.18653/v1/2022.acl-long.133
Publisher
ACL
City or Country
Dublin
Citation
TAN, Minghuan; DAI, Yong; TANG, Duyu; FENG, Zhangyin; HUANG, Guoping; JIANG, Jing; LI, Jiwei; and SHI, Shuming.
Exploring and adapting Chinese GPT to pinyin input method. (2022). Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 2022 May 22-27. 1899-1909.
Available at: https://ink.library.smu.edu.sg/sis_research/7474
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Additional URL
https://doi.org/10.18653/v1/2022.acl-long.133
Included in
Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons, Programming Languages and Compilers Commons