Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
7-2021
Abstract
In both formal and informal texts, missing punctuation marks make the texts confusing and challenging to read. This paper aims to conduct exhaustive experiments to investigate the benefits of the pre-trained Transformer-based models on two Vietnamese punctuation datasets. The experimental results show our models can achieve encouraging results, and adding Bi-LSTM or/and CRF layers on top of the proposed models can also boost model performance. Finally, our best model can significantly bypass state-of-the-art approaches on both the novel and news datasets for the Vietnamese language. It can gain the corresponding performance up to 21.45%21.45% and 18.27%18.27% in the overall F1-scores.
Keywords
Punctuation prediction, Transfer learning, Transformer models
Discipline
Numerical Analysis and Computation | South and Southeast Asian Languages and Societies | Theory and Algorithms
Publication
Advances and trends in artificial intelligence: International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2021, July 26-29, Kuala Lumpur, Virtual
First Page
47
Last Page
58
ISBN
9783030794620
Identifier
10.1007/978-3-030-79463-7_5
Publisher
Springer
City or Country
Cham
Citation
TRAN, Hieu; DINH, Cuong V.; PHAM, Hong Quang; and NGUYEN, Binh T..
An efficient transformer-based model for Vietnamese punctuation prediction. (2021). Advances and trends in artificial intelligence: International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2021, July 26-29, Kuala Lumpur, Virtual. 47-58.
Available at: https://ink.library.smu.edu.sg/sis_research/7102
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1007/978-3-030-79463-7_5
Included in
Numerical Analysis and Computation Commons, South and Southeast Asian Languages and Societies Commons, Theory and Algorithms Commons