Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

7-2021

Abstract

In both formal and informal texts, missing punctuation marks make the texts confusing and challenging to read. This paper aims to conduct exhaustive experiments to investigate the benefits of the pre-trained Transformer-based models on two Vietnamese punctuation datasets. The experimental results show our models can achieve encouraging results, and adding Bi-LSTM or/and CRF layers on top of the proposed models can also boost model performance. Finally, our best model can significantly bypass state-of-the-art approaches on both the novel and news datasets for the Vietnamese language. It can gain the corresponding performance up to 21.45%21.45% and 18.27%18.27% in the overall F1-scores.

Keywords

Punctuation prediction, Transfer learning, Transformer models

Discipline

Numerical Analysis and Computation | South and Southeast Asian Languages and Societies | Theory and Algorithms

Publication

Advances and trends in artificial intelligence: International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2021, July 26-29, Kuala Lumpur, Virtual

First Page

47

Last Page

58

ISBN

9783030794620

Identifier

10.1007/978-3-030-79463-7_5

Publisher

Springer

City or Country

Cham

Additional URL

https://doi.org/10.1007/978-3-030-79463-7_5

Share

COinS