Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
1-2020
Abstract
Adding appropriate punctuation marks into text is an essential step in speech-to-text where such information is usually not available. While this has been extensively studied for English, there is no large-scale dataset and comprehensive study in the punctuation prediction problem for the Vietnamese language. In this paper, we collect two massive datasets and conduct a benchmark with both traditional methods and deep neural networks. We aim to publish both our data and all implementation codes to facilitate further research, not only in Vietnamese punctuation prediction but also in other related fields. Our project, including datasets and implementation details, is publicly available at https://github.com/BinhMisfit/vietnamese-punctuation-prediction.
Keywords
Attention model, BiLSTM, Conditional random field, Punctuation prediction
Discipline
Numerical Analysis and Scientific Computing | South and Southeast Asian Languages and Societies
Publication
SOFSEM 2020: Theory and Practice of Computer Science: Limassol: January 20-24: Proceedings
Volume
12011
First Page
388
Last Page
400
ISBN
9783030389185
Identifier
10.1007/978-3-030-38919-2_32
Publisher
Springer
City or Country
Cham
Citation
PHAM, Thuy; NGUYEN, Nhu; PHAM, Hong Quang; CAO, Han; and NGUYEN, Binh.
Vietnamese punctuation prediction using deep neural networks. (2020). SOFSEM 2020: Theory and Practice of Computer Science: Limassol: January 20-24: Proceedings. 12011, 388-400.
Available at: https://ink.library.smu.edu.sg/sis_research/7817
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1007/978-3-030-38919-2_32
Included in
Numerical Analysis and Scientific Computing Commons, South and Southeast Asian Languages and Societies Commons