Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
5-2022
Abstract
Translate-train is a general training approach to multilingual tasks. The key idea is to use the translator of the target language to generate training data to mitigate the gap between the source and target languages. However, its performance is often hampered by the artifacts in the translated texts (translationese). We discover that such artifacts have common patterns in different languages and can be modeled by deep learning, and subsequently propose an approach to conduct translate-train using Translationese Embracing the effect of Artifacts (TEA). TEA learns to mitigate such effect on the training data of a source language (whose original and translationese are both available), and applies the learned module to facilitate the inference on the target language. Extensive experiments on the multilingual QA dataset TyDiQA demonstrate that TEA outperforms strong baselines.
Discipline
Databases and Information Systems | Programming Languages and Compilers
Research Areas
Data Science and Engineering
Publication
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 2022 May 22-27
First Page
362
Last Page
370
Identifier
10.18653/v1/2022.acl-short.40
Publisher
Association for Computational Linguistics
City or Country
Dublin, Ireland
Citation
YU, Sicheng; SUN, Qianru; ZHANG, Hao; and JIANG, Jing.
Translate-train embracing translationese artifacts. (2022). Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 2022 May 22-27. 362-370.
Available at: https://ink.library.smu.edu.sg/sis_research/7475
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
http://doi.org/10.18653/v1/2022.acl-short.40