Research Collection School Of Computing and Information Systems

Translate-train embracing translationese artifacts

Sicheng YU, Singapore Management UniversityFollow
Qianru SUN, Singapore Management UniversityFollow
Hao ZHANG
Jing JIANG, Singapore Management UniversityFollow

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

5-2022

Abstract

Translate-train is a general training approach to multilingual tasks. The key idea is to use the translator of the target language to generate training data to mitigate the gap between the source and target languages. However, its performance is often hampered by the artifacts in the translated texts (translationese). We discover that such artifacts have common patterns in different languages and can be modeled by deep learning, and subsequently propose an approach to conduct translate-train using Translationese Embracing the effect of Artifacts (TEA). TEA learns to mitigate such effect on the training data of a source language (whose original and translationese are both available), and applies the learned module to facilitate the inference on the target language. Extensive experiments on the multilingual QA dataset TyDiQA demonstrate that TEA outperforms strong baselines.

Discipline

Databases and Information Systems | Programming Languages and Compilers

Research Areas

Data Science and Engineering

Publication

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 2022 May 22-27

First Page

362

Last Page

370

Identifier

10.18653/v1/2022.acl-short.40

Publisher

Association for Computational Linguistics

City or Country

Dublin, Ireland

Citation

YU, Sicheng; SUN, Qianru; ZHANG, Hao; and JIANG, Jing. Translate-train embracing translationese artifacts. (2022). Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, Dublin, Ireland, 2022 May 22-27. 362-370.
Available at: https://ink.library.smu.edu.sg/sis_research/7475

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

http://doi.org/10.18653/v1/2022.acl-short.40

Download

Included in

Databases and Information Systems Commons, Programming Languages and Compilers Commons

COinS

Research Collection School Of Computing and Information Systems

Translate-train embracing translationese artifacts

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Translate-train embracing translationese artifacts

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

First Page

Last Page

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links