Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
5-2022
Abstract
A recent study by Ahmed and Devanbu reported that using a corpus of code written in multilingual datasets to fine-tune multilingual Pre-trained Language Models (PLMs) achieves higher performance as opposed to using a corpus of code written in just one programming language. However, no analysis was made with respect to fine-tuning monolingual PLMs. Furthermore, some programming languages are inherently different and code written in one language usually cannot be interchanged with the others, i.e., Ruby and Java code possess very different structure. To better understand how monolingual and multilingual PLMs affect different programming languages, we investigate 1) the performance of PLMs on Ruby for two popular Software Engineering tasks: Code Summarization and Code Search, 2) the strategy (to select programming languages) that works well on fine-tuning multilingual PLMs for Ruby, and 3) the performance of the fine-tuned PLMs on Ruby given different code lengths.
Keywords
Pre-trained language models, Low-resource languages
Discipline
Databases and Information Systems
Research Areas
Data Science and Engineering
Publication
Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, ICPC 2022, Virtual Event, May 16-17
First Page
401
Last Page
412
Identifier
10.1145/3524610.3527917
Publisher
Association for Computing Machinery
City or Country
New York
Citation
CHEN, Fuxiang; FARD, Fatemeh H.; LO, David; and BRYKSIN, Timofey.
On the transferability of pre-trained language models for low-resource programming languages. (2022). Proceedings of the 30th IEEE/ACM International Conference on Program Comprehension, ICPC 2022, Virtual Event, May 16-17. 401-412.
Available at: https://ink.library.smu.edu.sg/sis_research/7693
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/3524610.3527917