Publication Type

Journal Article

Version

publishedVersion

Publication Date

1-2021

Abstract

Commit messages recorded in version control systems contain valuable information for software development, maintenance, and comprehension. Unfortunately, developers often commit code with empty or poor quality commit messages. To address this issue, several studies have proposed approaches to generate commit messages from commit diffs. Recent studies make use of neural machine translation algorithms to try and translate git diffs into commit messages and have achieved some promising results. However, these learning-based methods tend to generate high-frequency words but ignore low-frequency ones. In addition, they suffer from exposure bias issues, which leads to a gap between training phase and testing phase. In this paper, we propose CoRec to address the above two limitations. Specifically, we first train a contextaware encoder-decoder model which randomly selects the previous output of the decoder or the embedding vector of a ground truth word as context to make the model gradually aware of previous alignment choices. Given a diff for testing, the trained model is reused to retrieve the most similar diff from the training set. Finally, we use the retrieval diff to guide the probability distribution for the final generated vocabulary. Our method combines the advantages of both information retrieval and neural machine translation. We evaluate CoRec on a dataset from Liu et al. and a large-scale dataset crawled from 10k popular Java repositories in Github. Our experimental results show that CoRec significantly outperforms the state-of-the-art method NNGen by 19% on average in terms of BLEU.

Discipline

Databases and Information Systems

Research Areas

Data Science and Engineering

Publication

ACM Transactions on Software Engineering and Methodology

Volume

Issue

First Page

Last Page

ISSN

1049-331X

Identifier

10.1145/3464689

Publisher

Association for Computing Machinery (ACM)

Citation

WANG, Haoye; XIA, Xin; LO, David; HE, Qiang; WANG, Xinyu; and GRUNDY, John. Context-aware retrieval-based deep commit message Generation. (2021). ACM Transactions on Software Engineering and Methodology. 30, (4), 1-29.
Available at: https://ink.library.smu.edu.sg/sis_research/6776

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Download

Included in

Databases and Information Systems Commons

COinS

Research Collection School Of Computing and Information Systems

Context-aware retrieval-based deep commit message Generation

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Context-aware retrieval-based deep commit message Generation

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Included in

Share

Search

Links

Browse

Links