Research Collection School Of Computing and Information Systems

Duplicate pull request detection: When time matters

Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

10-2019

Abstract

In open source communities (e.g., GitHub), developers frequently submit pull requests to fix bugs or add new features during development process. Since the process of pull request is uncoordinated and distributed, it causes massive duplication. Usually, only the first pull request qualified by reviewers can be merged to the main branch of the repository, and the others are regarded as duplication by maintainers. Since the duplication largely aggravates workloads of project reviewers and maintainers, the evolutionary process of open source repositories is delayed. To identify the duplicate pull requests automatically, Ren et al. proposed a state-of-the-art approach that models a pull request by nine features and determine whether a given request is duplicate with the other existing requests or not. Nevertheless, we notice that their approach overlooked the time factor which is a significant feature for the task. In this study, we investigate the influence of time factor and improve the pull request representation. We assume that two pull requests are more likely duplicate when their created time are close to each other. We verify the assumption based on 26 open source repositories from GitHub with over 100,000 pairs of pull requests. We integrate the time feature to the nine features proposed by Ren et al. and the experimental results show that it can substantially improve the performance of Ren et al.'s work by 14.36% and 11.93% in terms of F1-score@1 and F1-score@5, respectively.

Keywords

GitHub, Duplicate Pull Request, Time Factor

Discipline

Software Engineering

Publication

Internetware '19: Proceedings of the 11th Asia-Pacific Symposium on Internetware, Fukuoka, Japan, October 28-29

First Page

Last Page

ISBN

9781450377010

Identifier

10.1145/3361242.3361254

Publisher

ACM

City or Country

New York

Citation

WANG, Qingye; XU, Bowen; XIA, Xin; WANG, Ting; and LI, Shanping. Duplicate pull request detection: When time matters. (2019). Internetware '19: Proceedings of the 11th Asia-Pacific Symposium on Internetware, Fukuoka, Japan, October 28-29. 1-10.
Available at: https://ink.library.smu.edu.sg/sis_research/10217

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

http://doi.org/10.1145/3361242.3361254

Download

Included in

Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

Duplicate pull request detection: When time matters

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Duplicate pull request detection: When time matters

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links