Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

10-2022

Abstract

Semantic code clone detection involves the detection of functionally similar code fragments which may otherwise be lexically, syntactically, or structurally dissimilar. The detection of semantic code clones has important applications in aspect mining and product line analysis. The accurate detection of semantic code clones is a challenging task and various techniques have been proposed. However, the evaluation of these techniques is performed using various datasets and we do not have a clear picture of the performance of these techniques relative to each other. Recently, SemanticCloneBench has been introduced as a benchmark for semantic clones. Now, we can use the SemanticCloneBench to effectively evaluate and compare the performance of semantic code clone detection techniques. In this paper, we compare the semantic code clone detection performance of three different code clone detection techniques namely FACER-CD, CodeBERT and NIL for Java code clones using SemanticCloneBench. FACER-CD performs API usage similarity-based clustering to detect clones, while CodeBERT is a deep-learning based approach which uses a pre-trained programming language model, and NIL is a token-based large-gapped code clones detector. FACER-CD, NIL, and CodeBERT show a recall of 64.3%, 12.7%, and 83.2% respectively on SemanticCloneBench. Using all three techniques together on the SemanticCloneBench dataset gives us an overall recall of 95.5% which is currently the best performance achieved on SemanticCloneBench.

Keywords

Semantic Clone Detection, SemanticCloneBench, Deep Learning, Semantic Similarity, CodeBERT, Large-Variance Clones

Discipline

Software Engineering

Publication

Proceedings of the 2022 IEEE 16th International Workshop on Software Clones (IWSC), Limassol, Cyprus, October 2

First Page

16

Last Page

22

ISBN

9781665484473

Identifier

10.1109/IWSC55060.2022.00011

Publisher

IEEE

City or Country

Los Alamitos, CA

Additional URL

https://doi.org/10.1109/IWSC55060.2022.00011

Share

COinS