Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
10-2022
Abstract
Semantic code clone detection involves the detection of functionally similar code fragments which may otherwise be lexically, syntactically, or structurally dissimilar. The detection of semantic code clones has important applications in aspect mining and product line analysis. The accurate detection of semantic code clones is a challenging task and various techniques have been proposed. However, the evaluation of these techniques is performed using various datasets and we do not have a clear picture of the performance of these techniques relative to each other. Recently, SemanticCloneBench has been introduced as a benchmark for semantic clones. Now, we can use the SemanticCloneBench to effectively evaluate and compare the performance of semantic code clone detection techniques. In this paper, we compare the semantic code clone detection performance of three different code clone detection techniques namely FACER-CD, CodeBERT and NIL for Java code clones using SemanticCloneBench. FACER-CD performs API usage similarity-based clustering to detect clones, while CodeBERT is a deep-learning based approach which uses a pre-trained programming language model, and NIL is a token-based large-gapped code clones detector. FACER-CD, NIL, and CodeBERT show a recall of 64.3%, 12.7%, and 83.2% respectively on SemanticCloneBench. Using all three techniques together on the SemanticCloneBench dataset gives us an overall recall of 95.5% which is currently the best performance achieved on SemanticCloneBench.
Keywords
Semantic Clone Detection, SemanticCloneBench, Deep Learning, Semantic Similarity, CodeBERT, Large-Variance Clones
Discipline
Software Engineering
Publication
Proceedings of the 2022 IEEE 16th International Workshop on Software Clones (IWSC), Limassol, Cyprus, October 2
First Page
16
Last Page
22
ISBN
9781665484473
Identifier
10.1109/IWSC55060.2022.00011
Publisher
IEEE
City or Country
Los Alamitos, CA
Citation
RABBANI, Sohaib Masood; GULZAR, Nabeel Ahmad; ARSHAD, Saad; ABID, Shamsa; and SHAMAIL, Shafay.
A comparative analysis of clone detection techniques on SemanticCloneBench. (2022). Proceedings of the 2022 IEEE 16th International Workshop on Software Clones (IWSC), Limassol, Cyprus, October 2. 16-22.
Available at: https://ink.library.smu.edu.sg/sis_research/10205
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/IWSC55060.2022.00011