Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
12-2023
Abstract
Reinventing the wheel is a detrimental programming practice in software development that frequently results in the introduction of duplicated components. This practice not only leads to increased maintenance and labor costs but also poses a higher risk of propagating bugs throughout the system. Despite numerous issues introduced by duplicated components in software, the identification of component-level clones remains a significant challenge that existing studies struggle to effectively tackle. Specifically, existing methods face two primary limitations that are challenging to overcome: 1) Measuring the similarity between different components presents a challenge due to the significant size differences among them; 2) Identifying functional clones is a complex task as determining the primary functionality of components proves to be difficult. To overcome the aforementioned challenges, we present a novel approach named C3 (Component-level Code Clone detector) to effectively identify both textual and functional cloned components. In addition, to enhance the efficiency of eliminating cloned components, we develop an assessment method based on six component-level clone features, which assists developers in prioritizing the cloned components based on the refactoring necessity. To validate the effectiveness of C3, we employ a large-scale industrial product developed by Huawei, a prominent global ICT company, as our dataset and apply C3 to this dataset to identify the cloned components. Our experimental results demonstrate that C3 is capable of accurately detecting cloned components, achieving impressive performance in terms of precision (0.93), recall (0.91), and F1-score (0.9). Besides, we conduct a comprehensive user study to further validate the effectiveness and practicality of our assessment method and the proposed clone features in assessing the refactoring necessity of different cloned components. Our study establishes solid alignment between assessment outcomes and participant responses, indicating the accurate prioritization of clone components with a high refactoring necessity through our method. This finding further confirms the usefulness of the six "golden features"in our assessment.
Keywords
Clone detection, Code clone, Community detection algorithms, Component levels, Component-level clone detection, Component-level clone metric, Maintenance cost, Programming practices, Refactorings
Discipline
Databases and Information Systems | Software Engineering | Theory and Algorithms
Research Areas
Software and Cyber-Physical Systems
Publication
ESEC/FSE '23: Proceedings of ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, San Francisco, December 3-9
First Page
1832
Last Page
1843
ISBN
9798400703270
Identifier
10.1145/3611643.3613883
Publisher
ACM
City or Country
New York
Citation
YANG, Yanming; ZOU, Ying; HU, Xing; LO, David; NI, Chao; GRUNDY, John C.; and XIA, Xin:.
C³: Code clone-based identification of duplicated components. (2023). ESEC/FSE '23: Proceedings of ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, San Francisco, December 3-9. 1832-1843.
Available at: https://ink.library.smu.edu.sg/sis_research/8575
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/3611643.3613883
Included in
Databases and Information Systems Commons, Software Engineering Commons, Theory and Algorithms Commons