Publication Type

Journal Article

Version

acceptedVersion

Publication Date

2-2022

Abstract

Code clones are duplicate code fragments that share (nearly) similar syntax or semantics. Code clone detection plays an important role in software maintenance, code refactoring, and reuse. A substantial amount of research has been conducted in the past to detect clones. A majority of these approaches use lexical and syntactic information to detect clones. However, only a few of them target semantic clones. Recently, motivated by the success of deep learning models in other fields, including natural language processing and computer vision, researchers have attempted to adopt deep learning techniques to detect code clones. These approaches use lexical information (tokens) and(or) syntactic structures like abstract syntax trees (ASTs) to detect code clones. However, they do not make sufficient use of the available structural and semantic information hence, limiting their capabilities. This paper addresses the problem of semantic code clone detection using program dependency graphs and geometric neural networks, leveraging the structured syntactic and semantic information. We have developed a prototype tool HOLMES, based on our novel approach and empirically evaluated it on popular code clone benchmarks. Our results show that HOLMES performs considerably better than the other state-of-the-art tool, TBCCD. We also evaluated HOLMES on unseen projects and performed cross dataset experiments to assess the generalizability of HOLMES. Our results affirm that HOLMES outperforms TBCCD since most of the pairs that HOLMES detected were either undetected or suboptimally reported by TBCCD.

Keywords

Program representation learning, Semantic code clones, graph-based neural networks, siamese neural networks, program dependency graphs

Discipline

Graphics and Human Computer Interfaces | OS and Networks | Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

IEEE Transactions on Software Engineering

Volume

Issue

First Page

3771

Last Page

3789

ISSN

0098-5589

Identifier

10.1109/TSE.2021.3105556

Publisher

Institute of Electrical and Electronics Engineers

Citation

MEHROTRA, Nikita; AGARWAL, Navdha; GUPTA, Piyush; ANAND, Saket; LO, David; and PURANDARE, Rahul. Modeling functional similarity in source code with graph-based Siamese networks. (2022). IEEE Transactions on Software Engineering. 48, (10), 3771-3789.
Available at: https://ink.library.smu.edu.sg/sis_research/7658

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1109/TSE.2021.3105556

Download

Included in

Graphics and Human Computer Interfaces Commons, OS and Networks Commons, Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

Modeling functional similarity in source code with graph-based Siamese networks

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Modeling functional similarity in source code with graph-based Siamese networks

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links