Publication Type

Journal Article

Version

publishedVersion

Publication Date

9-2022

Abstract

Code representation is a fundamental problem in many software engineering tasks. Despite the effort made by many researchers, it is still hard for existing methods to fully extract syntactic, structural and sequential features of source code, which form the hierarchical semantics of the program and are necessary to achieve a deeper code understanding. To alleviate this difficulty, we propose a new supervised approach based on the novel use of Tree-LSTM to incorporate the sequential and the global semantic features of programs explicitly into the representation model. Unlike previous techniques, our proposed model can not only learn low-level syntactic information within each statement but also the high-level semantic information between statements over the constructed semantic graph. Besides, considering that the sequential semantics is also critical for developers to understand the dependency path and data flow transmission, we propose a DFS-based method to generate the topological order of statements being processed, and then feed them as well as their in-neighboring information and syntactic embeddings into the proposed model to learn richer statement-level semantic features. Extensive experiments on multiple program comprehension tasks, e.g., code clone detection, demonstrate that our method achieves promising performance compared with other existing baselines.

Keywords

Code representation, Graph-LSTM, Hierarchical semantics, Program classification, Clone detection, Vulnerability detection, Deep learning

Discipline

Databases and Information Systems | Graphics and Human Computer Interfaces | Software Engineering

Research Areas

Software and Cyber-Physical Systems

Publication

Journal of Systems and Software

Volume

191

First Page

Last Page

ISSN

0164-1212

Identifier

10.1016/j.jss.2022.111355

Publisher

Elsevier

Citation

JIANG, Yuan; SU, Xiaohong; TREUDE, Christoph; and WANG, Tiantian. Hierarchical semantic-aware neural code representation. (2022). Journal of Systems and Software. 191, 1-21.
Available at: https://ink.library.smu.edu.sg/sis_research/8766

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

https://doi.org/10.1016/j.jss.2022.111355

Download

Included in

Databases and Information Systems Commons, Graphics and Human Computer Interfaces Commons, Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

Hierarchical semantic-aware neural code representation

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Hierarchical semantic-aware neural code representation

Author

Publication Type

Version

Publication Date

Abstract

Keywords

Discipline

Research Areas

Publication

Volume

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links