Publication Type

Conference Proceeding Article

Version

acceptedVersion

Publication Date

3-2022

Abstract

Program semantics learning is the core and fundamental for various code intelligent tasks e.g., vulnerability detection, clone detection. A considerable amount of existing works propose diverse approaches to learn the program semantics for different tasks and these works have achieved state-of-the-art performance. However, currently, a comprehensive and systematic study on evaluating different program representation techniques across diverse tasks is still missed. From this starting point, in this paper, we conduct an empirical study to evaluate different program representation techniques. Specifically, we categorize current mainstream code representation techniques into four categories i.e., Feature-based, Sequence-based, Tree-based, and Graph-based program representation technique and evaluate its performance on three diverse and popular code intelligent tasks i.e., Code Classification, Vulnerability Detection, and Clone Detection on the public released benchmark. We further design three research questions (RQs) and conduct a comprehensive analysis to investigate the performance. By the extensive experimental results, we conclude that (1) The graph-based representation is superior to the other selected techniques across these tasks. (2) Compared with the node type information used in tree-based and graph-based representations, the node textual information is more critical to learning the program semantics. (3) Different tasks require the task-specific semantics to achieve their highest performance, however combining various program semantics from different dimensions such as control dependency, data dependency can still produce promising results.

Discipline

Programming Languages and Compilers | Software Engineering

Research Areas

Intelligent Systems and Optimization

Publication

Proceedings of the 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering, Honolulu, Hawaii, March 15-18

First Page

Last Page

ISBN

9781665437875

Identifier

10.1109/SANER53432.2022.00073

Publisher

IEEE

City or Country

Honolulu, Hawaii

Citation

SIOW, Jing Kai; LIU, Shangqing; XIE, Xiaofei; MENG, Guozhu; and LIU, Yang. Learning program semantics with code representations: An empirical study. (2022). Proceedings of the 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering, Honolulu, Hawaii, March 15-18. 1-12.
Available at: https://ink.library.smu.edu.sg/sis_research/7501

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

Additional URL

http://doi.org/10.1109/SANER53432.2022.00073

Download

Included in

Programming Languages and Compilers Commons, Software Engineering Commons

COinS

Research Collection School Of Computing and Information Systems

Learning program semantics with code representations: An empirical study

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Learning program semantics with code representations: An empirical study

Author

Publication Type

Version

Publication Date

Abstract

Discipline

Research Areas

Publication

First Page

Last Page

ISBN

Identifier

Publisher

City or Country

Citation

Creative Commons License

Additional URL

Included in

Share

Search

Links

Browse

Links