Publication Type
Conference Proceeding Article
Version
acceptedVersion
Publication Date
3-2022
Abstract
Program semantics learning is the core and fundamental for various code intelligent tasks e.g., vulnerability detection, clone detection. A considerable amount of existing works propose diverse approaches to learn the program semantics for different tasks and these works have achieved state-of-the-art performance. However, currently, a comprehensive and systematic study on evaluating different program representation techniques across diverse tasks is still missed. From this starting point, in this paper, we conduct an empirical study to evaluate different program representation techniques. Specifically, we categorize current mainstream code representation techniques into four categories i.e., Feature-based, Sequence-based, Tree-based, and Graph-based program representation technique and evaluate its performance on three diverse and popular code intelligent tasks i.e., Code Classification, Vulnerability Detection, and Clone Detection on the public released benchmark. We further design three research questions (RQs) and conduct a comprehensive analysis to investigate the performance. By the extensive experimental results, we conclude that (1) The graph-based representation is superior to the other selected techniques across these tasks. (2) Compared with the node type information used in tree-based and graph-based representations, the node textual information is more critical to learning the program semantics. (3) Different tasks require the task-specific semantics to achieve their highest performance, however combining various program semantics from different dimensions such as control dependency, data dependency can still produce promising results.
Discipline
Programming Languages and Compilers | Software Engineering
Research Areas
Intelligent Systems and Optimization
Publication
Proceedings of the 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering, Honolulu, Hawaii, March 15-18
First Page
1
Last Page
12
ISBN
9781665437875
Identifier
10.1109/SANER53432.2022.00073
Publisher
IEEE
City or Country
Honolulu, Hawaii
Citation
SIOW, Jing Kai; LIU, Shangqing; XIE, Xiaofei; MENG, Guozhu; and LIU, Yang.
Learning program semantics with code representations: An empirical study. (2022). Proceedings of the 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering, Honolulu, Hawaii, March 15-18. 1-12.
Available at: https://ink.library.smu.edu.sg/sis_research/7501
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
http://doi.org/10.1109/SANER53432.2022.00073